RFC 1842 (rfc1842) - Page 3 of 12
ASCII Printable Characters-Based Chinese Character Encoding for Internet Messages
Alternative Format: Original Text Document
RFC 1842 ASCII/Chinese Character Encoding August 1995
2. Description
For an arbitrary mixed text with both Chinese coded text strings and
ASCII text strings, we designate to two distinguishable text modes,
ASCII mode and HZ mode, as the only two states allowed in the text.
At any given time, the text is in either one of these two modes or in
the transition from one to the other. In the HZ mode, only printable
ASCII characters (0x21-0x7E) are meanful with the size of basic text
unit being two bytes long.
In the ASCII mode, the size of basic text unit is one (1) byte with
the exception '~~', which is the special sequence representing the
ASCII character '~'. In both ASCII mode and HZ mode, '~' leads an
escape sequence. However, as HZ mode has basic size of text unit
being 2 bytes long, only the '~' character which appears at the first
byte of the the two-byte character frame are considered as the start
of an escape sequence.
The default mode is ASCII mode. Each line of text starts with the
default ASCII mode. Therefore, all Chinese character strings are to
be enclosed with '~{' and '~}' pair in the same text line.
The escape sequences defined are as the following:
~{ ---- escape from ASCII mode to GB2312 HZ mode
~} ---- escape from HZ mode to ASCII mode
~~ ---- ASCII character '~' in ASCII mode
~\n ---- line continuation in ASCII mode
~[!-z|] ---- reserved for future HZ mode character sets
A few examples of the 7 bit representation of Chinese GB coded test
taken directly from [Lee89] are listed as the following:
Example 1: (Suppose there is no line size limit.)
This sentence is in ASCII.
The next sentence is in GB.~{<:ky2 example the maximum line size is this sentence in ascii. next gb. a new started for every mode switch. bye. wei et al informational>
</:ky2>