RFC 1842 (rfc1842) - Page 3 of 12


ASCII Printable Characters-Based Chinese Character Encoding for Internet Messages



Alternative Format: Original Text Document



RFC 1842            ASCII/Chinese Character Encoding         August 1995


2. Description

   For an arbitrary mixed text with both Chinese coded text strings and
   ASCII text strings, we designate to two distinguishable text modes,
   ASCII mode and HZ mode, as the only two states allowed in the text.
   At any given time, the text is in either one of these two modes or in
   the transition from one to the other. In the HZ mode, only printable
   ASCII characters (0x21-0x7E) are meanful with the size of basic text
   unit being two bytes long.

   In the ASCII mode, the size of basic text unit is one (1) byte with
   the exception '~~', which is the special sequence representing the
   ASCII character '~'. In both ASCII mode and HZ mode, '~' leads an
   escape sequence. However, as HZ mode has basic size of text unit
   being 2 bytes long, only the '~' character which appears at the first
   byte of the the two-byte character frame are considered as the start
   of an escape sequence.

   The default mode is ASCII mode. Each line of text starts with the
   default ASCII mode. Therefore, all Chinese character strings are to
   be enclosed with '~{' and '~}' pair in the same text line.

   The escape sequences defined are as the following:

        ~{       ---- escape from ASCII mode to GB2312 HZ mode
        ~}       ---- escape from HZ mode to ASCII mode
        ~~       ---- ASCII character '~' in ASCII mode
        ~\n      ---- line continuation in ASCII mode
        ~[!-z|]  ---- reserved for future HZ mode character sets


   A few examples of the 7 bit representation of Chinese GB coded test
   taken directly from [Lee89] are listed as the following:

   Example 1:  (Suppose there is no line size limit.)
               This sentence is in ASCII.
               The next sentence is in GB.~{<:ky2 example the maximum line size is this sentence in ascii. next gb. a new started for every mode switch. bye. wei et al informational>





</:ky2>