RFC 1843 (rfc1843) - Page 2 of 5


HZ - A Data Format for Exchanging Files of Arbitrarily Mixed Chinese and ASCII characters



Alternative Format: Original Text Document



RFC 1843        HZ - A Data Format for Exchanging Files      August 1995


2. Specification

   The format of HZ is described in the following.

   Without loss of generality, we assume that all Chinese characters
   (HanZi) have already been encoded in GB.  A GB (GB1 and GB2) code is
   a two byte code, where the first byte is in the range $21-$77
   (hexadecimal), and the second byte is in the range $21-$7E.

   A graphical ASCII character is a byte in the range $21-$7E. A non-
   graphical ASCII character is a byte in the range $0-$20 or of the
   value $7F.

   Since the range of a graphical ASCII character overlaps that of a GB
   byte, a byte in the range $21-$7E is interpreted according to the
   mode it is in.  There are two modes, namely ASCII mode and GB mode.

   By convention, a non-graphical ASCII character should only appear in
   ASCII mode.

   The default mode is ASCII mode.

   In ASCII mode, a byte is interpreted as an ASCII character, unless a
   '~' is encountered. The character '~' is an escape character. By
   convention, it must be immediately followed ONLY by '~', '{' or '\n'
   (), with the following special meaning.

   o The escape sequence '~~' is interpreted as a '~'.
   o The escape-to-GB sequence '~{' switches the mode from ASCII to
     GB.
   o The escape sequence '~\n' is a line-continuation marker to be
     consumed with no output produced.

   In GB mode, characters are interpreted two bytes at a time as (pure)
   GB codes until the escape-from-GB code '~}' is read. This code
   switches the mode from GB back to ASCII.  (Note that the escape-
   from-GB code '~}' ($7E7D) is outside the defined GB range.)

   The decoding process is clear from the above description.

   The encoding process is straightforward. Note that an (ASCII) '~' is
   always encoded as '~~'. A sequence of GB codes is enclosed in '~{'
   and '~}'.








Lee                          Informational