RFC 1843 (rfc1843) - Page 3 of 5


HZ - A Data Format for Exchanging Files of Arbitrarily Mixed Chinese and ASCII characters



Alternative Format: Original Text Document



RFC 1843        HZ - A Data Format for Exchanging Files      August 1995


3. Remarks & Recommendations

   We choose to encode any ASCII character except '~' as it is, rather
   than as a two byte code, and we choose ASCII as the default mode for
   the following reasons. The computer systems we use is ASCII based.  A
   HZ file containing pure ASCII characters (i.e. no Chinese characters)
   except '~' is precisely a pure ASCII file. In general, the English
   (ASCII) portion of a HZ file is directly readable.

   The escape character '~' is chosen not only because it is commonly
   used in the ASCII world, but also because '~' ($7E) is outside the
   defined range ($21-$77) of the first byte of a GB code.

   In ASCII mode, other potential escape sequences, i.e., two byte
   sequences beginning with '~' (other than '~~', '~{', '~\n') are
   currently invalid HZ sequences. Hence, they can be used for future
   extension of HZ with total upward compatibility.

   The line-continuation marker '~\n' is useful if one wants to encode
   long lines in the original text into short lines in this data format
   without introducing extra newline characters in the decoding process.

   There is no limit on the length of a line. In fact, the whole file
   could be one long line or even contain no newline characters. Any
   DECODER of this HZ data format should not and has no need to operate
   on the concept of a line.

   It is easy to write encoders and decoders for HZ. An encoder or
   decoder needs to lookahead at most one character in the input data
   stream.

   Given the current mode, it is also possible and easy to decode a HZ
   data stream by scanning backward. One of the implication is that
   "backspaces" can be handled correctly by a terminal emulator.

   To facilitate the effective use of programs supporting line/page
   skips such as "more" on UNIX with a terminal emulator understanding
   the HZ format, it is RECOMMENDED that the ENCODER (which outputs in
   HZ) sets a maximum line size of less than 80 characters.  Since '\n'
   is an ASCII character, the syntax of HZ then automatically implies
   that GB codes appearing at the end of a line must be terminated with
   the escape-from-GB code '~}', and the line-continuation marker '~\n'
   should be inserted appropriately. The price to paid is that the
   encoded file size is slightly larger.

   It is important to understand the following distinction.  Note that
   the above recommendation does NOT change the HZ format.  It is simply
   an encoding "style" which follows the syntax of HZ. Note that this



Lee                          Informational