RFC 1345 (rfc1345) - Page 2 of 103


Character Mnemonics and Character Sets



Alternative Format: Original Text Document



RFC 1345          Character Mnemonics & Character Sets         June 1992


   character sets, including ASCII, national variants of the ISO 646 7-
   bit character set and various EBCDICs.  In addition, the numeric
   value of the coded representations of all these characters are the
   same in all coded character sets compatible with ISO standards.  All
   of them except two, EXCLAMATION MARK and QUOTATION MARK, have the
   same coded representation in all variants of EBCDIC.  This minimal
   set of characters is called the reference character set in this memo.

   The mnemonics can be used in Internet standards for easy and
   unambiguous reference, and they can also serve as a fallback
   representation in various Internet specifications.

   The coded character sets covered include all parts of ISO 8859, ISO
   6937-2 and all ISO 646 conforming coded character sets in the ISO
   character set registry managed by ECMA according to ISO 2375.  Almost
   all graphic coded character sets in the ECMA registry (1) are
   covered.  The graphic coded character sets not included are registry
   numbers 31, 38, 39, 53, 59, 68, 71, 72, 129 and 137.  In addition
   many vendor defined character sets are covered, including PC
   codepages (4), (7), (8), many EBCDIC character sets (4), (5), (6) and
   HP, DEC and Apple character sets (8), (9), (10), (13), (14).  The
   East-Asian 16-bit character sets from the ECMA registry is also
   included in this memo.

2.  CHARACTER MNEMONICS

2.1  General Syntax

   The character mnemonics are taken from the ISO committee draft (CD)
   of the POSIX.2 standard (3).  They are classified into two groups:


   1. A group with two-character mnemonics
      - Primarily intended for alphabetic scripts like Latin, Greek,
        Cyrillic, Hebrew and Arabic, and special characters.
   2. A group with variable-length mnemonics
      - primarily intended for non-alphabetic scripts like Japanese and
        Chinese, but also used for some accented letters and special
        characters.

   In the two-character mnemonics, all invariant graphic character in
   the ISO 646 character codes except "&" are used, i.e. the following
   characters:

           ! "     %   ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ;  ?
             A B C D E F G H I J K L M N O P Q R S T U V W X Y Z       _
             a b c d e f g h i j k l m n o p q r s t u v w x y z

   The character "_" is not used as the first character.

   In the variable-length mnemonics, the character "_" is not  used as
   the first character. If it is used in a name, its presence is
   doubled.

Simonsen