RFC 1502 (rfc1502) - Page 2 of 14
X
Alternative Format: Original Text Document
RFC 1502 X.400 Use of Extended Character Sets August 1993
The author believes that this document gives a specification that can
easily accomodate the use of any character set in the ISO registry,
and, by giving guidance rules for choosing character sets, will help
interworking.
2.2. Families of character sets
2.2.1. ISO 6937/T.61
ISO 6937 is a code technique used and recommended in T.51 and T.101
(Teletex and Videotex service) and in X.500, providing a repertoire
of 333 characters from the Latin script by use of non- spacing
diacritical marks. It corresponds closely to CCITT recommendation
T.61.
The problem with that technique is that the character stream comes in
two modes, i.e., some characters are coded with one byte and some
with two (composite characters). This makes information processing
systems such as an E-mail UA or GW more complex.
It is also not extensible to other languages like Korean or Chinese,
or even Greek, without invoking the character set switching
techniques of ISO 2022.
2.2.2. ISO 8859
ISO 8859 defines a set of character sets, each suitable for use in
some group of languages. Each character in ISO 8859 is coded in a
single byte.
There are currently 11 parts of ISO 8859, plus a "supplementary" set,
registered as ISO IR 154. Most languages using single-byte characters
can be written in one or another of the ISO 8859 sets. There are
sets covering Greek, Hebrew and Arabic, but there is still
controversy over the problem of the rendering direction for Hebrew
and Arabic.
All the ISO 8859 sets include US-ASCII as a subset. All use 8 bits.
ISO 8859 is regarded by many as a solution; for instance, the X
windows system now comes with ISO-8859-1 as the "standard" character
set, with the possibility of specifying others. But since the same
applications often do not support character set switching within
text, it is problematic to use these in a truly multilingual
environment. (Also, most fonts claiming to be "ISO- 8859-1" in X11R5
are actually 7-bit fonts. The implied lie is very unfortunate.)
Alvestrand