RFC 2978 (rfc2978) - Page 2 of 11


IANA Charset Registration Procedures



Alternative Format: Original Text Document



RFC 2978          IANA Charset Registration Procedures      October 2000


1.2.  Character

   A member of a set of elements used for the organization, control, or
   representation of data.

1.3.  Charset

   The term "charset" (referred to as a "character set" in previous
   versions of this document) is used here to refer to a method of
   converting a sequence of octets into a sequence of characters.  This
   conversion may also optionally produce additional control information
   such as directionality indicators.

   Note that unconditional and unambiguous conversion in the other
   direction is not required, in that not all characters may be
   representable by a given charset and a charset may provide more than
   one sequence of octets to represent a particular sequence of
   characters.

   This definition is intended to allow charsets to be defined in a
   variety of different ways, from simple single-table mappings such as
   US-ASCII to complex table switching methods such as those that use
   ISO 2022's techniques.  However, the definition associated with a
   charset name must fully specify the mapping to be performed.  In
   particular, use of external profiling information to determine the
   exact mapping is not permitted.

   HISTORICAL NOTE: The term "character set" was originally used in MIME
   to describe such straightforward schemes as US-ASCII and ISO-8859-1
   which consist of a small set of characters and a simple one-to-one
   mapping from single octets to single characters.  Multi-octet
   character encoding schemes and switching techniques make the
   situation much more complex.  As such, the definition of this term
   was revised to emphasize both the conversion aspect of the process,
   and the term itself has been changed to "charset" to emphasize that
   it is not, after all, just a set of characters.  A discussion of
   these issues as well as specification of standard terminology for use
   in the IETF appears in RFC 2130.

1.4.  Coded Character Set

   A Coded Character Set (CCS) is a one-to-one mapping from a set of
   abstract characters to a set of integers.  Examples of coded
   character sets are ISO 10646 [ISO-10646], US-ASCII [US-ASCII], and
   the ISO-8859 series [ISO-8859].






Freed & Postel           Best Current Practice