RFC 1641 (rfc1641) - Page 2 of 6

Using Unicode with MIME

Alternative Format: Original Text Document
< Previous
Next >
RFC 1641                Using Unicode with MIME                July 1994


Overview

   Several ways of using Unicode are possible. This document specifies
   both guidelines for use of Unicode within MIME, and a specific usage.
   The usage specified in this document is a straightforward use of
   Unicode as specified in "The Unicode Standard, Version 1.1".

   This encoding is intended for situations where sender and recipient
   do not want to do a lot of processing, when the text does not consist
   primarily of characters from the US-ASCII character set, or when
   sender and receiver are known in advance to support Unicode.

   Another encoding is intended for situations where the text consists
   primarily of US-ASCII, with occasional characters from other parts of
   Unicode. This encoding allows the US-ASCII portion to be read by all
   recipients without having to support Unicode. This encoding is
   specified in another document, "UTF-7: A Mail Safe Transformation
   Format of Unicode" [UTF-7].

   Finally, in keeping with the principles set forth in RFC 1521, text
   which can be represented using the US-ASCII or ISO-8859-x character
   sets should be so represented where possible, for maximum
   interoperability.

Definitions

   The definition of character set Unicode:

      The 16 bit character set Unicode is defined by "The Unicode
      Standard, Version 1.1". This character set is identical with the
      character repertoire and coding of the international standard
      ISO/IEC 10646-1:1993(E); Coded Representation Form=UCS-2;
      Subset=300; Implementation Level=3.

      Note. Unicode 1.1 further specifies the use and interaction of
      these character codes beyond the ISO standard. However, any valid
      10646 BMP (Basic Multilingual Plane) sequence is a valid Unicode
      sequence, and vice versa; Unicode supplies interpretations of
      sequences on which the ISO standard is silent as to
      interpretation.

      This character set is encoded as sequences of octets, two per 16-
      bit character, with the most significant octet first. Text with an
      odd number of octets is ill-formed.

      Rationale. ISO/IEC 10646-1:1993(E) specifies that when characters
      in the UCS-2 form are serialized as octets, that the most
      significant octet appear first.  This is also in keeping with



Goldsmith & Davis
< Previous
Next >