RFC 1641 (rfc1641) - Page 2 of 6
Using Unicode with MIME
Alternative Format: Original Text Document
RFC 1641 Using Unicode with MIME July 1994
Overview
Several ways of using Unicode are possible. This document specifies
both guidelines for use of Unicode within MIME, and a specific usage.
The usage specified in this document is a straightforward use of
Unicode as specified in "The Unicode Standard, Version 1.1".
This encoding is intended for situations where sender and recipient
do not want to do a lot of processing, when the text does not consist
primarily of characters from the US-ASCII character set, or when
sender and receiver are known in advance to support Unicode.
Another encoding is intended for situations where the text consists
primarily of US-ASCII, with occasional characters from other parts of
Unicode. This encoding allows the US-ASCII portion to be read by all
recipients without having to support Unicode. This encoding is
specified in another document, "UTF-7: A Mail Safe Transformation
Format of Unicode" [UTF-7].
Finally, in keeping with the principles set forth in RFC 1521, text
which can be represented using the US-ASCII or ISO-8859-x character
sets should be so represented where possible, for maximum
interoperability.
Definitions
The definition of character set Unicode:
The 16 bit character set Unicode is defined by "The Unicode
Standard, Version 1.1". This character set is identical with the
character repertoire and coding of the international standard
ISO/IEC 10646-1:1993(E); Coded Representation Form=UCS-2;
Subset=300; Implementation Level=3.
Note. Unicode 1.1 further specifies the use and interaction of
these character codes beyond the ISO standard. However, any valid
10646 BMP (Basic Multilingual Plane) sequence is a valid Unicode
sequence, and vice versa; Unicode supplies interpretations of
sequences on which the ISO standard is silent as to
interpretation.
This character set is encoded as sequences of octets, two per 16-
bit character, with the most significant octet first. Text with an
odd number of octets is ill-formed.
Rationale. ISO/IEC 10646-1:1993(E) specifies that when characters
in the UCS-2 form are serialized as octets, that the most
significant octet appear first. This is also in keeping with
Goldsmith & Davis