RFC 2152 (rfc2152) - Page 2 of 15


UTF-7 A Mail-Safe Transformation Format of Unicode



Alternative Format: Original Text Document



RFC 2152                         UTF-7                          May 1997


   with the Quoted-Printable content transfer encoding of MIME
   represents US-ASCII characters in one octet, but other characters may
   require up to nine octets.

Overview

   UTF-7 encodes Unicode characters as US-ASCII octets, together with
   shift sequences to encode characters outside that range. For this
   purpose, one of the characters in the US-ASCII repertoire is reserved
   for use as a shift character.

   Many mail gateways and systems cannot handle the entire US-ASCII
   character set (those based on EBCDIC, for example), and so UTF-7
   contains provisions for encoding characters within US-ASCII in a way
   that all mail systems can accomodate.

   UTF-7 should normally be used only in the context of 7 bit
   transports, such as mail. In other contexts, straight Unicode or
   UTF-8 is preferred.

   See RFC 1641, "Using Unicode with MIME" for the overall specification
   on usage of Unicode transformation formats with MIME.

Definitions

   First, the definition of Unicode:

      The 16 bit character set Unicode is defined by "The Unicode
      Standard, Version 2.0". This character set is identical with the
      character repertoire and coding of the international standard
      ISO/IEC 10646-1:1993(E); Coded Representation Form=UCS-2;
      Subset=300; Implementation Level=3, including the first 7
      amendments to 10646 plus editorial corrections.

      Note. Unicode 2.0 further specifies the use and interaction of
      these character codes beyond the ISO standard. However, any valid
      10646 sequence is a valid Unicode sequence, and vice versa;
      Unicode supplies interpretations of sequences on which the ISO
      standard is silent as to interpretation.

   Next, some handy definitions of US-ASCII character subsets:

      Set D (directly encoded characters) consists of the following
      characters (derived from RFC 1521, Appendix B, which no longer
      appears in RFC 2045): the upper and lower case letters A through Z
      and a through z, the 10 digits 0-9, and the following nine special
      characters (note that "+" and "=" are omitted):




Goldsmith & Davis            Informational