RFC 1642 (rfc1642) - Page 2 of 14


UTF-7 - A Mail-Safe Transformation Format of Unicode



Alternative Format: Original Text Document



RFC 1642                         UTF-7                         July 1994


   UTF-FSS together with the Quoted-Printable content transfer encoding
   of MIME represents US-ASCII characters in one octet, but other
   characters may require up to nine octets.

Overview

   UTF-7 encodes Unicode characters as US-ASCII, together with shift
   sequences to encode characters outside that range. For this purpose,
   one of the characters in the US-ASCII repertoire is reserved for use
   as a shift character.

   Many mail gateways and systems cannot handle the entire US-ASCII
   character set (those based on EBCDIC, for example), and so UTF-7
   contains provisions for encoding characters within US-ASCII in a way
   that all mail systems can accomodate.

   UTF-7 should normally be used only in the context of 7 bit
   transports, such as mail and news. In other contexts, straight
   Unicode or UTF-8 is preferred.

   See the document "Using Unicode with MIME" for the overall
   specification on usage of Unicode transformation formats with MIME.

Definitions

   First, the definition of Unicode:

      The 16 bit character set Unicode is defined by "The Unicode
      Standard, Version 1.1". This character set is identical with the
      character repertoire and coding of the international standard
      ISO/IEC 10646-1:1993(E); Coded Representation Form=UCS-2;
      Subset=300; Implementation Level=3.

      Note. Unicode 1.1 further specifies the use and interaction of
      these character codes beyond the ISO standard. However, any valid
      10646 BMP (Basic Multilingual Plane) sequence is a valid Unicode
      sequence, and vice versa; Unicode supplies interpretations of
      sequences on which the ISO standard is silent as to
      interpretation.

   Next, some handy definitions of US-ASCII character subsets:

      Set D (directly encoded characters) consists of the following
      characters (derived from RFC 1521, Appendix B): the upper and
      lower case letters A through Z and a through z, the 10 digits 0-9,
      and the following nine special characters (note that "+" and "="
      are omitted):




Goldsmith & Davis