RFC 1642 (rfc1642) - Page 2 of 14
UTF-7 - A Mail-Safe Transformation Format of Unicode
Alternative Format: Original Text Document
RFC 1642 UTF-7 July 1994
UTF-FSS together with the Quoted-Printable content transfer encoding
of MIME represents US-ASCII characters in one octet, but other
characters may require up to nine octets.
Overview
UTF-7 encodes Unicode characters as US-ASCII, together with shift
sequences to encode characters outside that range. For this purpose,
one of the characters in the US-ASCII repertoire is reserved for use
as a shift character.
Many mail gateways and systems cannot handle the entire US-ASCII
character set (those based on EBCDIC, for example), and so UTF-7
contains provisions for encoding characters within US-ASCII in a way
that all mail systems can accomodate.
UTF-7 should normally be used only in the context of 7 bit
transports, such as mail and news. In other contexts, straight
Unicode or UTF-8 is preferred.
See the document "Using Unicode with MIME" for the overall
specification on usage of Unicode transformation formats with MIME.
Definitions
First, the definition of Unicode:
The 16 bit character set Unicode is defined by "The Unicode
Standard, Version 1.1". This character set is identical with the
character repertoire and coding of the international standard
ISO/IEC 10646-1:1993(E); Coded Representation Form=UCS-2;
Subset=300; Implementation Level=3.
Note. Unicode 1.1 further specifies the use and interaction of
these character codes beyond the ISO standard. However, any valid
10646 BMP (Basic Multilingual Plane) sequence is a valid Unicode
sequence, and vice versa; Unicode supplies interpretations of
sequences on which the ISO standard is silent as to
interpretation.
Next, some handy definitions of US-ASCII character subsets:
Set D (directly encoded characters) consists of the following
characters (derived from RFC 1521, Appendix B): the upper and
lower case letters A through Z and a through z, the 10 digits 0-9,
and the following nine special characters (note that "+" and "="
are omitted):
Goldsmith & Davis