RFC 2152 (rfc2152) - Page 2 of 15
UTF-7 A Mail-Safe Transformation Format of Unicode
Alternative Format: Original Text Document
RFC 2152 UTF-7 May 1997
with the Quoted-Printable content transfer encoding of MIME
represents US-ASCII characters in one octet, but other characters may
require up to nine octets.
Overview
UTF-7 encodes Unicode characters as US-ASCII octets, together with
shift sequences to encode characters outside that range. For this
purpose, one of the characters in the US-ASCII repertoire is reserved
for use as a shift character.
Many mail gateways and systems cannot handle the entire US-ASCII
character set (those based on EBCDIC, for example), and so UTF-7
contains provisions for encoding characters within US-ASCII in a way
that all mail systems can accomodate.
UTF-7 should normally be used only in the context of 7 bit
transports, such as mail. In other contexts, straight Unicode or
UTF-8 is preferred.
See RFC 1641, "Using Unicode with MIME" for the overall specification
on usage of Unicode transformation formats with MIME.
Definitions
First, the definition of Unicode:
The 16 bit character set Unicode is defined by "The Unicode
Standard, Version 2.0". This character set is identical with the
character repertoire and coding of the international standard
ISO/IEC 10646-1:1993(E); Coded Representation Form=UCS-2;
Subset=300; Implementation Level=3, including the first 7
amendments to 10646 plus editorial corrections.
Note. Unicode 2.0 further specifies the use and interaction of
these character codes beyond the ISO standard. However, any valid
10646 sequence is a valid Unicode sequence, and vice versa;
Unicode supplies interpretations of sequences on which the ISO
standard is silent as to interpretation.
Next, some handy definitions of US-ASCII character subsets:
Set D (directly encoded characters) consists of the following
characters (derived from RFC 1521, Appendix B, which no longer
appears in RFC 2045): the upper and lower case letters A through Z
and a through z, the 10 digits 0-9, and the following nine special
characters (note that "+" and "=" are omitted):
Goldsmith & Davis Informational