RFC 1642 (rfc1642) - Page 1 of 14


UTF-7 - A Mail-Safe Transformation Format of Unicode



Alternative Format: Original Text Document



Network Working Group                                       D. Goldsmith
Request for Comments: 1642                                      M. Davis
Category: Experimental                                    Taligent, Inc.
                                                               July 1994


                                 UTF-7


              A Mail-Safe Transformation Format of Unicode

Status of this Memo

   This memo defines an Experimental Protocol for the Internet
   community.  This memo does not specify an Internet standard of any
   kind.  Distribution of this memo is unlimited.

Abstract

   The Unicode Standard, version 1.1, and ISO/IEC 10646-1:1993(E)
   jointly define a 16 bit character set (hereafter referred to as
   Unicode) which encompasses most of the world's writing systems.
   However, Internet mail (STD 11, RFC 822) currently supports only 7-
   bit US ASCII as a character set. MIME (RFC 1521 and RFC 1522) extends
   Internet mail to support different media types and character sets,
   and thus could support Unicode in mail messages. MIME neither defines
   Unicode as a permitted character set nor specifies how it would be
   encoded, although it does provide for the registration of additional
   character sets over time.

   This document describes a new transformation format of Unicode that
   contains only 7-bit ASCII characters and is intended to be readable
   by humans in the limiting case that the document consists of
   characters from the US-ASCII repertoire. It also specifies how this
   transformation format is used in the context of RFC 1521, RFC 1522,
   and the document "Using Unicode with MIME".

Motivation

   Although other transformation formats of Unicode exist and could
   conceivably be used in this context (most notably UTF-1 and UTF-8,
   also known as UTF-2 or UTF-FSS), they suffer the disadvantage that
   they use octets in the range decimal 128 through 255 to encode
   Unicode characters outside the US-ASCII range. Thus, in the context
   of mail, those octets must themselves be encoded. This requires
   putting text through two successive encoding processes, and leads to
   a significant expansion of characters outside the US-ASCII range,
   putting non-English speakers at a disadvantage. For example, using



Goldsmith & Davis