RFC 2781 (rfc2781) - Page 1 of 14


UTF-16, an encoding of ISO 10646



Alternative Format: Original Text Document



Network Working Group                                        P. Hoffman
Request for Comments: 2781                     Internet Mail Consortium
Category: Informational                                      F. Yergeau
                                                      Alis Technologies
                                                          February 2000


                    UTF-16, an encoding of ISO 10646

Status of this Memo

   This memo provides information for the Internet community.  It does
   not specify an Internet standard of any kind.  Distribution of this
   memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (2000).  All Rights Reserved.

1. Introduction

   This document describes the UTF-16 encoding of Unicode/ISO-10646,
   addresses the issues of serializing UTF-16 as an octet stream for
   transmission over the Internet, discusses MIME charset naming as
   described in [CHARSET-REG], and contains the registration for three
   MIME charset parameter values: UTF-16BE (big-endian), UTF-16LE
   (little-endian), and UTF-16.

1.1 Background and motivation

   The Unicode Standard [UNICODE] and ISO/IEC 10646 [ISO-10646] jointly
   define a coded character set (CCS), hereafter referred to as Unicode,
   which encompasses most of the world's writing systems [WORKSHOP].
   UTF-16, the object of this specification, is one of the standard ways
   of encoding Unicode character data; it has the characteristics of
   encoding all currently defined characters (in plane 0, the BMP) in
   exactly two octets and of being able to encode all other characters
   likely to be defined (the next 16 planes) in exactly four octets.

   The Unicode Standard further defines additional character properties
   and other application details of great interest to implementors. Up
   to the present time, changes in Unicode and amendments to ISO/IEC
   10646 have tracked each other, so that the character repertoires and
   code point assignments have remained in sync. The relevant
   standardization committees have committed to maintain this very
   useful synchronism, as well as not to assign characters outside of
   the 17 planes accessible to UTF-16.




Hoffman & Yergeau            Informational