RFC 2781 (rfc2781) - Page 1 of 14
UTF-16, an encoding of ISO 10646
Alternative Format: Original Text Document
Network Working Group P. Hoffman
Request for Comments: 2781 Internet Mail Consortium
Category: Informational F. Yergeau
Alis Technologies
February 2000
UTF-16, an encoding of ISO 10646
Status of this Memo
This memo provides information for the Internet community. It does
not specify an Internet standard of any kind. Distribution of this
memo is unlimited.
Copyright Notice
Copyright (C) The Internet Society (2000). All Rights Reserved.
1. Introduction
This document describes the UTF-16 encoding of Unicode/ISO-10646,
addresses the issues of serializing UTF-16 as an octet stream for
transmission over the Internet, discusses MIME charset naming as
described in [CHARSET-REG], and contains the registration for three
MIME charset parameter values: UTF-16BE (big-endian), UTF-16LE
(little-endian), and UTF-16.
1.1 Background and motivation
The Unicode Standard [UNICODE] and ISO/IEC 10646 [ISO-10646] jointly
define a coded character set (CCS), hereafter referred to as Unicode,
which encompasses most of the world's writing systems [WORKSHOP].
UTF-16, the object of this specification, is one of the standard ways
of encoding Unicode character data; it has the characteristics of
encoding all currently defined characters (in plane 0, the BMP) in
exactly two octets and of being able to encode all other characters
likely to be defined (the next 16 planes) in exactly four octets.
The Unicode Standard further defines additional character properties
and other application details of great interest to implementors. Up
to the present time, changes in Unicode and amendments to ISO/IEC
10646 have tracked each other, so that the character repertoires and
code point assignments have remained in sync. The relevant
standardization committees have committed to maintain this very
useful synchronism, as well as not to assign characters outside of
the 17 planes accessible to UTF-16.
Hoffman & Yergeau Informational