RFC 2781 (rfc2781) - Page 3 of 14
UTF-16, an encoding of ISO 10646
Alternative Format: Original Text Document
RFC 2781 UTF-16, an encoding of ISO 10646 February 2000
2.1 Encoding UTF-16
Encoding of a single character from an ISO 10646 character value to
UTF-16 proceeds as follows. Let U be the character number, no greater
than 0x10FFFF.
1) If U 0xDFFF, the character value U is the value
of W1. Terminate.
2) Determine if W1 is between 0xD800 and 0xDBFF. If not, the sequence
is in error and no valid character can be obtained using W1.
Terminate.
3) If there is no W2 (that is, the sequence ends with W1), or if W2
is not between 0xDC00 and 0xDFFF, the sequence is in error.
Terminate.
4) Construct a 20-bit unsigned integer U', taking the 10 low-order
bits of W1 as its 10 high-order bits and the 10 low-order bits of
W2 as its 10 low-order bits.
Hoffman & Yergeau Informational