RFC 3066 (rfc3066) - Page 2 of 13
Tags for the Identification of Languages
Alternative Format: Original Text Document
RFC 3066 Tags for Identification of Languages January 2001
This document specifies an identifier mechanism, a registration
function for values to be used with that identifier mechanism, and a
construct for matching against those values.
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC 2119].
2. The Language tag
2.1 Language tag syntax
The language tag is composed of one or more parts: A primary language
subtag and a (possibly empty) series of subsequent subtags.
The syntax of this tag in ABNF [RFC 2234] is:
Language-Tag = Primary-subtag *( "-" Subtag )
Primary-subtag = 1*8ALPHA
Subtag = 1*8(ALPHA / DIGIT)
The productions ALPHA and DIGIT are imported from RFC 2234; they
denote respectively the characters A to Z in upper or lower case and
the digits from 0 to 9. The character "-" is HYPHEN-MINUS (ABNF:
%x2D).
All tags are to be treated as case insensitive; there exist
conventions for capitalization of some of them, but these should not
be taken to carry meaning. For instance, [ISO 3166] recommends that
country codes are capitalized (MN Mongolia), while [ISO 639]
recommends that language codes are written in lower case (mn
Mongolian).
2.2 Language tag sources
The namespace of language tags is administered by the Internet
Assigned Numbers Authority (IANA) [RFC 2860] according to the rules
in section 3 of this document.
The following rules apply to the primary subtag:
- All 2-letter subtags are interpreted according to assignments found
in ISO standard 639, "Code for the representation of names of
languages" [ISO 639], or assignments subsequently made by the ISO
639 part 1 maintenance agency or governing standardization bodies.
(Note: A revision is underway, and is expected to be released as
Alvestrand Best Current Practice