RFC 2141 (rfc2141) - Page 3 of 8
URN Syntax
Alternative Format: Original Text Document
RFC 2141 URN Syntax May 1997
2.2 Namespace Specific String Syntax
As required by RFC 1737, there is a single canonical representation
of the NSS portion of an URN. The format of this single canonical
form follows:
::= 1* ::= | "%" ::= | | | | ::= | "A" | "B" | "C" | "D" | "E" | "F" |
"a" | "b" | "c" | "d" | "e" | "f"
::= "(" | ")" | "+" | "," | "-" | "." |
":" | "=" | "@" | ";" | "$" |
"_" | "!" | "*" | "'"
Depending on the rules governing a namespace, valid identifiers in a
namespace might contain characters that are not members of the URN
character set above (). Such strings MUST be translated
into canonical NSS format before using them as protocol elements or
otherwise passing them on to other applications. Translation is done
by encoding each character outside the URN character set as a
sequence of one to six octets using UTF-8 encoding [5], and the
encoding of each of those octets as "%" followed by two characters
from the character set above. The two characters give the
hexadecimal representation of that octet.
2.3 Reserved characters
The remaining character set left to be discussed above is the
reserved character set, which contains various characters reserved
from normal use. The reserved character set follows, with a
discussion on the specifics of why each character is reserved.
The reserved character set is:
::= '%" | "/" | "?" | "#"
2.3.1 The "%" character
The "%" character is reserved in the URN syntax for introducing the
escape sequence for an octet. Literal use of the "%" character in a
namespace must be encoded using "%25" in URNs for that namespace.
The presence of an "%" character in an URN MUST be followed by two
characters from the character set.
Moats Standards Track