RFC 3190 (rfc3190) - Page 2 of 17
RTP Payload Format for 12-bit DAT Audio and 20- and 24-bit Linear Sampled Audio
Alternative Format: Original Text Document
RFC 3190 RTP Payload Format January 2002
sample ordering and channel interleaving specified in [2] plus
extensions specified here. This document also specifies out-of-band
negotiation methods for the extended channel interleaving rules and
for use when an analog preemphasis technique is applied to the audio
data.
1.1 Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [6]
2. The need for RTP encapsulation of 12-, 20- and 24-bit audio
Many high-quality digital audio and visual systems, such as DAT and
DV, adopt sample-based audio encodings. Different audio formats are
used in various situations. To transport the audio data using RTP,
an encapsulation needs to be defined for each specific format. Only
16-bit linear audio encapsulation (L16) has thus far been defined.
Other encoding formats have already appeared, such as the 12-bit
nonlinear, 20-bit linear and 24-bit linear encodings used in the DAT
and DV video world. This specification defines the RTP payload
encapsulation format in order to use the new encodings in the RTP
environment.
3. 12-bit nonlinear audio encapsulation
IEC 61119 [3] specifies the 12-bit nonlinear audio format in DAT and
DV, called LP (Long Play) audio. It would be easy to convert 12-bit
nonlinear audio into 16-bit linear form at the RTP sender and
transmit it using the L16 audio format already defined. However,
this would consume 33% more network bandwidth than necessary. This
payload format is specified as a more efficient alternative.
The 12-bit nonlinear encoding is the same as for 16-bit linear audio
except for the packing of each sampled data element. Each sample of
12-bit nonlinear audio is derived from a single sample of 16-bit
linear audio by a nonlinear compression. Table 1 shows the details
of the conversion from 16 to 12 bits. The result is a 12-bit signed
value ranging from -2048 to 2047 and it is represented in two's
complement notation. The 12-bit samples are packed contiguously into
payload octets starting with the most significant bit. When the
payload contains an odd number of samples, the four LSBs of the last
octet are unused. Parameters other than quantization, e.g., sampling
frequency and audio channel assignment, are the same as in the L16
payload format. In particular, samples are packed into the packet in
time sequence beginning with the oldest sample.
Kobayashi, et al. Standards Track