RFC 2429 (rfc2429) - Page 2 of 17
RTP Payload Format for the 1998 Version of ITU-T Rec
Alternative Format: Original Text Document
RFC 2429 H.263+ October 1998
The 1998 version of ITU-T Recommendation H.263 added numerous coding
options to improve codec performance over the 1996 version. The 1998
version is referred to as H.263+ in this document. Among the new
options, the ones with the biggest impact on the RTP payload
specification and the error resilience of the video content are the
slice structured mode, the independent segment decoding mode, the
reference picture selection mode, and the scalability mode. This
section summarizes the impact of these new coding options on
packetization. Refer to [4] for more information on coding options.
The slice structured mode was added to H.263+ for three purposes: to
provide enhanced error resilience capability, to make the bitstream
more amenable to use with an underlying packet transport such as RTP,
and to minimize video delay. The slice structured mode supports
fragmentation at macroblock boundaries.
With the independent segment decoding (ISD) option, a video picture
frame is broken into segments and encoded in such a way that each
segment is independently decodable. Utilizing ISD in a lossy network
environment helps to prevent the propagation of errors from one
segment of the picture to others.
The reference picture selection mode allows the use of an older
reference picture rather than the one immediately preceding the
current picture. Usually, the last transmitted frame is implicitly
used as the reference picture for inter-frame prediction. If the
reference picture selection mode is used, the data stream carries
information on what reference frame should be used, indicated by the
temporal reference as an ID for that reference frame. The reference
picture selection mode can be used with or without a back channel,
which provides information to the encoder about the internal status
of the decoder. However, no special provision is made herein for
carrying back channel information.
H.263+ also includes bitstream scalability as an optional coding
mode. Three kinds of scalability are defined: temporal, signal-to-
noise ratio (SNR), and spatial scalability. Temporal scalability is
achieved via the disposable nature of bi-directionally predicted
frames, or B-frames. (A low-delay form of temporal scalability known
as P-picture temporal scalability can also be achieved by using the
reference picture selection mode described in the previous
paragraph.) SNR scalability permits refinement of encoded video
frames, thereby improving the quality (or SNR). Spatial scalability
is similar to SNR scalability except the refinement layer is twice
the size of the base layer in the horizontal dimension, vertical
dimension, or both.
Bormann, et. al. Standards Track