RFC 3016 (rfc3016) - Page 2 of 21
RTP Payload Format for MPEG-4 Audio/Visual Streams
Alternative Format: Original Text Document
RFC 3016 RTP Payload Format for MPEG-4 Audio/Visual November 2000
functionality and thus require no such functionality from MPEG-4
Systems. H.323 terminals are an example of such systems, where
MPEG-4 Audio/Visual streams are not managed by MPEG-4 Systems Object
Descriptors but by H.245. The streams are directly mapped onto RTP
packets without using MPEG-4 Systems Sync Layer. Other examples are
SIP and RTSP where MIME and SDP are used. MIME types and SDP usages
of the RTP payload formats described in this document are defined to
directly specify the attribute of Audio/Visual streams (e.g., media
type, packetization format and codec configuration) without using
MPEG-4 Systems. The obvious benefit is that these MPEG-4
Audio/Visual RTP payload formats can be handled in an unified way
together with those formats defined for non-MPEG-4 codecs. The
disadvantage is that interoperability with environments using MPEG-4
Systems may be difficult, other payload formats may be better suited
to those applications.
The semantics of RTP headers in such cases need to be clearly
defined, including the association with MPEG-4 Audio/Visual data
elements. In addition, it is beneficial to define the fragmentation
rules of RTP packets for MPEG-4 Video streams so as to enhance error
resiliency by utilizing the error resilience tools provided inside
the MPEG-4 Video stream.
1.1 MPEG-4 Visual RTP payload format
MPEG-4 Visual is a visual coding standard with many new features:
high coding efficiency; high error resiliency; multiple, arbitrary
shape object-based coding; etc. [2]. It covers a wide range of
bitrates from scores of Kbps to several Mbps. It also covers a wide
variety of networks, ranging from those guaranteed to be almost
error-free to mobile networks with high error rates.
With respect to the fragmentation rules for an MPEG-4 Visual
bitstream defined in this document, since MPEG-4 Visual is used for a
wide variety of networks, it is desirable not to apply too much
restriction on fragmentation, and a fragmentation rule such as "a
single video packet shall always be mapped on a single RTP packet"
may be inappropriate. On the other hand, careless, media unaware
fragmentation may cause degradation in error resiliency and bandwidth
efficiency. The fragmentation rules described in this document are
flexible but manage to define the minimum rules for preventing
meaningless fragmentation while utilizing the error resilience
functionalities of MPEG-4 Visual.
The fragmentation rule recommends not to map more than one VOP in an
RTP packet so that the RTP timestamp uniquely indicates the VOP time
framing. On the other hand, MPEG-4 video may generate VOPs of very
small size, in cases with an empty VOP (vop_coded=0) containing only
Kikuchi, et al. Standards Track