Internet Engineering Task Force J. van der Meer Internet Draft Philips Electronics D. Mackie Apple Computer V. Swaminathan Sun Microsystems Inc. D. Singer Apple Computer P. Gentric Philips Electronics December 2002 Expires June 2003 Document: draft-ietf-avt-mpeg4-simple-05.txt Transport of MPEG-4 Elementary Streams Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This specification is a product of the Audio/Video Transport working group within the Internet Engineering Task Force. Comments are solicited and should be addressed to the working group's mailing list at avt@ietf.org and/or the authors. << Note for the RFC editor: xxxx should be replaced with the RFC number that will be assigned. >> Abstract The MPEG Committee (ISO/IEC JTC1/SC29 WG11) is a working group in ISO that produced the MPEG-4 standard. MPEG defines tools to compress content such as audio-visual information into elementary streams. This specification defines a simple, but generic RTP payload format for transport of any non-multiplexed MPEG-4 elementary stream. van der Meer et al. Expires June 2003 [Page 1] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Carriage of MPEG-4 elementary streams over RTP . . . . . . . 6 2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 6 2.2. MPEG Access Units . . . . . . . . . . . . . . . . . . . . 6 2.3. Concatenation of Access Units . . . . . . . . . . . . . . 6 2.4. Fragmentation of Access Units . . . . . . . . . . . . . . 7 2.5. Interleaving . . . . . . . . . . . . . . . . . . . . . . . 7 2.6. Time stamp information . . . . . . . . . . . . . . . . . . 8 2.7. State indication of MPEG-4 system streams . . . . . . . . 8 2.8. Random Access Indication . . . . . . . . . . . . . . . . . 8 2.9. Carriage of auxiliary information . . . . . . . . . . . . 9 2.10. MIME format parameters and configuring conditional field . 9 2.11. Global structure of payload format . . . . . . . . . . . . 9 2.12. Modes to transport MPEG-4 streams . . . . . . . . . . . . 10 2.13. Alignment with RFC 3016 . . . . . . . . . . . . . . . . . 10 3. Payload format . . . . . . . . . . . . . . . . . . . . . . . 11 3.1. Usage of RTP header fields and RTCP . . . . . . . . . . . 11 3.2. RTP payload structure . . . . . . . . . . . . . . . . . . 12 3.2.1. The AU Header Section . . . . . . . . . . . . . . . . . 12 3.2.1.1. The AU-header . . . . . . . . . . . . . . . . . . . . 12 3.2.2. The Auxiliary Section . . . . . . . . . . . . . . . . . 14 3.2.3. The Access Unit Data Section . . . . . . . . . . . . . . 15 3.2.3.1. Fragmentation . . . . . . . . . . . . . . . . . . . . 16 3.2.3.2. Interleaving . . . . . . . . . . . . . . . . . . . . . 16 3.2.3.3. Constraints for interleaving . . . . . . . . . . . . . 17 3.3. Usage of this specification . . . . . . . . . . . . . . . 21 3.3.1. General . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3.2. The generic mode . . . . . . . . . . . . . . . . . . . . 21 3.3.3. Constant bit rate CELP . . . . . . . . . . . . . . . . . 22 3.3.4. Variable bit rate CELP . . . . . . . . . . . . . . . . . 22 3.3.5. Low bit rate AAC . . . . . . . . . . . . . . . . . . . . 23 3.3.6. High bit rate AAC . . . . . . . . . . . . . . . . . . . 24 3.3.7. Additional modes . . . . . . . . . . . . . . . . . . . . 25 4. IANA considerations . . . . . . . . . . . . . . . . . . . . 26 4.1. MIME type registration . . . . . . . . . . . . . . . . . . 26 4.2. Registration of mode definitions with IANA . . . . . . . . 31 4.3. Concatenation of parameters . . . . . . . . . . . . . . . 31 4.4. Usage of SDP . . . . . . . . . . . . . . . . . . . . . . . 32 4.4.1. The a=fmtp keyword . . . . . . . . . . . . . . . . . . . 32 5. Security considerations . . . . . . . . . . . . . . . . . . 32 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 33 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 33 8. Author addresses . . . . . . . . . . . . . . . . . . . . . . 34 APPENDIX: Usage of this payload format . . . . . . . . . . . 36 A. Examples of delay analysis with interleave . . . . . . . 36 A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 36 A.2 De-interleaving and error concealment . . . . . . . . . 36 van der Meer et al. Expires June 2003 [Page 2] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 A.3 Simple Group interleave . . . . . . . . . . . . . . . . 36 A.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . 36 A.3.2 Determining the de-interleave buffer size . . . . . . 37 A.3.3 Determining the maximum displacement . . . . . . . . . 37 A.4 More subtle group interleave . . . . . . . . . . . . . . 37 A.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . 37 A.4.2 Determining the de-interleave buffer size . . . . . . 38 A.4.3 Determining the maximum displacement . . . . . . . . . 38 A.5 Continuous interleave . . . . . . . . . . . . . . . . . 38 A.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . 38 A.5.2 Determining the de-interleave buffer size . . . . . . 39 A.5.3 Determining the maximum displacement . . . . . . . . . 39 van der Meer et al. Expires June 2003 [Page 3] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1. Introduction The MPEG Committee is Working Group 11 (WG11) in ISO/IEC JTC1 SC29 that specified the MPEG-1, MPEG-2 and, more recently, the MPEG-4 standards [1]. The MPEG-4 standard specifies compression of audio-visual data into for example an audio or video elementary stream. In the MPEG-4 standard, these streams take the form of audio-visual objects that may be arranged into an audio-visual scene by means of a scene description. Each MPEG-4 elementary stream consists of a sequence of Access Units; examples of an Access Unit (AU) are an audio frame and a video picture. This specification defines a general and configurable payload structure to transport MPEG-4 elementary streams, in particular MPEG-4 audio (including speech) streams, MPEG-4 video streams and also MPEG-4 systems streams, such as BIFS (BInary Format for Scenes), OCI (Object Content Information), OD (Object Descriptor) and IPMP (Intellectual Property Management and Protection) streams. The RTP payload defined in this document is simple to implement and reasonably efficient. It allows for optional interleaving of Access Units (such as audio frames) to increase error resiliency in packet loss. Some types of MPEG-4 elementary streams include "crucial" information whose loss cannot be tolerated, but RTP does not provide reliable transmission so receipt of that crucial information is not assured. Section 3.2.3.4 specifies how stream state is conveyed so that the receiver can detect the loss of crucial information and cease decoding until the next random access point is received. Applications transmitting streams that include crucial information, such as OD commands, BIFS commands, or programmatic content such as MPEG-J (Java) and ECMAScript, should include random access points sufficiently often, depending upon the probability of loss, to reduce stream corruption to an acceptable level. An example is the carousel mechanism as defined by MPEG in ISO/IEC 14496-1. Such applications may also employ additional protocols or services to reduce the probability of loss. At the RTP layer, these measures include payload formats and profiles for retransmission or forward error correction (such as in RFC 2733), which must be employed with due consideration to congestion control. Another solution that may be appropriate for some applications is to carry RTP over TCP (such as in RFC 2326, section 10.12). At the network layer, resource allocation or preferential service may be available to reduce the probability of loss. For a general description of methods to repair streaming media see RFC 2354. van der Meer et al. Expires June 2003 [Page 4] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 Though the RTP payload format defined in this document is capable of transporting any MPEG-4 stream, other, more specific, formats may exist, such as RFC 3016 for transport of MPEG-4 video (part 2). Configuration of the payload is provided to accommodate transport of any MPEG-4 stream at any possible bit rate. However, for a specific MPEG-4 elementary stream typically only very few configurations are needed. So as to allow for the design of simplified, but dedicated receivers, this specification requires that specific modes are defined for transport of MPEG-4 streams. This document defines modes for MPEG-4 CELP and AAC streams, as well as a generic mode that can be used to transport any MPEG-4 stream. In the future new RFCs are expected to specify additional modes for transport of MPEG-4 streams. The RTP payload format defined in this document specifies carriage of system-related information that is often equivalent to the information that may be contained in the MPEG-4 Sync Layer (SL) as defined in MPEG-4 Systems [1]. This document does not prescribe how to transcode or map information from the SL to fields defined in the RTP payload format. Such processing, if any, is left to the discretion of the application. However, to anticipate the need for transport of any additional system-related information in future, an auxiliary field can be configured that may carry any such data. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [3]. van der Meer et al. Expires June 2003 [Page 5] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 2. Carriage of MPEG-4 elementary streams over RTP 2.1 Introduction With this payload format a single MPEG-4 elementary stream can be transported. Information on the type of MPEG-4 stream carried in the payload is conveyed by MIME format parameters, for example in an SDP [6] message or by other means (see section 4). These MIME format parameters specify the configuration of the payload. To allow for simplified and dedicated receivers, a MIME format parameter is available to signal a specific mode of using this payload. A mode definition MAY include the type of MPEG-4 elementary stream as well as the applied configuration, so as to avoid the need in receivers to parse all MIME format parameters. The applied mode MUST be signaled. 2.2 MPEG Access Units For carriage of compressed audio-visual data MPEG defines Access Units. An MPEG Access Unit (AU) is the smallest data entity to which timing information is attributed. In case of audio an Access Unit may represent an audio frame and in case of video a picture. MPEG Access Units are by definition octet-aligned. If for example an audio frame is not octet-aligned, up to 7 zero-padding bits MUST be inserted at the end of the frame to achieve the octet-aligned Access Units, as required by the MPEG-4 specification. MPEG-4 decoders MUST be able to decode AUs in which such padding is applied. Consistent with the MPEG-4 specification, this document requires that each MPEG-4 part 2 video Access Unit includes all the coded data of a picture, any video stream headers that may precede the coded picture data, and any video stream stuffing that may follow it, up to, but not including the startcode indicating the start of a new video stream or the next Access Unit. 2.3 Concatenation of Access Units Frequently it is possible to carry multiple Access Units in one RTP packet. This is particularly useful for audio; for example, when AAC is used for encoding of a stereo signal at 64 kbits/sec, AAC frames contain on average approximately 200 octets. On a LAN with a 1500 octet MTU this would allow on average 7 complete AAC frames to be carried per AAC packet. Access Units may have a fixed size in octets, but a variable size is also possible. To facilitate parsing in case of multiple concatenated AUs in one RTP packet, the size of each AU is made known to the receiver. When concatenating in case of a constant AU size, this size is communicated "out of band" through a MIME format parameter. When concatenating in case of variable size AUs, the RTP payload carries "in band" an AU size field for each contained AU. van der Meer et al. Expires June 2003 [Page 6] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 In combination with the RTP payload length the size information allows the RTP payload to be split by the receiver back into the individual AUs. To simplify the implementation of RTP receivers, it is required that when multiple AUs are carried in an RTP packet, each AU MUST be complete, i.e. the number of AUs in an RTP packet MUST be integral. In addition, an AU MUST NOT be repeated in other RTP packets; hence repetition of an AU is only possible by using a duplicate RTP packet. 2.4 Fragmentation of Access Units MPEG allows for very large Access Units. Since most IP networks have significantly smaller MTU sizes, this payload format allows for the fragmentation of an Access Unit over multiple RTP packets so as to avoid IP layer fragmentation. To simplify the implementation of RTP receivers, an RTP packet SHALL either carry one or more complete Access Units or a single fragment of one Access Unit (i.e. packets MUST NOT contain fragments of multiple Access Units). 2.5 Interleaving When an RTP packet carries a contiguous sequence of Access Units, the loss of such a packet can result in a "decoding gap" for the user. One method to alleviate this problem is to allow for the Access Units to be interleaved in the RTP packets. For a modest cost in latency and implementation complexity, significant error resiliency to packet loss can be achieved. To support optional interleaving of Access Units, this payload format allows for index information to be sent for each Access Unit. After informing receivers about buffer resources to allocate for de-interleaving, the RTP sender is free to choose the interleaving pattern without propagating this information a priori to the receiver(s). Indeed the sender could dynamically adjust the interleaving pattern based on the Access Unit size, error rates, etc. The RTP receiver does not need to know the interleaving pattern used, it only needs to extract the index information of the Access Unit and insert the Access Unit into the appropriate sequence in the decoding or rendering queue. An example of interleaving is given below. Assume that an RTP packet contains 3 AUs, and that the AUs are numbered 0, 1, 2, 3, 4, etc. If an interleaving group length of 9 is chosen, then RTP packet(i) contains the following AU(n): RTP packet(0): AU(0), AU(3), AU(6) RTP packet(1): AU(1), AU(4), AU(7) RTP packet(2): AU(2), AU(5), AU(8) RTP packet(3): AU(9), AU(12), AU(15) RTP packet(4): AU(10), AU(13), AU(16) Etc. van der Meer et al. Expires June 2003 [Page 7] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 2.6 Time stamp information The RTP time stamp MUST carry the sampling instant of the first AU (fragment) in the RTP packet. When multiple AUs are carried within an RTP packet, the time stamps of subsequent AUs can be calculated if the frame period of each AU is known. For audio and video this is possible if the frame rate is constant. However, in some cases it is not possible to make such calculation, for example for variable frame rate video and for MPEG-4 BIFS streams carrying composition information. To support such cases, this payload format can be configured to carry a time stamp in the RTP payload for each contained Access Unit. A time stamp MAY be conveyed in the RTP payload only for non-first AUs in the RTP packet, and SHALL NOT be conveyed for the first AU (fragment), as the time stamp for the first AU in the RTP packet is carried by the RTP time stamp. MPEG-4 defines two type of time stamps, the composition time stamp (CTS) and the decoding time stamp (DTS). The CTS represents the sampling instant of an AU, and hence the CTS is equivalent to the RTP time stamp. The DTS may be used in MPEG-4 video streams that use bi-directional coding, i.e. when pictures are predicted in both forward and backward direction by using either a reference picture in the past, or a reference picture in the future. The DTS cannot be carried in the RTP header. In some cases the DTS can be derived from the RTP time stamp using frame rate information; this requires deep parsing in the video stream, which may be considered objectionable. But if the video frame rate is variable, the required information may not even be present in the video stream. For both reasons, the capability has been defined to optionally carry the DTS in the RTP payload for each contained Access Unit. To keep the coding of time stamps efficient, each time stamp contained in the RTP payload is coded differentially, the CTS from the RTP time stamp, and the DTS from the CTS. 2.7 State indication of MPEG-4 system streams ISO/IEC 14496-1 defines states for MPEG-4 system streams. So as to convey state information when transporting MPEG-4 system streams, this payload format allows for the optional carriage in the RTP payload of the stream state for each contained Access Unit. Stream states are used to signal "crucial" AUs that carry information whose loss cannot be tolerated and are also useful when repeating AUs according to the carousel mechanism defined in ISO/IEC 14496-1. 2.8 Random access indication Random access to the content of MPEG-4 elementary streams may be possible at some but not all Access Units. To signal Access Units where random access is possible, a random access point flag can van der Meer et al. Expires June 2003 [Page 8] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 optionally be carried in the RTP payload for each contained Access Unit. Carriage of random access points is particularly useful for MPEG-4 system streams in combination with the stream state. 2.9 Carriage of auxiliary information. This payload format defines a specific field to carry auxiliary data. The auxiliary data field is preceded by a field that specifies the length of the auxiliary data, so as to facilitate skipping of the data without parsing it. The coding of the auxiliary data is not defined in this document; instead the format, meaning and signaling of auxiliary information is expected to be specified in one or more future RFCs. Auxiliary information MUST NOT be transmitted until its format, meaning and signaling have been specified and its use has been signaled. Receivers that have knowledge of the auxiliary data MAY decode the auxiliary data, but receivers without knowledge of such data MUST skip the auxiliary data field. 2.10 MIME format parameters and configuring conditional fields To support the features described in the previous sections several fields are defined for carriage in the RTP payload. However, their use strongly depends on the type of MPEG-4 elementary stream that is carried. Sometimes a specific field is needed with a certain length, while in other cases such field is not needed at all. To be efficient in either case, the fields to support these features are configurable by means of MIME format parameters. In general, a MIME format parameter defines the presence and length of the associated field. A length of zero indicates absence of the field. As a consequence, parsing of the payload requires knowledge of MIME format parameters. The MIME format parameters are conveyed to the receiver via SDP [6] messages, as specified in section 4.4.1, or through other means. 2.11 Global structure of payload format The RTP payload following the RTP header, contains three octet-aligned data sections, of which the first two MAY be empty. See figure 1. +---------+-----------+-----------+---------------+ | RTP | AU Header | Auxiliary | Access Unit | | Header | Section | Section | Data Section | +---------+-----------+-----------+---------------+ <----------RTP Packet Payload-----------> Figure 1: Data sections within an RTP packet The first data section is the AU (Access Unit) Header Section, that contains one or more AU-headers; however, each AU-header MAY be empty, in which case the entire AU Header Section is empty. The van der Meer et al. Expires June 2003 [Page 9] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 second section is the Auxiliary Section, containing auxiliary data; this section MAY also be configured empty. The third section is the Access Unit Data Section, containing either a single fragment of one Access Unit or one or more complete Access Units. The Access Unit Data Section MUST NOT be empty. 2.12 Modes to transport MPEG-4 streams While it is possible to build fully configurable receivers capable of receiving any MPEG-4 stream, this specification also allows for the design of simplified, but dedicated receivers, that are capable for example of receiving only one type of MPEG-4 stream. This is achieved by requiring that specific modes be defined for using this specification. Each mode may define constraints for transport of one or more type of MPEG-4 streams, for instance on the payload configuration. The applied mode MUST be signaled. Signaling the mode is particularly important for receivers that are only capable of decoding one or more specific modes. Such receivers need to determine whether the applied mode is supported, so as to avoid problems with processing of payloads that are beyond the capabilities of the receiver. In this document several modes are defined for transport of MPEG-4 CELP and AAC streams, as well as a generic mode that can be used for any MPEG-4 stream. In the future, new RFCs may specify other modes of using this specification. However, each mode MUST be in full compliance with this specification (see section 3.3.7). 2.13 Alignment with RFC 3016 This payload can be configured to be nearly identical to the payload format defined in RFC 3016 [5] for the MPEG-4 video configurations recommended in RFC 3016. Hence, receivers that comply with RFC 3016 can decode such RTP payload, providing that additional packets containing video decoder configuration (VO, VOL, VOSH) are inserted in the stream, as required by RFC 3016. Conversely, receivers that comply with the specification in this document should be able to decode payloads, names and parameters defined for MPEG-4 video in RFC 3016. In this respect it is strongly RECOMMENDED to implement the ability to ignore "in band" video decoder configuration packets in the RFC 3016 payload. Note the "out of band" availability of the video decoder configuration is optional in RFC 3016. To achieve maximum interoperability with the RTP payload format defined in this document, applications that use RFC 3016 to transport MPEG-4 video (part 2) are recommended to make the video decoder configuration available as a MIME parameter. van der Meer et al. Expires June 2003 [Page 10] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 3. Payload Format 3.1 Usage of RTP Header Fields and RTCP Payload Type (PT): The assignment of an RTP payload type for this packet format is outside the scope of this document; it is specified by the RTP profile under which this payload format is used. Marker (M) bit: The M bit is set to 1 to indicate that the RTP packet payload contains either the final fragment of a fragmented Access Unit or one or more complete Access Units. Extension (X) bit: Defined by the RTP profile used. Sequence Number: The RTP sequence number SHOULD be generated by the sender in the usual manner with a constant random offset. Timestamp: Indicates the sampling instant of the first AU contained in the RTP payload. This sampling instant is equivalent to the CTS in the MPEG-4 time domain. When using SDP the clock rate of the RTP time stamp MUST be expressed using the "rtpmap" attribute. If an MPEG-4 audio stream is transported, the rate SHOULD be set to the same value as the sampling rate of the audio stream. If an MPEG-4 video stream is transported, it is RECOMMENDED to set the rate to 90 kHz. In all cases, the sender SHALL make sure that RTP time stamps are identical only if the RTP time stamp refers to fragments of the same Access Unit. According to RFC 1889 [2] (section 5.1), RTP time stamps are RECOMMENDED to start at a random value for security reasons. This is not an issue for synchronization of multiple RTP streams. When, however, streams from multiple sources are to be synchronized (for example one stream from local storage, another from an RTP streaming server), synchronization may become impossible if the receiver only knows the original time stamp relationships. Synchronization in such cases, may require to provide the correct relationship between time stamps for obtaining synchronization by out of band means. The format of such information as well as methods to convey such information are beyond the scope of this specification. SSRC: set as described in RFC 1889 [2]. CC and CSRC fields are used as described in RFC 1889 [2]. RTCP SHOULD be used as defined in RFC 1889 [2]. Note that time stamps in RTCP Sender Reports may be used to synchronize multiple van der Meer et al. Expires June 2003 [Page 11] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 MPEG-4 elementary streams and also to synchronize MPEG-4 streams with non-MPEG-4 streams, in case the delivery of these streams uses RTP. 3.2 RTP Payload Structure 3.2.1 The AU Header Section When present, the AU Header Section consists of the AU-headers-length field, followed by a number of AU-headers. See figure 2. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+ |AU-headers-length|AU-header|AU-header| |AU-header|padding| | | (1) | (2) | | (n) | bits | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+ Figure 2: The AU Header Section The AU-headers are configured using MIME format parameters and MAY be empty. If the AU-header is configured empty, the AU-headers-length field SHALL NOT be present and consequently the AU Header Section is empty. If the AU-header is not configured empty, then the AU-headers-length is a two octet field that specifies the length in bits of the immediately following AU-headers, excluding the padding bits. Each AU-header is associated with a single Access Unit (fragment) contained in the Access Unit Data Section in the same RTP packet. For each contained Access Unit (fragment) there is exactly one AU-header. Within the AU Header Section, the AU-headers are bit-wise concatenated in the order in which the Access Units are contained in the Access Unit Data Section. Hence, the n-th AU-header refers to the n-th AU (fragment). If the concatenated AU-headers consume a non-integer number of octets, up to 7 zero-padding bits MUST be inserted at the end in order to achieve octet-alignment of the AU Header Section. 3.2.1.1 The AU-header Each AU-header may contain the fields given in figure 3. The length in bits of the above fields with the exception of the CTS-flag, the DTS-flag and the RAP-flag fields is defined by MIME format parameters; see section 4.1. If a MIME format parameter has the default value of zero, then the associated field is not present. If present, the fields MUST occur in the mutual order given in figure 3. In the general case a receiver can only discover the size of an AU-header by parsing it since the presence of the CTS-delta and DTS-delta fields is signaled by the value of the CTS-flag and DTS-flag, respectively. van der Meer et al. Expires June 2003 [Page 12] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 +---------------------------------------+ | AU-size | +---------------------------------------+ | AU-Index / AU-Index-delta | +---------------------------------------+ | CTS-flag | +---------------------------------------+ | CTS-delta | +---------------------------------------+ | DTS-flag | +---------------------------------------+ | DTS-delta | +---------------------------------------+ | RAP-flag | +---------------------------------------+ | Stream-state | +---------------------------------------+ Figure 3: The fields in the AU-header. If used, the AU-Index field only occurs in the first AU-header within an AU Header Section; in any other AU-header the AU-Index-delta field occurs instead. AU-size: Indicates the size in octets of the associated Access Unit in the Access Unit Data Section in the same RTP packet. When the AU-size is associated with an AU fragment, the AU size indicates the size of the entire AU and not the size of the fragment. In this case, the size of the fragment is known from the size of the AU data section. This can be exploited to determine whether a packet contains an entire AU or a fragment, which is particularly useful after losing a packet carrying the last fragment of an AU. AU-Index: Indicates the serial number of the associated Access Unit (fragment). For each (in decoding order) consecutive AU or AU fragment, the serial number is incremented with 1. When present, the AU-Index field occurs in the first AU-header in the AU Header Section, but MUST NOT occur in any subsequent (non-first) AU-header in that Section. To encode the serial number in any such non-first AU-header, the AU-Index-delta field is used. If each AU-Index field is coded with the value 0, the serial number of the AU (fragment) is not specified, and in that case receivers may ignore the AU-Index field. AU-Index-delta: The AU-Index-delta field is an unsigned integer that specifies the serial number of the associated AU as the difference with respect to the serial number of the previous Access Unit. Hence, for the n-th (n>1) AU the serial number is found from: AU-Index(n) = AU-Index(n-1) + AU-Index-delta(n) + 1 If the AU-Index field is present in the first AU-header in van der Meer et al. Expires June 2003 [Page 13] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 the AU Header Section, then the AU-Index-delta field MUST be present in any subsequent (non-first) AU-header. When the AU-Index-delta is coded with the value 0, it indicates that the Access Units are consecutive in decoding order. An AU-Index-delta value larger than 0 signals that interleaving is applied. CTS-flag: Indicates whether the CTS-delta field is present. A value of 1 indicates that the field is present, a value of 0 that it is not present. The CTS-flag field MUST be present in each AU-header if the length of the CTS-delta field is signaled to be larger than zero. In that case, the CTS-flag field MUST have the value 0 in the first AU-header and MAY have the value 1 in all non-first AU-headers. The CTS-flag field SHOULD be 0 for any non-first fragment of an Access Unit. CTS-delta: Encodes the CTS by specifying the value of CTS as a 2's complement offset (delta) from the time stamp in the RTP header of this RTP packet. The CTS MUST use the same clock rate as the time stamp in the RTP header. DTS-flag: Indicates whether the DTS-delta field is present. A value of 1 indicates that DTS-delta is present, a value of 0 that it is not present. The DTS-flag field MUST be present in each AU-header if the length of the DTS-delta field is signaled to be larger than zero. The DTS-flag field MUST have the same value for all fragments of an Access Unit. DTS-delta: Specifies the value of the DTS as a 2's complement offset (delta) from the CTS. The DTS MUST use the same clock rate as the time stamp in the RTP header. The DTS-delta field MUST have the same value for all fragments of an Access Unit. RAP-flag: Indicates when set to 1 that the associated Access Unit provides a random access point to the content of the stream. If an Access Unit is fragmented, the RAP flag, if present, MUST be set to 0 for each non-first fragment of the AU. Stream-state: Specifies the state of the stream for an AU of an MPEG-4 system stream; each state is identified by a value of a modulo counter. In ISO/IEC 14496-1, MPEG-4 system streams use the AU_SequenceNumber to signal stream states. When the stream state changes, the value of stream-state MUST be incremented by one. Note: no relation is required between stream-states of different streams. van der Meer et al. Expires June 2003 [Page 14] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 3.2.2 The Auxiliary Section The Auxiliary Section consists of the auxiliary-data-size field followed by the auxiliary-data field. Receivers MAY (but are not required to) parse the auxiliary-data field; to facilitate skipping of the auxiliary-data field by receivers, the auxiliary-data-size field indicates the length in bits of the auxiliary-data. If the concatenation of the auxiliary-data-size and the auxiliary-data fields consume a non-integer number of octets, up to 7 zero padding bits MUST be inserted immediately after the auxiliary data in order to achieve octet-alignment. See figure 4. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+ | auxiliary-data-size | auxiliary-data |padding bits | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+ Figure 4: The fields in the Auxiliary Section The length in bits of the auxiliary-data-size field is configurable by a MIME format parameter; see section 4.1. The default length of zero indicates that the entire Auxiliary Section is absent. auxiliary-data-size: specifies the length in bits of the immediately following auxiliary-data field; auxiliary-data: the auxiliary-data field contains data of a format not defined by this specification. 3.2.3 The Access Unit Data Section The Access Unit Data Section contains an integer number of complete Access Units or a single fragment of one AU. The Access Unit Data Section is never empty. If data of more than one Access Unit is present, then the AUs are concatenated into a contiguous string of octets. See figure 5. The AUs inside the Access Unit Data Section MUST be in decoding order, though not necessarily contiguous in the case of interleaving. The size and number of Access Units SHOULD be adjusted such that the resulting RTP packet is not larger than the path MTU. To handle larger packets, this payload format relies on lower layers for fragmentation, which may result in reduced performance. van der Meer et al. Expires June 2003 [Page 15] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |AU(1) | + | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |AU(2) | +-+-+-+-+-+-+-+-+ | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | AU(n) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |AU(n) continued| |-+-+-+-+-+-+-+-+ Figure 5: Access Unit Data Section; each AU is octet-aligned. When multiple Access Units are carried, the size of each AU MUST be made available to the receiver. If the AU size is variable then the size of each AU MUST be indicated in the AU-size field of the corresponding AU-header. However, if the AU size is constant for a stream, this mechanism SHOULD NOT be used, but instead the fixed size SHOULD be signaled by the MIME format parameter "ConstantSize", see section 4.1. The absence of both AU-size in the AU-header and the ConstantSize MIME format parameter indicates carriage of a single AU (fragment), i.e. that a single Access Unit (fragment) is transported in each RTP packet for that stream. 3.2.3.1 Fragmentation A packet SHALL carry either one or more complete Access Units, or a single fragment of an Access Unit. Fragments of the same Access Unit have the same time stamp but different RTP sequence numbers. The marker bit in the RTP header is 1 on the last fragment of an Access Unit, and 0 on all other fragments. 3.2.3.2 Interleaving Access Units MAY be interleaved. Senders MAY perform interleaving. Receivers MUST support interleaving, except if the receiver only supports modes in which no interleaving is allowed. When interleaving of Access Units is used it SHALL be implemented using the AU-Index and AU-Index-delta fields in the AU-header. Based on the RTP sequence number, the RTP time stamp, the AU-Index and the AU-Index-delta, a receiver can unambiguously reconstruct the original order even in case of out-of-order packets, packet loss or duplication. Note that for this purpose the AU-Index is van der Meer et al. Expires June 2003 [Page 16] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 redundant when the RTP time stamp and the AU-Index-delta values are sufficient for placing the AUs correctly in time. In such cases receivers MAY ignore the AU-Index value and senders MAY code the AU-Index field with the value 0, but only if they code each AU-Index field with that value. If the AU-Index is not redundant, senders SHOULD use a length of the AU-Index field so that this field is not coded with the value 0 in two subsequent RTP packets. When interleaving is applied, a de-interleave buffer is needed in receivers to put the Access Units in their correct logical consecutive decoding order. This requires the computation of the time stamp for each Access Unit. In case of a fixed time duration per Access Unit, the time stamp of the i-th access unit in an RTP packet with RTP time stamp T is calculated as follows: Timestamp[0] = T Timestamp[i, i > 0] = T +(Sum(for k=1 to i of (AU-Index-delta[k] + 1))) * access-unit-duration When AU-Index-delta is always 0, this reduces to T + i * (access- unit-duration). This is the non-interleaved case, where the frames are consecutive in decoding order. Note that the AU-Index field (present for the first Access Unit) is not needed in this calculation. Hence in cases where the access-unit-duration has a fixed and known value, the AU-Index does not need to provide index information and can be coded with the value 0. See also the semantics of the AU-Index field in 3.2.1.1. If the Access Units are not fixed duration, the AU-Index is not redundant, and MUST provide the index information required for re-ordering. The number of bits of the AU-Index field MUST be chosen so that valid index information is provided at the applied interleaving scheme, without causing problems due to roll-over of the AU-Index field. Note that the CTS-delta may be required to compute the correct time stamp for each AU. 3.2.3.3 Constraints for interleaving The size of the packets should be suitably chosen to be appropriate to both the path MTU and the capacity of the receiver's de-interleave buffer. The maximum packet size for a session SHOULD be chosen not to exceed the path MTU. To allow receivers to allocate sufficient resources for de-interleaving, senders MUST provide the information to receivers as specified in this section. AUs enter the decoder in decoding order. The de-interleave buffer is used to re-order a stream of interleaved AUs back into decoding order. When interleaving is applied, the decoding of "early" AUs has to be postponed until all AUs that precede in decoding order van der Meer et al. Expires June 2003 [Page 17] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 have been received. Therefore these "early" AUs are stored in the de-interleave buffer. As an example in figure 6 the interleaving pattern from section 2.5 is considered. +--+--+--+--+--+--+--+--+--+--+--+- Interleaved AUs | 0| 3| 6| 1| 4| 7| 2| 5| 8| 9|12|.. +--+--+--+--+--+--+--+--+--+--+--+- Storage of "early" AUs 3 3 3 3 3 3 6 6 6 6 6 6 4 4 4 7 7 7 12 12 Figure 6: Storage of "early" AUs in the de-interleave buffer per interleaved AU. AU(3) is to be delivered to the decoder after AU(0), AU(1)and AU(2); of these AUs, AU(2) is most late and hence AU(3) needs to be stored until AU(2) is received. Similarly, AU(6) is to be stored until AU(5) is received, while AU(4) and AU(7) are to be stored until AU(2) and AU(5) are received, respectively. Note that the fullness of the de-interleave buffer varies in time. In figure 6, the de-interleave buffer contains at most 4, but often less AUs. So as to give a rough indication of the resources needed in the receiver for de-interleaving, the maximum displacement in time of an AU is defined. The maximum displacement in time of an AU is the maximum difference between the time stamp of any received AU and the time stamp of the earliest AU that is not yet received. In other words, when considering a sequence of interleaved AUs, then: Maximum displacement = max{TS(i) - TS(j)}, for any i and any j>i, where i and j indicate the index of the AU in the interleaving pattern and TS denotes the time stamp of the AU As an example in figure 7 the interleaving pattern from section 2.5 is considered. For each AU in the pattern the earliest not yet received AU is indicated. A "-" indicates that all previous AUs are received. If the AU period is constant, the maximum displacement equals 5 AU periods, as found for AU(6) and AU(7). +--+--+--+--+--+--+--+--+--+--+--+- Interleaved AUs | 0| 3| 6| 1| 4| 7| 2| 5| 8| 9|12|.. +--+--+--+--+--+--+--+--+--+--+--+- Earliest not yet received AU - 1 1 - 2 2 - - - - 10 Figure 7: The earliest not yet received AU for each AU in the interleaving pattern. van der Meer et al. Expires June 2003 [Page 18] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 When interleaving, senders MUST signal the maximum displacement in time during the session via the MIME format parameter "maxDisplacement"; see section 4.1. An estimate of the size of the de-interleave buffer is found by multiplying the maximum displacement by the maximum bit rate: size(de-interleave buffer) = {(maxDisplacement) * Rate(max)} / (RTP clock frequency), where Rate(max) is the maximum bit-rate of the transported stream. Note that receivers can derive Rate(max) from the MIME format parameters StreamType, Profile-level-id, and config. However, this calculation estimates the size of the de-interleave buffer and its size may be larger than calculated. If this calculation under-estimates the size of the de-interleave buffer, then senders, when interleaving, MUST signal a size of the de-interleave buffer that is large enough to contain all "early" AUs at any point in time during the session via the MIME format parameter "de-interleaveBufferSize"; see section 4.1. If the "de-interleaveBufferSize" parameter is present, then the applied buffer for de-interleaving in a receiver MUST have a size that is at least equal to the signaled size of the de-interleave buffer, else a size that is at least equal to the calculated size of the de-interleave buffer. No matter what interleaving scheme is used, the scheme must be analyzed to calculate the applicable maxDisplacement value, as well as the required size of the de-interleave buffer. Senders SHOULD signal values that are not larger than the strictly required values; if larger values are signalled, the receiver will buffer excessively. Note that for low bit-rate material, the applied interleaving may make packets shorter than the MTU size. 3.2.3.5. Crucial and non-crucial AUs with MPEG-4 System data Some Access Units with MPEG-4 system data, called "crucial" AUs, carry information whose loss cannot be tolerated, either in the presentation or in the decoder. At each crucial AU in an MPEG-4 system stream, the stream state changes. The stream-state MAY remain constant at non-crucial AUs. In ISO/IEC 14496-1, MPEG-4 system streams use the AU_SequenceNumber to signal stream states. Example: Given three AUs, AU1 = "Insertion of node X", AU2 = "Set position of node X", AU3 = "Set position of node X". AU1 is crucial, since if it is lost, AU2 cannot be executed. However, AU2 is not crucial, since AU3 can be executed even if AU2 is lost. van der Meer et al. Expires June 2003 [Page 19] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 When a crucial AU is (possibly) lost, the stream is corrupted. For example, when an AU is lost and the stream state has changed at the next received AU, then it is possible that the lost AU was crucial. Once corrupted, the stream remains corrupted until the next random access point. Note that loss of non-crucial AUs does not corrupt the stream. When a decoder starts receiving a stream, the decoder MUST consider the stream corrupted until an AU is received that provides a random access point. An AU that provides a random access point, as signaled by the RAP-flag, may be crucial or not. Non-crucial RAP AUs provide a "repeated" random access point for use by decoders that recently joined the stream or that need to re-start decoding after a stream corruption. Non-crucial RAP AUs MUST include all updates since the last crucial RAP AU. Upon receiving AUs, decoders are to react as follows: a) if the RAP-flag is set to 1 and the stream-state changes, then the AU is a crucial RAP AU, and the AU MUST be decoded. b) if the RAP-flag is set to 1 and the stream state does not change, then the AU is a non-crucial RAP AU, and the receiver SHOULD decode it if the stream is corrupted. Otherwise, the decoder MUST ignore the AU. c) if the RAP-flag is set to 0, then the AU MUST be decoded, unless the stream is corrupted, in which case the AU MUST be ignored. van der Meer et al. Expires June 2003 [Page 20] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 3.3 Usage of this specification 3.3.1 General Usage of this specification requires definition of a mode. A mode defines how to use this specification, as deemed appropriate. Senders MUST signal the applied mode via the MIME format parameter "Mode", as specified in section 4.1. This specification defines a generic mode that can be used for any MPEG-4 stream, as well as specific modes for transport of MPEG-4 CELP and MPEG-4 AAC streams, defined in ISO/IEC 14496-3. When use of this payload format is signaled using SDP [6], an "rtpmap" attribute is part of that signaling. The same requirements apply for the rtpmap attribute in any mode compliant to this specification. The general form of an rtpmap attribute is: a=rtpmap: /[/] For audio streams, specifies the number of audio channels: 2 for stereo material (see RFC 2327) and 1 for mono. Provided no additional parameters are needed, this parameter may be omitted for mono material, hence its default value is 1. 3.3.2 The generic mode The generic mode can be used for any MPEG-4 stream. In this mode no mode-specific constraints are applied; hence, in the generic mode the full flexibility of this specification can be exploited. The generic mode is signaled by mode=generic. An example is given below for transport of a BIFS stream. In this example carriage of multiple BIFS Access Units is allowed in one RTP packet. The AU-header contains the AU-size field, the CTS-flag and, if the CTS flag is set to 1, the CTS-delta field. The number of bits of the AU-size and the CTS-delta fields is 10 and 16, respectively. The AU-header also contains the RAP-flag and the Stream-state of 4 bits. This results in an AU-header with a total size of two or four octets per BIFS AU. The RTP time stamp uses a 1 kHz clock. Note that the media type name is video, because the BIFS stream is part of an audio-visual presentation. For conventions on media type names see section 4.1. In detail: m=video 49230 RTP/AVP 96 a=rtpmap:96 mpeg4-generic/1000 a=fmtp:96 streamtype=3; profile-level-id=257; mode=generic; ObjectType=2; config=BIFSConfiguration(); SizeLength=10; CTSDeltaLength=16; RandomAccessIndication=1; StreamStateIndication=4 van der Meer et al. Expires June 2003 [Page 21] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 Note: The a=fmtp line has been wrapped to fit the page, it comprises a single line in the SDP file. BIFSConfiguration() is the hexadecimal string as defined in ISO/IEC 14496-1; for the description of MIME parameters see section 4.1. 3.3.3 Constant bit-rate CELP This mode is signaled by mode=CELP-cbr. In this mode one or more complete CELP frames of fixed size can be transported in one RTP packet; there is no support for interleaving. The RTP payload consists of one or more concatenated CELP frames, each of the same size. CELP frames MUST not be fragmented when using this mode. Both the AU Header Section and the Auxiliary Section MUST be empty. The MIME format parameter ConstantSize MUST be provided to specify the length of each CELP frame. For example: m=audio 49230 RTP/AVP 96 a=rtpmap:96 mpeg4-generic/44100/2 a=fmtp:96 streamtype=5; profile-level-id=15; mode=CELP-cbr; config= AudioSpecificConfig(); ConstantSize=xxx; Note: The a=fmtp line has been wrapped to fit the page, it comprises a single line in the SDP file. AudioSpecificConfig() is the hexadecimal string as defined in ISO/IEC 14496-3. AudioSpecificConfig() specifies that the audio stream type is CELP. For the description of MIME parameters see section 4.1. 3.3.4 Variable bit-rate CELP This mode is signaled by mode=CELP-vbr. With this mode one or more complete CELP frames of variable size can be transported in one RTP packet with optional interleaving. As CELP frames are very small, while the largest possible AU-size in this mode is greater than the maximum CELP frame size, there is no support for fragmentation of CELP frames. Hence CELP frames MUST not be fragmented when using this mode. In this mode the RTP payload consists of the AU Header Section, followed by one or more concatenated CELP frames. The Auxiliary Section MUST be empty. For each CELP frame contained in the payload there MUST be a one octet AU-header in the AU Header Section to provide: (a) the size of each CELP frame in the payload and (b) index information for computing the sequence (and hence timing) of each CELP frame. Transport of CELP frames requires that the AU-size field is coded with 6 bits. In this mode therefore 6 bits are allocated to the van der Meer et al. Expires June 2003 [Page 22] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 AU-size field, and 2 bits to the AU-Index(-delta) field. Each AU-Index field MUST be coded with the value 0. In the AU Header Section, the concatenated AU-headers are preceded by the 16-bit AU-headers-length field, as specified in section 3.2.1. In addition to the required MIME format parameters, the following parameters MUST be present: SizeLength, IndexLength, and IndexDeltaLength. When interleaving is applied (AU-Index-delta coded with a value larger than 0), the parameter InterleaveDelay MUST also be present. For example: m=audio 49230 RTP/AVP 96 a=rtpmap:96 mpeg4-generic/44100/2 a=fmtp:96 streamtype=5; profile-level-id=15; mode=CELP-vbr; config= AudioSpecificConfig(); SizeLength=6; IndexLength=2; IndexDeltaLength=2 Note: The a=fmtp line has been wrapped to fit the page, it comprises a single line in the SDP file. AudioSpecificConfig() is the hexadecimal string as defined in ISO/IEC 14496-3, AudioSpecificConfig() specifies that the audio stream type is CELP. For the description of MIME parameters see section 4.1. 3.3.5 Low bit-rate AAC This mode is signaled by mode=AAC-lbr. This mode supports transport of one or more complete AAC frames of variable size. In this mode the AAC frames are allowed to be interleaved and hence receivers MUST support de-interleaving. The maximum size of an AAC frame in this mode is 63 octets. CELP frames MUST not be fragmented when using this mode. The payload configuration in this mode is the same as in the variable bit-rate CELP mode as defined in 3.3.4. The RTP payload consists of the AU Header Section, followed by concatenated AAC frames. The Auxiliary Section MUST be empty. For each AAC frame contained in the payload the one octet AU-header MUST provide: (a) the size of each AAC frame in the payload and (b) index information for computing the sequence (and hence timing) of each AAC frame. In the AU-header, the AU-size MUST be coded with 6 bits and the AU-Index(-delta) with 2 bits; the AU-Index field MUST have the value 0 in each AU-header. In the AU-header Section, the concatenated AU-headers MUST be preceded by the 16-bit AU-headers-length field, as specified in section 3.2.1. van der Meer et al. Expires June 2003 [Page 23] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 In addition to the required MIME format parameters, the following parameters MUST be present: SizeLength, IndexLength, and IndexDeltaLength. When interleaving is applied (AU-Index-delta coded with a value larger than 0), also the parameter InterleaveDelay MUST be present. For example: m=audio 49230 RTP/AVP 96 a=rtpmap:96 mpeg4-generic/44100/2 a=fmtp:96 streamtype=5; profile-level-id=15; mode=AAC-lbr; config= AudioSpecificConfig(); SizeLength=6; IndexLength=2; IndexDeltaLength=2 Note: The a=fmtp line has been wrapped to fit the page, it comprises a single line in the SDP file. AudioSpecificConfig() is the hexadecimal string as defined in ISO/IEC 14496-3. AudioSpecificConfig() specifies that the audio stream type is AAC. For the description of MIME parameters see section 4.1. 3.3.6 High bit-rate AAC This mode is signaled by mode=AAC-hbr. This mode supports transport of variable size AAC frames. In one RTP packet either one or more complete AAC frames are carried, or a single fragment of an AAC frame. In this mode the AAC frames are allowed to be interleaved and hence receivers MUST support de-interleaving. The maximum size of an AAC frame in this mode is 8191 octets. In this mode the RTP payload consists of the AU Header Section, followed by either one AAC frame, several concatenated AAC frames or one fragmented AAC frame. The Auxiliary Section MUST be empty. For each AAC frame contained in the payload there MUST be an AU-header in the AU Header Section to provide: (a) the size of each AAC frame in the payload and (b) index information for computing the sequence (and hence timing) of each AAC frame. To code the maximum size of an AAC frame requires 13 bits. Therefore in this configuration 13 bits are allocated to the AU-size, and 3 bits to the AU-Index(-delta) field. Thus each AU-header has a size of 2 octets. Each AU-Index field MUST be coded with the value 0. In the AU Header Section, the concatenated AU-headers MUST be preceded by the 16-bit AU-headers-length field, as specified in section 3.2.1. In addition to the required MIME format parameters, the following parameters MUST be present: SizeLength, IndexLength, and IndexDeltaLength. When interleaving is applied (AU-Index-delta coded with a value larger than 0), also the parameter InterleaveDelay MUST be present. van der Meer et al. Expires June 2003 [Page 24] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 For example: m=audio 49230 RTP/AVP 96 a=rtpmap:96 mpeg4-generic/44100/2 a=fmtp:96 streamtype=5; profile-level-id=15; mode=AAC-hbr; config=AudioSpecificConfig(); SizeLength=13; IndexLength=3; IndexDeltaLength=3 Note: The a=fmtp line has been wrapped to fit the page, it comprises a single line in the SDP file. AudioSpecificConfig() is the hexadecimal string as defined in ISO/IEC 14496-3. AudioSpecificConfig() specifies that the audio stream type is AAC. For the description of MIME parameters see section 4.1. 3.3.7 Additional modes This specification only defines the modes specified in sections 3.3.2 up to 3.3.6. Additional modes are expected to be defined in future RFCs. Each additional mode MUST be in full compliance with this specification. Any new mode MUST be defined such that an implementation including all the features of this specification can decode the payload format corresponding to this new mode. For this reason a mode MUST NOT specify new default values for MIME parameters. In particular, MIME parameters that configure the RTP payload MUST be present (unless they have the default value), even if its presence is redundant in case the mode assigns a fixed value to a parameter. A mode may define additionally that some MIME parameters are required instead of optional, that some MIME parameters have fixed values (or ranges), and that there are rules restricting the usage. van der Meer et al. Expires June 2003 [Page 25] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 4. IANA considerations This section describes the MIME types and names associated with this payload format. Section 4.1 registers the MIME types, as per RFC 2048. This format may require additional information about the mapping to be made available to the receiver. This is done using parameters also described in the next section. 4.1 MIME type registration MIME media type name: "video" or "audio" or "application" "video" MUST be used for MPEG-4 Visual streams (ISO/IEC 14496-2) or MPEG-4 Systems streams (ISO/IEC 14496-1) that convey information needed for an audio/visual presentation. "audio" MUST be used for MPEG-4 Audio streams (ISO/IEC 14496-3) or MPEG-4 Systems streams that convey information needed for an audio only presentation. "application" MUST be used for MPEG-4 Systems streams (ISO/IEC 14496-1) that serve purposes other than audio/visual presentation, e.g. in some cases when MPEG-J (Java) streams are transmitted. Depending on the required payload configuration, MIME format parameters need to be available to the receiver. This is done using the parameters described in the next section. There are required and optional parameters. Optional parameters are of two types: general parameters and configuration parameters. The configuration parameters are used to configure the fields in the AU Header section and in the auxiliary section. The absence of any configuration parameter is equivalent to the associated field set to its default value, which is always zero. The absence of all configuration parameters resolves into a default "basic" configuration with an empty AU-header section and an empty auxiliary section in each RTP packet. MIME subtype name: mpeg4-generic Required parameters: MIME format parameters are not case dependent; however for clarity both upper and lower case are used in the names of the parameters described in this specification. StreamType: The integer value that indicates the type of MPEG-4 stream that is carried; its coding corresponds to the values of the streamType as defined in Table 9 (streamType Values) in ISO/IEC 14496-1. van der Meer et al. Expires June 2003 [Page 26] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 Profile-level-id: A decimal representation of the MPEG-4 Profile Level indication. This parameter MUST be used in the capability exchange or session set-up procedure to indicate the MPEG-4 Profile and Level combination of which the relevant MPEG-4 media codec is capable of. For MPEG-4 Audio streams, this parameter is the decimal value from Table 5 (audioProfileLevelIndication Values) in ISO/IEC 14496-1, indicating which MPEG-4 Audio tool subsets are required to decode the audio stream. For MPEG-4 Visual streams, this parameter is the decimal value from Table G-1 (FLC table for profile and level indication) of ISO/IEC 14496-2, indicating which MPEG-4 Visual tool subsets are required to decode the visual stream. For BIFS streams, this parameter is the decimal value that is obtained from (SPLI + 256*GPLI), where: SPLI is the decimal value from Table 4 in ISO/IEC 14496-1 with the applied sceneProfileLevelIndication; GPLI is the decimal value from Table 7 in ISO/IEC 14496-1 with the applied graphicsProfileLevelIndication. For MPEG-J streams, this parameter is the decimal value from table 13 (MPEGJProfileLevelIndication) in ISO/IEC 14496-1, indicating the profile and level of the MPEG-J stream. For OD streams, this parameter is the decimal value from table 3 (ODProfileLevelIndication) in ISO/IEC 14496-1, indicating the profile and level of the OD stream. For IPMP streams, this parameter has either the decimal value 0, indicating an unspecified profile and level, or a value larger than zero, indicating an MPEG-4 IPMP profile and level as defined in a future MPEG-4 specification. For Clock Reference streams and Object Content Info streams, this parameter has the decimal value zero, indicating that profile and level information is conveyed through the OD framework. Config: A hexadecimal representation of an octet string that expresses the media payload configuration. Configuration data is mapped onto the hexadecimal octet string in an MSB-first basis. The first bit of the configuration data SHALL be located at the MSB of the first octet. In the last octet, if necessary to achieve octet-alignment, up to 7 zero-valued padding bits shall follow the configuration data. For MPEG-4 Audio streams, config is the audio object type specific decoder configuration data AudioSpecificConfig() as defined in ISO/IEC 14496-3. For Structured Audio, the AudioSpecificConfig() may be conveyed by other means, not defined by this specification. If the AudioSpecificConfig() is conveyed by other means for Structured Audio, then the config MUST be a quoted empty hexadecimal octet string, as follows: config="". Note that a future mode of using this RTP payload format for Structured Audio may define such other means. van der Meer et al. Expires June 2003 [Page 27] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 For MPEG-4 Visual streams, config is the MPEG-4 Visual configuration information as defined in subclause 6.2.1 Start codes of ISO/IEC 14496-2. The configuration information indicated by this parameter SHALL be the same as the configuration information in the corresponding MPEG-4 Visual stream, except for first-half-vbv-occupancy and latter-half-vbv-occupancy, if it exists, which may vary in the repeated configuration information inside an MPEG-4 Visual stream (See 6.2.1 Start codes of ISO/IEC 14496-2). For BIFS streams, this is the BIFSConfig() information as defined in ISO/IEC 14496-1. For version 1, BIFSConfig is defined in section 9.3.5.2, and for version 2 in section 9.3.5.3. The MIME format parameter ObjectType signals the version of BIFSConfig. For IPMP streams, this is either a quoted empty hexadecimal octet string, indicating the absence of any decoder configuration information (config=""), or the IPMPConfiguration() as defined in a future MPEG-4 IPMP specification. For Object Content Info (OCI) streams, this is the OCIDecoderConfiguration() information of the OCI stream, as defined in section 8.4.2.4 in ISO/IEC 14496-1. For OD streams, Clock Reference streams and MPEG-J streams, this is a quoted empty hexadecimal octet string (config=""), as no information on the decoder configuration is required. Mode: The mode in which this specification is used. The following modes can be signaled: mode=generic, mode=CELP-cbr, mode=CELP-vbr, mode=AAC-lbr and mode=AAC-hbr. Other modes are expected to be defined in future RFCs. See also section 3.3.7 and 4.2 of RFC xxxx. Optional general parameters: ObjectType: The decimal value from Table 8 in ISO/IEC 14496-1, indicating the value of the objectTypeIndication of the transported stream. For BIFS streams this parameter MUST be present to signal the version of BIFSConfiguration(). Note that ObjectTypeIndication may signal a non-MPEG-4 stream and that the RTP payload format defined in this document may not be suitable to carry a stream that is not defined by MPEG-4. ObjectType SHOULD NOT be set to a value that signals a stream that cannot be carried by this payload format. ConstantSize: The constant size in octets of each Access Unit for this stream. The ConstantSize and the SizeLength parameters MUST NOT be simultaneously present. van der Meer et al. Expires June 2003 [Page 28] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 maxDisplacement: The decimal representation of the maximum displacement in time of an interleaved AU, as defined in section 3.2.3.3, expressed in units of the RTP time stamp clock. This parameter MUST be present when interleaving is applied. de-interleaveBufferSize: The decimal representation in number of octets of the size of the de-interleave buffer, described in section 3.2.3.3. When interleaving, this parameter MUST be present if the calculation of the de-interleave buffer size given in 3.2.3.3 and based on maxDisplacement and rate(max) under-estimates the size of the de-interleave buffer. If this calculation does not under-estimate the size of the de-interleave buffer, then the de-interleaveBufferSize parameter SHOULD NOT be present. Optional configuration parameters: SizeLength: The number of bits on which the AU-size field is encoded in the AU-header. The SizeLength and the ConstantSize parameters MUST NOT be simultaneously present. IndexLength: The number of bits on which the AU-Index is encoded in the first AU-header. The default value of zero indicates the absence of the AU-Index and AU-Index-delta fields in each AU-header. IndexDeltaLength: The number of bits on which the AU-Index-delta field is encoded in any non-first AU-header. CTSDeltaLength: The number of bits on which the CTS-delta field is encoded in the AU-header. DTSDeltaLength: The number of bits on which the DTS-delta field is encoded in the AU-header. RandomAccessIndication: A decimal value of zero or one, indicating whether the RAP-flag is present in the AU-header. The decimal value of one indicates presence of the RAP-flag, the default value zero its absence. StreamStateIndication: The number of bits on which the Stream-state field is encoded in the AU-header. This parameter MAY be present when transporting MPEG-4 system streams, and SHALL NOT be present for MPEG-4 audio and MPEG-4 video streams. van der Meer et al. Expires June 2003 [Page 29] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 AuxiliaryDataSizeLength: The number of bits that is used to encode the auxiliary-data-size field. Applications MAY use more parameters, in addition to those defined above. Each additional parameter MUST be registered with IANA, to ensure that there is no clash of names. Each additional parameter MUST be accompanied by a specification in the form of an RFC, MPEG standard, or other permanent and readily available reference (the "Specification Required" policy defined in RFC 2434). Receivers MUST tolerate the presence of such additional parameters, but these parameters SHALL NOT impact the decoding of receivers that comply to this specification. Encoding considerations: This MIME subtype is defined for RTP transport only. System bitstreams MUST be generated according to MPEG-4 Systems specifications (ISO/IEC 14496-1). Video bitstreams MUST be generated according to MPEG-4 Visual specifications (ISO/IEC 14496-2). Audio bitstreams MUST be generated according to MPEG-4 Audio specifications (ISO/IEC 14496-3). The RTP packets MUST be packetized according to the RTP payload format defined in RFC xxxx. Security considerations: As defined in section 5 of RFC xxxx. Interoperability considerations: MPEG-4 provides a large and rich set of tools for the coding of visual objects. For effective implementation of the standard, subsets of the MPEG-4 tool sets have been provided for use in specific applications. These subsets, called 'Profiles', limit the size of the tool set a decoder is required to implement. In order to restrict computational complexity, one or more 'Levels' are set for each Profile. A Profile@Level combination allows: . a codec builder to implement only the subset of the standard he needs, while maintaining interworking with other MPEG-4 devices that implement the same combination, and . checking whether MPEG-4 devices comply with the standard ('conformance testing'). A stream SHALL be compliant with the MPEG-4 Profile@Level specified by the parameter "profile-level-id". Interoperability between a sender and a receiver is achieved by specifying the parameter "profile-level-id" in MIME content. In the capability exchange / announcement procedure this parameter may mutually be set to the same value. van der Meer et al. Expires June 2003 [Page 30] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 Published specification: The specifications for MPEG-4 streams are presented in ISO/IEC 14496-1, 14496-2, and 14496-3. The RTP payload format is described in RFC xxxx. Applications which use this media type: Multimedia streaming and conferencing tools. Additional information: none Magic number(s): none File extension(s): None. A file format with the extension .mp4 has been defined for MPEG-4 content but is not directly correlated with this MIME type for which the sole purpose is RTP transport. Macintosh File Type Code(s): none Person & email address to contact for further information: Authors of RFC xxxx, IETF Audio/Video Transport working group. Intended usage: COMMON Author/Change controller: Authors of RFC xxxx, IETF Audio/Video Transport working group. 4.2 Registration of mode definitions with IANA This specification can be used in a number of modes. The mode of operation is signaled using the "Mode" MIME parameter, with the initial set of values specified in section 4.1. New modes may be defined at any time, as described in section 3.3.7. These modes MUST be registered with IANA, to ensure that there is no clash of names. A new mode registration MUST be accompanied by a specification in the form of an RFC, MPEG standard, or other permanent and readily available reference (the "Specification Required" policy defined in RFC 2434). 4.3 Concatenation of parameters Multiple parameters SHOULD be expressed as a MIME media type string, in the form of a semicolon-separated list of parameter=value pairs (for parameter usage examples see sections 3.3.2 up to 3.3.6). van der Meer et al. Expires June 2003 [Page 31] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 4.4 Usage of SDP 4.4.1 The a=fmtp keyword It is assumed that one typical way to transport the above-described parameters associated with this payload format is via a SDP message [6] for example transported to the client in reply to a RTSP DESCRIBE [8] or via SAP [7]. In that case the (a=fmtp) keyword MUST be used as described in RFC 2327 [6], section 6, the syntax being then: a=fmtp: =[; =] 5. Security Considerations RTP packets using the payload format defined in this specification are subject to the security considerations discussed in the RTP specification [2]. This implies that confidentiality of the media streams is achieved by encryption. Because the data compression used with this payload format is applied end-to-end, encryption may be performed on the compressed data so there is no conflict between the two operations. The packet processing complexity of this payload type (i.e. excluding media data processing) does not exhibit any significant non-uniformity in the receiver side to cause a denial- of-service threat. However, it is possible to inject non-compliant MPEG streams (Audio, Video, and Systems) to overload the receiver/decoder's buffers, which might compromise the functionality of the receiver or even crash it. This is especially true for end-to-end systems like MPEG where the buffer models are precisely defined. MPEG-4 Systems supports stream types including commands that are executed on the terminal like OD commands, BIFS commands, etc. and programmatic content like MPEG-J (Java(TM) Byte Code) and ECMAScript. It is possible to use one or more of the above in a manner non-compliant to MPEG to crash the receiver or make it temporarily unavailable. Senders that transport MPEG-4 content SHOULD ensure that such content is MPEG compliant, as defined in the compliance part of IEC/ISO 14496 [1]. Receivers that support MPEG-4 content should prevent malfunctioning of the receiver in case of non MPEG compliant content. Authentication mechanisms can be used to validate the sender and the data to prevent security problems due to non-compliant malignant MPEG-4 streams. van der Meer et al. Expires June 2003 [Page 32] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 In ISO/IEC 14496-1 a security model is defined for MPEG-4 Systems streams carrying MPEG-J access units which comprise Java(TM) classes and objects. MPEG-J defines a set of Java APIs and a secure execution model. MPEG-J content can call this set of APIs and Java(TM) methods from a set of Java packages supported in the receiver within the defined security model. According to this security model, downloaded byte code is forbidden to load libraries, define native methods, start programs, read or write files, or read system properties. Receivers can implement intelligent filters to validate the buffer requirements or parametric (OD, BIFS, etc.) or programmatic (MPEG-J, ECMAScript) commands in the streams. However, this can increase the complexity significantly. 6. Acknowledgements This document evolved through several revisions thanks to contributions by people from the ISMA forum, from the IETF AVT Working Group and from the 4-on-IP ad-hoc group within MPEG. The authors wish to thank all involved people, and in particular Andrea Basso, Stephen Casner, M. Reha Civanlar, Carsten Herpel, John Lazaro, Zvi Lifshitz, Young-kwon Lim, Alex MacAulay, Bill May, Colin Perkins, Dorairaj V and Stephan Wenger for their valuable comments and support. 7. References [1] ISO/IEC International Standard 14496 (MPEG-4); "Information technology - Coding of audio-visual objects", January 2000 [2] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson RTP, "A Transport Protocol for Real Time Applications", RFC 1889, Internet Engineering Task Force, January 1996. [3] S. Bradner, "Key words for use in RFCs to Indicate Requirement Levels", RFC 2119, March 1997. [4] D. Hoffman, G. Fernando, V. Goyal, M. Civanlar, "RTP payload format for MPEG1/MPEG2 Video", RFC 2250, January 1998. [5] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, "RTP payload format for MPEG-4 Audio/Visual streams", RFC 3016. [6] M. Handley, V. Jacobson, "SDP: Session Description Protocol", RFC 2327, Internet Engineering Task Force, April 1998. [7] M. Handley, C. Perkins, E. Whelan, "SAP: Session Announcement Protocol", RFC 2974, Internet Engineering Task Force, October 2000. [8] H. Schulzrinne, A. Rao, R. Lanphier, "RTSP: Real-Time Session Protocol", RFC 2326, Internet Engineering Task Force, April 1998. van der Meer et al. Expires June 2003 [Page 33] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 8. Author Addresses Jan van der Meer Philips Digital Networks Cederlaan 4 5600 JB Eindhoven Netherlands Email : jan.vandermeer@philips.com David Mackie Apple Computer, Inc. One Infinite Loop, MS:302-2LF Cupertino CA 95014 Email: dmackie@apple.com Viswanathan Swaminathan Sun Microsystems Inc. 901 San Antonio Road, M/S UMPK15-214 Palo Alto, CA 94303 Email: viswanathan.swaminathan@sun.com David Singer Apple Computer, Inc. One Infinite Loop, MS:302-3MT Cupertino CA 95014 Email: singer@apple.com Philippe Gentric Philips Digital Networks, MP4Net 51 rue Carnot 92156 Suresnes France e-mail: philippe.gentric@philips.com van der Meer et al. Expires June 2003 [Page 34] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 Full Copyright Statement Copyright (C) The Internet Society (December 2002). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process MUST be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. van der Meer et al. Expires June 2003 [Page 35] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 APPENDIX: Usage of this payload format Appendix A. Interleave analysis A.1 Introduction In this appendix interleaving issues are discussed. Some general notes are provided on de-interleaving and error concealment, while a number of interleaving patterns are examined, in particular for determining the maximum displacement in time and the size of the de-interleave buffer. In these examples, the maximum displacement is cited in terms of an access unit count, for ease of reading. In actual streams, it is signalled in units of the RTP time stamp clock. A.2 De-interleaving and error concealment This appendix does not describe any details on de-interleaving and error concealment, as the control of the AU decoding and error concealment process has little to do with interleaving. If the next AU to be decoded is present and there is sufficient storage available for the decoded AU, then decode it now. If not, wait. When the decoding deadline is reached (i.e., the time when decoding must begin in order to be completed by the time the AU is to be presented), or if the decoder is some hardware that presents a constant delay between initiation of decoding of an AU and presentation of that AU, then decoding must begin at that deadline time. If the next AU to be decoded is not present when the decoding deadline is reached, then that AU is lost so the receiver must take whatever error concealment measures is deemed appropriate. The playout delay may need to be adjusted at that point (especially if other AUs have also missed their deadline recently). Or, if it was a momentary delay, and maintaining the latency is important, then the receiver should minimize the glitch and continue processing with the next AU. A.3 Simple Group interleave A.3.1 Introduction An example of regular interleave is when packets are formed into groups. If the 'stride' of the interleave (the distance between interleaved AUs) is N, packet 0 could contain AU(0), AU(N), AU(2N), and so on; packet 1 could contain AU(1), AU(1+N), AU(1+2N), and so on. If there are M access units in a packet, then there are M*N access units in the group. van der Meer et al. Expires June 2003 [Page 36] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 An example with N=M=3 follows; note that this is the same example as given in section 2.5: Packet Time stamp Carried AUs AU-Index, AU-Index-delta P(0) T[0] 0, 3, 6 0, 2, 2 P(1) T[1] 1, 4, 7 0, 2, 2 P(2) T[2] 2, 5, 8 0, 2, 2 P(3) T[9] 9,12,15 0, 2, 2 In the above example the AU-Index is coded with the value 0, as required for the modes defined in this document. The position of the first AU of each packet within the group is defined by the RTP time stamp, while the AU-Index-delta field indicates the position of subsequent AUs relative to the first AU in the packet. All AU-Index-delta fields are coded with the value N-1, equal to 2 in this example. Hence the RTP time stamp and the AU-Index-delta are used to reconstruct the original order. See also section 3.2.3.2. A.3.2 Determining the de-interleave buffer size For the regular pattern as in this example, figure 6 in section 3.2.3.3 shows that the de-interleave buffer size is equal to 4 AU sizes. A.3.3 Determining the maximum displacement For the regular pattern as in this example, figure 7 in section 3.3 shows that the value of the maxDisplacement equals 5 AU periods. A.4 More subtle group interleave A.4.1 Introduction Another example of forming packets with group interleave is given below. In this example the packets are formed such that the loss of two subsequent RTP packets does not cause the loss of two subsequent AUs. Note that in this example the RTP time stamps of packet 3 and packet 4 are earlier than the RTP time stamps of packets 1 and 2, respectively. Packet Time stamp Carried AUs AU-Index, AU-Index-delta 0 T[0] 0, 5 0, 5 1 T[2] 2, 7 0, 5 2 T[4] 4, 9 0, 5 3 T[1] 1, 6 0, 5 4 T[3] 3, 8 0, 5 5 T[10] 10, 15 0, 5 and so on .. van der Meer et al. Expires June 2003 [Page 37] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 In this example the AU-Index is coded with the value 0, as required for the modes defined in this document. To reconstruct the original order, the RTP time stamp and the AU-Index-delta (coded with the value 5) are used. See also section 3.2.3.2. A.4.2 Determining the de-interleave buffer size From figure 8 it can be to determined that at most 5 "early" AUs are to be stored. If the AUs are of constant size, then this value equals 5 times the AU size. +--+--+--+--+--+--+--+--+--+--+ Interleaved AUs | 0| 5| 2| 7| 4| 9| 1| 6| 3| 8| +--+--+--+--+--+--+--+--+--+--+ - - 5 - 5 - 2 7 4 9 7 4 9 5 Received "early" AUs 5 6 7 7 9 9 Figure 8: Storage of "early" AUs in the de-interleave buffer per interleaved AU. A.4.2 Determining the maximum displacement From figure 9 it can be seen that max-interleaveDisplacement has a value of 8 AU periods. +--+--+--+--+--+--+--+--+--+--+ Interleaved AUs | 0| 5| 2| 7| 4| 9| 1| 6| 3| 8| +--+--+--+--+--+--+--+--+--+--+ Earliest not yet received AU - 1 1 1 1 1 - 3 - - Figure 9: The earliest not yet received AU for each AU in the interleaving pattern. A.5 Continuous interleave A.5.1 Introduction In continuous interleave, once the scheme is 'primed', the number of AUs in a packet exceeds the 'stride' (the distance between them). This shortens the buffering needed, smooths the data-flow, and gives slightly larger packets -- and thus lower overhead -- for the same interleave. For example, here is a continuous interleave also over a stride of 3 AUs, but with 4 AUs per packet, for a run of 20 AUs. This shows both how the scheme 'starts up' and how it finishes. van der Meer et al. Expires June 2003 [Page 38] RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 Packet Time-stamp Carried AUs AU-Index, AU-Index-delta 0 T[0] 0 0 1 T[1] 1 4 0 2 2 T[2] 2 5 8 0 2 2 3 T[3] 3 6 9 12 0 2 2 2 4 T[7] 7 10 13 16 0 2 2 2 5 T[11] 11 14 17 20 0 2 2 2 6 T[15] 15 18 0 2 7 T[19] 19 0 Also in this example the AU-Index is coded with the value 0, as required for the modes defined in this document. To reconstruct the original order, the RTP time stamp and the AU-Index-delta (coded with the value 2) are used. See also 3.2.3.2. Note that this example has RTP time-stamps in increasing order. A.5.2 Determining the de-interleave buffer size For this example the de-interleave buffer size can be derived from figure 10. The maximum number of "early" AUs is three. If the AUs are of constant size, then this value equals 3 times the AU size. Compared to the example in A.2, for constant size AUs the de-interleave buffer size is reduced from 4 to 3 times the AU size, while maintaining the same 'stride'. +--+--+--+--+--+--+--+--+--+--+--+--+--+--+- Interleaved AUs | 0| 1| 4| 2| 5| 8| 3| 6| 9|12| 7|10|13|16| +--+--+--+--+--+--+--+--+--+--+--+--+--+--+- - - - 4 - - 4 8 - - 8 12 - - 5 9 Received "early" AUs 8 12 Figure 10: Storage of "early" AUs in the de-interleave buffer per interleaved AU. A.5.3 Determining the maximum displacement For this example the maxDisplacement has a value of 5 AU periods. See figure 11. +--+--+--+--+--+--+--+--+--+--+--+--+--+--+- Interleaved AUs | 0| 1| 4| 2| 5| 8| 3| 6| 9|12| 7|10|13|16| +--+--+--+--+--+--+--+--+--+--+--+--+--+--+- Earliest not yet received AU - - 2 - 3 3 - - 7 7 - - 11 11 Figure 11: The earliest not yet received AU for each AU in the interleaving pattern. van der Meer et al. Expires June 2003 [Page 39]