Internet Engineering Task Force Audio-Video Transport WG INTERNET-DRAFT H. Schulzrinne AT&T Bell Laboratories December 15, 1992 Expires: 5/1/93 A Transport Protocol for Real-Time Applications Status of this Memo This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts). Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress." Please check the I-D abstract listing contained in each Internet Draft directory to learn the current status of this or any other Internet Draft. Distribution of this document is unlimited. Abstract This draft describes a protocol (RTP) suitable for the transport of real-time data, such as audio, video or simulation data. The data transport is enhanced by a control protocol designed to provide minimal control and identification functionality. A reverse control protocol provides mechanisms for monitoring quality of service and other content-specific requests. This protocol is intended for experimental use. 1 Introduction This draft concisely specifies a real-time transport protocol. A discussion of the design decisions can be found in the current version of the companion Internet draft draft-ietf-avt-issues.txt. The transport protocol provides INTERNET-DRAFT RTP December 15, 1992 end-to-end delivery services for one or more flows of data with real-time characteristics, for example, interactive audio and video. It does not guarantee delivery or prevent out-of-order delivery, nor does it assume that the underlying network is reliable and delivers packets in sequence. RTP is designed to run on top of a variety of network and transport protocols, for example, IP, ST-II or UDP. RTP transfers data in a single direction, possibly to multiple destinations if supported by the underlying network. A mechanism for indicating a return path for control data is provided. While RTP is primarily designed to satisfy the needs of multiparticipant multimedia conferences, it is not limited to that particular application. Storage of continuous data, interactive distributed simulation and control and measurement applications may also find RTP applicable. Profiles are used to instantiate certain header fields and options for particular sets of applications. This document defines two packet formats and protocols: o the real-time transport protocol (RTP) for exchanging data with real-time properties. o the real-time control protocol (RTCP) for conveying information about the sites in an on-going association. RTCP information may be ignored without affecting the ability to correctly receive information. Control fields (options) for RTP and RTCP share the same structure and numbering space and are carried within the same packet. Within a packet, RTP options precede RTCP options. Each option consists of the final bit, the option type designation, a one-octet length field denoting the total number of 32-bit long words comprising the option (including final bit, type and length), and finally any option-specific data. The last option before the packet data portion has the 'F' (final) bit set to one, for all other options this field has a value of zero. Field within the fixed header and within options are aligned to the natural length of the field, i.e., 16-bit words are aligned on even addresses, 32-bit long words are aligned at addresses divisible by four, etc. Octets designated as padding have the value zero. Options unknown to the RTP implementation or the application are to be ignored. Options with option types having values from 64 to 127 inclusive are passed unaltered to the appropriate application. Fields designated as MBZ ('must be zero') must have a value of binary zero and are to be ignored by the receiver. All integer fields are carried in network byte order, that is, most significant byte (octet) first. The transmission order is described in detail in [1], Appendix A. Unless otherwise noted, constants are in decimal (base 10). H. Schulzrinne Expires 5/1/93 [Page 2] INTERNET-DRAFT RTP December 15, 1992 2 Real-time Data Transfer Protocol -- RTP 2.1 Framing If and only if RTP protocol data units (RPDU) are carried over underlying protocols that provide the abstraction of a continuous bit stream rather than messages, each RPDU (and any synchronization source identifier, as defined below) is prefixed by a 32-bit framing field containing the length of the RPDU measured in octets, including the synchronization source identifier, but not including the framing field itself. If a RPDU traverses a mixture of octet-stream and message-oriented networks, each gateway between these networks is responsible for adding and removing the framing field. 2.2 Synchronization Source Encapsulation A content source is defined to be the actual source of the data carried, for example, the application and workstation where the audio was digitized. The synchronization source is the combination of one or more content sources with its own timing. A network source is the network-level origin of the RPDUs as seen by the end system. Unless otherwise specified, the content source and synchronization source are both assumed to be identical to the network source. If the synchronization source differs from the network source, the RPDU is prefixed with a 32-bit IP address designating the network source. This encapsulation may be used by gateways and transport-level firewalls. The end system has to determine by some means outside this specification whether it is being served by such a facility. [NOTE: The method of determining whether encapsulation is used or not is unsatisfactory, particularly for sites where only some conference participants are connected through reflectors. The method was chosen to allow reflectors to be independent of the protocol particulars.] A synchronization unit consists of one or more packets that, as a group, share a common fixed delay between generation and playout of each part of the group, or can only be scheduled as a whole. The delay may change at the beginning of such a synchronization unit. The most common synchronization units are talkspurts for voice and frames for video transmission. 2.3 RTP Header Fields The header fields have the following meaning: H. Schulzrinne Expires 5/1/93 [Page 3] INTERNET-DRAFT RTP December 15, 1992 protocol version: 2 bits Defines the protocol version. The version number of the protocol defined in this draft is one. flow: 6 bits The value of the field is the flow identifier, one of the items used by the receiver for demultiplexing. option present bit (O): 1 bit This flag has a value of one if the fixed RTP header is followed by one or more options. end-of-synchronization-unit (S): 1 bit This flag has a value of one in the last packet of a synchronization unit, a value of zero otherwise. content: 6 bits The content field forms an index into a table defined through a conference announcement protocol (to be specified), RTCP messages, a conference server or some other out-of-band means. If no mapping has been defined in this manner, a standard mapping to be specified by RFC 1340, Assigned Numbers, or its successor, is to be used. sequence number: 16 bits The sequence number counts RPDUs. timestamp: 32 bits The timestamp reflects the wallclock time when the RPDU was generated. The timestamp consists of the middle 32 bits of a 64-bit NTP timestamp, as defined in RFC 1305. Note that several consecutive packets may have equal timestamps. The maximum difference between the timestamp and the true time is encoded in the RTCP CDESC option. The timestamp of the first packet(s) within a synchronization unit is expected to closely reflect the actual sampling instant, measured by the local system clock. It is not expected that the timestamp of the beginning of every synchronization unit is based on a local synchronized system clock. However, the local clock should be used frequently enough so that clock drift between synchronized system clock and sampling clock can be compenssated for gradually. The local system clock should be controlled by a time synchronization protocol such as NTP. For packets inside a synchronization unit, it may be appropriate to compute timestamps based on the logical timing relationships. For audio samples, for example, the nominal sampling interval may be used. If the clock quality field of the CDESC option does not indicate otherwise, it is assumed that the timestamp at the beginning of a synchronization unit is derived from a synchronized system clock. The packet header is followed by options, if any, and the media data. Optional fields are summarized below. Unless otherwise noted, each option H. Schulzrinne Expires 5/1/93 [Page 4] INTERNET-DRAFT RTP December 15, 1992 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | packet length (optional) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | address of synchronization source (optional) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Ver| flow |0|S| content | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp (seconds) | timestamp (fraction) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | options ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 1: RTP header format may appear only once per packet. Each packet may contain any number of options. CSRC 0 Globally unique content source identifier. The option is replicated within a packet for each contributor to this packet. A source is identified by a globally unique six-octet string formed by concatenating a two-octet numeric source id unique within the host containing the content source and a four-octet Internet address of the content source. The length of the content source address and thus of the CSRC option may change in the future; a receiver should be prepared to accept identifiers of up to ten octets. If missing, the network source is considered the content source. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| CSRC | length = 2 | id unique within host | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IP address of content source | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ SSRC 1 Globally unique synchronization source identifier. The format of the option is the same as the CSRC option. This option prevails over the specification of the synchronization source through encapsulation. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| SSRC | length = 2 | id unique within host | H. Schulzrinne Expires 5/1/93 [Page 5] INTERNET-DRAFT RTP December 15, 1992 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IP address of synchronization source | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ BOP 2 (beginning of playout unit) 16-bit sequence number designating the first packet within the current playout unit. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| BOP | length = 1 | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3 Real Time Control Packets --- RTCP The real-time control protocol (RTCP) conveys minimal out-of-band advisory information during a conference. The services provided by RTCP services enhance RTP, but an end-node does not have to implement RTCP features to participate in conferences(1). RTCP does not aim to provide the services of a conference control protocol and does not provide services desirable for two-party conversations. Unless otherwise noted, control information is carried periodically as options within RPDUs. In the absence of media data, packets containing only RTCP data are sent periodically to the same multicast group as data packets, using the same time-to-live value. The period should be varied randomly to avoid synchronization of all sources and should be roughly inversely proportional to the number of participants in the conference. The length of the period determines, for example, how long a receiver joining a conference has to wait in the worst case until it can identify the source. An initial period varying randomly between 3 and 10 seconds is recommended. The item types are defined below: 3.1 Forward Control Options The following options are sent in the same direction as the data stream. CDESC 32 Content description. ------------------------------ 1. There is one exception to that rule: if an application sends CDESC options, the receiver has to decode these in order to properly interpret the RTP payload H. Schulzrinne Expires 5/1/93 [Page 6] INTERNET-DRAFT RTP December 15, 1992 content: 6 bits The 'content' field designates the index value from the 'content' fixed header field, with values ranging from 0 to 63. Return port number: 16 bits The return port number is defined as the port to be used as a destination port number for transmitting control information from the receiver of RTP data to its sender. A value of zero indicates that no control information should be returned. Clock quality: 8 bits Provides an indication as to the sender-perceived quality of the timestamps in the RTP header. The octet is interpreted as a quantity indicating the maximum dispersion to a root time server measured in fractions of a second and expressed as a power of two. If a source is known to be slaved to NTP, but does not know its dispersion, or the dispersion is greater than TBD, the value TBD is used. If the clock is based on the nominal sample rate of the source, a value of TBD is used. [These values need to be finalized.] The clock quality indication can be used to judge how the delay measurements reported by the QOS option can be interpreted (as absolute delay or only as delay variation). It is also useful for determining to what extent several sources with different clocks can be synchronized. Content: 32 bits The content field describes what type of data is contained within the RTP payload; it can be considered as a next-protocol field. That field and all bytes following it are not interpreted by RTP, but passed on to the next higher layer. Content-dependent data: variable Content-dependent data may or may not appear in a CDESC option. It is passed to the next layer and not interpreted by RTP. H. Schulzrinne Expires 5/1/93 [Page 7] INTERNET-DRAFT RTP December 15, 1992 A CDESC mapping changes the interpretation of a given 'content' value starting at the packet containing the CDESC option. The option only affects the synchronization source of the packet. A sender should refrain from changing the content type and flow index of a mapping defined by external means such as a conference registry, conference announcement protocol or otherwise agreed-upon mapping. Dynamic changes to these values may result in misinterpretation of RTP payload if the packet(s) containing the CDESC option are lost. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| CDESC | length |0|0| content | MBZ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | return port number | clock quality | MBZ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | content descriptor | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... content-dependent data ... ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ SDESC 33 This option provides a mapping between a numeric source identifier (consisting of a two-octet identifier unique within a host and a 4-octet IP address) and a human-readable text string describing the source. The variable-length string is padded with zeros so that the total length of the item, including the type and length bytes, is a multiple of four bytes. Examples include the name of a speaker or the station identification for a rebroadcast radio station. The content is not specified or authenticated. The content is encoding according to ISO standard 10646 (also known as NET-UTF). US-ASCII is a 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| SDESC | length | id unique within host | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IP address of content or synchronization source | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... text describing the source ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ H. Schulzrinne Expires 5/1/93 [Page 8] INTERNET-DRAFT RTP December 15, 1992 FDESC 34 Flow content description. The option describes the flow corresponding to the given flow index, drawn from the numbering space used by the flow index in the CDESC option. Character set and padding are the same as for the SDESC option. The text string describes the current content of the flow. Example applications include the session title for a conference distribution, or the current program title for radio or television redistribution through packet networks. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| FDESC | length | text string | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... describing the flow content ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ BYE 35 The site specified by the host-unique ID and the IP address is leaving the conference. Padded to 32 bit word length. If the length is one, the synchronization source of the packet is implied to be the network source. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| BYE | length = 1 | 0 | 0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| BYE | length = 2 | unique ID within host | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IP address of source | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 4 Reverse Control This section describes a means for the receiver of RTP protocol data to signal back to the sender or a third party (reverse control). Use of reverse control packets is optional. Reverse control packets have the format shown below. The packet is preceded by a packet length field if and only if the underlying transport layer does not support framing. The packet length field contains the number of octets within the packet, not including the packet length field itself. H. Schulzrinne Expires 5/1/93 [Page 9] INTERNET-DRAFT RTP December 15, 1992 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | optional packet length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | flow index | MBZ | MBZ | MBZ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | reverse-control options (variable length) ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The following options may be used within reverse control packets: QOS 64 Quality of service measurement. The source identifier (as in the CSRC option) is followed by the number of packets received (16 bits), the number of packets expected (16 bits), the minimum delay, the maximum delay and the average delay. The delay measures are encoded as 6/10 NTP timestamps, that is, six bits encode the number and seconds and 10 bits the fraction of a second. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| QOS | length = 5 | user id | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IP address of source | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | packets received | sequence number range | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | minimum delay | maximum delay | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | average delay | MBZ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ RAD 65 Reverse application data. The data contained in the option is directly passed to the application, without interpretation by RTP. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| RAD | length | reverse application data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ H. Schulzrinne Expires 5/1/93 [Page 10] INTERNET-DRAFT RTP December 15, 1992 Security Considerations Security issues need to be discussed before this draft is submitted as an RFC. RTP suffers from the same security deficiencies as the underlying protocols, for example, the ability of an impostor to fake source or destination IP address. The usage of network addresses for identification within the protocol has additional security implications. o false identification of content sources through the CSRC option o false synchronization source o BYE sent from site other than content source or synchronization source; this can also be used for denial-of-service attacks Impersonation and denial-of-service attacks can be made more difficult by providing digital signatures for all or parts of a message. IP multicast provides no direct means for a sender to know all the receivers of the data sent. RTP options make it easy for all participants in a conference to identify themselves; if deemed important for a particular application, it is the responsibility of the application writer to make listening without identification difficult. It should be noted, however, that within an internet, privacy of the payload can generally only be assured by encryption. Details of the support for authentication, encryption and integrity checks remain for further study. Acknowledgments This draft is based on discussion within the IETF audio-video transport working group chaired by Stephen Casner. The current protocol has its origins in the Network Voice Protocol and the Packet Video Protocol (Danny Cohen and Randy Cole) and the protocol implemented by the 'vat' application (Van Jacobson and Steve McCanne). 5 Addresses of Authors Stephen Casner USC/Information Sciences Institute 4676 Admiralty Way H. Schulzrinne Expires 5/1/93 [Page 11] INTERNET-DRAFT RTP December 15, 1992 Marina del Ray, CA 90292-6695 Phone: (213) 822-1511 x153 electronic mail: casner@isi.edu Henning Schulzrinne AT&T Bell Laboratories MH 2A244 600 Mountain Avenue Murray Hill, NJ 07974 telephone: 908 582-2262 electronic mail: hgs@research.att.com References [1] J. Postel, ``Internet protocol,'' Network Working Group Request for Comments RFC 791, Information Sciences Institute, Sept. 1981. [2] D. L. Mills, ``Network time protocol (version 3) -- specification, implementation and analysis,'' Network Working Group Request for Comments RFC 1305, University of Delaware, Mar. 1992. H. Schulzrinne Expires 5/1/93 [Page 12]