Internet Engineering Task Force Audio-Video Transport WG INTERNET-DRAFT draft-ietf-avt-rtp-03.txt H. Schulzrinne/S. Casner AT&T/ISI September 15, 1993 Expires: 11/01/93 RTP: A Transport Protocol for Real-Time Applications Status of this Memo This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a ``working draft'' or ``work in progress.'' Please check the I-D abstract listing contained in each Internet Draft directory to learn the current status of this or any other Internet Draft. Distribution of this document is unlimited. Abstract This memorandum describes a protocol called RTP suitable for the end-to-end network transport of real-time data, such as audio, video or simulation data for both multicast and unicast transport services. The data transport is augmented by a control protocol (RTCP) designed to provide minimal control and identification functionality particularly in multicast networks. RTP and RTCP are designed to be independent of the underlying transport and network layers. The protocol supports the use of RTP-level translators and bridges. Within multicast associations, sites can direct control messages to individual sites. The protocol does not address resource reservation and does not guarantee quality-of-service for real-time services. This specification is a product of the Audio-Video Transport working group within the Internet Engineering Task Force. Comments are solicited and INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 should be addressed to the working group's mailing list at rem-conf@es.net and/or the authors. Contents 1 Introduction 3 2 RTP Protocol Use Scenarios 5 2.1 Simple Multicast Audio Conference . . . . . . . . . . . . . . . . . 5 2.2 Bridges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Translators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.4 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3 Definitions 7 4 Byte Order, Alignment, and Reserved Values 10 5 Real-Time Data Transfer Protocol -- RTP 10 5.1 RTP Fixed Header Fields . . . . . . . . . . . . . . . . . . . . . . 10 5.2 The RTP Options . . . . . . . . . . . . . . . . . . . . . . . . . . 12 5.2.1CSRC: Content source identifiers . . . . . . . . . . . . . . . . 13 5.2.2SSRC: Synchronization source identifier . . . . . . . . . . . . 13 5.2.3BOS: Beginning of synchronization unit . . . . . . . . . . . . . 14 5.3 Reverse-Path Option . . . . . . . . . . . . . . . . . . . . . . . . 14 5.3.1SDST: Synchronization destination identifier . . . . . . . . . . 15 5.4 Security Options . . . . . . . . . . . . . . . . . . . . . . . . . 16 5.4.1ENC: Encryption . . . . . . . . . . . . . . . . . . . . . . . . 18 5.4.2MIC: Messsage integrity check . . . . . . . . . . . . . . . . . 19 5.4.3MICA: Message integrity check, asymmetric encryption . . . . . . 20 5.4.4MICK: Message integrity check, keyed . . . . . . . . . . . . . . 21 H. Schulzrinne/S. Casner Expires 11/01/93 [Page 2] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 5.4.5MICS: Message integrity check, symmetric-key encrypted . . . . . 22 6 Real Time Control Protocol --- RTCP 22 6.1 FMT: Format description . . . . . . . . . . . . . . . . . . . . . . 23 6.2 SDES: Source descriptor . . . . . . . . . . . . . . . . . . . . . . 24 6.3 BYE: Goodbye . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 6.4 QOS: Quality of service measurement . . . . . . . . . . . . . . . . 27 7 Security Considerations 28 8 RTP over Network and Transport Protocols 29 8.1 Defaults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 8.2 ST-II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 A Implementation Notes 30 A.1 Timestamp Recovery . . . . . . . . . . . . . . . . . . . . . . . . 31 A.2 Detecting the Beginning of a Synchronization Unit . . . . . . . . . 32 A.3 Demultiplexing and Locating the Synchronization Source . . . . . . 33 A.4 Parsing RTP Options . . . . . . . . . . . . . . . . . . . . . . . . 33 A.5 Determining the Expected Number of RTP Packets . . . . . . . . . . 34 B Addresses of Authors 35 1 Introduction This memorandum specifies a transport protocol for real-time applications. A discussion of real-time services and algorithms for their implementation and some of the RTP design decisions can be found in the current version of the companion Internet draft draft-ietf-avt-issues. The transport protocol provides end-to-end delivery services for data with real-time characteristics, for example, interactive audio and video. RTP itself does not provide any mechanism to ensure timely delivery or provide other quality-of-service guarantees, but relies on lower-layer services to do so. It does ___ guarantee delivery or prevent out-of-order delivery, nor does it assume that the underlying network is reliable and delivers packets in H. Schulzrinne/S. Casner Expires 11/01/93 [Page 3] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 sequence. The sequence numbers included in RTP allow the end system to reconstruct the sender's packet sequence, but sequence numbers might also be used to determine the proper location of a packet, for example in video decoding, without necessarily decoding packets in sequence. RTP does not provide quality-of-service guarantees. RTP is designed to run on top of a variety of network and transport protocols, for example, IP, ST-II or UDP.(1) RTP transfers data in a single direction, possibly to multiple destinations if supported by the underlying network. A mechanism for sending control data in the opposite direction, reversing the path traversed by regular data, is provided. While RTP is primarily designed to satisfy the needs of multi-participant multimedia conferences, it is not limited to that particular application. Storage of continuous data, interactive distributed simulation, active badge, and control and measurement applications may also find RTP applicable. Profiles are used to instantiate certain header fields and options for particular sets of applications. A profile for audio and video data may be found in the companion Internet draft draft-ietf-avt-profile. The current Internet does not support the widespread use of real-time services. High-bandwidth services using RTP, such as video, can potentially seriously degrade other network services. Thus, implementors should take appropriate precautions to limit accidental bandwidth usage. Application documentation should clearly outline the limitations and possible operational impact of high-bandwidth real-time services on the Internet and other network services. This document defines a packet format shared by two protocols: o the real-time transport protocol (RTP), for exchanging data that hs real-time properties. The RTP header consists of a fixed-length portion plus optional control fields; o the RTP control protocol (RTCP), for conveying information about the participants in an on-going session. RTCP consists of additional header options that may be ignored without affecting the ability to receive data correctly. RTCP is used for ``loosely controlled'' sessions, i.e., where there is no explicit membership control and set-up. Its functionality may be subsumed by a session control protocol, which is beyond the scope of this document. ------------------------------ 1. For most applications, RTP offers insufficient demultiplexing to run directly on IP. H. Schulzrinne/S. Casner Expires 11/01/93 [Page 4] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 2 RTP Protocol Use Scenarios The following sections describe some aspects of the use of RTP. The examples were chosen to illustrate the basic operation of applications using RTP, not to limit what RTP may be used for. In these examples, RTP is carried on top of IP and UDP, and follows the conventions established by the profile for audio and video specified in the companion Internet draft draft-ietf-avt-profile. 2.1 Simple Multicast Audio Conference A working group of the IETF meets to discuss the latest protocol draft, using the IP multicast services of the Internet for voice communications. Through some allocation mechanism, the working group chair obtains a multicast group address; all participants use the destination UDP port specified by the profile. The multicast address and port are distributed, say, by electronic mail, to all intended participants. The mechanisms for discovering available multicast addresses and distributing the information to participants are beyond the scope of RTP. The audio conferencing application used by each conference participant sends audio data in small chunks of, say, 20 ms duration. Each chunk of audio data is preceded by an RTP header; RTP header and data are in turn contained in a UDP packet. The Internet, like other packet networks, occasionally loses and reorders packets and delays them by variable amounts of time. To cope with these impairments, the RTP header contains timing information and a sequence number that allow the receivers to reconstruct the timing seen by the source, so that, in our case, a chunk of audio is delivered to the speaker every 20 ms. The sequence number can also be used by the receiver to estimate how many packets are being lost. Each RTP packet also indicates what type of audio encoding (such as PCM, ADPCM or GSM) is being used, so that senders can change the encoding during a conference, for example, to accommodate a new participant that is connected through a low-bandwidth link. Since members of the working group join and leave during the conference, it is useful to know who is participating at any moment. For that purpose, each instance of the audio application in the conference periodically multicasts the name, email address and other information of its user. Such control information is carried as RTCP SDES options within RTP messages, with or without audio data (see Section 6.2). These periodic messages also provide some indication as to whether the network connection is still functioning. A site sends the RTCP BYE (Section 6.3) option when it leaves a conference. The RTCP QOS (Section 6.4) option indicates how well the current speaker is being received and may be used to control adaptive encodings. H. Schulzrinne/S. Casner Expires 11/01/93 [Page 5] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 2.2 Bridges So far, we have assumed that all sites want to receive audio data in the same format. However, this may not always be appropriate. Consider the case where participants in one area are connected through a low-speed link to the majority of the conference participants, who enjoy high-speed network access. Instead of forcing everyone to use a lower-bandwidth, reduced-quality audio encoding, a ______ is placed near the low-bandwidth area. This bridge resynchronizes incoming audio packets to reconstruct the constant 20 ms spacing generated by the sender, mixes these reconstructed audio streams, translates the audio encoding to a lower-bandwidth one and forwards the lower-bandwidth packet stream to the low-bandwidth sites. After the mixing, the identity of the high-speed site that is speaking can no longer be determined from the network origin of the packet. Therefore, the bridge inserts a CSRC option (Section 5.2.1) into the packet containing a list of short site identifiers to indicate which site(s) ``contributed'' to that mixed packet. An example of this is shown for bridge B1 in Fig. 1. As name and location information is received by the bridge in SDES options from the high-speed sites, that information is passed on to the receivers along with a mapping to the CSRC identifiers. [E1] [E6] | | E1:17 | E6:15 | | | E6:63/6 V B1:48 (1,2) B1:28/1 (1,2) V B1:63/5 (1,2) (B1)------------->----------------->--------------->[E7] ^ ^ E4:28/2 ^ E4:63/3 E2:1 | E4:47 | | B3:63/4 (1,4) | | | [E2] [E4] | | LEGEND: [E3] --------->(B2)----------->(B3)------------| [End system] E3:64 B2:12 (3) ^ (Bridge) | E5:45 | [E5] content: source port/SSRC (CSRCs) --------------------------------> Figure 1: Sample RTP network with end systems, bridges and translators 2.3 Translators Not all sites are directly accessible through IP multicast. For these sites, mixing may not necessary, but a translation of the underlying transport protocol is. RTP-level gateways that do not restore timing or H. Schulzrinne/S. Casner Expires 11/01/93 [Page 6] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 mix packets from different sources are called ___________ in this document. Application-level firewalls, for example, will not let any IP packets pass. Two translators are installed, one on either side of the firewall, the outside one funneling all multicast packets received through the secure connection to the translator inside the firewall. The translator inside the firewall sends them again as multicast packets to a multicast group restricted to the site's internal network. Other examples include the connection of a group of hosts speaking only IP/UDP to a group of hosts that understand only ST-II. After RTP packets have passed through a translator, they all carry the network source address of the translator, making it impossible for the receiver to distinguish packets from different speakers based on network source addresses. Since each sending site has its own sequence number space and slightly offset timestamp space, the receiver could not properly mix the audio packets. (For video, it could not properly separate them into distinct displays.) Instead of forcing all senders to include some globally unique identifier in each packet, a translator inserts an SSRC option (Section 5.2.2) with a short identifier for the source that is locally unique to the translator. This also works if an RTP packet has to travel through several translators, with the SSRC value being mapped into a new locally unique value at each translator. An example is shown in Fig. 1, where hosts T1 and T2 are translators. The RTP packets from host E4 are identified with SSRC value 2, while those coming from bridge B1 are labeled with SSRC value 1. Similarly, translator T2 labels packets from E6, B1, E4 and B3 with SSRC values 6, 5, 3 and 4, respectively (or some other unique values). 2.4 Security Conference participants would often like to ensure that nobody else can listen to their deliberations. Encryption, indicated by the presence of the ENC option (Section 5.4.1), provides that privacy. The encryption method and key can be changed during the conference by indexing into a table. For example, a meeting may go into executive session, protected by a different encryption key accessible only to a subset of the meeting participants. For authentication, a number of methods are provided, depending on needs and computational capabilities. All these message integrity check (MIC) options (Sections 5.4.3 and following) compute cryptographic checksums, also known as message digests, over the RTP data. 3 Definitions _______ is the data following the RTP fixed header and the RTP/RTCP options. The payload format and interpretation are beyond the scope of this memo. H. Schulzrinne/S. Casner Expires 11/01/93 [Page 7] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 RTP packets without payload are valid. Examples of payload include audio samples and video data. An ___ ______ consists of the encapsulation specific to a particular underlying protocol, the fixed RTP header, RTP and RTCP options (if any) and the payload, if any. A single packet of the underlying protocol may contain several RTP packets if permitted by the encapsulation method. A __________ ____ is the ``abstraction that transport protocols use to distinguish among multiple destinations within a given host computer. TCP/IP protocols identify ports using small positive integers.'' [1] The transport selectors (TSEL) used by the OSI transport layer are equivalent to ports. A _________ _______ denotes the combination of network address, e.g., the 4-octet IP Version 4 address, and the transport protocol port, e.g., the UDP port. In OSI systems, the transport address is called transport service access point or TSAP. The destination transport address may be a unicast or multicast address. A _______ ______ is the actual source of the data carried in an RTP packet, for example, the application that originally generated some audio data. Data from one or more content sources may be combined into a single RTP packet by a bridge, which becomes the synchronization source (see next paragraph). Content sources identify the logical source of the data, for example, to highlight the current speaker in an audio conference; they have no effect on the delivery or playout timing of the data itself. In Fig. 1, E1 and E2 are the content sources of the data received by E7 from bridge B1, while B1 is the synchronization source. A _______________ ______ is the combination of one or more content sources with its own timing. Each synchronization source has its own sequence number space. The audio coming from a single microphone and the video from a camera are examples of synchronization sources. The receiver groups packets by synchronization source for playback. Typically a single synchronization source emits a single medium (e.g., audio or video). A synchronization source may change its data format, e.g., audio encoding, over time. Synchronization sources are identified by their transport address and the identifier carried in the SSRC option. If the SSRC option is absent, a value of zero is assumed for that identifier. A _________ ______ is the transport-level origin of the RTP packets as seen by the receiving end system. In Fig. 1, host T2, port 63 is the transport source of all packets received by end system E7. A _______ comprises all synchronization sources sending to the same destination transport address using the same RTP channel identifier. An ___ ______ generates the content to be sent in RTP packets and consumes the content of received RTP. An end system can act as one or more synchronization sources. (Most end systems are expected to be a single H. Schulzrinne/S. Casner Expires 11/01/93 [Page 8] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 synchronization source.) An (RTP-level) ______ receives RTP packets from one or more sources, combines them in some manner and then forwards a new RTP packet. A bridge may change the data format. Since the timing among multiple input sources will not generally be synchronized, the bridge will make timing adjustments among the streams and generate its own timing for the combined stream. Therefore, bridges are synchronization sources, with each of the sources whose packets were combined into an outgoing RTP packet as the content sources for that outgoing packet. Audio bridges and media converters are examples of bridges. In Fig. 1, end systems E1 and E2 use the services of bridge B1. B1 inserts CSRC identifiers for E1 and E2 when they are active (e.g., talking in an audio conference). The RTP-level bridges described in this document are unrelated to the data link-layer bridges found in local area networks. If there is possibility for confusion, the term 'RTP-level bridge' should be used. The name bridge follows common telecommunication industry usage. An (RTP-level) __________ forwards RTP packets, but does not alter their sequence numbers or timestamps. Examples of its use include encoding conversion without mixing or retiming, conversion from multicast to unicast, and application-level filters in firewalls. A translator is neither a synchronization nor a content source. The properties of bridges and translators are summarized in Table 1. Checkmarks in parentheses designate possible, but unlikely actions. The options are explained in Sections 5.2, the RTCP options in Section 6. end sys. bridge translator mix sources -- x -- change encoding N/A x x encrypt x x (x) sign for authentication x x -- alter content x x x insert CSRC (RTP) -- x -- insert SSRC (RTP) x x x insert SDST (RTP) x x -- insert SDES (RTCP) x x -- Table 1: The properties of end systems, bridges and translators A _______________ ____ consists of one or more packets that are emitted contiguously by the sender. The most common synchronization units are talkspurts for voice and frames for video transmission. During playout synchronization, the receiver must reconstruct exactly the time difference between packets within a synchronization unit. The time difference between synchronization units may be changed by the receiver to compensate for clock drift or to adjust to changing network delay jitter. For example, if audio packets are generated at fixed intervals during talkspurts, the receiver has to play back packets with exactly the same spacing. However, if, for H. Schulzrinne/S. Casner Expires 11/01/93 [Page 9] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 example, a silence period between synchronization units (talkspurts) lasts 600 ms, the receiver may adjust it to, say, 500 ms without this being noticed by the listener. _______ __________ refers to other protocols and mechanisms that may be needed to provide a useable service. In particular, for multimedia conferences, a conference control application may distribute encryption and authentication keys, negotiate the encryption algorithm to be used, and determine the mapping from the RTP format field to the actual data format used. For simple applications, electronic mail or a conference database may also be used. The specification of such mechanisms is outside the scope of this memorandum. 4 Byte Order, Alignment, and Reserved Values All integer fields are carried in network byte order, that is, most significant byte (octet) first. This byte order is commonly known as big-endian. The transmission order is described in detail in [2], Appendix A. Unless otherwise noted, numeric constants are in decimal (base 10). Numeric constants prefixed by '0x' are in hexadecimal. Fields within the fixed header and within options are aligned to the natural length of the field, i.e., 16-bit words are aligned on even addresses, 32-bit long words are aligned at addresses divisible by four, etc. Octets designated as padding have the value zero. Textual information is encoded accorded to the UTF-2 encoding of the ISO standard 10646 (Annex F) [3,4]. US-ASCII is a subset of this encoding and requires no additional encoding. The presence of multi-octet encodings is indicated by setting the most significant bit to a value of one. An octet with a binary value of zero may be used as a string terminator for padding purposes. However, strings are not required to be zero terminated. 5 Real-Time Data Transfer Protocol -- RTP 5.1 RTP Fixed Header Fields The RTP header has the following format: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Ver| ChannelID |P|S| format | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ H. Schulzrinne/S. Casner Expires 11/01/93 [Page 10] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 | timestamp (seconds) | timestamp (fraction) | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | options ... | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ The first eight octets are present in every RTP packet and have the following meaning: protocol version: 2 bits Identifies the protocol version. The version number of the protocol defined in this memo is one (1). channel ID: 6 bits The channel identifier field forms part of the tuple identifying a channel (see definition in Section 3) to provide an additional level of multiplexing at the RTP layer. The channel ID field is convenient if several different channels are to receive the same treatment by the underlying layers or if a profile allows for the concatenation of several RTP packets on different channels into a single packet of the underlying protocol layer. option present bit (P): 1 bit This flag has a value of one (1) if the fixed RTP header is followed by one or more options and a value of zero otherwise. end-of-synchronization-unit (S): 1 bit This flag has a value of one in the last packet of a synchronization unit, a value of zero otherwise.[As shown in Appendix A, the beginning of a synchronization unit can be readily established from this flag. If this flag were to signal the beginning of a synchronization unit instead, the end of a synchronization unit could not be established in real time.] format: 6 bits The format field forms an index into a table defined through the RTCP FMT option or non-RTP mechanisms (see Section 3). The mapping establishes the format of the RTP payload and determines its interpretation by higher layers. If no mapping has been defined in this manner, a standard mapping is specified by the companion profile document, RFC TBD. Also, default formats may be defined by the current edition of the Assigned Numbers RFC. sequence number: 16 bits The sequence number counts RTP packets. The sequence number increments by one for each packet sent. The sequence number may be used by the receiver to detect packet loss, to restore packet sequence and to identify packets to the application. timestamp: 32 bits H. Schulzrinne/S. Casner Expires 11/01/93 [Page 11] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 The timestamp reflects the wall clock time when the RTP packet was generated. Several consecutive RTP packets may have equal timestamps if they are generated at once. The timestamp consists of the middle 32 bits of a 64-bit NTP timestamp, as defined in RFC 1305 [5]. That is, it counts time since 0 hours UTC, January 1, 1900, with a resolution of 65536 ticks per second. (UTC is Coordinated Universal Time, approximately equal to the historical Greenwich Mean Time.) The RTP timestamp wraps around approximately every 18 hours. The timestamp of the first packet within a synchronization unit is expected to closely reflect the actual sampling instant, measured by the local system clock. If possible, the local system clock should be controlled by a time synchronization protocol such as NTP. However, it is allowable to operate without synchronized time on those systems where it is not available, unless a profile or session protocol requires otherwise. It is not necessary to reference the local system clock to obtain the timestamp for the beginning of every synchronization unit, but the local clock should be referenced frequently enough so that clock drift between the synchronized system clock and the sampling clock can be compensated for gradually. Within one synchronization unit, it may be appropriate to compute timestamps based on the logical timing relationships between the packets. For audio samples, for example, the nominal sampling interval may be used. 5.2 The RTP Options The packet header may be followed by options and then the payload. Each option consists of the F (final) bit, the option type designation, a one-octet length field denoting the total number of 32-bit words comprising the option (including F bit, type and length), followed by any option-specific data. The last option before the payload has the F bit set to one; for all other options this bit has a value of zero. An application may discard options with types unknown to it. Private and experimental options should use option types 64 through 127. Fields designated as ``reserved'' or ``R'' are set aside for future use; they should be set to zero by senders and ignored by receivers. Unless otherwise noted, each option may appear only once per packet. Each packet may contain any number of options. Options may appear in any order, unless specifically restricted by the option description. In particular, the position of some security options may have significance. The RTP options have the following type values: H. Schulzrinne/S. Casner Expires 11/01/93 [Page 12] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 name value CSRC 0 SSRC 1 SDST 2 BOS 3 ENC 8 MIC 9 MICA 10 MICK 11 MICS 12 5.2.1 CSRC: Content source identifiers 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| CSRC | length | content source identifier ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The content source option, inserted only by bridges, lists all sources that contributed to the packet. For example, for audio packets, all sources that were mixed together to create this packet are enumerated, allowing correct talker indication at the receiver. Each CSRC option may contain one or more 16-bit content source identifiers. The identifier values must be unique for all content sources received from a particular synchronization source on a particular channel; the value of binary zero is reserved and may not be used. If the number of content sources is even, the two octets needed to pad the list to a multiple of four octets are set to zero. There should only be a single CSRC option within a packet. If no CSRC option is present, the content source identifier is assumed to have a value of zero. CSRC options are not modified by RTP-level translators. A conformant RTP implementation does not have to be able to generate or interpret the CSRC option. 5.2.2 SSRC: Synchronization source identifier 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| SSRC | length = 1 | identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The SSRC option may be inserted by RTP-level translators, end systems and bridges. It is typically used only by translators, but it may be used by an end system application to distinguish several sources sent with the H. Schulzrinne/S. Casner Expires 11/01/93 [Page 13] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 same transport source address. Multiple synchronization sources with the same transport source address (e.g., the same IP address and UDP port) must each insert a distinct SSRC identifier. Conversely, synchronization sources that are distinguishable by their transport address do not require the use of SSRC options. The SSRC value zero is reserved; the receiver treats the packet as if the SSRC option were not present. If no SSRC option is present, the transport source address is assumed to indicate the synchronization source. There must be no more than one SSRC option per packet; thus, a translator must remap the SSRC identifier of an incoming packet into a new, locally unique SSRC identifier. The SSRC option can be viewed as an extension of the source port number in protocols like UDP, ST-II or TCP. An RTP receiver must support the SSRC option. RTP senders only need to support this option if they intend to send more than one source to the same channel using the same source port. 5.2.3 BOS: Beginning of synchronization unit 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| BOS | length = 1 | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The sequence number within the options contains the sequence number of the first packet within the current synchronization unit. The BOS option allows the receiver to compute the offset of a packet with respect to the beginning of the synchronization unit, even if the last packet of the previous synchronization unit was lost. It is expected that many applications will be able to tolerate such a loss, and so will not use the BOS option but rely on the S bit. 5.3 Reverse-Path Option With two-party (unicast) communications, having a receiver of data relay back control information to the sender is straightforward. Similarly, for multicast communications, control information can easily be sent to all members of the group. It may, however, be desirable to send a unicast message to a single member of a multicast group, for example to request retransmission of a particular data frame or to request/send a reception quality report. For this particular use, RTP includes a mechanism for sending so-called reverse RTP packets. The format of reverse RTP packets is exactly the same as for regular RTP packets and they can make use of all the options defined in this memorandum, except SSRC, as appropriate. The H. Schulzrinne/S. Casner Expires 11/01/93 [Page 14] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 support for and semantics of particular options are to be specified by a profile. Reverse RTP packets travel through the same translators as forward RTP packets. A site distinguishes reverse RTP packets from forward RTP packets by their arrival port. Reverse RTP packets arrive on the same port that the site uses as a source port for forward (data) RTP packets. Only reverse RTP packets carry the SDST option; if RTP packets are carried directly within IP or other network-layer protocols, the presence of the SDST option signals that the packet is a reverse RTP packet. A receiver of reverse RTP packets cannot rely on sequence numbers being consecutive, as a sender is allowed to use the same sequence number space while communicating through this reverse path with several sites. In particular, a receiver of reverse RTP packets cannot tell by the sequence numbers whether it has received all reverse RTP packets sent to it. The sequence number space of reverse RTP packets has to be completely separate from that used for RTP packets sent to the multicast group. If the same sequence number space were used, the members of the multicast group not receiving reverse RTP packets would detect a gap in their received sequence number space. The sender of reverse RTP packets should ensure that sequence numbers are unique, modulo wrap-around, so that they can, if necessary, be used for matching request and response. (Currently, no such request-response mechanism has been defined.) As a hypothetical example, consider defining a request to pan the remote video camera. After completing the request, the receiver of the request would send a generic acknowledgement containing the sequence number of the request to the requestor as an option (not as the packet sequence number in the fixed header). The timestamp should reflect the approximate sending time of the packet. The channel identifier must be the same as that used in the corresponding forward RTP packets. If many receivers send a reverse RTP packet in response to a stimulus in the data stream, the simultaneous delivery of a large number of packets back to the data source can cause congestion for both the network and the destination (this is known as an ``ack implosion''). Thus reverse RTP packets should be used with care, perhaps with mechanisms such as response rate limiting and random delays to spread out the simultaneous delivery. 5.3.1 SDST: Synchronization destination identifier 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| SDST | length = 1 | identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The SDST option is only inserted by RTP end systems and bridges if they want to send unicast information to a particular site within the multicast group. H. Schulzrinne/S. Casner Expires 11/01/93 [Page 15] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 Packets containing an SDST option must not contain an SSRC option and vice versa. Packets containing a SDST option are always reverse RTP packets. The SDST option may be used to distinguish reverse RTP packets from forward RTP packets if the port-number mechanism described earlier in this section is not available, e.g., because RTP packets are carried directly within IP packets, without UDP. If a forward RTP packet carries SSRC identifier X when sent from A to B, where A and B may be either two translators or an end system and a translator, the unicast reverse RTP packet will carry an SDST option with identifier X from B to A. Consider the topology shown in Fig. 1. Assume that all forward RTP packets are addressed to destination port 8000. For the case that B1 wants to send a reverse packet to E1, B1 simply sends to the source address and port, that is, port 17 in this example. E1 can tell by the arrival on port 17 that the packet is a reverse packet rather than a regular (forward) packet. The mechanism is somewhat more complicated when translators intervene. We focus on end system E7. E7 receives, say, video from a range of sources, E1 through E6 as indicated by the arrows. The transmission from T2 to E7 could be either multicast or unicast. Assume that E7 wants to send a retransmission request, a request to pan the camera, etc., to end system E4 and only to E4. E7 may not be able to directly reach E4, as E4 may be using a network protocol unknown to E7 or be located behind a firewall. According to the figure, video transmissions from E4 reach E7 through T2 with source port 63 and SSRC identifier 3. For the reverse message, E7 sends a message to T2, with destination port 63 and SDST identifier 3. T2 can look up in its table that it sends forward data coming from T1 with that identifier 3. T2 also knows that those messages from T1 carry SSRC 2 and arrive with source port 28. Just like E7, T2 places the SSRC identifier, 2 in this case, into the SDST option and forwards the packet to T1 at port 28. Finally, translator T1 consults its table to find that it labels packets coming from E4, port 47 with SSRC value 2 and thus knows to forward the reverse packet to E4, port 47. T1 can either place SDST value zero or no SDST option into that packet. Note that E4 cannot directly determine that E7 sent the reverse packet, rather than, say, E6. If that is important, a global identifier as defined for the QOS option needs to be included in the reverse packet. Only applications that need to send or receive reverse control RTP packets need to implement the SDST option. 5.4 Security Options The security options below offer message integrity, authentication and privacy and the combination of the three. Support for the security options is not mandatory, but see the discussion for the ENC option. The H. Schulzrinne/S. Casner Expires 11/01/93 [Page 16] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 four message integrity check options --- MIC, MICA, MICK and MICS --- are mutually exclusive, i.e., only one of them should be used in a single RTP packet. Combinations of one of the message integrity check options (MIC, MICA, MICK, or MICS) and the encryption (ENC) option described below can be used to provide a variety of security services: confidentiality: Confidentiality means that only the intended receiver(s) can decode the received RTP packets; for others, the RTP packet contains no useful information. Confidentiality of the content is achieved by encryption. The presence of encryption and the encryption initialization vector is indicated by the ENC option.[For efficiency reasons, this specification does not insist that content encryption only be used in conjunction with message integrity and authentication mechanisms. In most cases, it will be obvious to the person receiving the data if he or she does not possess the right encryption key.] authentication and message integrity: In combination with certificates(2), the receiver can ascertain that the claimed originator is indeed the originator of the data (authentication) and that the data has not been altered after leaving the sender (message integrity). These two security services are provided by the message integrity check options. Certificates for MICA must be distributed through means outside of RTP. The services offered by MICA and MIC/MICK/MICS differ: With MIC/MICK/MICS, the receiver can only verify that the message originated within the group holding the secret key, rather than authenticate the sender of the message, while the MICA option affords true authentication of the sender. authentication, message integrity, and confidentiality: By carrying both the message integrity check and ENC option in RTP packets, the authenticity, message integrity and confidentiality of the packet can be assured (subject to the limitations discussed in the previous paragraph). The message integrity check is applied first to all parts of the outgoing packet to be authenticated, and the message integrity check option is prepended to those parts. Then the packet including the message integrity check option is encrypted using the shared secret key. The ENC option must be followed immediately by the message integrity check option, without any other options in between. The receiver first decrypts the octets following the ENC option and then authenticates the decrypted data using the signature contained in the message integrity check option. For this combination of security features and group authentication, the ------------------------------ 2. For a description of certificates see, for example, RFC 1422 or [6]. H. Schulzrinne/S. Casner Expires 11/01/93 [Page 17] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 combination ENC and MIC is recommended (instead of MICS or MICK), as it yields the lowest processing overhead. A message integrity check option followed by an ENC option should not be used. All message integrity check options are computed over the fixed header, the first four octets of the message integrity check option and the data, that is, the remaining header options and payload that follow the message integrity check option. The MICK option includes the whole MICK option itself in the message integrity check. The fixed header is protected to foil replay attacks and reassignment to a different channel. The message integrity check options and the ENC option shall not cover the SSRC and SDST options, i.e., SSRC and SDST must be inserted between the fixed header and the ENC or message integrity check options; SSRC and SDST are subject to change by translators that likely do not possess the necessary descriptor table (see below) and encryption keys. Translators that have the necessary keys and descriptor translation table may modify the contents of the RTP packet, unless the MICA option is used (see MICA description in Section 5.4.3). All security options carry a one-octet descriptor field. This descriptor is an index into two tables, one for the message integrity check options, one for the ENC option, established by non-RTP means, containing digest algorithms (MD2, MD5, etc.), encryption algorithms (DES variants) and encryption keys or shared secrets (for the MICK option). All sources within the same channel share the same table; this reduces per-site state information. The descriptor value may change during a session, for example, to switch to a different encryption key. The descriptor value zero selects a set of default algorithms, namely, MD5 for the message digest algorithm, DES CBC for the encryption algorithm. 5.4.1 ENC: Encryption 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| ENC | length = 3 | reserved | descriptor | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | DES (CBC) initialization vector, bytes 0 through 3 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | DES (CBC) initialization vector, bytes 4 through 7 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| ENC | length = 1 | reserved | descriptor | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ H. Schulzrinne/S. Casner Expires 11/01/93 [Page 18] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 All packet data after this option is encrypted, using the encryption key and symmetric encryption algorithm specified by the descriptor field. Every encrypted RTP packet must contain this option. Note that the fixed header is specifically not encrypted because some fields must be interpreted by translators that will not have access to the key. The descriptor value may change over time to accommodate varying security requirements or limit the amount of ciphertext using the same key. For example, in a job interview conducted across a network, the candidate and interviewers could share one key, with a second key set aside for the interviewers only. For symmetric keys, source-specific keys offer no advantage. The descriptor value zero is reserved for a default mode using the Data Encryption Standard (DES) algorithm in CBC (cipher block chaining) mode, as described in Section 1.1 of RFC 1423 [7]. The padding specified in that section is to be used. The 8-octet initialization vector (IV) may be carried unencrypted within the ENC option, generated anew for each packet. If the ENC option does not contain an initialization vector (indicated by an option length of one), the fixed RTP header is used as the initalization vector. (Using the fixed RTP header as the initialization vector avoids regenerating the initialization vector for each packet and incurs less header overhead.) For details on the tradeoffs for CBC initialization vector use, see [8]. Support for encryption is not required. Implementations that do not support encryption should recognize the ENC option so that they can avoid processing encrypted messages and provide a meaningful failure indication. Implementations that support encryption should, at the minimum, always support the DES CBC algorithm. 5.4.2 MIC: Messsage integrity check 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| MIC | length | reserved | descriptor | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | message digest (unencrypted) ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The MIC option option is used only in combination with the ENC option immediately preceding it to provide privacy and group membership authentication. The message integrity check uses the digest algorithm specified by the descriptor field. (A message digest is a cryptographic hash function that transforms a message of any length to a fixed-length byte string, where the fixed-length string has the property that it is computationally infeasible to generate another, different message with the same digest.) The value zero implies the use of the MD5 message digest. Note that the MIC option is not separately encrypted. H. Schulzrinne/S. Casner Expires 11/01/93 [Page 19] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 5.4.3 MICA: Message integrity check, asymmetric encryption 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| MICA | length | message digest ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | (asymmetrically encrypted) ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Currently, only the use of the MD2 and MD5 message digest algorithms is defined, as described in RFC 1319 [9] (as corrected in Section 2.1 of RFC 1423) and RFC 1321 [10], respectively. The MD2 and MD5 message digests are 16 octets long. RFC 1423, Section 2.1: To avoid any potential ambiguity regarding the ordering of the octets of an MD2 message digest that is input as a data value to another encryption process (e.g., RSAEncryption), the following holds true. The first (or left-most displayed, if one thinks in terms of a digest's ``print'' representation) octet of the digest (i.e., digest[0] as specified in RFC 1319), when considered as an RSA data value, has numerical weight 2**120. The last (or right-most displayed) octet (i.e., digest[15] as specified in RFC 1319) has numerical weight 2**0. RFC 1423, Section 2.2: To avoid any potential ambiguity regarding the ordering of the octets of an MD5 message digest that is input as an RSA data value to the RSA encryption process, the following holds true. The first (or left-most displayed, if one thinks in terms of a digest's ``print'' representation) octet of the digest (i.e., the low-order octet of A as specified in RFC 1321), when considered as an RSA data value, has numerical weight 2**120. The last (or right-most displayed) octet (i.e., the high-order octet of D as specified in RFC 1321) has numerical weight 2**0. The message digest is encrypted, using asymmetric keys, with the sender's private key using the algorithm described in Section 4.2.1 of RFC 1423: As described in PKCS #1, all quantities input as data values to the RSAEncryption process shall be properly justified and padded to the H. Schulzrinne/S. Casner Expires 11/01/93 [Page 20] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 length of the modulus prior to the encryption process. In general, an RSAEncryption input value is formed by concatenating a leading NULL octet, a block type BT, a padding string PS, a NULL octet, and the data quantity D, that is, RSA input value = 0x00, BT, PS, 0x00, D. To prepare a MIC for RSAEncryption, the PKCS #1 ``block type 01'' encryption-block formatting scheme is employed. The block type BT is a single octet containing the value 0x01 and the padding string PS is one or more octets (enough octets to make the length of the complete RSA input value equal to the length of the modulus) each containing the value 0xFF. The data quantity D is comprised of the MIC and the MIC algorithm identifier. The encoding is described in detail in RFC 1423. For encrypting MD2 and MD5, the data quantity D comprises the 16-octet checksum, preceded by the binary sequences shown here in hexadecimal: 0x30, 0x20, 0x30, 0x0C, 0x06, 0x08, 0x2A, 0x86, 0x48, 0x86, 0xF7, 0x0D, 0x02, 0x02, 0x05, 0x00, 0x04, 0x10 for MD2 and 0x30, 0x20, 0x30, 0x0C, 0x06, 0x08, 0x2A, 0x86, 0x48, 0x86, 0xF7, 0x0D, 0x02, 0x05, 0x05, 0x00, 0x04, 0x10 for MD5. Contrary to what is specified in RFC 1423 for privacy enhanced mail, the asymmetrically signed MIC is carried in binary, ___ represented in the printable encoding of RFC 1421, Section 4.3.2.4. The encrypted length of the signature will be equal to the modulus of the RSA encryption used, rounded to the next integral octet count. The modulus and public key are conveyed to the receivers by non-RTP means. Asymmetric keys are used since symmetric keys would not allow authentication of the individual source in the multicast case. The signature is padded as necessary. The value of the padding is left unspecified. The number of non-padding bits within the signature is known to the receiver as being equal to the key length. The MIC algorithm is identified through the octets prepended to the actual 16-octet signature. A translator is not allowed to modify the parts of an RTP packet covered by the MICA option as the receiver would have no way of establishing the identity of the translator and thus could not verify the integrity of the RTP packet. Support for sending or interpreting MICA options is not required. 5.4.4 MICK: Message integrity check, keyed 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| MICS | length | reserved | descriptor | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ H. Schulzrinne/S. Casner Expires 11/01/93 [Page 21] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 | message digest (symmetrically encrypted) ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ This message integrity check option does not require encryption. In addition to the RTP packet parts to be included in the message digest according to the introduction to this section, the shared secret is placed in the MICK option and included in the message digest. The shared secret is equivalent to the key used for the MICS and ENC options, but is 16 octets long, padded if needed with binary zeroes. The shared secret in the MICK option is then replaced by the computed 16-octet message digest. The receiver stores the message digest contained in the MICK option, replaces it with the shared secret key and computes the message digest in the same manner as the sender. If the RTP packet has not been tampered with and has originated with one of the holders of the shared secret, the computed message digest will agree with the digest found on reception in the MICS option.[The message integrity check follows the practice of SNMP Version 2, as described in RFC 1446, Section 1.5.1. The MICS option itself is covered by the digest in order to detect tampering with the descriptor field itself. Using the secret key in the signature instead of encrypting the MD5 message digest avoids the use of an encryption algorithm when only authentication is desired. However, the security of this approach has not been as well established as the authentication based on encrypting message digests used in the MICS, MIC and MICA options.] 5.4.5 MICS: Message integrity check, symmetric-key encrypted 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| MICS | length | reserved | descriptor | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | message digest (symmetrically encrypted) ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ This message integrity check encrypts the message digest using DES ECB mode as described in RFC 1423, Section 3.1. 6 Real Time Control Protocol --- RTCP The real-time control protocol (RTCP) conveys minimal control and advisory information during a session. It provides support for ``loosely controlled'' sessions, i.e., where participants enter and leave without membership control and parameter negotiation. The services provided by RTCP H. Schulzrinne/S. Casner Expires 11/01/93 [Page 22] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 services augment RTP, but an end system does not have to implement RTCP features to participate in sessions. There is one exception to this rule: if an application sends FMT options, the receiver has to decode these in order to properly interpret the RTP payload. RTCP does not aim to provide the services of a session control protocol and does not provide some of the services desirable for two-party conversations. If a session control protocol is in use, the services of RTCP should not be required. (As of the writing of this document, a session or conference control protocol has not been specified within the Internet.) RTCP options share the same structure and numbering space as RTP options, which are described in Section 5. Unless otherwise noted, control information is carried periodically as options within RTP packets, with or without payload. RTCP packets are sent to all members of a session. These packets are part of the same sequence number space as RTP packets not containing RTCP options. The period should be varied randomly to avoid synchronization of all sources and its mean should increase with the number of participants in the session to limit the growth of the overall network and host interrupt load. The length of the period determines, for example, how long a receiver joining a session has to wait until it can identify the source. A receiver may remove from its list of active sites a site that it has not been heard from for a given time-out period; the time-out period may depend on the number of sites or the observed average interarrival time of RTCP messages. Note that not every periodic message has to contain all RTCP options; for example, the EMAIL part within the SDES option might only be sent every few messages. RTCP options should also be sent when information carried in RTCP options changes, but the generation of RTCP options should be rate-limited. The option types are defined below: 6.1 FMT: Format description 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| FMT | length |R|R| format | reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | format-dependent data ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ format: 6 bits The format field corresponds to the index value from the format field in the RTP fixed header, with values ranging from 0 to 63. Format-dependent data: variable length Format-dependent data may or may not appear in a FMT option. It is passed to the next layer and not interpreted by RTP. H. Schulzrinne/S. Casner Expires 11/01/93 [Page 23] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 A FMT mapping changes the interpretation of a given format value carried in the fixed RTP header starting at the packet containing the FMT option. The new interpretation applies only to packets from the same synchronization source as the packet containing the FMT option. If format mappings are changed through the FMT option, the option should be sent periodically as otherwise sites that did not receive the FMT option due to packet loss or joining the session after the FMT option was sent will not know how to interpret the particular format value. 6.2 SDES: Source descriptor 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| SDES | length | source identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | type = ADDR | length | reserved | address type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | network-layer address ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | type = ADDR | length = 2 | reserved | addr. type = 1| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IPv4 address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | type = PORT | length = 1 | port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | type = PORT | length > 1 | reserved | reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | port ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | type = CNAME | length | user and domain name ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | type = EMAIL | length | electronic mail address ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ H. Schulzrinne/S. Casner Expires 11/01/93 [Page 24] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 | type = NAME | length | common name of source ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | type = LOC | length | geographic location of site ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | type = TXT | length | text describing source ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The SDES option provides a mapping between a numeric source identifier and one or more items identifying the source.[Several attributes were combined into one option so that the receiver does not have to perform multiple mappings from identifiers to site data structures.] For those applications where the size of a multi-item SDES option would be a concern, multiple SDES options may be formed with subsets of the items to be sent in separate packets. A bridge uses an identifier value of zero within the SDES option to refer to itself rather than content sources bridged by it. For each content source, a bridge forwards the SDES information received from that source, but changes the SDES source identifier to the value used in the CSRC option when identifying that content source. A bridge that contributes local data to outgoing packets should select another non-zero source identifier for that traffic and send CSRC and SDES options for it. Translators do not modify or insert SDES options. The end system performs the same mapping it uses to identify the content sources (that is, the combination of network source, synchronization source and the source identifier within this SDES option) to identify a particular source. SDES information is specific to a particular channel, unless a profile or a higher-layer control protocol defines that all packets with the same source identifier (network and transport-level source addresses and the optional SSRC value) from a set of channels defined by the control protocol are described by the same SDES. Currently, the items listed in Table 2 are defined. Each has a structure similar to that of RTCP and RTP options, that is, a type field followed by a length field, measured in multiples of four octets. No final bit (see Section 5.2) is needed since the overall length is known. Text items are encoded according to the rules in Section 4. All of the SDES items are optional; however, if quality-of-service monitoring is to be used, one ADDR item and the PORT item are mandatory, as described for the QOS option. Only the TXT item is expected to change during the duration of a session. Option types 128 through 255 are reserved for private or experimental extensions. Items are padded with the binary value zero to the next multiple of four octets. Each item may appear only once unless otherwise noted. A more detailed description of the content of some of these items follows: H. Schulzrinne/S. Casner Expires 11/01/93 [Page 25] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 type value description ADDR 1 network address of source PORT 2 source port CNAME 4 canonical user and host identifier, e.g., ``doe@sleepy.megacorp.com'' or ``sleepy.megacorp.com'' EMAIL 5 user's electronic mail address, e.g., ``John.Doe@megacorp.com'' NAME 6 common name describing the source, e.g.,``John Doe, Bit Recycler, Megacorp'' LOC 8 geographic user location, e.g., ``Rm. 2A244, Murray Hill, NJ'' TXT 16 text describing the source, e.g., ``out for lunch'' Table 2: Summary of SDES items ADDR: This item contains the network address of the source, for example, the IP version 4 address or an NSAP. The address is carried in binary form, not as ``dotted decimal'' or similar human-readable form. A source may send several network addresses, but only one for each address type value. Address types are identified by the Domain Name Service Resource Record (RR) type, as specified in the current edition of the Assigned Numbers RFC. PORT: If the length field is one, the transport selector, such as the UDP port number, is carried as octets three and four in the first and only word of the item. If the length field is greater than one, octets three and four are zero and the transport selector appears in words two and following of this item, in network byte order. The figure shows the use of the PORT item for the TCP and UDP protocols. There must be no more than one PORT item in an SDES option. The PORT item should immediately precede any ADDR items.[Multiple concurrent transport addresses are not meaningful. The ordering simplifies processing at the receiver, as the consecutive octet string of PORT followed by the first ADDR can be used as a globally unique identifier. The transport protocol does not need to be identified, as the receiver will only see one type of transport protocol for a session.] CNAME: The CNAME item must have the format ``user@host'' or ``host'', where ``host'' is the fully qualified domain name of the host from which the real-time data originates, formatted according to the rules specified in RFC 1034, RFC 1035 and Section 2.1 of RFC 1123. The ``host'' form may be used if a user name is not available, for example on single-user systems. The user name should be in a form that a program such as ``finger'' or ``talk'' could use, i.e., it typically is the login name rather than the ``real life'' name. Note that the host name is not necessarily identical to the electronic mail address of the H. Schulzrinne/S. Casner Expires 11/01/93 [Page 26] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 participant. The latter is provided through the EMAIL item. LOC: Depending on the application, different degrees of detail are appropriate for this item. For conference applications, a string like ``Murray Hill, New Jersey'' may be sufficient, while, for an active badge system, strings like ``Room 2A244, AT&T BL MH'' might be appropriate. The degree of detail is left to the implementation and/or user, but format and content may be prescribed by a profile. 6.3 BYE: Goodbye 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| BYE | length = 1 | content source identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The BYE option indicates that a particular session participant is no longer active. A bridge sends BYE options with a (non-zero) content source value. An identifier value of zero indicates that the source indicated by the synchronization source (SSRC) option and transport address is no longer active. If a bridge shuts down, it should first send BYE options for all content sources it handles, followed by a BYE option with an identifier value of zero. Each RTCP message can contain one or more BYE messages. Multiple identifiers in a single BYE option are not allowed, to avoid ambiguities between the special value of zero and any necessary padding. 6.4 QOS: Quality of service measurement 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| QOS | length | reserved | reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | packets expected | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | packets received | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | minimum delay (seconds) | minimum delay (fraction) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | maximum delay (seconds) | maximum delay (fraction) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | average delay (seconds) | average delay (fraction) | H. Schulzrinne/S. Casner Expires 11/01/93 [Page 27] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | type = PORT | length | transport address ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | type = ADDR | length | reserved | address type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | network-layer address ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The QOS option conveys statistics of a single synchronization source belonging to the channel identified by the multicast address, destination port and channel identifier. The synchronization source is identified by appending the first of the ADDR items together with the PORT item from the SDES option. These SDES items are appended directly to the fixed-length part of the QOS option, with PORT preceding ADDR. For a description of these items, see the SDES option. If the QOS option is used in reverse control packets, the destination port number identifies the channel, along with the channel identifier. For that reason, every multicast group should be associated with a unique source port. The other fields of the option contain the number of packets received, the number of packets expected, the minimum delay, the maximum delay and the average delay. The expected number of packets may be computed according to the algorithm in Section A.5. The delay measures are in units of 1/65536 of a second, that is, with the same resolution as the timestamp in the fixed RTP header. A single RTCP packet may contain several QOS options. It is left to the implementor to decide how often to transmit QOS options and which sources are to be included. 7 Security Considerations Without the use of the security options described in section 5.4, RTP suffers from the same security deficiencies as the underlying protocols, for example, the ability of an impostor to fake source or destination network addresses, or to change header or payload without detection. For example, the SDES fields may be used to impersonate another participant. IP multicast provides no direct means for a sender to know all the receivers of the data sent. RTP options make it easy for all participants in a session to identify themselves; if deemed important for a particular application, it is the responsibility of the application writer to make listening without identification difficult. It should be noted, however, that privacy of the payload can generally be assured only by encryption. H. Schulzrinne/S. Casner Expires 11/01/93 [Page 28] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 The periodic transmission of session messages may make it possible to detect denial-of-service attacks, as the receiver can detect the absence of these expected messages. Unlike for other data, ciphertext-only attacks may be more _________ for compressed audio and video sources. Such data is very close to white noise, making statistics-based ciphertext-only attacks difficult. Even without message integrity check options, it may be difficult for an attacker to detect automatically when he or she has found the secret cryptographic key since the bit pattern after correct decryption may not look significantly different from one decrypted with the wrong key. However, the session information is more or less constant and predictable, allowing known-plaintext attacks. Chosen-plaintext attacks appear, in general, to be difficult. The integrity of the timestamp in the fixed RTP header can be protected by the message integrity options. If clocks are known to be synchronized, an attacker only has a very limited time window of maybe a few seconds every 18 hours to replay recorded RTP without detection by the receiver. Key distribution and certificates are outside the scope of this document. 8 RTP over Network and Transport Protocols This section describes issues specific to carrying RTP packets within particular network and transport protocols. 8.1 Defaults The following rules apply unless superseded by protocol-specific subsections in this section. The rules apply to both forward and reverse RTP packets. RTP packets contain no length field or other delineation, so that a framing mechanism is needed if they are carried in underlying protocols that provide the abstraction of a continuous bit stream rather than messages (packets). TCP is an example of such a protocol. Framing is also needed if the underlying protocol may contain padding so that the extent of the RTP payload cannot be determined. For these cases, each RTP packet is prefixed by a 32-bit framing field containing the length of the RTP packet measured in octets, not including the framing field itself. If an RTP packet traverses a path over a mixture of octet-stream and message-oriented protocols, each RTP-level bridge between these protocols is responsible for adding and removing the framing field. A profile may determine that this framing method is to be used even when RTP H. Schulzrinne/S. Casner Expires 11/01/93 [Page 29] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 is carried in protocols that do provide framing in order to allow carrying several RTP packets in one lower-layer protocol data unit, such as a UDP packet. Carrying several RTP packets in one network or transport packet reduces header overhead and may simplify synchronization between different streams. 8.2 ST-II When used in conjunction with RTP, ST-II [11] service access ports (SAPs) have a length of 16 bits. The next protocol field (``NextPCol'', Section 4.2.2.10 in RFC 1190) is used to distinguish two encapsulations of RTP over ST-II. The first uses NextPCol value TBD and directly places the RTP packet into the ST-II data area. If NextPCol value TBD is used, the RTP header is preceded by a 32-bit header shown below. The octet count determines the number of octets in the RTP header and payload to be checksummed. The 16-bit checksum uses the TCP and UDP checksum algorithm. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | count of octets to be checked | check sum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RTP packet (fixed header, options and payload) ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ A Implementation Notes We describe aspects of the receiver implementation in this section. There may be other implementation methods that are faster in particular operating environments or have other advantages. These implementation notes are for informational purposes only. The following definitions are used for all examples; the structure definitions are valid for 32-bit big-endian architectures only. Bit fields are assumed to be packed tightly, with no additional padding. #include typedef double CLOCK_t; typedef enum { RTP_CSRC = 0, RTP_SSRC = 1, RTP_SDST = 2, H. Schulzrinne/S. Casner Expires 11/01/93 [Page 30] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 RTP_BOS = 3, RTP_ENC = 8, RTP_MIC = 9, RTP_MICA = 10, RTP_MICK = 11, RTP_MICS = 12, RTP_FMT = 32, RTP_SDES = 34, RTP_BYE = 35, RTP_QOS = 36 } rtp_option_t; typedef struct { unsigned int ver:2; /* version number */ unsigned int channel:6; /* channel id */ unsigned int o:1; /* option present */ unsigned int s:1; /* sync bit */ unsigned int format:6; /* content type */ u_short seq; /* sequence number */ u_long ts; /* time stamp */ } rtp_hdr_t; typedef union { struct { int final:1; /* final option */ int type:7; /* option type */ u_char length; /* length, including type/length */ short id[1]; } csrc; /* ... */ } rtp_t; A.1 Timestamp Recovery For some applications it is useful to have the receiver reconstruct the sender's high-order bits of the NTP timestamp from the received 32-bit RTP timestamp. The following code uses double-precision floating point numbers for whole numbers with a 48-bit range. Other type definitions of CLOCK_t may be appropriate for different operating environments, e.g., 64-bit architectures or systems with slow floating point support. The routine applies to any clock frequency, not just the RTP value of 65,536 Hz, and any clock starting point. It will reconstruct the correct high-order bits as long as the local clock now is within one half of wrap-around time of the 32-bit timestamp, e.g., approximately 9.2 hours for RTP timestamps. #include H. Schulzrinne/S. Casner Expires 11/01/93 [Page 31] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 #define MOD32bit 4294967296. #define MAX31bit 0x7fffffff CLOCK_t clock_extend(ts, now) u_long ts; /* in: timestamp, low-order 32 bits */ CLOCK_t now; /* in: current local time */ { u_long high, low; /* high and low order bits of 48-bit clock */ low = fmod(now, MOD32bit); high = now / MOD32bit; if (low > ts) { if (low - ts > MAX31bit) high++; } else { if (ts - low > MAX31bit) high--; } return high * MOD32bit + ts; } /* extend_timestamp */ Using the full timestamp internally has the advantage that the remainder of the receiver code does not have to be concerned with modulo arithmetic. The current local time does not have to be derived directly from the system clock for every packet; a clock based on samples, e.g., incremented by the nominal audio frame duration, is sufficient. The whole seconds within NTP time stamps can be obtained by adding 2208988800 to the value of the standard Unix clock (generated, for example, by the gettimeofday system call), which starts from the year 1970. For the RTP time stamp, only the least significant 16 bits of the second are used. A.2 Detecting the Beginning of a Synchronization Unit RTP packets contain a bit flag indicating the end of a synchronization unit. The following code fragment determines, based on sequence numbers, if a packet is the beginning of a synchronization unit. It assumes that the packet header has been converted to host byte order. static u_long seq_eos; rtp_hdr_t *h; static int flag; if (h->s) { flag = 1; seq_eos = h->seq; } H. Schulzrinne/S. Casner Expires 11/01/93 [Page 32] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 /* handle wrap-around of sequence number */ else if (flag && (h->seq - seq_eos < 32768)) { flag = 0; /* handle beginning of synchronization unit */ } A.3 Demultiplexing and Locating the Synchronization Source The combination of destination address, destination port and channel identifier determines the channel. For each channel, the receiver maintains a list of all sources, content and synchronization sources alike, in a table or other suitable data structure. Synchronization sources are stored with a content source value of zero. When an RTP packet arrives, the receiver determines its network source address and port (from information returned by the operating system), synchronization source (SSRC option) and content source(s) (CSRC option). To locate the table entry containing timing information, mapping from content descriptor to actual encoding, etc., the receiver sets the content source to zero and locates a table entry based on the triple (transport source address, and synchronization source identifier, 0). The receiver identifies the contributors to the packet (for example, the speaker who is heard in the packet) through the list of content sources carried in the CSRC option. To locate the table entry, it matches on the triple (network address and port, synchronization source identifier, content source). Note that since network addresses are only generated locally at the receiver, the receiver can choose whatever format seems most appropriate for matching. For example, a Berkeley Unix-based system may use struct sockaddr data types if it expects network sources with non-IP addresses. A.4 Parsing RTP Options The following code segment walks through the RTP options, preventing infinite loops due to zero and invalid length fields. Structure definitions are valid for big-endian architectures only. u_long len; /* length of RTP packet in bytes */ u_long *pt; /* pointer */ rtp_hdr_t *h; /* fixed header */ rtp_t *r; /* options */ if (h->o) { for (pt = (u_long *)(h+1);; pt += r->csrc.length) { H. Schulzrinne/S. Casner Expires 11/01/93 [Page 33] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 r = (rtp_t *)pt; /* invalid length field */ if ((char *)pt - (char *)h > len || r->csrc.length == 0) return -1; switch(r->csrc.type) { case RTP_BYE: /* handle BYE option */ break; case RTP_CSRC: /* handle CSRC option */ break; /* ... */ default: /* undefined option */ break; } if (r->csrc.final) break; } } A.5 Determining the Expected Number of RTP Packets The number of packets expected can be computed by the receiver by tracking the first sequence number received (seq0), the last sequence number received, seq, and the number of complete sequence number cycles: expected = cycles * 65536 + seq - seq0 + 1; The cycle count is updated for each packet, where seq_prior is the sequence number of the prior packet: unsigned long seq, seq_prior; if (seq - seq_prior > 65536) cycle++; else if (seq - seq_prior > 32768) cycle--; seq_prior = seq; H. Schulzrinne/S. Casner Expires 11/01/93 [Page 34] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 Acknowledgments This memorandum is based on discussions within the IETF audio-video transport working group chaired by Stephen Casner. The current protocol has its origins in the Network Voice Protocol and the Packet Video Protocol (Danny Cohen and Randy Cole) and the protocol implemented by the vat application (Van Jacobson and Steve McCanne). Stuart Stubblebine (ISI) helped with the security aspects of RTP. Ron Frederic (Xerox PARC) provided extensive editorial assistance. B Addresses of Authors Stephen Casner USC/Information Sciences Institute 4676 Admiralty Way Marina del Rey, CA 90292-6695 telephone: +1 310 822 1511 (extension 153) electronic mail: casner@isi.edu Henning Schulzrinne AT&T Bell Laboratories MH 2A244 600 Mountain Avenue Murray Hill, NJ 07974-0636 telephone: +1 908 582 2262 facsimile: +1 908 582 5809 electronic mail: hgs@research.att.com References [1] D. E. Comer, _______________ ____ ______, vol. 1. Englewood Cliffs, New Jersey: Prentice Hall, 1991. [2] J. Postel, ``Internet protocol,'' Network Working Group Request for Comments RFC 791, Information Sciences Institute, Sept. 1981. [3] International Standards Organization, ``ISO/IEC DIS 10646-1:1993 information technology -- universal multiple-octet coded character set (UCS) -- part I: Architecture and basic multilingual plane,'' 1993. [4] The Unicode Consortium, ___ _______ ________. New York, New York: Addison-Wesley, 1991. [5] D. L. Mills, ``Network time protocol (version 3) -- specification, H. Schulzrinne/S. Casner Expires 11/01/93 [Page 35] INTERNET-DRAFT draft-ietf-avt-rtp-03.txt September 15, 1993 implementation and analysis,'' Network Working Group Request for Comments RFC 1305, University of Delaware, Mar. 1992. [6] S. Kent, ``Understanding the Internet certification system,'' in ___________ __ ___ _____________ __________ __________ ______, (San Francisco, California), pp. BAB--1 -- BAB--10, Internet Society, Aug. 1993. [7] D. Balenson, ``Privacy enhancement for internet electronic mail: Part III: Algorithms, modes, and identifiers,'' Network Working Group Request for Comments RFC 1423, IETF, Feb. 1993. [8] V. L. Voydock and S. T. Kent, ``Security mechanisms in high-level network protocols,'' ___ _________ _______, vol. 15, pp. 135--171, June 1983. [9] J. Kaliski, Burton S., ``The MD2 message-digest algorithm,'' Network Working Group Request for Comments RFC 1319, RSA Laboratories, Apr. 1992. [10] R. Rivest, ``The MD5 message-digest algorithm,'' Network Working Group Request for Comments RFC 1321, IETF, Apr. 1992. [11] C. Topolcic, S. Casner, C. Lynn, Jr., P. Park, and K. Schroder, ``Experimental internet stream protocol, version 2 (ST-II),'' Network Working Group Request for Comments RFC 1190, BBN Systems and Technologies, Oct. 1990. H. Schulzrinne/S. Casner Expires 11/01/93 [Page 36]