Network Working Group Stephan Wenger INTERNET-DRAFT Umesh Chandra Expires: January 2006 Nokia Magnus Westerlund Ericsson July 11, 2005 Codec Control Messages in the Audio-Visual Profile with Feedback (AVPF) draft-wenger-avt-avpf-ccm-00.txt> Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Copyright Notice Copyright (C) The Internet Society (2005). Abstract This document specifies a few extensions to the messages defined in the Audio-Visual Profile with Feedback (AVPF). They are useful primarily in conversational multimedia scenarios where centralized multipoint functionalities are in use. However some are also usable in smaller multicast environments and point-to-point calls. The extensions discussed are Full Intra Request, Freeze Request, Temporary Maximum Media Bit-rate and Temporal Spatial Tradeoff. Wenger, Chandra, Westerlund [Page 1] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 TABLE OF CONTENTS Status of this Memo...............................................1 Copyright Notice..................................................1 Abstract..........................................................1 TABLE OF CONTENTS.................................................2 1. Introduction...................................................4 2. Definitions....................................................5 2.1. Glossary...................................................5 2.2. Terminology................................................5 3. Motivation (Informative).......................................6 3.1. Use Cases..................................................6 3.2. Feedback Messages..........................................8 3.2.1. Full Intra Request Command............................8 3.2.2. Freeze Request Indication.............................9 3.2.3. Temporal Spatial Tradeoff Request and Acknowledge....10 3.2.4. Temporary Maximum Media Bit-rate Request and Ack.....10 3.3. Using the Media Path......................................13 4. Solution (Informative)........................................13 4.1. Reliability...............................................14 4.2. Multicast.................................................14 4.3. Freeze Picture............................................15 4.4. Full Intra Request Command................................15 4.5. Temporal Spatial Tradeoff.................................16 4.6. Temporary Maximum Media Bit-Rate..........................16 5. RTCP Receiver Report Extensions...............................17 5.1. Transport Layer Feedback Messages.........................17 5.1.1. Temporary Maximum Media Bit-rate Request (TMMBR).....17 5.1.2. Temporary Maximum Media Bit-rate Acknowledgement.... 18 5.2. Payload Specific Feedback Messages........................19 5.2.1. Full Intra Request (FIR).............................20 5.2.2. Temporal-Spatial Tradeoff Request (TSTR).............23 5.2.3. Temporal-Spatial Tradeoff Acknowledgement (TSTA).....24 5.2.4. Freeze Indication ...................................25 6. Congestion Control............................................26 7. Security Considerations.......................................26 8. SDP Definitions...............................................27 8.1. Extension of rtcp-fb attribute............................27 8.2. Offer-Answer..............................................28 8.3. Examples..................................................29 9. IANA Considerations...........................................30 10. Open Issues..................................................31 11. Acknowledgements.............................................31 12. References...................................................32 12.1. Normative references.....................................32 12.2. Informative references...................................32 Wenger, Chandra, Westerlund Standards Track [Page 2] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 13. Authors' Addresses...........................................32 RFC Editor Considerations........................................34 Wenger, Chandra, Westerlund Standards Track [Page 3] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 1. Introduction When the Audio-Visual Profile with Feedback (AVPF) [AVPF] was developed, the main emphasis of the authors lied in the efficient support of point-to-point and small multipoint scenarios without centralized multipoint control. However, in practice, many small multipoint conferences are conveyed utilizing devices known as Multipoint Control Units (MCUs). MCUs comprise mixers and translators (in RTP [RFC3550] terminology), but also signaling support. Long standing experience of the conversational video conferencing industry suggests that there is a need for a few additional feedback messages, to efficiently support MCU-based multipoint conferencing. It appears that at least some of the messages are also desirable in non-MCU based communication relationships. Four messages are introduced, two of them with associated acknowledge messages. The Full Intra Request (FIR) Command requires the receiver of the feedback message (and sender of the video stream) to insert a decoder refresh point (e.g. an IDR/Intra picture) immediately. In order to fulfil congestion control constraints, this may imply a significant drop in frame rate, as IDR/Intra pictures are commonly much larger than regular predicted pictures. The use of this message is restricted to cases where no other means of decoder refresh can be employed, e.g. during the join-phase of a new participant in a multipoint conference. It is explicitly disallowed to use the FIR command for error resilience purposes and instead it is referred to AVPF's PLI message, which reports lost pictures and has been included in AVPF for that purpose. Today, the FIR message appears to be useful primarily with video streams, but in the future it may become helpful also in conjunction with other media codecs that support temporal prediction across RTP packets. The Temporary Maximum Media Bandwidth Request (TMMBR) Message allows a receiver to signal to the media sender the currently maximal supported media bit-rate for a given media stream. Usage scenarios include limiting media senders in MCU scenarios to the slowest receiver, and graceful bandwidth adaptation in scenarios where the upper limit connection bandwidth to a receiver changes but is known in the interval between these dynamic changes. The TMMBR message is useful for all media types that are not inherently of constant bit rate. The Video Freeze Indication is used by MCUs to tell a receiver to stop video decoding, freezing the current image and await a freeze release in the media stream. Wenger, Chandra, Westerlund Standards Track [Page 4] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 Finally, the Temporal-Spatial Tradeoff Request Message enables a receiver to signal to the video sender its preference for spatial quality or high temporal resolution (frame rate). The receiver of the video stream generates this signal typically based on input from its user interface, so to react to explicit requests of the user. However, some implicit use forms are also known. For example, the trade-offs commonly used for live video and document camera content are different. Obviously, this indication is relevant only with respect to video transmission. After the Introduction and the Definitions, the informative sections 3 and 4 provide information on the Motivation and the Solutions. Section 5 contains the normative definition of the feedback messages introduced before. The following sections define signalling, congestion control and security considerations, respectively. 2. Definitions 2.1.G lossary FEC - Forward Error Correction FIR - Full Intra Request MCU - Multipoint Control Unit TMMBR - Temporary Maximum Media Bit-rate Request TMMBA - Temporary Maximum Media Bit-rate Acknowledgement PLI - Picture Loss Indication TSTA - Temporal Spatial Tradeoff Acknowledgement TSTR - Temporal Spatial Tradeoff Request 2.2.T erminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. Message: codepoint defined by this specification, of one of the following types: Request: message that requires Acknowledgement Acknowledgment: message that answers a Request Command: message that forces the receiver to an action Indication: message that reports a situation Note that this terminology is in rough alignment with ITU-T Rec. H.245. Wenger, Chandra, Westerlund Standards Track [Page 5] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 Decoder Refresh Point: A bit string, packetized in one or more RTP packets, that completely resets the decoder to a known state. A typical example for a Decoder Refresh Point is an H.261 Intra picture. However, there are also much more complex decoder refresh points. Rendering: The operation of presenting (parts of) the reconstructed media stream to the user. Decoding: The operation of reconstructing the media stream. 3. Motivation This informative section presents the motivation and use of the different video and media control messages. The video control messages have been discussed previously, and a requirement draft was drawn up [Bassso]. Unfortunately this draft has expired; however we do quote relevant parts out of that draft to provide motivation and requirements. 3.1.U se Cases There are a number of possible usages for the proposed feedback messages. Let's begin with looking through the use cases Basso etc. al [Basso] proposed. Some of the use cases have been reformulated and commented: 1. An RTP video mixer composes multiple encoded video sources into a single encoded video stream. Each time a video source is added, the RTP mixer needs to request a decoder refresh point from the video source, so to start an uncorrupted prediction chain on the spatial area of the mixed picture occupied by the data from the new video source. 2. An RTP video mixer that receives multiple encoded RTP video streams from conference participants, and dynamically selects (e.g. through voice activation) one of the streams to be included in its output RTP stream. At the time of a bit stream change (determined through means such as voice activation or the user interface), the mixer requests a decoder refresh point from the remote source, in order to avoid using unrelated content as reference data for inter picture prediction. After requesting the decoder refresh point, the video mixer stops the delivery of the current RTP stream and monitors the RTP stream from the new source until it detects data belonging to the decoder refresh point. At that time, the RTP mixer starts forwarding the newly selected stream to the receiver(s). Wenger, Chandra, Westerlund Standards Track [Page 6] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 3. An application needs to signal to the remote encoder a request of change of the desired tradeoff in temporal/spatial resolution. For example, one user may prefer a higher frame rate and a lower spatial quality, and another use may prefer the opposite. This choice is also highly content dependent. Many current video conferencing systems offer, in the user interface, a mechanism to make this selection, usually in the form of a slider. 4. Use case 4 of the Basso draft applies only to AVPF's PLI and is not reproduced here. 5. A video mixer switches its output stream to a new video source, similar to use case 2. It instructs the receiving endpoints, by means of a codec control message, to complete the decoding of the current picture and then freezing the picture (stop rendering but continue decoding), until the freeze picture request is released. The freeze picture release codepoint is a mechanism that can be selected on a per picture basis and can be conveyed in-band in most video coding standards. Concurrently, the video mixer requests a decoder refresh point from the new video source, and immediately switches to the new source. Once the new source receives the request for the reference picture and acts on it, it produces a decoder refresh point with an embedded Freeze-Release. Once having received the decoder refresh point with the freeze release information, the receiving endpoints restart rendering and displays an uncorrupted new picture. The main benefit of this method as opposed to the one of use case 2 is that the video mixer does not have to discover the beginning of a decoder refresh point. Stream switching can be performed media-unaware. 6. A video mixer dynamically selects one of the received video streams to be sent out to participants, and tries to provide the highest bit rate possible to all participants, while minimizing stream transrating. One way of achieving this is to setup sessions with endpoints using the maximum bit rate accepted by that endpoint, and by the call admission method used by the mixer. By means of commands that allow flow control, the mixer can then reduce the maximum bit rate sent by endpoints to the lowest common denominator of all received streams. As the lowest common denominator changes due to endpoints joining, leaving, or network congestion, the mixer can adjust the limits to which endpoints can send their streams to match the new limit. The mixer then would request a new maximum bit rate, which is equal or less than the maximum bit-rate negotiated at session setup, for a specific media stream, and the remote endpoint can respond with the actual bit- rate that it can support. The picture Basso, et. al draws up covers most applications we foresee. However we would like to extend the list with one additional use case: Wenger, Chandra, Westerlund Standards Track [Page 7] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 7. The used congestion control algorithms (AMID and TFRC) probe for more bandwidth as long as there is something to send. With congestion control using packet-loss as the indication for congestion, this probing does generally result in reduced media quality due to packet loss and increased delay. In a number of deployment scenarios, especially cellular ones, the bottleneck link is often the last hop link. That cellular link also commonly has some type of QoS negotiation enabling the cellular device to learn the maximal bit-rate available over this last hop. Thus indicating the maximum available bit-rate to the transmitting part can be beneficial to prevent it from even trying to exceed the known hard limit that exists. For cellular or other mobile devices the available known bit-rate can also quickly change due to handover to another transmission technology, QoS renegotiation due to mobility induced congestion, etc. To enable minimal disruption of service a mechanism for quick convergence, especially in cases of reduced bandwidth, a media path signalling method is desired. 3.2. Feedback Messages After these use cases lets review the semantics of the different proposed feedback messages and how applies to the different use cases. 3.2.1. Full Intra Request Command A Full Intra Request (FIR), also known as "video fast update", involves sending a decoder refresh point (normally an Intra or IDR picture in the current video compression standards) to a decoder. more formally, sending a decoder refresh point implies refraining from using any picture data sent prior to that point as a reference for the encoding process, of any subsequent picture sent in the stream. The Full Intra Request instructs the video encoder to complete the encoding of the current video picture and to generate a decoder refresh point at the earliest opportunity. The evaluation of such opportunity includes the current encoder coding strategy and the current available network resources. An H.264 encoder shall react to a Full Intra Request with an IDR picture or a series of pictures forming a gradual decoder refresh, as discussed for example in section D.2.7. of [H.264]. Decoder Refresh points, especially Intra or IDR pictures, independently from the instant in time when they are encoded, are normally several times larger than predicted pictures. Therefore, in scenarios in which the available bandwidth is small, the use of a decoder refresh point implies a delay that is significantly longer than the typical picture duration. Wenger, Chandra, Westerlund Standards Track [Page 8] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 Full Intra Request is motivated by use-case 1, 2, and 5. The sender side's fulfilment of the Full Intra Request can trivially be detected in the media stream. Therefore no acknowledgement of the reception of the command is necessary. 3.2.2. Freeze Request Indication The Freeze Request Indication instructs the video decoder to complete the decoding of the current video picture and subsequently display it until either a timeout period has elapsed, or until the reception of a signal (in band in the video stream) that indicates the release of the frozen picture. Note that a freeze picture release signal is part of the at least the H.261, H.263 and H.264 video coding specifications. Coding schemes that support picture freeze release in their bitstreams are required to use freeze release to signal the remote end to resume decoding. Most video compression standards also define a timeout forcing resuming the video rendering, in case that a Freeze Picture Request has been issued, but no explicit Freeze Release is received. In H.264, for example, the timeout mentioned is at least 6 seconds. As a last resort, this specification contains its own timeout mechanism that forces the resume of rendering after 30 seconds. In adding this feature, the specification reflects the lossy nature of a normal RTP transmission, where it can occur that explicit freeze release signals get lost. Historically, the freeze indication has been used in MCUs according to use case 5. Nowadays, most MCUs operate media aware and simply stop sending media data of the old stream, at a defined picture boundary. The new stream is spliced in at a decoder refresh point. Hence, for modern MCUs, the Freeze indication is of much less relevance. However, a mechanism known as gradual decoder refresh may make the Freeze indication attractive again. Using a gradual decoder refresh, a new user can join a conference by listening in to a sequence of pictures (spanning perhaps a second of video), which are guarantied to gradually form a complete reference picture. The associated problems in the video encoding are non-trivial, but solvable, and applications exist where they have been solved successfully. In order to shield the user from the slow and annoying gradual built-up of the picture, a stop of the rendering is desirable. The freeze picture indication can serve for this purpose. We note that other, more complex means (that may involve control protocols) may also be available serving a similar purpose. Wenger, Chandra, Westerlund Standards Track [Page 9] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 Ideally, the freeze indication requires synchronous delivery with the media data. In its current form, the draft suggests transmitting the freeze indication in the RTCP forward channel, which is not imlicitely synchronized with the media stream. Hence a synchronization problem exists. Both early and late arrival of a freeze indication may result in a bad user experience. Early arrival could result in an unnecessarily long frozen picture. Late arrival could result in the freezing of a picture that should have been frozen earlier; hence it may already at least partly being gradually refreshed and not at all appealing to the user. We solve the early arrival problem by including, into the freeze indication message, the RTP timestamp of the instance from which on the freeze indication applies. The problem resulting from late arrival, which is more severe, cannot be solved easily. There are scenarios where the freezing instance is not far enough in advance known to send the request early enough to guaranty timely arrival. One important example would be voice activated switching, where a fast reaction is desirable and which is unforeseeable by a decoder. It should be remarked, though, that not including the freeze indication at all does not solve above problem either. By including it, we do not create a new problem; we just haven't found the perfect solution for an existing one. Without freeze indication, we would be worse off than with a partly broken one. As stated in section 10 (open issues), we invite readers to propose solutions to this problem. The only obvious solution we found (apart from pushing the problem to the media coding standardization) appears to somehow splice the freeze request into the forward media stream (instead of using RTCP). Ideally, the freeze request would be piggy- packed to each media packet it applies to. This option remains for further consideration. 3.2.3. Temporal Spatial Tradeoff Request and Acknowledgement Temporal Spatial Tradeoff Request (TSTR) instructs the video encoder to change its trade-off between temporal and spatial resolution. Index values from 0 to 31 to indicate monotonically a desire for higher frame rate. In general the encoder reaction time may be significantly longer than the typical picture duration. See use case 3 for an example on how this is used. To allow the TSTR sending application confirmation of reception an acknowledgement process is defined. 3.2.4. Temporary Maximum Media Bit-rate Request and Acknowledgement Wenger, Chandra, Westerlund Standards Track [Page 10] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 The Temporary Maximum Media bit-rate Request (TMMBR) is used by a receiver or MCU to request a sender to limit the individual maximum bit-rate for a media to, or below, the provided value. The primary usage for this is a scenario with MCU (use case 6) that can be depicted as follows: +---+ +------------+ +---+ | A |------| Conference |------| B | +---+ | Bridge | +---+ | (MCU) | +---+ | | +---+ | C |------| |------| D | +---+ +------------+ +---+ Figure 1 - Conference bridge scenario In Figure 1 a small multipart conference is ongoing. All four participants (A-D) have negotiated a common maximum bit-rate that this session can use. However that bit-rate is the one that all participants guarantee to be able to encode and decode and may have sufficient bandwidth for. There exist no guarantees that the links between the MCU and the participants will be able to handle these bit-rates. The conference operates over a number of unicast links between the participants and the MCU. The congestion situation on each of these links and easily be monitored by the participant in question and by the MCU, utilizing for example RTCP Receiver Reports or DCCP [DCCP]. However, any given participant has no knowledge of the congestion situation of the connections to the other participants. Worse, without mechanisms similar to the ones discussed in this draft, the MCU (who is aware of the congestion situation on all connections it manages) has no standardized means to inform participants to slow down; short of forging receiver reports (which is undesirable). In principle, an RTP mixer confronted with such a situation is obliged to thin streams intended for connections which detected congestion. In practice, stream thinning, if performed media aware, is unfortunately a very difficult and cumbersome operation and adds undesirable delay. If done media unaware, it leads very quickly to unacceptable reproduced media quality. Hence, means to slow down senders even in the absence of congestion on their connections to the MCU are desirable. To allow the MCU to perform congestion control on the individual links, without performing transcoding, there must be a mechanism that enables the MCU to request the participant's media encoders to limit their maximum media bit-rate currently used. The MCU handles the detection of a congestion state between itself and a participant as follows: 1. Start thinning the media traffic to the supported bit-rate. 2. Use the TMMBR to request the media sender(s) to reduce the media bit-rate sent by them to the MCU, to a value that is in compliance with congestion control priciples for the slowest link. Slow Wenger, Chandra, Westerlund Standards Track [Page 11] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 refers here to the available bandwidth and packet rate after congestion control. 3. As soon as the bit-rate has been reduced by the sending part, the MCU stops stream thinning implicitely, because there is no need for it any more as the stream is in compliance with congestion control. Above algorithms may suggest to some that there is no need for the TMMBR; it should be sufficient to solely rely on stream thinning. As much as this is desirable from a network protocol designer's viewpoint, it has the disadvantage that it doesn't work very well. As the very minimum, to make stream thinning work without severe quality degradation, the video encoder has to be cooperative in that it tailors the bit stream to make it suitable for thinning. While this is possible, it does already have negative implications to the coding efficiency and hence quality, as thinnable streams are less efficient than non-thinnable streams. Furthermore, the more thinable a stream is, the less good is its coding efficiency. Stream thinning of video streams not tailored for that purpose very quickly results in unusable reproduced quality. It appears to be a reasonable compromise to rely on stream thinning as an immediate reaction tool to combat congestions, and have a quick minimum control mechanism that instructs the original sender to reduce its bitrate. Note also that the standard RTCP receiver report may not serve for the purpose mentioned. In some MCU environments, the RTCP RR is only being sent between the RTP receiver in the endpoint and the RTP sender in the MCU. The stream that needs to be bandwidth-reduced, however, is the one between the original sending endpoint and the MCU. This endpoint doesn't see the aforementioned RTCP RRs, and hence needs explicitly informed about desired bandwidth adjustments. The TMMBR only provides an upper limit, because the media sender may be required to lower the media bit-rate to levels lower than the indicated value. One example is detected congestion between media sender and the MCU. It is the MCU's responsibility to take into consideration the multiple max media bit rate requests, which it receives from the receivers, and its knowledge about the congestion control state, and select the lowest of those bit rate values. The MCU may also support certain transcoding capabilities, which can be employed for some of the receivers so as not to reduce the conference bit rate to a lowest common denominator, which would affect the user experience. It may also be faced with the problem that it needs to change more than one media, although many audio/video conferences usually only change the video bit-rate. Wenger, Chandra, Westerlund Standards Track [Page 12] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 The TMMBR needs to be acknowledged, as it is fundamental that the MCU knows that the value state has been established at the media sender side. In addition to the use case mentioned above there seem to exist opportunities to use the method discussed to improve performance in the scenario described below. However this is still under discussion. In use case 7 it may be possible to use TMMBR to improve the performance at times of changes in the known upper limit of the bit- rate. In this use case normally the signaling protocol will have established an upper limit for the session and media bit-rates. However at the time of change a receiver could avoid serious congestion by sending a TMMBR to the sending side. Then when it is certain that the new bit-rate will be what applies in this session it can perform a renegotiation of the session upper limit using the signalling protocol. 3.3.U sing the Media Path There are multiple reasons why we propose to use the media path for the messages. First, systems employing MCUs are usually separating the control and media processing parts. As these messages are intended or generated by the media processing rather than the signaling part of the MCU, having them on the media path avoids interfaces and unnecessary control traffic between signalling and processing. Secondly, the signalling path quite commonly contains several signalling entities, e.g. SIP-proxies and application servers. Avoiding signalling entities avoids delay for several reasons. Signalling proxies may also have less stringent delay requirements than media processing and due to their complex and more generic nature may result in significant processing delay. The topological locations of the signalling entities are also commonly not optimized for minimal delay, rather other architectural goals. Thus the signalling path can be significantly longer in both geographical and delay sense. 4. Solution This informative section discusses the solution and how the different components fit together. The formal definitions of the AVPF feedback messages are provided in section 5, and the signaling in section 8. We employ the AVPF [AVPF] and its feedback message framework. AVPF provides a simple way of implementing the new messages and also provides timing rules controlling when the feedback messages can be sent, which we re-use by reference. Wenger, Chandra, Westerlund Standards Track [Page 13] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 The signalling allows each individual type of function to be configured or negotiated on a RTP session basis. The freeze picture, full intra request and temporal spatial trade-off can even be negotiated on payload type level within an RTP session. 4.1.R eliability The use of RTCP messages implies that each message transfer is unreliable, unless the lower layer transport provides reliability. The different messages proposed in this specification have different requirements in terms of reliability. However, in all cases, some way of dealing with occasional loss of feedback messages must be supported. With TMMBR and TSTR, a request and reception acknowledgement mechanism is proposed. It is used to allow the sender of the request to know that the request recipient has received the request. This is desirable behaviour for both mechanisms as neither of TMMBR or TSTR necessarily result in an easily identifiable (or any) change of the behaviour from the receiver of the request. The FIR command should result in the delivery of a decoder refresh point. Decoder refresh points are easily identifiable from the bit stream. Hence there is no need for protocol-level acknowledgement, and a simple command repetition mechanism is sufficient for ensuring the level of reliability required. However, the potential use of repetition does require a mechanism to prevent the recipient from responding to messages already received and responded to. The Freeze indication is only valid for a specific duration. This fact alone is already an indication for the need of timely delivery. The loss of the indication results in lowered rendered media quality, however it will not cause any permanent damage to the stream. Hence, the indication can be repeated during the freeze period as long as the repeated messages indicate the time instance from which they apply, so to provide protection against packet loss. As mentioned before, the problem of late delivery exist, and there appears to be no good solution for it (at least when using RTCP as the transmission mechanism). 4.2.M ulticast The media related requests might be used with multicast. The RTCP timing rules specified in [RFC3550] and [AVPF] ensure that the messages do not cause overload of the RTCP connection. More problematic are inconsistent messages arriving at the RTP sender from different receivers, when multicast is employed. Lack of time prevented us from addressing this problem adequately. In later revisions of this draft, we plan to add, in each message definition, advice how to handle those inconsistencies. Wenger, Chandra, Westerlund Standards Track [Page 14] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 4.3. Freeze Picture The Freeze Picture request advises the receiver to complete the decoding of the current picture and to freeze that picture (stop rendering). Normally, rendering commences again as soon as a freeze release signal is received (typically in-band in the video stream) or the media specific timeout (defined in the video coding specification, normally 6 seconds) expires. For robustness, this specification contains its own timeout mechanism. Decoding of the stream continues during the freezing phase. Freeze requests are normally issued because the video stream contains coded video that is unappealing to the user. A typical example is a gradual decoder refresh, which looks very much like a slow built-up of an image using blocks of 16x16 pixels. To avoid rendering this unappealing data, the freeze request has high demands for a timely delivery. Therefore, early or, even better, immediate feedback mode is recommended. The Freeze picture feedback message contains both the SSRC of the party sending the request and the SSRC of the media source the request applies too. They may be the same SSRC; however, normally they will be different. To ensure highest possible delivery probability of the freeze request, the request may be repeated during the whole freeze period. To allow the receiver to correctly timeout the freeze request and determine from what point in the media it was valid, the freeze request contains the RTP media timestamp corresponding to the picture from which on the request applies. Unfortunately, a late reception of the freeze request may result in a very annoying picture quality; see the discussion above. This could be resolved by including the freeze indication in all media packets to which it applies. This concept is for future study. 4.4.F ull Intra Request Command The Full Intra Request (FIR) command, when received by the designated media sender, requires that the media sender, as soon as possible considering congestion control, sends an decoder refresh point (normally an Intra or IDR picture, depending on the video standard employed). The FIR contains the SSRC of the requesting party and the media sender that shall send the decoder refresh point. The FIR also contains a request sequence number for detection of repetitions of a request and new requests. To ensure the best possible reliability, a sender of FIR may repeat the FIR request until a response has been received. The repetition interval is determined by the RTCP timing rules the session operates Wenger, Chandra, Westerlund Standards Track [Page 15] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 under. Upon reception of a complete decoder refresh point, or the detection of an attempt to send a decoder refresh point (which got damaged due to a packet loss) the repetition of the FIR must stop. If another FIR is necessary, the request sequence number must be increased. To combat loss of the decoder refresh points sent, the sender that receives repetitions of the FIR 2 RTT after the transmission of the decoder refresh point shall send a new decoder refresh point. A FIR sender shall not have more than one FIR request (different request sequence number) outstanding at any time per media sender in the session. The first FIR command message may be sent using early or immediate feedback RTCP packets. Usage in multicast is possible; however aggregation of the commands will be necessary. A receiver that receives a request closely (within 2*RTT) after sending a decoder refresh point should await a second command to ensure that the receiver hasn't been served with the previously delivered decoder refresh point. 4.5.T emporal Spatial Tradeoff The solution for Temporal Spatial Tradeoff consists of one Request and one acknowledgement message. The Request (TSTR) is sent to the source that a receiver requests to change its tradeoff. The source determines if the request will result in a change of the trade off. As acknowledgement on the reception of the request the value used after the request is responded using the Indication message (TSTA). 4.6. Temporary Maximum Media Bit-Rate The temporary maximum media bit-rate messages are generic messages that can be applied to any media. The solution for the temporary limiting of the maximum media bit-rate allowed to be used by the sender, is implemented by a request/acknowledge message pair. The temporary maximum media bit- rate request message (TMMBR) sets the maximum bit-rate that the sender may use to this receiver. If multiple maximum bit-rates are set in a given session, where the media is common to all the receivers (for example multicast) the sender should set its sending bit rate to the lowest value received. The maximum bit-rate values from receivers that time-out from the session shall be removed from consideration, possible triggering a change in the maximum bit-rate value used. In these cases it is recommneded that the sender transmit a Maximum bit-rate indication. The corresponding acknowledgement message signals the reception of the request, and is called temporary maximum media bit-rate Wenger, Chandra, Westerlund Standards Track [Page 16] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 acknowledgement (TMMBA). It shall be sent for each reception of a TMMBR message, even repetition of earlier received messages. The acknowledgement includes the sequence number of the request received. The feedback messages may be used in both multicast and unicast sessions. For sessions with a larger number of participants using the lowest common denominator may not be the wisest course. It also important to consider the security risks involved with faked MMBRs. 5. RTCP Receiver Report Extensions This memo specifies six new feedback messages. The Freeze Picture Indication, Full Intra Request (FIR) Command, Temporal-Spatial Tradeoff Request (TSTR), and Temporal-Spatial Tradeoff Acknowledgement (TSTA) are all "Payload Specific Feedback Messages" in the sense of section 6.3 of AVPF [AVPF]. The Temporary Maximum Media Bit-rate Request (TMMBR) and Temporary Maximum Media Bit-rate Acknowledgement (TMMBA) are "Transport Layer Feedback Messages" in the sense of section 6.2 of AVPF. In the following subsections, the new feedback messages are defined, following a similar structure as in the AVPF specification's sections 6.2 and 6.3, respectively. 5.1. Transport Layer Feedback Messages Transport Layer FB messages are identified by the value RTPFB as RTCP message type. In AVPF, one message of this category had been defined. This memo specifies two more messages for a total of three messages of this type. They are identified by means of the FMT parameter as follows: 0: unassigned 1: Generic NACK (as per AVPF) 2: Maximum Media Bit-rate Request 3: Maximum Media Bit-rate Acknowledgement 4-30: unassigned 31: reserved for future expansion of the identifier number space The following subsection defines the formats of the FCI field for this type of FB message. 5.1.1. Temporary Maximum Media Bit-rate Request (TMMBR) The FCI field MUST contain a single TMMBR per feedback message. 5.1.1.1. Semantics Wenger, Chandra, Westerlund Standards Track [Page 17] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 The TMMBR is used to indicate the highest bit-rate per sender of a media, which the receiver currently supports in this session. The message sender SHOULD set this bit rate to the maximum sending rate the receiver wishes to process. The media sender MAY use any lower bit-rate, as it may need to address a congestion situation or other limiting factors. See section 6 (congestion control) for more discussions. The "SSRC of the packet sender" field indicates the source of the request, and the "SSRC of media source" denotes the media sender the message applies to. TMMBR feedback SHOULD NOT be used if the underlying transport protocol is capable of providing similar feedback information to the sender. 5.1.1.2.M essage Format The Feedback control information (FCI) field has the following Syntax (figure 2): 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Seq. nr | Maximum bit-rate in bit/s | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2: Syntax for the TMMBR message Seq. nr: Request sequence number. Each feedback sending source (SSRC) has its own sequence number space. The sequence number SHALL be increased by 1 modulo 256 for each new request. A repetition SHALL NOT increase the sequence number. Maximum bit-rate: The temporary maximum media bit-rate value in bit/s. The length of the FB message MUST be set to 3. 5.1.1.3.T iming Rules The first transmission of the request message MAY use early or immediate feedback in cases when timeliness is desirable. Any repetition of a request message SHOULD follow the normal RTCP transmission timing. 5.1.2. Temporary Maximum Media Bit-rate Acknowledgement (TMMBA) The FCI field MAY contain one or more TMMBA per feedback message. Wenger, Chandra, Westerlund Standards Track [Page 18] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 5.1.2.1. Semantics This feedback message is used to acknowledge the reception of a TMMBR. It SHALL be sent for each TMMBR received that was targeted to this receiver, i.e. for each TMMBR received in which the "SSRC of media source" field is identical to the receiving entities SSRC. The acknowledgement SHALL also be sent for repetitions received. If the request's receiver has received TMMBR with several different sequence numbers from a single requestor, it MAY aggregate several acknowledgments in the same message by concatenating the FCI fields for each sequence number. 5.1.2.2.M essage Format The Feedback control information (FCI) field has the following Syntax (figure 3): 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Seq. nr | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3: Syntax for the TMMR message Seq. nr: Request sequence number being acknowledged. Reserved: These bits SHALL be set to 0 and SHALL be ignored by the receiver. The length field value of the FB message MAY be 3 or more. 5.1.2.3.T iming Rules The acknowledgement SHOULD be sent as soon as allowed by the applied timing rules for the session, preferably using an immediate or early feedback message. 5.2. Payload Specific Feedback Messages Payload-Specific FB messages are identified by the value PT=PSFB as RTCP message type. AVPF defines three payload-specific FB messages and one application layer FB message. This memo specifies two additional payload specific feedback messages. All are identified by means of the FMT parameter as follows: 0: unassigned 1: Picture Loss Indication (PLI) 2: Slice Lost Indication (SLI) Wenger, Chandra, Westerlund Standards Track [Page 19] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 3: Reference Picture Selection Indication (RPSI) 4: Full Intra Request Command (FIR) 5: Temporal-Spatial Tradeoff Request (TSTR) 6: Temporal-Spatial Tradeoff Acknowledgement (TSTA) 7: Freeze Indication 8-14: unassigned 15: Application layer FB message 16-30: unassigned 31: reserved for future expansion of the sequence number space The following subsections define the new FCI formats for the payload- specific FB messages. 5.2.1. Full Intra Request (FIR) The FIR FB message is identified by PT=PSFB and FMT=4. There MUST be exactly one FIR contained in the FCI field. 5.2.1.1. Semantics Upon reception of a FIR message, an encoder MUST send a decoder refresh point as soon as possible. A "decoder refresh point" is a video picture (or a sequence of video pictures) that resets all cross-picture prediction mechanisms in the decoder into a known state. Note: Currently, video appears to be the only useful application for FIR, as it appears to be the only payload widely deployed that relies heavily on media prediction across RTP packet boundaries. However, use of FIR could also reasonably be envisioned for other media types that share essential properties with compressed video, namely cross-frame prediction (whatever a frame may be for that media type). One possible example may be the dynamic updates of MPEG-4 scene descriptions. It is suggested that payload formats for such media types refer to FIR and other message types defined in this specification and in AVPF, instead of creating similar mechanisms in the payload specifications. The payload specifications may have to explain how the payload specific terminologies map to the video-centric terminology used here. Note: Typical examples for "hard" decoder refresh points are Intra pictures in H.261, H.263, MPEG 1/2 and MPEG-4 part 2, and IDR pictures in H.264. "Gradual" decoder refresh points may also be used; see for example [Gradual]. While both "hard" and "gradual" decoder refresh points are acceptable in the scope of this specification, in most cases the user experience will benefit from using a "hard" decoder refresh point. Wenger, Chandra, Westerlund Standards Track [Page 20] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 Note: A decoder refresh point also contains all header information above the picture layer (or equivalent, depending on the video compression standard) that is conveyed in-band. In H.264, for example, a decoder refresh point contains parameter set NAL units that generate parameter sets necessary for the decoding of the following slice/data partition NAL units (and that are not conveyed out of band). Note: In environments where the sender has no control over the codec (e.g. when streaming pre-recorded and pre-coded content), the reaction to this command cannot be specified. One suitable reaction of a sender would be to skip forward in the video bit stream to the next decoder refresh point. In other scenarios, it may be preferable not to react to the command at all, e.g. when streaming to a large multicast group. Other reactions may also be possible. When deciding on a strategy, a sender could take into account factors such as the size of the receiving multicast group, the ''importance'' of the sender of the FIR message (however ''importance'' may be defined in this specific application), the frequency of decoder refresh points in the content, and others. However FIR shouldn't be used in a session which predominately handles pre-coded content as there is encoder accessible that could react appropriately. Instead, usage of transport level reliability mechanism is recommended. The sender MUST consider congestion control as outlined in section 6, which MAY restrict its ability to send a decoder refresh point quickly. Note: The relationship between the Picture Loss Indication and FIR is as follows. As discussed in section 6.3.1 of AVPF, a Picture Loss Indication informs the decoder about the loss of a picture and hence the likeliness of misalignment of the reference pictures in encoder and decoder. Such a scenario is normally related to losses in an ongoing connection. In point-to-point scenarios, and without the presence of advanced error resilience tools, one possible option an encoder has is to send a decoder refresh point. However, there are other options including ignoring the PLI, for example if only one receiver of many has sent a PLI or when the embedded stream redundancy is likely to clean up the reproduced picture within a reasonable amount of time. The FIR, in contrast, leaves a real-time encoder no choice but to send a decoder refresh point. It disallows the encoder to take any considerations such as the ones mentioned above into account. Note: Mandating a maximum delay for completing the sending of a decoder refresh point would be desirable from an application viewpoint, but may be problematic from a congestion control point of view. the phrase 'As soon as possible' as mentioned above appears to be a reasonable compromise. Wenger, Chandra, Westerlund Standards Track [Page 21] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 FIR SHALL NOT be sent as a reaction to picture losses. Instead, it is RECOMMENDED to use PLI instead. FIR SHOULD be used only in such situations where not sending a decoder refresh point would render the video unusable for the users. Note: a typical example where sending FIR is adequate is when, in a multipoint conference, a new user joins the session and no regular decoder refresh point interval is established. Another example would be a video switching MCU that changes streams. Here, normally, the MCU issues a freeze picture request to the receiver(s), switches the streams, and issues a FIR to the new sender so to force it to emit a decoder refresh point. The decoder refresh point includes normally a Freeze Picture Release, which re- starts the rendering process of the receivers. Both techniques mentioned are commonly used in MCU-based multipoint conferences. Other RTP payload specifications such as RFC 2032 [RFC2032] already define a feedback mechanism for certain codecs. An application supporting both schemes MUST use the feedback mechanism defined in this specification when sending feedback. For backward compatibility reasons, such an application SHOULD also be capable to receive and react to the feedback scheme defined in the respective RTP payload format, if this is required by that payload format. 5.2.1.2. Message Format FIR does not require parameters. Therefore, the length field MUST be 2, and there MUST NOT be any Feedback Control Information. The semantics of this FB message is independent of the payload type. 5.2.1.3.T iming Rules The timing follows the rules outlined in section 3 of [AVPF]. FIR MAY be used with early or immediate feedback. 5.2.1.4. Remarks FIR messages typically trigger the sending of full intra or IDR pictures. Both are several times larger then predicted (inter) pictures. Their size is independent of the time they are generated. In most environments, especially when employing bandwidth-limited links, the use of an intra picture implies an allowed delay that is a significant multitude of the typical frame duration. An example: If the sending frame rate is 10 fps, and an intra picture is assumed to be 10 times as big as an inter picture, then a full second of latency Wenger, Chandra, Westerlund Standards Track [Page 22] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 has to be accepted. In such an environment there is no need for a particular short delay in sending the FIR message. Hence waiting for the next possible time slot allowed by RTCP timing rules as per [AVPF] does not have a negative impact on the system performance. 5.2.2. Temporal-Spatial Tradeoff Request (TSTR) The TSTR FB message is identified by PT=PSFB and FMT=5. There MUST be exactly one TSTR contained in the FCI field. 5.2.2.1. Semantics A decoder can suggest the use of a temporal-spatial tradeoff by sending a TSTR message to an encoder. If the encoder is capable of adjusting its temporal-spatial tradeoff, it SHOULD take the received TSTR message into account for future coded pictures. A value of 0 suggests a high spatial quality and a value of 31 suggests a high frame rate. The values from 0 to 31 indicate monotonically a desire for higher frame rate. Actual values do not correspond to precise values of spatial quality or frame rate. 5.2.2.2. Message Format The Temporal-Spatial Tradeoff Request uses one additional FCI field, the content of which is depicted in figure 4. The length of the FB message MUST be set to 3. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Seq nr. | | Index | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 4: Syntax of the TSTI Seq. nr: Request sequence number. Each feedback sending source (SSRC) has its own sequence number space. The sequence number SHALL be increased by 1 modulo 256 for each new request. A repetition SHALL NOT increase the sequence number. Index: An integer value between 0 and 31 that indicates the relative trade off that is requested. An index value of 0 index highest possible spatial quality, while 31 indicates highest possible temporal resolution. 5.2.2.3.T iming Rules Wenger, Chandra, Westerlund Standards Track [Page 23] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 The timing follows the rules outlined in section 3 of [AVPF]. This request message is not time critical and SHOULD be sent using regular RTCP timing. 5.2.2.4. Remarks The term ''spatial quality'' does not necessarily refer to the resolution, measure by the number of pixels the reconstructed video is using. In fact, in most scenarios the video resolution will likely stay constant during the lifetime of a session. However, all video compression standards have means to adjust the spatial quality at a given resolution, normally referred to as Quantizer Parameter or QP. A numerically low QP results in a good reconstructed picture quality, whereas a numerically high QP yields a coarse picture. The typical reaction of an encoder to this request is to change its rate control parameters to use a lower frame rate and a numerically lower (on average) QP, or vice versa. The precise mapping of Index, frame rate, and QP is intentionally left open here, as it depends on factors such as compression standard employed, spatial resolution, content, bit rate, and many more. 5.2.3. Temporal-Spatial Tradeoff Acknowledgement (TSTA) The TSTA FB message is identified by PT=PSFB and FMT=6. There MUST be at least one TSTA in the FCI field. 5.2.3.1. Semantics This feedback message is used to acknowledge the reception of a TSTR. It SHALL be sent for each TSTR targeted to this receiver, i.e. each TSTR received that in the "SSRC of media source" field has the receiving entities SSRC. The acknowledgement SHALL be sent also for repetitions received. If the request receiver has received TSTR with several different sequence numbers from a single requestor it MAY aggregate several acknowledgments in the same message by concatenating the FCI fields for each sequence number. 5.2.3.2. Message Format The Temporal-Spatial Tradeoff Acknowledgement uses one additional FCI field, the content of which is depicted in figure 5. The length of the FB message MUST be set to 3. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Wenger, Chandra, Westerlund Standards Track [Page 24] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 | Seq nr. | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 5: Syntax of the TSTI Seq. nr: Request sequence number being acknowledged. 5.2.3.3.T iming Rules The timing follows the rules outlined in section 3 of [AVPF]. This acknowledgement message is not time critical and SHOULD be sent using regular RTCP timing. 5.2.3.4. Remarks 5.2.4. Freeze Indication The Freeze Indication FB message is identified by PT=PSFB and FMT=7. 5.2.4.1. Semantics Upon reception of this message, the media receiver MUST continue decoding the media stream, but SHOULD stop rendering it, until one of the following conditions are met: 1. Media specific timeout occurs. Note that most video compression standards define such a timeout, usually around 5 seconds 2. A media-specific freeze release signal is detected. Note that most video compression standards contain means, e.g. bits in the picture header or SEI message, to signal a freeze release in- band. 3. A timeout of 30 seconds. Note: this timeout is included as a perhaps entirely redundant safety measure to fix problems resulting from non-compliant encoders. The value of 30 seconds has been arbitrarily chosen to be significantly higher than all reasonable media timeouts. 5.2.4.2. Message Format Freeze does not require parameters. Therefore, the length field MUST be 2, and there MUST NOT be any Feedback Control Information. 5.2.4.3.T iming Rules Wenger, Chandra, Westerlund Standards Track [Page 25] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 The timing follows the rules outlined in section 3 of [AVPF]. This request message is time critical and SHOULD be sent using immediate or early RTCP timing if possible. 5.2.4.4. Remarks 6. Congestion Control The correct application of the AVPF timing rules should prevent the network flooding by feedback messages. Hence, assuming a correct implementation, the RTCP channel cannot break its bit-rate commitment and introduce congestion. The reception of some of the feedback messages modifies the behavior of the media senders or, more specifically, the media encoders. All of these modifications MUST only be performed within the bandwidth limits the applied congestion control provides. For example, when reacting to a FIR, the unusually high number of packets that form the decoder refresh point have to be paced in compliance with the congestion control algorithm, even if the user experience suffers from a slowly transmitted decoder refresh point. A change of the Temporary Maximum Media Bit-rate value can only mitigate congestion, but not cause congestion. An increase of the value REQUIRES that any transmission up to that value be allowed by the used congestion control mechanism at the time of sending. A reduction of the value may result in a reduced transmission bit-rate thus reducing the risk for congestion. 7. Security Considerations The defined messages have certain properties that have security implications. These must be addressed and taken into account by users of this protocol. The defined signaling mechanism is sensitive to modification attacks that can result in session creation with sub-optimal configuration, and, in the worst case, session rejection. To prevent this type of attack, authentication and integrity protection of the signaling is required. Spoofing of feedback messages defined in this specification can have the following implications: a. Severely reduced media bit-rate due to false TMMBR messages that sets the maximum perhaps to a very low value. b. Sending TSTR that result in a video quality different from the user's desire, rendering the session less useful. c. The usage of Freeze to constantly freeze the receivers video output, and hence reducing the practical framerate of the Wenger, Chandra, Westerlund Standards Track [Page 26] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 video to (worst case) 1 frame every 30 seconds. This is attack only require sending a freeze message every 30 seconds to each of the receivers. To prevent these attacks there is need to apply authentication and integrity protection of the feedback messages. This can be accomplished against group external threats using SRTP [SRTP]. In the MCU cases separate security contexts can be applied between the MCU and the participants thus protecting other MCU users from a misbehaving participant. 8. SDP Definitions Section 4 of [AVPF] defines new SDP attributes that are used for the capability exchange of the AVPF commands and indications, like Reference Picture selection, Picture loss indication etc. The defined SDP attribute is known as rtcp-fb and its ABNF is described in section 4.2 of [AVPF]. In this section we extend the rtcp-fb attribute to include the commands and indications that are described in this document for codec control protocol. We also discuss the Offer/Answer implications for the codec control commands and indications. 8.1. Extension of rtcp-fb attribute As described in [AVPF] rtcp-fb attribute is defined to indicate the capability of using RTCP feedback. The rtcp-fb attribute MUST only be used as a media level attribute and MUST NOT be provided at session level. All the rules described in [AVPF] for rtcp-fb attribute relating to payload type, multiple rtcp-fb attributes in a session description hold for the new feedback messages for codec control defined in this document. The ABNF for rtcp-fb attributed as defined in [AVPF] is Rtcp-fb-syntax = ''a=rtcp-fb:'' rtcp-fb-pt SP rtcp-fb-val CRLF Where rtcp-fb-pt is the payload type and rtcp-fb-val defines the type of the feedback message such as ack, nack, trr-int and rtcp-fb-id. For example to indicate the support of feedback of picture loss indication, the sender declares the following in SDP v=0 o=alice 3203093520 3203093520 IN IP4 host.example.com s=Media with feedback t=0 0 c=IN IP4 host.example.com m=audio 49170 RTP/AVPF 98 a=rtpmap:98 H263-1998/90000 Wenger, Chandra, Westerlund Standards Track [Page 27] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 a=rtcp-fb:98 nack pli In this document we define a new feedback value type called ''ccci'' which indicates the support of codec control commands using RTCP feedback messages. The ''ccci'' feedback value should be used with parameters, which indicates the support of which codec commands the session would use. In this draft we define four parameters, which can be used with the ccci feedback value type. o ''fir'' indicates the support of Full Intra Request o ''tmmbr'' indicates the support of Temporal Maximum Media Bit- rate o ''tsro'' indicates the support of temporal spatial tradeoff request. o "frz" indicates the support of Freeze Indication. In ABNF for rtcp-fb-val defined in [AVPF], there is a placeholder called rtcp-fb-id to define new feedback types. The ccci is defined as a new feedback type in this document and the ABNF for the parameters for ccci are defined here (please refer section 4.2 of [AVPF] for complete ABNF syntax). Rtcp-fb-param = SP ''app'' /SP rtcp-fb-ccci-param / ; empty rtcp-fb-ccci-param = 1*(ccci-params) ccci-params = "fir" ; Full Intra Request / "tmmbr" ; temporary max media bit rate / "tstr"; Temporal Spatial Trade Off / "frz" ; Freeze Indication / token ; for future commands/indications 8.2. Offer-Answer The Offer/Answer [RFC3264] implications to codec control protocol feedback messages are similar to as described in [AVPF]. The offerer MAY indicate the capability to support selected codec commands and indications. The answerer MUST remove all ccci parameters, which it does not understand or does not wish to use in this particular media session. The answerer MUST NOT add new ccci parameters in addition to what has been offered. The answer is binding for the media session and both offerer and answerer MUST only use feedback messages negotiated in this way. Wenger, Chandra, Westerlund Standards Track [Page 28] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 8.3.E xamples Example 1: The following SDP describes a point-to-point video call with H.263 with the originator of the call declaring its capability to support some codec control messages. The SDP is carried in a high level signaling protocol like SIP v=0 o=alice 3203093520 3203093520 IN IP4 host.example.com s=Point-to-Point call c=IN IP4 172.11.1.124 m=audio 49170 RTP/AVP 0 a=rtpmap:0 PCMU/8000 m=video 51372 RTP/AVPF 98 a=rtpmap:98 H263-1998/90000 a=rtcp-fb:98 ccci tstr fir In the above example the sender when it receives a TSTR message from the remote party can adjust the trade off as indicated in the RTCP TSTA feedback message. Example 2: The following SDP describes a SIP end point joining a video MCU that is hosting a multiparty video conferencing session. The participant supports only the FIR (Full Intra Request) codec control command and it declares it in its session description. The video MCU can send an FIR RTCP feedback message to this end point when it needs to send this participants video to other participants of the conference. v=0 o=alice 3203093520 3203093520 IN IP4 host.example.com s=Multiparty Video Call c=IN IP4 172.11.1.124 m=audio 49170 RTP/AVP 0 a=rtpmap:0 PCMU/8000 m=video 51372 RTP/AVPF 98 a=rtpmap:98 H263-1998/90000 a=rtcp-fb:98 ccci fir When the video MCU decides to route the video of this participant it sends an RTCP FIR feedback message. Upon receiving this feedback message the end point is mandated to generate a full intra request. Example 3: The following example describes the Offer/Answer implications for the codec control messages. The Offerer wishes to support all the commands and indications of codec control messages. The offered SDP is -------------> Offer Wenger, Chandra, Westerlund Standards Track [Page 29] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 v=0 o=alice 3203093520 3203093520 IN IP4 host.example.com s=Offer/Answer c=IN IP4 172.11.1.124 m=audio 49170 RTP/AVP 0 a=rtpmap:0 PCMU/8000 m=video 51372 RTP/AVPF 98 a=rtpmap:98 H263-1998/90000 a=rtcp-fb:98 ccci tstr fir frz tmmbr The answerer only wishes to support FIR and TSTO message as the codec control messages and the answerer SDP is <---------------- Answer v=0 o=alice 3203093520 3203093524 IN IP4 host.anywhere.com s=Offer/Answer c=IN IP4 189.13.1.37 m=audio 47190 RTP/AVP 0 a=rtpmap:0 PCMU/8000 m=video 53273 RTP/AVPF 98 a=rtpmap:98 H263-1998/90000 a=rtcp-fb:98 ccci fir tstr 9. IANA Considerations The new value of ccci for the rtcp-fb attribute needs to be registered with IANA. Value name: ccci Long Name: Codec Control Commands and Indications Reference: RFC XXXX For use with ''ccci'' the following values also needs to be registered. Value name: fir Long name: Full Intra Request Command Usable with: ccci Reference: RFC XXXX Value name: tmmbr Long name: Temporary Maximum Media Bit-rate Usable with: ccci Reference: RFC XXXX Value name: tstr Long name: temporal Spatial Trade Off Wenger, Chandra, Westerlund Standards Track [Page 30] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 Usable with: ccci Reference: RFC XXXX Value name: frz Long name: Freeze Indication Usable with: ccci Reference: RFC XXXX 10.O pen Issues As this draft is under development, certain open issues are to be resolved. Please provide feedback on the following open issues: 1. Freeze Picture: Due to risk of severely reduced media quality due to late freeze indications being delivered after a video packet from the new stream the following open issues exist: a) Should another method of delivery be used, like piggybacking it to all video packets. b) should it be supported at all as alternatives do exist, although they are more complex. 2. general concept of two way Request/Ack in RTCP. Desirable? 3. Request/Ack mechanism for Temporal/Spatial Tradeoff. Is it necessary? 4. Should semantic acknowledgement of TMMBR and TSTR be defined? With semantic we mean that the request receiver indicates whether, and to what extend, it will honor the information received. 5. The Temporary Maximum Media Bit-rate Request (TMMBR) could be used in other cases then with MCUs to improve behavior when the bit- rate is reduced due to receiver detectable events. Should this be pursued? 6. Which feedback messages should not only allow specific targets but all receivers or senders? 7. For the TSTA, should it be possible to indicate both positive and negative acknowledgement? OR should support from an end-point only be negotiated at session setup time? 8. Should Freeze Indication be implemented using another protocol than RTCP? 9. Is 30 seconds a reasonable timeout for Freeze Picture? 11.A cknowledgements The authors would like to thank Andrea Basso, Orit Levin, Nermeen Ismail for their work on the requirement and discussion draft [Basso]. We further thank Roni Even and Joerg Ott for their valuable advice. Wenger, Chandra, Westerlund Standards Track [Page 31] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 12.R eferences 12.1. Normative references [AVPF] draft-ietf-avt-rtcp-feedback-11.txt [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003. [RFC2032] Turletti, T. and C. Huitema, "RTP Payload Format for H.261 Video Streams", RFC 2032, October 1996. [RFC2327] Handley, M. and V. Jacobson, "SDP: Session Description Protocol", RFC 2327, April 1998. [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video Conferences with Minimal Control", STD 65, RFC 3551, July 2003. [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with Session Description Protocol (SDP)", RFC 3264, June 2002. 12.2. Informative references [Basso] A. Basso, et. al., "Requirements for transport of video control commands", draft-basso-avt-videoconreq-02.txt, expired Internet Draft, October 2004. [AVC] Joint Video Team of ITU-T and ISO/IEC JTC 1, ITU-T Draft Recommendation and Final Draft International Standard of Joint Video Specification (ITU-T Rec. H.264 | ISO/IEC 14496-10 AVC), Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, JVT-G050, March 2003. [SRTP] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 3711, March 2004. [Gradual] A reference on gradual decoder refresh (or H.264 spec) [DCCP] E. Kohler, et al., "Datagram Congestion Control Protocol (DCCP)," draft-ietf-dccp-spec-11.txt, March 2005. [H.264] ITU-T Rec. H.264 (2005) 13.A uthors' Addresses Stephan Wenger Nokia Corporation P.O. Box 100 FIN-33721 Tampere FINLAND Phone: +358-50-486-0637 EMail: Stephan.Wenger@nokia.com Wenger, Chandra, Westerlund Standards Track [Page 32] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 Umesh Chandra Nokia Research Center 6000 Connection Drive Irving, Texas 75063 USA Phone: +1-972-894-6017 Email: Umesh.Chandra@nokia.com Magnus Westerlund Ericsson Research Ericsson AB SE-164 80 Stockholm, SWEDEN Phone: +46 8 7190000 EMail: magnus.westerlund@ericsson.com Full Copyright Statement Copyright (C) The Internet Society (2005). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any Wenger, Chandra, Westerlund Standards Track [Page 33] INTERNET-DRAFT AVPF RTCP-RR Extensions July 11, 2005 assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. RFC Editor Considerations The RFC editor is requested to replace all occurrences of XXXX with the RFC number this document receives. Wenger, Chandra, Westerlund Standards Track [Page 34]