Internet DRAFT - draft-wenger-avt-rtcp-feedback


       INTERNET-DRAFT                                            Stephan Wenger
       draft-wenger-avt-rtcp-feedback-02.txt                          TU Berlin
                                                                      Joerg Ott
                                                        Universitaet Bremen TZI

                                                                  2 March, 2001
                                                         Expires September 2001

                   RTCP-based Feedback: Concepts and Message Timing Rules

       Status of this Memo

       This document is an Internet-Draft and is in full conformance with all
       provisions of Section 10 of RFC 2026.  Internet-Drafts are working
       documents of the Internet Engineering Task Force (IETF), its areas, and
       its working groups.  Note that other groups may also distribute working
       documents as Internet-Drafts.

       Internet-Drafts are draft documents valid for a maximum of six months
       and may be updated, replaced, or obsoleted by other documents at any
       time.  It is inappropriate to use Internet- Drafts as reference material
       or to cite them other than as "work in progress."

       The list of current Internet-Drafts can be accessed at

       The list of Internet-Draft Shadow Directories can be accessed at


          Real-time media streams are not resilient against packet losses. RTP
          [1] provides all the necessary mechanisms to restore ordering and
          timing to properly reproduce a media stream at the recipient.  RTP
          also provides continuous feedback about the overall reception quality
          from all receivers -- thereby allowing the sender(s) in the mid-term
          (in the order of several seconds to minutes) to adapt their coding
          scheme and transmission behavior to the observed network QoS.
          However, except for a few payload specific mechanisms [2], RTP makes
          no provision for timely feedback that would allow a sender to repair
          the media stream immediately: through retransmissions, retro-active
          FEC, or media-specific mechanisms such as reference picture

          This document specifies a modification to the algorithm for
          scheduling RTCP packets in order to allow occasional timely feedback
          to events observed by a receiver (such a lost packets).  The message
          format for RTCP-based feedback is defined in a companion document

       Wenger/Ott              Expires September 2001                [Page 1]
       Internet Draft                                       24 November, 2000

       1. Introduction

          Real-time media streams are not resilient against packet losses. RTP
          [1] provides all the necessary mechanisms to restore ordering and
          timing present at the sender to properly reproduce a media stream at
          a recipient.  RTP also provides continuous feedback about the overall
          reception quality from all receivers -- thereby allowing the
          sender(s) in the mid-term (in the order of several seconds to
          minutes) to adapt their coding scheme and transmission behavior to
          the observed network QoS.  However, except for a few payload specific
          mechanisms [2], RTP makes no provision for timely feedback that would
          allow a sender to repair the media stream immediately: through
          retransmissions, retro-active FEC, or media-specific mechanisms such
          as reference picture selection.

          Current mechanisms available with RTP to improve error resilience
          include audio redundancy coding [3], video redundancy coding [4],
          RTP-level FEC [5], and general considerations on more robust media
          streams transmission [6].  Particularly in small groups, however,
          virtually all kinds of all types of real-time media streams could
          benefit from a mechanism that would enable a sender to perform media
          stream repair -- including but not limited to audio, video, DTMF, and
          text chat streams.  In some case of networks with acceptable round-
          trip times but scarce bandwidth, occasional retransmissions may be
          much preferred over continuous transmission of redundant information.

          For example, predictive video coding is not loss resilient.  Any loss
          of coded data leads to annoying artifacts not only in the reproduced
          picture in which the loss occurred, but also in subsequent pictures.
          Error resilience can be achieved by spending bits to convey redundant
          information using source coding based mechanisms or transport based
          mechanisms.  This can be done without the use of any feedback between
          the decoder(s) and the encoder.  Similar consideration apply to
          protecting e.g. DTMF (and other tones) carried in an RTP stream [9].

          Alternatively, where applicable, receivers can inform the sender
          through a feedback channel about a loss situation, and the sender can
          react accordingly.  This approach provides better media quality and
          is more efficient with respect to the bandwidth used by the sender to
          achieve a given media quality.  However, using feedback mechanisms is
          limited to certain application scenarios identified by encoder
          characteristics, delay constraints, and/or the number of recipients.

          This memo specifies a profile based upon [1] and [10] with enhanced
          rules for sending receiver reports to support feedback transmission
          reflecting the need for very low delay for conveying feedback, which
          is necessary to make them efficient (or workable at all).  Immediate
          Feedback messages (FB messages) and Early Receiver Reports (Early
          RRs) and algorithms are specified that allow for low delay in small
          multicast groups, but prevent network flooding in larger ones.
          Special consideration is given to point-to-point scenarios.

       Wenger/Ott              Expires September 2001                [Page 2]
       Internet Draft                                       24 November, 2000

          In addition, this memo gives some consideration to specific
          application scenarios are the respective feedback requirements, at
          the moment focusing on predictive video coding.

          A companion document [7] discusses various types of general purpose
          feedback information (also allowing for extensions specific to
          certain media payload) and defines an RTCP packet format to transmit
          FBs in an RTP environment.  It can be used in conjunction with all
          payload specifications for predictive video coding schemes currently
          available for RTP.

       2. Motivation

          2.1 Example: Predictive Video Coding

          2.1.1 Video Encoder-decoder synchronicity

          Most current video coding schemes for compressed video, such as the
          ITU-T H.261 and H.263 and ISO/IEC MPEG[124] employ a mechanism known
          as Inter Picture Prediction.  Each picture is divided into
          macroblocks of uniform size.   For each macroblock, one or more
          motion vectors may be identified and transmitted.  The residual
          signal after motion compensation is DCT-transformed, quantized,
          entropy coded, and transmitted as well.  The encoder reconstructs,
          based on this information, a so-called reference picture, which is
          used to perform the motion compensation and residual signal coding
          steps for the subsequent picture.  Since the reference picture is
          generated using only such information that is also available at the
          decoder, the reference picture is identical to the reconstructed
          picture at the decoder.  Having identical reference pictures at the
          encoder and decoder is referred to as encoder-decoder-synchronicity.

          Whenever data is damaged or lost on the way between the encoder and
          the decoder, the reconstructed picture at the decoder is no more
          identical with the encoder's reference picture -- the encoder-decoder
          synchronicity is lost.

          Any loss of the encoder-decoder synchronicity results in annoying
          artifacts at the decoder.  Because the prediction of subsequent
          pictures in the decoder is based on a damaged reference picture, the
          annoying artifacts are present not only in the picture in which the
          loss occurred; they propagate to all subsequent pictures, until,
          through source coding based mechanisms, the encoder-decoder
          synchronicity is restored.  Therefore, the goal of systems employing
          predictive video coding in a lossy environment must be to keep the
          encoder-decoder synchronicity, or, if this is not possible, to regain
          that synchronicity as quickly as possible.

          2.1.2. Non-feedback based mechanisms

          Avoiding the loss of the encoder-decoder synchronicity corresponds to
          avoiding the loss of coded picture data.  Such a task can be
          performed on the transport layer.  In RTP environments, the use of
          packet-based FEC is a good example for such a technique. (The use of

       Wenger/Ott              Expires September 2001                [Page 3]
       Internet Draft                                       24 November, 2000

          TCP or reliable multicast as the transport for media streams would be
          an even better one but is inappropriate for low-delay (interactive)
          real-time systems.)  FEC schemes, interleaving, and other means for
          repairing real-time media streams may also add additional delay and
          significant bit rate overhead without being able to guarantee
          compensation of virtually all packet losses.

          Once the encoder-decoder synchronicity is lost, only source coding
          oriented mechanisms can help to regain it.  One common way is to send
          a non-predictively coded picture (known as Intra picture).  Intra
          pictures have the disadvantage of being several times bigger than
          predictively coded pictures (Inter pictures).  Therefore, sending
          Intra pictures has negative implications both on the bandwidth and
          (in bandwidth limited environments) delay.  Another way is to use
          Intra macroblock refresh.  Here, certain parts of the picture (those
          affected by a packet loss) are coded non-predictively in order to
          resynchronize the encoder and decoder over time.  Intra macroblock
          refresh has better delay characteristics then full Intra pictures
          because the picture size can be kept constant, but is less efficient
          in terms of bit rate/distortion than full Intra pictures.  More
          sophisticated means such as Reference Picture Selection (RPS) are
          also available in modern video coding standards.

          Systems not employing feedback channels may use any combination of
          the mechanisms described above to add error resilience -- at the cost
          of added bit rate and, sometimes, added delay.  The number of
          additional bits spent for error resilience can be adapted using the
          long-term packet loss rate information in the RTCP receiver reports.
          But, even when using such adaptive means, it is still likely that
          systems spend many more bits then theoretically necessary to achieve
          error resilience in order to be on the safe side.  Plus, as regular
          RTCP feedback is aimed at longer terms, reactivity to sudden losses
          is limited.  In all practical applications today this means that
          fewer bits are available for non redundant picture data, and hence
          the overall picture quality suffers.

          2.1.3 Feedback based systems

          Feedback-based systems try to avoid spending too many bits for
          redundant information by informing the encoder about a loss situation
          at the decoder(s).  The encoder can then react accordingly and spend
          redundant bits only when needed possibly only for the part of the
          picture that was effected by the loss -- thereby reducing the number
          of redundant bits and leaving more bits for useful information.  As a
          result, a higher reproduced picture quality can generally be expected
          when feedback channels are available.

          Similar to the observations of section 2.1.2, transport and source
          coding based mechanisms can be distinguished that react on loss
          situations reported by feedback.

          Transport based systems employing feedback react media unaware, by
          re-transmitting lost packets.  TCP is a good example for a protocol

       Wenger/Ott              Expires September 2001                [Page 4]
       Internet Draft                                       24 November, 2000

          following such a scheme.  Transport-based feedback in real-time
          and/or multicast environments is a complex matter and subject of a
          lot of engineering and research in and outside of the IETF.  This
          specification is not concerned with pure transport-based feedback.

          Source coding based mechanisms may react upon the arrival of a
          feedback message indicating a loss situation by adding bits that
          restore, or at least make an effort to restore, the encoder-decoder
          synchronicity.  This process has to be performed by a real-time
          encoder.  However, schemes were reported, that allow the use of
          feedback also for non-real-time encoders by storing multiple
          representations of the same data (e.g. Inter and Intra coded), and
          dynamically switching between those representations.

          Several types of feedback messages, called Feedback Messages or FB
          messages, can be defined for such a case.  An FB message can be as
          simple as a Boolean condition, indicating for example the loss of a
          full picture (and, therefore, the need of a full Intra picture
          transmission).  Other feedback messages may contain more complex
          information such as information about the damage of a spatial region
          of the picture.  A special form consists of a message the format and
          semantics of which are not known at the transport level, because they
          are defined in the video codec standards.

          2.2 Feedback Messages

          Most FB messages contain negative acknowledge information, indicating
          an erroneous situation at the decoder.  In others, the nature of the
          acknowledge (positive, negative, or both) is part of the feedback
          message itself.  When used in multicast environments, positive
          acknowledge must not be used.

          This document assumes that feedback messages are transmitted using
          RTCP packets.  RTCP messages from the receivers to the sender cannot
          be sent at any possible time, in order to prevent traffic explosion
          in case of large multicast groups.  Instead, the bit rate for all
          RTCP messages of all receivers together has to obey a maximum
          fraction of the total RTP session bit rate, yielding a very limited
          bit rate budget for a single receiver when having a large multicast
          group.  This, in turn, leads to an increased average delay when the
          size of the receiving multicast group grows.  (see section 6 of [1]
          for details)

          This specification defines an algorithm that adheres to the bit rate
          limitations for the feedback channel on the long term, but allows
          short-term overdrafting for any receiver (but not all of them
          simultaneously).  Thus, the algorithm allows for better real-time
          performance then the one specified in [1].  Traffic explosion in such
          cases in which many receivers identify a picture damage
          simultaneously is prevented by dithering.

          As this specification assumes a sender that has full control over its
          transmission bit rate (e.g. a real-time encoder), there is no scaling

       Wenger/Ott              Expires September 2001                [Page 5]
       Internet Draft                                       24 November, 2000

          problem on the forward channel.  Any reaction to negative feedback
          generates additional bits, which have to be conveyed but this is
          taken from the sender's total bit rate budget.  The encoder can take
          this into account by, for example, changing the encoding mode, packet
          size, and so forth.  The sender is also free to simply ignore
          feedback messages.  Adjusting the tradeoff between the reproduced
          media quality of all receivers of a multicast group and the amount of
          additional repair traffic is a media-dependent, very complex task and
          is not covered in this specification.

          Finally, frequent RTCP-based feedback messages may provide additional
          input to the sender(s)'s congestion control algorithms and thus
          improve its reactivity towards network congestion.

          Feedback messages as well as sender and receiver behavior are to be
          specified in separate documents (such as [7]).  Such specifications
          need to consider that, frequently, packet loss is an indication of
          network congestion and thus define mechanisms for media-specific
          congestion control in the presence of feedback as defined in this

          2.3. Applications and Relationships to other Standards

          This specification is based on RTCP, which implies its use in an RTP
          environment.  RTP itself is used in a variety of systems such as in
          SIP- or H.323-based multimedia conferencing/telephony, SAP-announced
          Mbone conferences, and RTSP-based media streaming.

          As for the video codecs, there is currently a small set of standards
          that are, for the purpose of this discussion, roughly comparable.
          Many mechanisms for regaining encoder-decoder synchronicity are
          applicable to all video codecs.  Others require certain tools (such
          as Reference Picture Selection, aka NEWPRED) that are available only
          in certain versions of the standards, and/or optional tools whose use
          must be negotiated prior to being used.

          A few RTP payload specifications such as RFC 2032 [2] already define
          a feedback mechanism for some of the coding algorithms considered in
          this specification.  An application capable of performing both
          schemes MUST use the feedback mechanism defined in this
          specification, although, for backward compatibility reasons, it MUST
          also be capable to conform to the feedback scheme defined in the
          respective RTP payload format, if this is required by that payload

          Also, audio, DTMF, and text streams could benefit from more immediate
          feedback even though the redundancy payload formats work well for
          these media.

          All kinds of non-interactive media streams (such as RTSP-controlled
          media streaming applications) could benefit significantly as without
          interactivity there is more time available for media repair.

       Wenger/Ott              Expires September 2001                [Page 6]
       Internet Draft                                       24 November, 2000

          2.4 Remarks on the size of the multicast group

          This specification prevents traffic explosion on the feedback channel
          in a very similar way as RTP does, with the exception of allowing
          individual receivers to overdraft their bit rate budget from time to
          time.  This is necessary in order to allow for low delay, which is
          needed by the algorithms reacting to Feedback messages.

          This scaling, however, limits the usefulness of this mechanism in
          multicast groups from a certain size upwards (where the size
          threshold depends on a number of parameters including loss rate,
          frame rate, number of packets per frame, and session bandwidth).  The
          maximum size of the multicast group is soft and also depends on
          application requirements and is therefore not specified here.
          Considerations on the multicast group sizes will be presented in
          section 3.5.

          2.5 Terminology

          The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
          "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
          document are to be interpreted as described in RFC 2119 [8]

       3. Low delay RTCP Feedback

          Two components constitute RTCP-based feedback as described in this

          .  Status reports are contained in SR/RR messages and are transmitted
             at regular intervals as part of compound RTCP packets (which also
             include SDES and possibly other messages); these status reports
             provide an overall indication for the recent reception quality of a
             media stream.  RTP [1] define rules for the transmission of these
             status reports.

          .  Feedback messages as defined in a companion document [7] that
             indicate loss or reception of particular pieces of a media stream
             (or provide some other form of rather immediate feedback on the
             data received).  Rules for the transmission of feedback messages
             are newly introduced in this memo.

          As discussed in [7], RTCP Feedback (FB) messages are just another
          RTCP message type.  Thus multiple FB messages may be combined in a
          single RTCP packet.  FB messages may be sent in full compound RTCP
          packets along with SR/RR, SDES, and other RTCP messages.  Or they may
          be transmitted in minimal compound RTCP FB packets (which only
          contain the RR/SR and an encryption prefix if necessary to reduce the
          message size).  RTCP packets that do not contain FB messages are
          referred to as non-FB RTCP packets.

       Wenger/Ott              Expires September 2001                [Page 7]
       Internet Draft                                       24 November, 2000

          3.1 Algorithm Outline

          FB messages are part of the RTCP control streams and are thus subject
          to the same bandwidth constraints as other RTCP traffic.  This means
          in particular that it may not be possible to report a packet loss at
          a receiver immediately back to the sender.  However, the value of
          feedback given to a sender typically decreases over time -- in terms
          of the media quality as perceived by the user at the receiving end
          and/or the cost required to achieve media stream repair.

          RTP [1] specifies rules when compound RTCP packets should be sent.
          This specification modifies those rules in order to allow
          applications to timely report media loss or reception events, since
          most algorithms that use FB messages are very critical to the
          feedback timing.  See section 5 and following for a discussion of FB
          messages and the impact of delay on the performance these FB types.

          The modified algorithm can be outlined as follows: Normally, when no
          FB messages have to be conveyed, compound RTCP packets are sent
          following the rules of RTP [1].  If a receiver detects the need for
          an FB message, the receiver first checks whether it has already seen
          a corresponding FB message from any other receiver (which it can do
          with all FB messages that are transmitted via multicast; for unicast
          sessions, there is no such delay).  If this is the case then the
          receiver refrains from sending the FB message, and continues to
          follow the regular RTCP sending schedule.  If the receiver has not
          yet seen a similar FB message from any other receiver, it checks
          whether it has recently exceeded its RTCP bit rate budget to transmit
          another FB message (without waiting for its regularly scheduled RTCP
          transmission time).  Only if this is not the case, it sends the FB
          message, after waiting a short, random dithering interval period (in
          case of multicast).

          FB messages are sent as part of minimal compound RTCP packets .  Full
          compound RTCP packet are interspersed as per [1] in regular intervals
          of at least five seconds.

          3.2 Modes of Operation

          RTCP-based feedback may operate in one of three modes (figure 1):

          a) Immediate feedback mode: the group size is below a certain
             threshold (the FB threshold) which gives each receiving party
             sufficient bandwidth to transmit the feedback traffic for the
             intended purpose.  This means, for each receiver there is enough
             bandwidth to report each event it is supposed/expected to by means
             of a virtually "immediate" Early RTCP packet.

             The group size threshold is a function of a number of parameters
             including (but not necessarily limited to) the type of feedback
             used (e.g. ACK vs. NACK), bandwidth, packet rate, packet loss
             probability, media type, codec, and -- again depending on the type

       Wenger/Ott              Expires September 2001                [Page 8]
       Internet Draft                                       24 November, 2000

             of FB used -- the (worst case or observed) frequency of events to
             report (e.g. frame received, packet lost).

             A special case of this is the ACK mode (where positive
             acknowledgements are used to confirm reception of data) which is
             restricted to point-to-point communications.

          b) In Early RTCP mode, the group size and other parameters no longer
             allow each receiver to react to each event that would be worth (or
             needed) to report.  But feedback can still be given sufficiently
             often so that it allows the sender to adapt the media stream and
             thereby increase the overall reproduced media quality.

          c) From some group size upwards, it is no longer useful to provide
             feedback from individual receivers at all -- because of the time
             scale in which the feedback could be provided and/or because in
             large groups the sender(s) have no chance to react to individual
             feedback anymore.

          As the feedback algorithm described in this memo scales, there is no
          need for an agreement on the precise values of the respective
          "thresholds" within the group.  Hence the borders between all these
          modes are fluent. 

            :<- - - -  NACK feedback - - - ->//
            :   Immediate   ||
            : Feedback mode ||Early RTCP mode   Regular RTCP mode
            :               ||
           -+---------------||---------------//------------------> group size
            2               ||
             Application-specific FB Threshold
                = f(rate,loss,codec,...)

          Figure 1: Modes of operation

          The respective thresholds depend on a number of technical parameters
          (of the codec, the transport, the feedback used, etc.) but also on
          the respective application scenarios.  Section 3.5 provides some
          useful hints (but no complete precise calculations) on estimating
          these thresholds.

          3.3 Definitions

          a) Let the media stream be transmitted at a (roughly) constant packet
             rate f (in packets per second).  This results in an average
             inter-packet interval of tau=1/f.

       Wenger/Ott              Expires September 2001                [Page 9]
       Internet Draft                                       24 November, 2000

          b) Let T_rtt be the maximum round trip time as measured by RTCP
             (if available to the receiver).  Note that this may be asymmetric.

          d) Let t_rr and t_(rr-1) be the time for the next (last) scheduled
             RTCP RR transmission calculated prior to reconsideration.
             Let T_rr + t_(rr-1) = t_rr.  (In RTP [1] these are termed tp, tn,

          d) Let t_e be the time for which a feedback packet is scheduled.

          e) Let t_dither_max be the maximum interval for which an RTCP
             feedback packet may be additionally delayed (to prevent

          f) Let T_fd be the delay for the feedback message that a certain
             packet P caused to return to the sender after reception of P.

          g) Let S be the number of active senders in the RTP session.

          h) Let N be the current estimate of the number of receivers in the
             RTP session.

          The feedback situation for an event to report at a receiver is
          depicted in figure 2 below.  At time t0, such an event (e.g. a packet
          loss is detected at the receiver.  The receiver decides -- based upon
          current T_rtt, group size, and other (application-specific)
          parameters -- that a feedback message shall be sent back to the

          To avoid an implosion of immediate feedback packets, the receiver
          delays transmission of the compound feedback packet by a random
          amount T_fd (with the random number evenly distributed in the
          interval [0, T_dither_max].  Transmission of the compound RTCP packet
          is then scheduled for t_e = t0 + T_fd.

          The T_dither_max parameter is chosen based upon the group size, the
          RTCP bandwidth constraints, and, if available, the round-trip time.
          In addition, the receiver may take into account a number of other
          parameters (such as the estimated round-trip time, the type of
          feedback to be provided) to possibly extend the upper bound for the
          feedback while ensuring that the feedback information still will make
          sense when it reaches the sender.

          If a compound RTCP feedback packet is scheduled, the time slot for
          the next scheduled compound RTCP packet is updated accordingly to a
          new t_rr.

       Wenger/Ott              Expires September 2001               [Page 10]
       Internet Draft                                       24 November, 2000

                    event to
                       |  RTCP feedback
                       vXXXXXXXXXXXXXXXXXXXX            ) )
          |---+--------+-------------+-----+------------| |--------+--------->
              |        |             |     |            ( (        |
              |       t0            te                             |
           t_(rr-1)                                              t_rr
                        \_______  ________/

          Figure 2: Event report and parameters for Early RTCP scheduling

          3.4 Early RTCP Algorithm

          Assume an active sender S0 (out of S senders) and a number N of
          receivers with R being one of these receivers.

          Assume further that R has verified that using feedback mechanisms is
          reasonable at the current constellation (which is highly application
          specific and hence not specified in this memo).

          Then, the following rules apply to transmitting a Feedback Messages
          as minimal compound RTCP packet:

          Initially, R sets allow_early := TRUE.

          At a point in time t0, R has transmitted the last RTCP RR packet at
          t_(rr-1) and has scheduled the next transmission (prior to
          reconsideration) for t_rr.

          Now R detects the need to transmit a feedback message (e.g. because a
          media "unit" needs to be ACKed or NACKed) at time t0.

          R first checks whether there is still a feedback packet waiting for
          transmission.  If so, the new feedback message is appended to the
          packet and the increased RTCP packet size is updated in the RTCP
          bandwidth calculation (which may later lead to an adjustment of
          t_rr); the schedule for the waiting RTCP feedback packet remains

          If no feedback message is already awaiting transmission a new
          (minimal) compound RTCP feedback message is created and the interval
          T_dither_max is chosen as follows:

          i)   If the session is a unicast session (group size = 2) then
               T_dither_max := 0.

       Wenger/Ott              Expires September 2001               [Page 11]
       Internet Draft                                       24 November, 2000

          ii)  If the receiver has an RTT estimate to the originator of the
               media unit to provide feedback about, then

                                / T_rtt/2     if T_rtt/2 > 10ms
               T_dither_max := <
                                \ 10ms        otherwise.

          iii) If the receiver does not have an RTT estimate to the originator,

                                / T_rr/2      if T_rr/2 < 100ms
               T_dither_max := <
                                \ 100ms       otherwise.

          (Note: These values are *still* open to discussion.)

          (Note that application-specific feedback considerations may make it
          worth while to increase T_dither_max beyond this value.)

          Then, R checks whether its next regularly scheduled RTCP packet is
          within the time bounds for the RTCP FB (t_e + T_dither_max > t_rr).
          If so, no Early RTCP is scheduled; instead the FB message is appended
          to the regular RTCP packet and the RTCP bandwidth calculation is
          updated to reflect the additional RTCP size.  The updated bandwidth
          calculation may result in a slightly increased t_rr (=t_rr') but,
          even if t_rr' > t_e + T_dither_max, this does not change the updated
          transmission time t_rr'.

          (Q: if the FB is piggybacked onto a regularly scheduled RTCP RR
          message but the same or a superset of the feedback information is
          received from another receiver, should the FB then be removed from
          the compound RR/FB and its transmission time be revised again from
          t_rr' to t_rr as calculated before?)

          Otherwise, R MUST check whether it is allowed to transmit an Early
          RTCP packet (allow_early == TRUE).

             If so, R schedules an Early RTCP packet for t_e = t0 + RND *
             T_dither_max with the RND function evenly distributed between 0
             and 1.

             If R receives an RTCP feedback packet (indicating the same or a
             superset of the feedback information R wanted to transmit) before
             t_e is reached, the FB information is discarded and the
             transmission schedule for the next RR packet is reset to t_rr as
             calculated before.

             Otherwise, when t_e is reached, R creates an RR, appends the FB
             information, and transmits the RTCP packet.  R then sets
             allow_early := FALSE and recalculates t_rr := t_e + 2*T_rr.  As
             soon as R sends its next regularly scheduled RTCP RR
             (at the new t_rr), it sets allow_early := TRUE again.

       Wenger/Ott              Expires September 2001               [Page 12]
       Internet Draft                                       24 November, 2000

          If allow_early == FALSE then R checks the time for the next scheduled
          RR: if t_rr - t0 < T_dither_max then R creates an FB message for
          transmission along with the RTCP packet at a then slightly modified
          t_rr' (see above).  Otherwise, R does not send an RTCP feedback
          message at all.

          In regular RTCP intervals as specified by [1] (i.e. at most every
          five seconds), a full compound RTCP packet is sent (which may also
          contain a feedback message if one has been created according to the
          above rules and scheduled for transmission along the full compound
          RTCP message).

          The E bit in the message header [7] is used upon reception to detect
          whether this RTCP feedback message was sent as Early RTCP or not.
          Hence, a feedback message that is sent as an Early RTCP packet MUST
          set the E bit in the message header to "1".  Feedback messages piggy-
          backed on regularly scheduled RTCP packets will MUST set the E bit to

          3.5 Considerations on the Group Size

          This section intends to give some brief guidelines to the group sizes
          at which the various feedback modes may be used.

          3.5.1 ACK mode

          The group size MUST be exactly two participants, i.e. point-to-point
          communications.  Unicast addresses SHOULD be used in the session

          For unidirectional as well as bi-directional communication between
          two parties, 2.5% of the RTP session bandwidth are available for
          feedback.  Assuming a ratio of 1:10 for minimal to full compound RTCP
          packets, at 64kbit/s, a receiver can report 2.5 events per second
          back to the sender, at 256kbit/s 10 events and so forth.

          From 768kbit/s upwards, a receiver would be able to acknowledge each
          individual frame (not packet!) in a 30 fps video stream.

          ACK strategies have to be defined accordingly to work with these
          bandwidth limitations.

          3.5.2 NACK mode

          Negative acknowledgements (or similar types of feedback) have to be
          used for all groups larger than two.

          Whether or not the use of Immediate or Early RTCP packets should be
          considered depends upon a number of parameters including session
          bandwidth, codec, special type of feedback, number of senders and
          receivers, among many others.

       Wenger/Ott              Expires September 2001               [Page 13]
       Internet Draft                                       24 November, 2000

          The crucial parameters -- to which all of the above can be reduced --
          is the allowed minimal interval between two RTCP reports and the
          number of events that presumably need reporting per time interval.
          The minimum interval is derived from the available RTCP bandwidth and
          the expected average size of an RTCP packet.  The number events to
          report e.g. per second may be derived from the packet loss rate and
          sender's rate of transmitting packets.  From these two values, the
          allowable group size for the Immediate feedback mode can be

          The upper bound for the Early RTCP mode then solely depends on the
          acceptable quality degradation, i.e. how many events per time
          interval may go unreported.

          Example: If a 256kbit/s video with 30 fps is transmitted through a
          network with an MTU size of some 1500 bytes, then, in most cases,
          each frame would fit in its own packet leading to a packet rate of 30
          packets per second.  If 5% packet loss occurs in the network (equally
          distributed, no inter-dependence between receivers), then each
          receiver will have to report 3 packets lost each two seconds.
          Assuming a single sender and more then three receivers yields 3.75%
          of the RTCP bandwidth allocated to the receivers and thus 9.6kbit/s.
          Assuming further a size of 100 bytes for the average compound RTCP
          packet allows 12 RTCP packets to be sent per second or 24 in two
          seconds.  If every receiver needs to report three packets, this
          yields a maximum group size of 8 receivers if all loss events shall
          be reported.  The rules for transmission of immediate RTCP packets
          should provide sufficient flexibility for most of this reporting to
          occur in a timely fashion.

          Extending this example to determine the upper bound for Early RTCP
          mode leads to the following considerations: assume that the
          underlying coding scheme and the application (as well as the tolerant
          users) allow in the order of one loss without repair per two seconds.
          Thus the number of packets to be reported by each receiver decreases
          to two per two seconds second and increases the group size to 12.
          Assuming further that some number of packet losses are correlated,
          feedback traffic is further reduced and group sizes of some 15 to 20
          can be reasonably well supported using Early RTCP mode.

          3.6 Summary of decision steps

          3.6.1 General Hints

          Before even considering whether or not to send RTCP feedback
          information an application has to determine whether this mechanism is

          1) An application has to decide whether -- for the current ratio of
             packet rate with the associated (application-specific) maximum
             feedback delay and the currently observed round-trip time (if
             available) -- feedback mechanisms can be applied at all.

       Wenger/Ott              Expires September 2001               [Page 14]
       Internet Draft                                       24 November, 2000

             This decision may obviously be based upon (and dynamically revised
             following) regular RTCP reception statistics.

          2) The application has to decide whether -- for a certain observed
             error rate, assigned bandwidth, frame rate, and group size -- (and
             which) feedback mechanisms can be applied.

             Regular RTCP provides valuable input to this step, too.

          3) If these tests pass, the application has to follow the rules for
             transmitting Early RTCP packets or regularly scheduled RTCP
             packets with piggybacked feedback.

          3.6.2 Session Description Attributes

          A number of additional SDP parameters may be used to describe a
          session.  These are defined as session level and/or media level

 RTCP Feedback

          a=rtcp-fb: {"ack"|"nack"|extension} params

          This attribute is used to indicate the feedback (to be) supported by
          the sender. "ack" MUST only be used if the media session is allowed
          to operate in ACK mode as defined in

          It is up to the recipients whether or not they send feedback
          information and up to the sender(s) to make use of feedback provided.


          If an m= line in the SDP describing a session indicates unicast
          addresses for a particular media type (and does not operate in multi-
          unicast mode with all recipients listed explicitly but still
          addressed via unicast), the RTCP feedback MAY operate in ACK feedback

       4. Format of RTCP Feedback messages

          The general format of the FB messages are defined in [7].

       5. Security Considerations

          RTP packets transporting information with the proposed payload for
          mat are subject to the security considerations discussed in the RTP
          specification [1]. This implies that confidentiality of the media
          streams is achieved by encryption.

          If the entire stream (extension data and AU data) is to be secured
          and all the participants are expected to have the keys to decode the

       Wenger/Ott              Expires September 2001               [Page 15]
       Internet Draft                                       24 November, 2000

          entire stream, then the encryption is performed in the usual manner,
          and there is no conflict between the two operations (encapsulation
          and encryption).

          The need for a portion of stream (e.g. extension data) to be
          encrypted with a different key, or not to be encrypted, would require
          application level signaling protocols to be aware of the usage of
          the XT field, and to exchange keys and negotiate their usage on the
          media and extension data separately.

       6. Acknowledgements

          Large parts of the syntax and the text concerned with RPS and NEWPRED
          were borrowed from an early I-D from Fukunaga et. al. that was
          concerned with MPEG-4 ES packetization.

       7. Full Copyright Statement

          Copyright (C) The Internet Society (2001). All Rights Reserved.

          This document and translations of it may be copied and furnished to
          others, and derivative works that comment on or otherwise explain it
          or assist in its implementation may be prepared, copied, published
          and distributed, in whole or in part, without restriction of any
          kind, provided that the above copyright notice and this paragraph are
          included on all such copies and derivative works.

          However, this document itself may not be modified in any way, such as
          by removing the copyright notice or references to the Internet Soci-
          ety or other Internet organizations, except as needed for the purpose
          of developing Internet standards in which case the procedures for
          copyrights defined in the Internet Standards process must be fol-
          lowed, or as required to translate it into languages other than

          The limited permissions granted above are perpetual and will not be
          revoked by the Internet Society or its successors or assigns.

          This document and the information contained herein is provided on an

       8. Authors' Addresses

          Stephan Wenger (
          TU Berlin
          Sekr. FR 6-3
          Franklinstr. 28-29

       Wenger/Ott              Expires September 2001               [Page 16]
       Internet Draft                                       24 November, 2000

          D-10587 Berlin

          Joerg Ott (
          Universitaet Bremen TZI
          MZH 5180
          Bibliothekstr. 1
          D-28359 Bremen

       4. Bibliography

          [1]  H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP -
               A Transport Protocol for Real-time Applications," Internet
               Draft, draft-ietf-avt-rtp-new-08.txt, Work in Progress, July

          [2]  T. Turletti and C. Huitema, "RTP Payload Format for H.261 Video
               Streams, RFC 2032, October 1996.

          [3]  C. Perkins, I. Kouvelas, O. Hodson, V. Hardman, M. Handley, J.C.
               Bolot, A. Vega-Garcia, and S. Fosse-Parisis, "RTP Payload for
               Redundant Audio Data," RFC 2198, September 1997.

          [4]  C. Bormann, L. Cline, G. Deisher, T. Gardos, C. Maciocco, D.
               Newell, J. Ott, G. Sullivan, S. Wenger, and C. Zhu, "RTP Payload
               Format for the 1998 Version of ITU-T Rec. H.263 Video (H.263+),"
               RFC 2429, October 1998.

          [5]  C. Perkins and O. Hodson, "2354 Options for Repair of Streaming
               Media," RFC 2354, June 1998.

          [6]  J. Rosenberg and H. Schulzrinne, "An RTP Payload Format for
               Generic Forward Error Correction,", RFC 2733, December 1999.

          [7]  S. Fukunaga, N. Sato, K. Yano, A. Miyazaki, K. Hata, R.
               Hakenberg, C. Burmeister, "Low Delay RTCP Feedback Format,"
               Internet Draft draft-fukunaga-low-delay-rtcp-02.txt, Work in
               Progress, February 2001.

          [8]  S. Bradner, "Key words for use in RFCs to Indicate Requirement
               Levels," RFC 2119, March 1997.

          [9]  H. Schulzrinne and S. Petrack, "RTP Payload for DTMF Digits,
               Telephony Tones and Telephony Signals," RFC 2833, May 2000.

          [10] H. Schulzrinne and S. Casner, " RTP Profile for Audio and Video
               Conferences with Minimal Control," Internet Draft draft-ietf-
               avt-profile-new-09.txt, July 2000.

       Wenger/Ott              Expires September 2001               [Page 17]
       Internet Draft                                       24 November, 2000

       Appendix A: Considerations On Video

          This section of this memo covers feedback messages for a Picture Loss
          Indication (PLI), Slice Loss Indication (SLI), and Reference Picture
          Selection Indication (RPSI).  PLI indicates the loss of a full
          picture and roughly corresponds to the Fast Intra Request known from
          H.320 systems and from RFC 2032 (H261 packetization).  Algorithms
          using SLI can be found under the acronym Automatic Repeat Request
          (ARQ) in the signal processing literature.  Reference Picture
          Selection, aka NEWPRED, is available in certain profiles of MPEG-4
          (version 2 and later) and as an optional mode in H.263 (version 2 and
          later).  The packet format specified in this document is open to
          extensions so that future feedback mechanisms can easily be

          All these messages use the payload specific feedback format as
          defined in [7], using PT=PSFB and the FMT field to further
          distinguish between the three subtypes.  These messages are defined
          for payload types indicating H.263 and MPEG-4.

          Note that the Bit 00 of the first (counting from 1) 32-bit word in
          the messages described below is placed in Bit 08 of the fourth
          (counting from 1) 32-bit word of the payload type specific feedback

          A.1 Message Type 1: Picture Loss Indication (PLI)

          A.1.1 Semantics

          With the Picture Loss Indication message a decoder informs the
          encoder about the loss of one or more full pictures

          A.1.2 Format

          PLI does not require parameters.  Therefore, the length field MUST be
          0, and there MUST NOT be Feedback Control Information.

          A.1.3 Timing Rules

          The timing follows the rules outlined in section 3.  In systems that
          employ both PLI and other FB types it may be advisable to follow the
          regular RTCP RR timing rules, since PLI is not as delay critical as
          other FB types.

          A.1.4 Remarks

          PLI messages typically trigger the sending of full Intra pictures.
          Intra Pictures are several times larger then predicted (Inter)
          pictures.  Their size is independent of the time they are generated.
          In most environments, especially when employing bandwidth-limited
          links, the use of an Intra picture implies an allowed delay that is a
          significant multitude of the typical frame duration.  An example: If
          the sending frame rate is 10 fps, and an Intra picture is assumed to
          be 10 times as big as an Inter picture (not an unrealistic

       Wenger/Ott              Expires September 2001               [Page 18]
       Internet Draft                                       24 November, 2000

          assumption, see [] for details), then a full second of latency has to
          be accepted.  In such an environment there is no need for a
          particular short delay in sending the feedback message.  Hence
          waiting for the next possible time slot allowed by RFC1889bis RTCP
          timing rules does not negatively influence system performance.

          A.2 Message Type 2: Slice Lost Indication

          A.2.1 Semantics

          With the Slice Lost Indication a decoder can inform an encoder that
          it was unable to decode one, or several consecutive, macroblocks.
          The encoder can take appropriate action in order to re-synchronize
          encoder and decoder by means of its choice, typically by sending the
          lost macroblocks in Intra mode.  This feedback message SHALL NOT be
          used for video codecs with non-uniform, dynamically changeable
          macroblock sizes such as H.263 with enabled Annex Q.  In such a case,
          an encoder cannot always identify the corrupted spatial region.

          A.2.2 Format

          When FBT indicates a Slice Lost Indication, then there is one
          additional UCI field the content of which is in the following format:

           0                   1                   2                   3
           0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
          |            First        |  Number                 |  TR       |

          First: 13 bits
          The macroblock (MB) address of the first lost macroblock.  The MB
          numbering is done such that the macroblock in the upper left corner
          of the picture is considered macroblock number 1 and the number for
          each macroblock increases from left to right and then from top to
          bottom in raster-scan order (such that if there is a total of N
          macroblocks in a picture, the bottom right macroblock is considered
          macroblock number N).

          Number: 13 bits
          The number of lost macroblocks, in scan order as discussed above.

          TR: 6 bits
          The six least significant bits of the Temporal Reference of the

          A.2.3 Timing Rules

          The efficiency of algorithms using the Slice Lost Indication is
          reduced greatly when the Indication is not transmitted in a timely
          fashion.  Motion compensation propagates corrupted pixels that are
          not reported as being corrupted.  Therefore, the use of the algorithm
          discussed in section 3 is highly recommended.

       Wenger/Ott              Expires September 2001               [Page 19]
       Internet Draft                                       24 November, 2000

          Constraints on T_dither_max to be discussed.

          A.2.4 Remarks

          The First field of the UCI defines the first macroblock of a picture
          as 1 and not, as one could suspect, as 0.  This was done to align
          this specification with the comparable mechanism available in H.245.
          The maximum number of macroblocks in a picture (2**13 or 8192)
          corresponds to the maximum picture sizes of the ITU-T and ISO/IEC
          video codecs.  If future video codecs offer larger picture sizes
          and/or smaller macroblock sizes, then an additional feedback message
          has to be defined.  The six least significant bits of the Temporal
          Reference field are deemed to be sufficient to indicate the picture
          in which the loss occurred.

          Algorithms were reported that keep track of the regions effected by
          motion compensation, in order to allow for a transmission of Intra
          macroblocks to all those areas, regardless of the timing of the FB
          [TBP.].  While, when those algorithms are used, the timing of the FB
          is less critical then without, it has to be observed that those
          algorithms correct large parts of the picture and, therefore, have to
          transmit many for bits in case of delayed FBs.

          A.3 Message Type 3: Reference Picture Selection Indication

          A.3.1 Semantics

          Modern video coding standards such as MPEG-4 visual version 2 or
          H.263 version 2 allow the use of older reference pictures then the
          most recent one.  Typically, a first-in-first-out queue of reference
          pictures is maintained.  If an encoder has learned about a loss of
          encoder-decoder synchronicity, a known-as-correct reference picture
          can be used. As this reference picture is temporally further away
          then usual, the resulting predictively coded picture will use more

          Both MPEG-4 and H.263 define a binary format for the _payload_ of an
          RPSI message that includes information such as the temporal ID of the
          damaged picture and the size of the damaged region.  This bit string
          is typically small _- a couple of dozen bits -_, of variable length,
          and self-contained, i.e. contains all information that is necessary
          to perform reference picture selection.

          Note that both MPEG-4 and H.263 allow the use of RPSI with positive
          feedback information as well.  That is, all corrected pictures are
          reported.  Any form of positive feedback MUST NOT be used when in a
          multicast environment (reporting positive feedback about individual
          reference pictures at RTCP intervals is not expected to be of much
          use anyway).  For point-to-point communication, positive feedback MAY
          be used but, again, the bit rate budget of RTCP feedback will prevent
          the use in most scenarios anyway.

          A.3.2 Format

       Wenger/Ott              Expires September 2001               [Page 20]
       Internet Draft                                       24 November, 2000

          When FB indicates an RPSI, then the length field is set to the number
          of bits of the following bit string that contains the RPS
          information.  This bit string follows byte aligned in the UCI field.
          Bit padding is used to achieve 32-bit word alignment of the UCI
          message (and the whole packet).

          A.3.3 Timing Rules

          RPS is even more critical to delay then algorithms using SLI.  This
          is due to the fact that the older the RPS message is, the more bits
          the encoder has to spend to achieve encoder-decoder synchronicity.
          See [TBP.] for some information about the overhead of RPS for certain
          bit rate/frame rate/loss rate scenarios.

          Therefore, RPS messages should typically be sent as soon as possible,
          employing the algorithm of section 3.

          Constraints on T_dither_max to be discussed.

          A.3.4 Remarks


       Wenger/Ott              Expires September 2001               [Page 21]