INTERNET-DRAFT                                            Stephan Wenger
      draft-wenger-avt-rtcp-feedback-00.txt                          TU Berlin
                                                                     Joerg Ott
                                                       Universitaet Bremen TZI

                                                                 July 14, 2000
                                                         Expires December 2000


                  RTCP-based Feedback for Predictive Video Coding


      Status of this Memo

      This document is an Internet-Draft and is in full conformance with all
      provisions of Section 10 of RFC 2026.  Internet-Drafts are working
      documents of the Internet Engineering Task Force (IETF), its areas, and
      its working groups.  Note that other groups may also distribute working
      documents as Internet-Drafts.

      Internet-Drafts are draft documents valid for a maximum of six months
      and may be updated, replaced, or obsoleted by other documents at any
      time.  It is inappropriate to use Internet- Drafts as reference material
      or to cite them other than as "work in progress."

      The list of current Internet-Drafts can be accessed at
      http://www.ietf.org/ietf/1id-abstracts.txt

      The list of Internet-Draft Shadow Directories can be accessed at
      http://www.ietf.org/shadow.html.


      0. Open Issues

      1) Should the draft limit itself to supporting feedback for video only
         or should it target a more general solution for feedback?  At the
         moment, the draft covers only video.

      2) Should the feedback be restricted to point-to-point scenarios or
         should we support (small group) multicast.  At the moment, the draft
         is designed to scale to (small) group.

      3) Feedback traffic explosion is prevented by a) dithering and b)
         damping.  a) somewhat poses constraints on timely transmission of
         feedback.  b) prevents that the encoder can learn about the
         _severeness_ of a loss problem (e.g. how many receivers have now a
         bad picture).  This prevents adaptive encoder reaction based on the

         perceived quality of the whole group.  At the moment, a) and b) are
         both to be used to be network friendly.  Which mechanisms (besides
         flooding the network which we want to avoid) are conceivable to
         support an approach that is able to achieve a better perceived
         picture quality?


      Wenger/Ott              Expires December 2000                 [Page 1]


      Internet Draft                                           July 14, 2000


      4) Is the maximum number of MBs 8191 for SLI sufficient?  Yes for MPEG-
         1, MPEG-2 and ITU-T H.261, H.263.  What about MPEG-4?

      5) Should there be a special mode (possibly optimized for point-to-point
         communication) that allows UMs packets without RR (see section 3)?

      6) RPS/NEWPRED also make use of positive acknowledgements.  Obviously,
         this does inherently not scale to multicast.  Should there be a
         point-to-point mode that allows positive ACKs?

      7) We have not yet considered the use of layered codecs.  When
         transporting each layer in its own RTP stream, everything should be
         ok.  If not, then we can foresee problems.

      8) Section 7 on NEWPRED needs more work (probably based on Fukunaga et.
         al draft).

      9) Further work is needed on maximum group size estimation for using
         feedback and on more detailed guidelines on calculating the maximum
         dithering delay for Early RRs (T_dither_max) per UM type.

      10) 
         Further investigations are desirable for the Early RR/UM scheduling
         and damping and the relationship of Early RR/UM scheduling to regular
         RTCP report scheduling.


      1. Abstract

         Predictive video coding is not loss resilient.  Any loss of coded
         data leads to annoying artifacts not only in the reproduced picture
         in which the loss occurred, but also in subsequent pictures.  Error
         resilience can be achieved by spending bits to convey redundant
         information using source coding based mechanisms or transport based
         mechanisms.  This can be done without the use of any feedback between
         the decoder(s) and the encoder.

         Alternatively, where applicable, decoders can inform the encoder
         through a feedback channel about a loss situation, and the encoder
         can react accordingly.  This approach provides better picture quality
         and is more efficient with respect to the bandwidth used by the
         encoder to achieve a given quality.  However, using feedback
         mechanisms is limited to certain application scenarios identified by
         encoder characteristics, delay constraints, and/or the number of
         recipients.  This document discusses various types of feedback
         information (called _upstream messages_, UMs) for predictive video
         coding and defines an RTCP packet format to transmit UMs in an RTP
         environment.  It can be used in conjunction with all payload
         specifications for predictive video coding schemes currently
         available for RTP.  To reflect the need for very low delay for the

         transmission of the UMs, which is necessary to make them efficient,
         the rules for sending receiver reports are enhanced to support Early
         Receiver Report (Early RRs) and an algorithm is specified that allows


      Wenger/Ott              Expires December 2000                 [Page 2]


      Internet Draft                                           July 14, 2000


         for low delay in small multicast groups, but prevents network
         flooding.

      2. Introduction

         2.1. Video Encoder-decoder synchronicity

         Most current video coding schemes for compressed video, such as the
         ITU-T H.261 and H.263 and ISO/IEC MPEG[124] employ a mechanism known
         as Inter Picture Prediction.  Each picture is divided into
         macroblocks of uniform size.   For each macroblock, one or more
         motion vectors may be identified and transmitted.  The residual
         signal after motion compensation is DCT-transformed, quantized,
         entropy coded, and transmitted as well.  The encoder reconstructs,
         based on this information, a so-called reference picture, which is
         used to perform the motion compensation and residual signal coding
         steps for the subsequent picture.  Since the reference picture is
         generated using only such information that is also available at the
         decoder, the reference picture is identical to the reconstructed
         picture at the decoder.  Having identical reference pictures at the
         encoder and decoder is referred to as encoder-decoder-synchronicity.

         Whenever data is damaged or lost on the way between the encoder and
         the decoder, the reconstructed picture at the decoder is no more
         identical with the encoder's reference picture -- the encoder-decoder
         synchronicity is lost.

         Any loss of the encoder-decoder synchronicity results in annoying
         artifacts at the decoder.  Because the prediction of subsequent
         pictures in the decoder is based on a damaged reference picture, the
         annoying artifacts are present not only in the picture in which the
         loss occurred; they propagate to all subsequent pictures, until,
         through source coding based mechanisms, the encoder-decoder
         synchronicity is restored.  Therefore, the goal of systems employing
         predictive video coding in a lossy environment must be to keep the
         encoder-decoder synchronicity, or, if this is not possible, to regain
         that synchronicity as quickly as possible.

         2.2. Non-feedback based mechanisms

         Avoiding the loss of the encoder-decoder synchronicity corresponds to
         avoiding the loss of coded picture data.  Such a task can be
         performed on the transport layer.  In RTP environments, the use of
         packet-based FEC is a good example for such a technique. (The use of
         TCP or reliable multicast as the transport for media streams would be
         an even better one but is inappropriate for low-delay (interactive)
         real-time systems.)  FEC schemes, interleaving, and other means for

         repairing real-time media streams may also add additional delay and
         significant bit rate overhead without being able to guarantee
         compensation of virtually all packet losses.


      Wenger/Ott              Expires December 2000                 [Page 3]


      Internet Draft                                           July 14, 2000


         Once the encoder-decoder synchronicity is lost, only source coding
         oriented mechanisms can help to regain it.  One common way is to send
         a non predictively coded picture (known as Intra picture).  Intra
         pictures have the disadvantage of being several times bigger than
         predictively coded pictures (Inter pictures).  Therefore, sending
         Intra pictures has negative implications both on the bandwidth and
         (in bandwidth limited environments) delay.  Another way is to use
         Intra macroblock refresh.  Here, certain parts of the picture (those
         affected by a packet loss) are coded non predictively in order to
         resynchronize the encoder and decoder over time.  Intra macroblock
         refresh has better delay characteristics then full Intra pictures
         because the picture size can be kept constant, but is less efficient
         in terms of bit rate/distortion than full Intra pictures.  More
         sophisticated means such as Reference Picture Selection (RPS) are
         also available in modern video coding standards.

         Systems not employing feedback channels may use any combination of
         the mechanisms described above to add error resilience -- at the cost
         of added bit rate and, sometimes, added delay.  The number of
         additional bits spent for error resilience can be adapted using the
         long-term packet loss rate information in the RTCP receiver reports.
         But, even when using such adaptive means, it is still likely that
         systems spend many more bits then theoretically necessary to achieve
         error resilience in order to be on the safe side.  Plus, as regular
         RTCP feedback is aimed at longer terms, reactivity to sudden losses
         is limited.  In all practical applications today this means that
         fewer bits are available for non redundant picture data, and hence
         the overall picture quality suffers.

         2.3 Feedback based systems

         Feedback-based systems try to avoid spending too many bits for
         redundant information by informing the encoder about a loss situation
         at the decoder(s).  The encoder can then react accordingly and spend
         redundant bits only when needed possibly only for the part of the
         picture that was effected by the loss -- thereby reducing the number
         of redundant bits and leaving more bits for useful information.  As a
         result, a higher reproduced picture quality can generally be expected
         when feedback channels are available.

         Similar to the observations of section 2.2, transport and source
         coding based mechanisms can be distinguished that react on loss

         situations reported by feedback.

         Transport based systems employing feedback react media unaware, by
         re-transmitting lost packets.  TCP is a good example for a protocol
         following such a scheme.  Transport-based feedback in real-time
         and/or multicast environments is a complex matter and subject of a
         lot of engineering and research in and outside of the IETF.  This
         specification is not concerned with pure transport-based feedback.


      Wenger/Ott              Expires December 2000                 [Page 4]


      Internet Draft                                           July 14, 2000


         Source coding based mechanisms may react upon the arrival of an
         upstream message indicating a loss situation by adding bits that
         restore, or at least make an effort to restore, the encoder-decoder
         synchronicity.  This process has to be performed by a real-time
         encoder.  However, schemes were reported, that allow the use of
         feedback also for non-real-time encoders by storing multiple
         representations of the same data (e.g. Inter and Intra coded), and
         dynamically switching between those representations.

         Several types of feedback messages, called Upstream Messages or UMs,
         are defined in this specification.  A UM can be as simple as a
         Boolean condition, indicating for example the loss of a full picture
         (and, therefore, the need of a full Intra picture transmission).
         Other feedback messages may contain more complex information such as
         information about the damage of a spatial region of the picture.  A
         special form consists of a message the format and semantics of which
         are not known at the transport level, because they are defined in the
         video codec standards.

         Most UMs contain negative acknowledge information, indicating an
         erroneous situation at the decoder.  In others, the nature of the
         acknowledge (positive, negative, or both) is part of the feedback
         message itself.  When used in multicast environments, positive
         acknowledge MUST NOT be used.

         This document assumes that feedback messages are transmitted using
         RTCP packets.  RTCP messages from the receivers to the sender cannot
         be send at any possible time, in order to prevent traffic explosion
         in case of large multicast groups.  Instead, the bit rate for all
         RTCP messages of all receivers together has to obey a maximum
         fraction of the total RTP session bit rate, yielding a very limited
         bit rate budget for a single receiver when having a large multicast
         group.  This, in turn, leads to an increased average delay when the
         size of the receiving multicast group grows.  (see section 6 of
         draft-ietf-avt-rtp-new-06.txt for details)

         This specification defines an algorithm that adheres to the bit rate
         limitations for the feedback channel on the long term, but allows
         short-term overdrafting for any receiver (but not all of them

         simultaneously).  Thus, the algorithm allows for better real-time
         performance then the one specified in draft-ietf-avt-rtp-new-06.txt.
         Traffic explosion in such cases in which many receivers identify a
         picture damage simultaneously is prevented by dithering.

         As this specification assumes a real-time encoder that has full
         control over its transmission bit rate, there is no scaling problem
         on the forward channel.  Any reaction to negative feedback generates
         additional bits, which have to be conveyed but this is taken from the
         sender's total bit rate budget.  The encoder can take this into
         account by, for example, sending fewer pictures per second, lower the
         quality and bit rate by changing quantization parameters and so
         forth.  The sender is also free to simply ignore feedback messages.


      Wenger/Ott              Expires December 2000                 [Page 5]


      Internet Draft                                           July 14, 2000


         Adjusting the tradeoff between the reproduced picture quality of all
         receivers of a multicast group and the amount of bits spent for
         encoder-decoder re-synchronization is a very complex task and is not
         covered in this specification.

         This document currently covers feedback messages for a Picture Loss
         Indication (PLI), Slice Loss Indication (SLI), and Reference Picture
         Selection Indication (RPSI).  PLI indicates the loss of a full
         picture and roughly corresponds to the Fast Intra Request known from
         H.320 systems and from RFC 2032 (H261 packetization).  Algorithms
         using SLI can be found under the acronym Automatic Repeat Request
         (ARQ) in the signal processing literature.  Reference Picture
         Selection, aka NEWPRED, is available in certain profiles of MPEG-4
         (version 2 and later) and as an optional mode in H.263 (version 2 and
         later).  The packet format specified in this document is open to
         extensions so that future feedback mechanisms can easily be
         integrated.


         2.4. Applications and Relationships to other Standards

         This specification is based on RTCP, which implies its use in an RTP
         environment.  RTP itself is used in a variety of systems such as in
         SIP- or H.323-based multimedia conferencing/telephony.

         As for the video codecs, there is currently a small set of standards
         that are, for the purpose of this discussion, roughly comparable.
         Many mechanisms for regaining encoder-decoder synchronicity are
         applicable to all video codecs.  Others require certain tools (such
         as Reference Picture Selection, aka NEWPRED) that are available only
         in certain versions of the standards, and/or optional tools whose use
         must be negotiated prior to being used.

         A few RTP payload specifications such as RFC 2032 already define a
         feedback mechanism for some of the coding algorithms considered in
         this specification.  An application capable of performing both

         schemes MUST use the feedback mechanism defined in this
         specification, although, for backward compatibility reasons, it MUST
         also be capable to conform to the feedback scheme defined in the
         respective RTP payload format, if this is required by that payload
         format.

         2.5 Remarks on the size of the multicast group

         This specification makes an attempt to prevent traffic explosion on
         the feedback channel in a very similar way as RTP does, with the
         exception of allowing individual receivers to overdraft their bit
         rate budget from time to time.  This is necessary in order to allow
         for low delay, which is needed by the algorithms reacting to UMs.

         This scaling, however, limits the usefulness of this mechanism in
         multicast groups from a certain size upwards (where the size


      Wenger/Ott              Expires December 2000                 [Page 6]


      Internet Draft                                           July 14, 2000


         threshold depends on a number of parameters including loss rate,
         frame rate).  The maximum size of the multicast group is not
         specified here (which is soft and also depends on application
         requirements).  The authors have done some rough calculations (for
         which it is too early to present them here in detail) that suggest
         that feedback is not expected to yield acceptable results for group
         sizes larger then 10 receivers (often less than five), assuming
         today's network conditions (RTT, loss rate) and common bit rates.

         2.6 Terminology

         The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
         "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
         document are to be interpreted as described in RFC 2119 [xxx]


      3. Low delay RTCP Feedback

         UMs are part of the RTCP control streams and are thus subject to the
         same bandwidth constraints as other RTCP traffic.  This means in
         particular, that it may not be possible to report a packet loss at a
         receiver immediately back to the sender.  However, the value of
         feedback given to a sender typically decreases over time -- in terms
         of the media quality as perceived by the user at the receiving end
         and/or the cost required to achieve media stream repair.

         RFC1889bis (i.e. draft revision draft-ietf-avt-rtp-new-06.txt)
         specifies rules when RTCP receiver reports (RRs) should be sent.
         This specification modifies those rules in order to allow
         applications to timely report damaged pictures, since most algorithms
         that use UMs are very critical to the UM timing.  See section 5 and
         following for a discussion of the impact of delay on the performance
         of each UM type.

         The modified algorithm can be outlined as follows: Normally, when no
         UMs have to be conveyed, RRs are sent following the rules of
         RFC1889bis.  If one or more receivers detect the need for an UM, the

         receiver first checks whether it has already seen a corresponding UM
         from any other receiver (which it can do as UMs are transmitted via
         multicast).  If this is the case then the receiver refrains from
         sending the UM, and continues to follow the regular RR sending
         schedule.  If the receiver has not yet seen a similar UM from any
         other receiver, it checks whether it has already overdrafted its RTCP
         bit rate budget before (without waiting for its regularly scheduled
         RR time).  Only if this is not the case, it sends the UM, after
         waiting a short, random dithering interval period.  Note that always
         a complete RR is sent in addition to the UM, in order to a) follow
         the rules for compound packets, and b) make sure that a sufficiently
         large number of RRs from each receiver is transmitted.  Considering
         the overhead for IP and UDP packets, it is believed that these
         advantages outweigh the disadvantage of preventing RTCP packets that
         contain only UMs.


      Wenger/Ott              Expires December 2000                 [Page 7]


      Internet Draft                                           July 14, 2000


         3.1. Definitions

         [Note: not all are used in this first revision of the draft.]

         a) Let the video stream be transmitted at a (roughly) constant frame
            rate f (in frames per second).  This results in an inter-frame
            time period of tau=1/f if frame are sent in regular intervals.

         b) For timing considerations, we assume that a single frame is always
            carried in a single packet.  If a frame does not fit into the MTU,
            then the frame is split across several packets.  Gaps are then
            measured between always the first or always the last packet of a
            frame.  For later considerations on feedback delay, if a frame is
            split its packets are paced for transmission (rather than sent as
            a burst) over some time period T_split, this can be modeled as a
            _constant_ added to the overall transmission delay from the sender
            to the receiver.

         c) Let T_rtt be the maximum round trip time as measured by RTCP.
            Note that this may be asymmetric.

         d) Let T_jitter be the maximum jitter measured from a sender to a
            receiver.

         e) Let t_rr and t_(rr-1) be the time for the next (last) scheduled
            RTCP RR transmission calculated prior to reconsideration.
            Let T_rr + t_(rr-1) = t_rr.  (In the RFC1889bis draft these are
            termed tp, tn, respectively).

         f) Let t_e be the time for which a feedback packet is scheduled.

         g) Let t_dither_max be the maximum interval for which an RTCP
            feedback packet may be additionally delayed (to prevent
            implosions).

         h) Let T_fd be the delay for the feedback message that a certain
            packet to return to the sender after.


         i) Let S be the number of active senders in the RTP session.

         j) Let N be the current estimate of the number of receivers in the
            RTP session.


         3.2. RTCP Feedback

         The feedback situation for a packet loss at a receiver is depicted in
         figure 1 below.  At time t0, a packet loss is detected at the
         receiver.  The receiver decides -- based upon current T_rtt, group
         size, and other (application-specific) parameters -- that a certain
         type of feedback information shall be sent back to the sender.


      Wenger/Ott              Expires December 2000                 [Page 8]


      Internet Draft                                           July 14, 2000


         To avoid an implosion of immediate feedback packets, the receivers
         delays transmission of the feedback packet(an Early RTCP RR/FB
         packets) by a random amount T_fd (with the random number evenly
         distributed in the interval [0, T_dither_max].  Transmission of
         the RTCP RR/FB is then scheduled for t_e = t0 + T_fd.

         The T_dither_max parameter depends on the feedback algorithm used
         (PLI, SLI, RPSI) and needs to take into account a number of other
         parameters (such as the estimated round-trip time) to limit the upper
         bound for the feedback in a way that ensures that the feedback
         information still makes sense when it reaches the sender.

         If an RTCP feedback packet is scheduled, the time slot for the next
         scheduled RTCP RR is updated accordingly to a new t_rr taken from
         the interval [t_(rr-1) + 2*T_rr, t_e + 2*T_rr] (with T_rr being the
         newly calculated deterministic RTCP interval.


                   pkt loss
                   detected
                      |
                      |  RTCP feedback
                      vXXXXXXXXXXXXXXXXXXXX            ) )
         |---+--------+-------------+-----+------------| |--------+--------->
             |        |             |     |            ( (        |
             |       t0            te                             |
          t_(rr-1)                                              t_rr
                       \_______  ________/
                               \/
                         T_dither_max


         Figure 1: Packet loss and parameters for Early RR scheduling


         3.3. Early RR/UM Algorithm

         Assume an active sender S0 (out of S senders) and a number N of
         receivers with R being one of these receivers.

         Assume further that R has verified that using feedback mechanisms is
         reasonable at the current constellation (which is highly application
         specific and hence not specified in this document at the moment; a
         future revision may contain more detailed guidelines to this end).

         Then, the following rules apply to transmitting an Upstream Messages
         (UM) as compound packet with RTCP RR and possibly other information.
         This compound RTCP packet is referred to as _RTCP RR/UM_.


         Initially, R sets allow_early=TRUE.


      Wenger/Ott              Expires December 2000                 [Page 9]


      Internet Draft                                           July 14, 2000


         At a point in time t0, R has transmitted the last RTCP RR packet at
         t_(rr-1) and has scheduled the next transmission (prior to
         reconsideration) for t_rr.

         If R detects a packet loss at time t0 then R should check first
         whether its next regularly scheduled RTCP RR is within the time
         bounds for the RTCP UM (t_e + t_dither_max > t_rr).  If so, no Early
         RR is scheduled; instead the UM is appended to the regular RTCP RR.
         Otherwise, R should check whether it is allowed to transmit an Early
         RR/FB packet (allow_early==TRUE).

            If so, R creates a UM unit, calculates t_dither_max and then
            schedules an early RR/UM packet for t_e = t0 + RND * t_dither_max
            with the RND function evenly distributed between 0 and 1.

            If R receives an RR/UM packet (indicating the same or a superset
            of the feedback information R wanted to transmit) before t_e is
            reached, the FB information is discarded and the transmission
            schedule for the next RR packet is reset to t_rr as calculated
            before.
            (Note: if the UM is piggybacked onto a regularly scheduled RTCP RR
             message, this should not affect transmission of the RR; but
             should the UM then be removed from the compound RR/UM?)

            Otherwise, when t_e is reached, R creates an RR, appends the UM
            information, and transmits the RR/UM packet.  R then sets
            allow_early=FALSE and recalculates t_rr += T_rr (possibly
            t_rr = t_e + 2*T_rr or some value in between; this needs further
            work).  As soon as R sends its next regularly scheduled RTCP RR
            (at the new t_rr), it sets allow_early=TRUE again.

            Option: R also starts a timer T_allow (e.g. T_allow=T_rr).
                    If T_rr expires before an Early RR/UM is received from
                    another participant in the RTP session, R sets
                    allow_early=TRUE.  If an Early RR/UM is received from
                    another participant before T_allow expires, T_allow
                    is cancelled.

         If allow_early==FALSE then R calculates t_dither_max and checks the
         time for the next scheduled RR: if t_rr - t0 < t_dither_max then R
         creates an FB unit for transmission along with the RR packet at t_rr
         (see above).  Otherwise, R does not send an RTCP RR/UM.

         Note: A bit in the UM unit is required to indicate whether the
               transmission occurs as an Early RR/FB or as a regularly
               scheduled RR/FB packet.  This E-bit is to be set accordingly.
               See section 4 for details.

         Note: Numerous variations spring to mind on RTCP RR/UM scheduling,
               dithering, damping, etc.  Right now, this is deliberately kept
               simple for an easy starting point and to provoke further


      Wenger/Ott              Expires December 2000                [Page 10]


      Internet Draft                                           July 14, 2000


               discussions.


         3.4. Summary of decision steps

         Before even considering whether or not to send RTCP UM information an
         application has to determine whether this mechanism is applicable:

         1) An application has to decide whether -- for the current ratio of
            frame rate with the associated (application-specific) maximum
            feedback delay and the currently observed round-trip time --
            feedback mechanisms can be applied at all.

         2) The application has to decide whether -- for a certain observed
            error rate, assigned bandwidth, frame rate, and group size -- (and
            which) feedback mechanisms can be applied.

         3) If these tests pass, the application has to follow the rules for
            transmitting early RTCP RRs or regularly scheduled RTCP RRs with
            piggybacked UMs.


      4. Format of RTCP Feedback messages

         The general format of an UM is outlined below.  Compound packets
         including UMs are possible.  All UMs concerning any given picture of
         any given receiver MUST be conveyed in a single compound packet, in
         order to prevent the loss of parts of such a combined message.  It
         SHOULD be avoided to combine different types of UMs for any given
         picture of any given receiver.

         0                   1                   2                   3
          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |V=2|    UMT  |E| PT=RTCP-Feedb |           length              |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |                              SSRC                             |
         +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
         |  Upstream Control Information (UCI)
         |
         |                                     +-+-+-+-+-+-+-+-+-+-+-+-+-+
         |                                     :          padding        |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

         version (V): 2 bits
         Identifies the version of RTP, which is the same in RTCP packets as
         in RTP data packets.

         Upstream Message Type (UMT): 5 bits
         Identifies the type of the upstream message.

         0:     forbidden


      Wenger/Ott              Expires December 2000                [Page 11]


      Internet Draft                                           July 14, 2000


         1:     Picture Loss Indication
         2:     Slice Lost Indication
         3:     Reference Picture Selection Indication
         4-31:  reserved

         Packet Type (PT): 8 bits
         Constant value (TBD) identifying RTCP Upstream messages.

         Early Upstream Message (E): 1 bit
         A bit that, when set, indicates that the UM is sent early, i.e. did
         not follow the regular schedule for sending RTCP Receiver Reports.


         Length: 16 bits: Number of bits valid in the UCI field.  A zero value
         indicates that the UCI field is not present (e.g. in case of a
         Picture Intra Request).

         SSRC: 32 bits
         SSRC is the synchronization source identifier for the sender of this
         packet.

         Upstream Control Information (UCI): variable
         Format and semantics of the UCI defer for the various upstream
         message types.  Fragmentation of an upstream message into several UCI
         fields is prohibited.  See the following sections for their
         definition.


      5. Message Type 1: Picture Loss Indication (PLI)

         5.1 Semantics

         With the Picture Loss Indication message a decoder informs the
         encoder about the loss of one or more full pictures

         5.2 Format

         PLI does not require parameters.  Therefore, the length field MUST be
         0, and there MUST NOT be Upstream Control Information.

         5.3 Timing Rules

         The timing follows the rules outlined in section 3.  In systems that
         employ both PLI and other UM types it may be advisable to follow the
         regular RTCP RR timing rules, since PLI is not as delay critical as
         other UM types.

         5.4 Remarks

         PLI messages typically trigger the sending of full Intra pictures.
         Intra Pictures are several times larger then predicted (Inter)
         pictures.  Their size is independent of the time they are generated.
         In most environments, especially when employing bandwidth-limited


      Wenger/Ott              Expires December 2000                [Page 12]


      Internet Draft                                           July 14, 2000


         links, the use of an Intra picture implies an allowed delay that is a
         significant multitude of the typical frame duration.  An example: If
         the sending frame rate is 10 fps, and an Intra picture is assumed to
         be 10 times as big as an Inter picture (not an unrealistic
         assumption, see [] for details), then a full second of latency has to
         be accepted.  In such an environment there is no need for a
         particular short delay in sending the upstream message.  Hence
         waiting for the next possible time slot allowed by RFC1889bis RTCP
         timing rules does not negatively influence system performance.

      6. Message Type 2: Slice Lost Indication

         6.1 Semantics

         With the Slice Lost Indication a decoder can inform an encoder that
         it was unable to decode one, or several consecutive, macroblocks.
         The encoder can take appropriate action in order to re-synchronize
         encoder and decoder by means of its choice, typically by sending the
         lost macroblocks in Intra mode.  This upstream message SHALL NOT be
         used for video codecs with non-uniform, dynamically changeable
         macroblock sizes such as H.263 with enabled Annex Q.  In such a case,
         an encoder cannot always identify the corrupted spatial region.


         6.2 Format

         When UMT indicates a Slice Lost Indication, then there is one
         additional UCI field the content of which is in the following format:

          0                   1                   2                   3
          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |            First        |  Number                 |  TR       |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

         First: 13 bits
         The macroblock (MB) address of the first lost macroblock.  The MB
         numbering is done such that the macroblock in the upper left corner
         of the picture is considered macroblock number 1 and the number for
         each macroblock increases from left to right and then from top to
         bottom in raster-scan order (such that if there is a total of N
         macroblocks in a picture, the bottom right macroblock is considered
         macroblock number N).

         Number: 13 bits
         The number of lost macroblocks, in scan order as discussed above.

         TR: 6 bits
         The six least significant bits of the Temporal Reference of the
         picture.

         6.3 Timing Rules


      Wenger/Ott              Expires December 2000                [Page 13]


      Internet Draft                                           July 14, 2000


         The efficiency of algorithms using the Slice Lost Indication is
         reduced greatly when the Indication is not transmitted in a timely
         fashion.  Motion compensation propagates corrupted pixels that are
         not reported as being corrupted.  Therefore, the use of the algorithm
         discussed in section 3 is highly recommended.

         Constraints on T_dither_max to be discussed.

         6.4 Remarks

         The First field of the UCI defines the first macroblock of a picture
         as 1 and not, as one could suspect, as 0.  This was done to align
         this specification with the comparable mechanism available in H.245.
         The maximum number of macroblocks in a picture (2**13 or 8192)
         corresponds to the maximum picture sizes of the ITU-T and ISO/IEC
         video codecs.  If future video codecs offer larger picture sizes
         and/or smaller macroblock sizes, then an additional upstream message
         has to be defined.  The six least significant bits of the Temporal
         Reference field are deemed to be sufficient to indicate the picture
         in which the loss occurred.

         Algorithms were reported that keep track of the regions effected by
         motion compensation, in order to allow for a transmission of Intra
         macroblocks to all those areas, regardless of the timing of the UM
         [TBP.].  While, when those algorithms are used, the timing of the UM
         is less critical then without, it has to be observed that those
         algorithms correct large parts of the picture and, therefore, have to
         transmit many for bits in case of delayed UMs.


         7. Message Type 3: Reference Picture Selection Indication

         7.1 Semantics

         Modern video coding standards such as MPEG-4 visual version 2 or
         H.263 version 2 allow the use of older reference pictures then the
         most recent one.  Typically, a first-in-first-out queue of reference
         pictures is maintained.  If an encoder has learned about a loss of
         encoder-decoder synchronicity, a known-as-correct reference picture
         can be used. As this reference picture is temporally further away
         then usual, the resulting predictively coded picture will use more
         bits.

         Both MPEG-4 and H.263 define a binary format for the _payload_ of an
         RPSI message that includes information such as the temporal ID of the
         damaged picture and the size of the damaged region.  This bit string
         is typically small _- a couple of dozen bits -_, of variable length,
         and self-contained, i.e. contains all information that is necessary
         to perform reference picture selection.

         Note that both MPEG-4 and H.263 allow the use of RPSI with positive
         feedback information as well.  That is, all corrected pictures are


      Wenger/Ott              Expires December 2000                [Page 14]


      Internet Draft                                           July 14, 2000


         reported.  Any form of positive feedback MUST NOT be used when in a
         multicast environment (reporting positive feedback about individual
         reference pictures at RTCP intervals is not expected to be of much
         use anyway).  For point-to-point communication, positive feedback MAY
         be used but, again, the bit rate budget of RTCP feedback will prevent
         the use in most scenarios anyway.

         7.2 Format

         When UM indicates an RPSI, then the length field is set to the number
         of bits of the following bit string that contains the RPS
         information.  This bit string follows byte aligned in the UCI field.
         Bit padding is used to achieve 32-bit word alignment of the UCI
         message (and the whole packet).

         7.3 Timing Rules

         RPS is even more critical to delay then algorithms using SLI.  This
         is due to the fact that the older the RPS message is, the more bits
         the encoder has to spend to achieve encoder-decoder synchronicity.
         See [TBP.] for some information about the overhead of RPS for certain
         bit rate/frame rate/loss rate scenarios.

         Therefore, RPS messages should typically be sent as soon as possible,
         employing the algorithm of section 3.

         Constraints on T_dither_max to be discussed.

         7.4 Remarks

         [To Do]


         8. Security considerations

         RTP packets transporting information with the proposed payload for-
         mat are subject to the security considerations discussed in the RTP
         specification [1]. This implies that confidentiality of the media
         streams is achieved by encryption.


         If the entire stream (extension data and AU data) is to be secured
         and all the participants are expected to have the keys to decode the
         entire stream, then the encryption is performed in the usual manner,
         and there is no conflict between the two operations (encapsulation
         and encryption).

         The need for a portion of stream (e.g. extension data) to be
         encrypted with a different key, or not to be encrypted, would require
         application level signaling protocols to be aware of the usage of
         the XT field, and to exchange keys and negotiate their usage on the
         media and extension data separately.


      Wenger/Ott              Expires December 2000                [Page 15]


      Internet Draft                                           July 14, 2000


         9. Acknowledgements

         Large parts of the syntax and the text concerned with RPS and NEWPRED
         were borrowed from an early I-D from Fukunaga et. al. that was
         concerned with MPEG-4 ES packetization.

         10. Full Copyright Statement

         Copyright (C) The Internet Society (1999). All Rights Reserved.

         This document and translations of it may be copied and furnished to
         others, and derivative works that comment on or otherwise explain it
         or assist in its implementation may be prepared, copied, published
         and distributed, in whole or in part, without restriction of any
         kind, provided that the above copyright notice and this paragraph are
         included on all such copies and derivative works.

         However, this document itself may not be modified in any way, such as
         by removing the copyright notice or references to the Internet Soci-
         ety or other Internet organizations, except as needed for the purpose
         of developing Internet standards in which case the procedures for
         copyrights defined in the Internet Standards process must be fol-
         lowed, or as required to translate it into languages other than
         English.

         The limited permissions granted above are perpetual and will not be
         revoked by the Internet Society or its successors or assigns.

         This document and the information contained herein is provided on an
         "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
         TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
         BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
         HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MER-
         CHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE."

         11. Authors' Addresses

            Stephan Wenger (stewe@cs.tu-berlin.de)
            TU Berlin
            Sekr. FR 6-3
            Franklinstr. 28-29
            D-10587 Berlin
            Germany

            Joerg Ott (jo@tzi.uni-bremen.de)
            Universitaet Bremen TZI
            MZH 5180
            Bibliothekstr. 1
            D-28359 Bremen
            Germany


         12. Bibliography: TODO


      Wenger/Ott              Expires December 2000                [Page 16]