<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-midtskogen-netvc-chromapred-00"
     ipr="trust200902">
  <front>
    <title abbrev="Improved chroma prediction">Improved chroma prediction</title>

    <author fullname="Steinar Midtskogen" initials="S." surname="Midtskogen">
      <organization>Cisco</organization>

      <address>
        <postal>
          <street></street>

          <city>Lysaker</city>

          <country>Norway</country>
        </postal>

        <phone></phone>

        <email>stemidts@cisco.com</email>
      </address>
    </author>

    <date month="July" year="2016"/>

    <abstract>
      <t>This document describes the technique used to improve the chroma
  prediction in the Thor video codec.</t>
    </abstract>
  </front>

  <middle>
    <section title="Introduction">
      <t>Modern video coding standards such as <xref
      target="I-D.fuldseth-netvc-thor">Thor</xref> form predictions
      for the luma channel (Y) and chroma channels (U and V) which are
      encoded separately (in that order).  The prediction for each
      channel has spatial or temporal dependencies only in its own
      channel.  Most of the perceived information of a video is to be
      found in the luma channel, but there still remain correlations
      between the luma and chroma channels.  For instance, the same
      shape of an object can often be seen in all three channels, and
      if this correlation is not exploited, some structural
      information will be transmitted three times.  Thor will attempt
      to improve the chroma prediction by finding linear relationships
      between the each of the initial chroma predictions and the luma
      prediction, and if certain criteria are satisfied, use that
      relationship to form a new prediction based on the reconstructed
      luma samples.

      </t>
    </section>

    <section title="Definitions">

      <section title="Requirements Language">
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
        document are to be interpreted as described in <xref
        target="RFC2119">RFC 2119</xref>.</t>
      </section>
    </section>

    <section title="Background">
      <t>
        The improved predictions are derived from the reconstructed
        luma samples using a mapping. The underlying assumption is
        that the colours can be identified by their luminosities.
        Informally we can say that a new chroma prediction is formed
        from the reconstructed luma block painted with the colours of
        the initial chroma prediction.
      </t>
      <t>
        There is often a linear correlation between the luma and
        chroma channel, so that a chroma sample c can be expressed by
        the linear function
        <figure align="center" anchor="eq1" title="Linear relationship">
          <artwork align="center">
            <![CDATA[
c = a*y + b
            ]]>
          </artwork>
        </figure>
        where y is the corresponding luma sample.  This observation
        has been previously been used in techniques to convert YUV
        4:2:0 and YUV 4:2:2 images to YUV 4:4:4, and in a (rejected)
        proposal for HEVC as a special intra mode.  Thor, however,
        generalises the prediction, so it does not depend on the
        coding mode (i.e. whether inter or intra, or the kind of
        inter/intra mode).
      </t>
      <t>
        Since it would be too costly to transmit the values a and b in
        the linear mapping, and since both the encoder and decoder
        must be able to compute identical predictions, a and b are
        derived from data available to both using linear regression.
      </t>
      </section>

      <section title="Computing the improved prediction">
      <t>
        Since the assumption that the correlation is the same in the
        predicted block and in the reconstructed block is not always
        true, the new prediction from luma might not be better even
        when there is a very good correlation in the predicted block.
        Therefore, we can only expected an improvement if the initial
        prediction is bad, and the luma residual is used as an
        estimate for this. The initial chroma prediction is kept
        unless the average squared difference between the
        reconstructed luma samples yr and the predicted y samples for
        an N*N prediction block is above 64:
        <figure align="center" anchor="eq2" title="Requirement for improvement 1">
          <artwork align="center">
            <![CDATA[
        _N_ _N_                  
        \   \                    
        /__ /__ (yr(i, j) - y(i, j)) ^ 2
        i=1 j=1                  
        -------------------------------- > 64
                       N*N        
            ]]>
          </artwork>
        </figure>
      </t>
      <t>
        The encoder and decoder must compute a and b using the same
        least square fit for an N*N prediction block, where y and c denote the
        luma and chroma samples in the initial prediction:
        <figure align="center" anchor="eq3" title="Equations for linear regression 1">
          <artwork align="center">
            <![CDATA[
        _N_ _N_                            _N_ _N_         
        \   \                              \   \           
 Ysum = /__ /__ y(i, j)             Csum = /__ /__ c(i, j) 
        i=1 j=1                            i=1 j=1         

        _N_ _N_                            _N_ _N_         
        \   \                              \   \           
YYsum = /__ /__ y(i, j) ^ 2        CCsum = /__ /__ c(i, j) ^ 2
        i=1 j=1                            i=1 j=1         
       
        _N_ _N_                  
        \   \                    
YCsum = /__ /__ y(i, j) * c(i, j)
        i=1 j=1                  
            ]]>
          </artwork>
        </figure>
      </t>
      <t>
        These sums will all be contained within a 32 bit signed integer.  Then
        the following must be computed using 64 bit arithmetic:
        <figure align="center" anchor="eq4" title="Equations for linear regression 2">
          <artwork align="center">
            <![CDATA[
SSyy = YYsum - ((Ysum * Ysum) >> 2*log2(N))
SScc = CCsum - ((Csum * Csum) >> 2*log2(N))
SSyc = YCsum - ((YCsum * YCsum) >> 2*log2(N))
            ]]>
          </artwork>
        </figure>
      </t>
      <t>
        Still using 64 bit arithmetic, if
        <figure align="center" anchor="eq5" title="Requirement for improvement 2">
          <artwork align="center">
            <![CDATA[
SSyy > 0 /\ 2 * SSyy * SSyy > SSyy * SScc
            ]]>
          </artwork>
        </figure>
        then it is assumed that the correlation is
        reasonably good and a new prediction will be computed and used.
        Otherwise, the initial prediction will be kept.  First,
        a and b must be computed:
        <figure align="center" anchor="eq6" title="Equation for linear regression 3">
          <artwork align="center">
            <![CDATA[
a = (SSyc << 16) / SSyy
b = ((Csum << 16) - a * YCsum) >> 2*log2(N)
            ]]>
          </artwork>
        </figure>
        The final operations are performed with 32 bit arithmetic, so
        a must be clipped to [-2^23, 2^23] and b must be clipped to
        [-2^31, 2^31-1].  The a new chroma prediction c' is computed
        using the reconstructed luma samples yr, a and b, and a
        clipping function saturating the results to an 8 bit value:
        <figure align="center" anchor="eq7" title="Improved chroma prediction">
          <artwork align="center">
            <![CDATA[
c'(i, j) = clip((a * yr(i, j) + b) >> 16)
            ]]>
          </artwork>
        </figure>
      </t>
      <t>
        The above assumes 4:4:4 format.  For the 4:2:0 format the
        predicted luma block must be subsampled first:
        <figure align="center" anchor="eq8" title="Subsampling of predicted luma block">
          <artwork align="center">
            <![CDATA[
y'(i,j) = (y(2*i, 2*j)   + y(2*i+i, 2j) +
           y(2*i, 2*j+1) + y(2*i+1, 2*j+1) + 2) >> 2
            ]]>
          </artwork>
        </figure>
        The resulting new chroma prediction must also be subsampled.  The clipping is performed before the subsampling.
        <figure align="center" anchor="eq9" title="Subsampling of improved chroma prediction">
          <artwork align="center">
            <![CDATA[
c(i, j) = (clip((a*yr(2*i, 2*j) + b) >> 16) +
           clip((a*yr(2*i+1, 2*j) + b) >> 16) +
           clip((a*yr(2*i, 2*j+1) + b) >> 16) +
           clip((a*yr(2*i+1, 2*j+1) + b) >> 16) + 2) >> 2
            ]]>
          </artwork>
        </figure>
        
      </t>
      <t>
        In intra mode the chroma prediction improvement must be
        performed right after each transform, since the new chroma
        reconstruction will be used to predict the next block.
      </t>
    </section>

<section title="Performance">
<t>
  The improved chroma prediction may significantly improve the
  compression efficiency for images or video containing high
  correlations between the channels. It is particularly useful for
  encoding screen content, 4:4:4 content, high frequency content and
  "difficult" content where traditional prediction techniques perform
  poorly.  Little quality change is seen for content not in these
  categories, but there is a general small increase in chroma PSNR.
  </t><t>

  An encoded configured for low delay and medium complexity was used
  for the following results.  The numbers have been computed using the
  Bjontegaard Delta Rate (<xref target="BDR">BDR</xref>).  The rates
  for Y, U and V have been shown separately.

</t>
<t>
<figure align="center" anchor="perf1" title="Compression Performance, improved prediction for intra blocks only">
<artwork align="center">
<![CDATA[
+--------------+--------------------+--------------------+
|              |        4:4:4       |        4:2:0       |
+--------------+------+------+------+------+------+------+
|Sequence      |   Y  |   U  |   V  |   Y  |   U  |   V  |
+--------------+------+------+------+------+------+------+
|cad_waveform  |-14.2%|-17.5%|-16.1%| -3.7%| -5.2%| -5.3%|
|pcb_layout    | -4.8%| -7.1%| -8.2%| -1.1%| -1.8%| -1.5%|
|ppt_doc_xls   |-19.6%| -9.1%|-10.8%| -0.3%| -1.2%| -0.0%|
|vc_doc_sharing| -3.0%| -6.5%| -6.7%| -0.0%| -0.2%| -2.1%|
|web_browsing  | -0.5%| -0.8%| -0.8%| -0.7%| -3.6%| -1.1%|
|wordEditing   | -4.3%| -6.0%| -3.5%| -0.1%| -0.4%| -0.7%|
|park_joy      | -0.2%| -0.5%| -0.2%| -0.5%| -4.4%| -1.1%|
|old_town_cross| -0.2%| -1.4%| -0.7%| -0.0%| -4.2%| -1.7%|
+--------------+------+------+------+------+------+------+
|Average       | -5.9%| -6.1%| -5.9%| -0.8%| -2.6%| -1.7%|
+--------------+------+------+------+------+------+------+
]]>
</artwork>
</figure>
</t>


<t>
<figure align="center" anchor="perf4" title="Compression Performance, improved prediction using intra only coding">
<artwork align="center">
<![CDATA[
+--------------+--------------------+--------------------+
|              |        4:4:4       |        4:2:0       |
+--------------+------+------+------+------+------+------+
|Sequence      |   Y  |   U  |   V  |   Y  |   U  |   V  |
+--------------+------+------+------+------+------+------+
|cad_waveform  |-22.6%|-27.9%|-25.9%| -2.8%| -3.9%| -3.7%|
|pcb_layout    |-18.9%|-27.1%|-20.5%| -1.1%| -1.8%| -1.6%|
|ppt_doc_xls   | -6.4%|-12.4%|-13.5%| -0.4%| -0.2%| -0.8%|
|vc_doc_sharing| -5.7%|-11.9%|-11.9%| -0.1%| -2.9%| -0.6%|
|web_browsing  | -1.4%| -1.8%| -1.8%| -0.6%| -1.0%| -1.2%|
|wordEditing   |-12.9%|-16.3%|-13.5%| -0.3%| -5.4%| -1.2%|
|park_joy      | -5,7%| -7.3%| -6.9%| -1.3%| -3.0%| -1.9%|
|old_town_cross| -1.9%| -2.4%| -2.4%| -0.2%| -4.9%| -1.7%|
+--------------+------+------+------+------+------+------+
|Average       | -9.4%|-13.4%|-12.1%| -0.8%| -2.8%| -1.7%|
+--------------+------+------+------+------+------+------+
]]>
</artwork>
</figure>
</t>

<t>
<figure align="center" anchor="perf2" title="Compression Performance, improved prediction for inter blocks only">
<artwork align="center">
<![CDATA[
+--------------+--------------------+--------------------+
|              |        4:4:4       |        4:2:0       |
+--------------+------+------+------+------+------+------+
|Sequence      |   Y  |   U  |   V  |   Y  |   U  |   V  |
+--------------+------+------+------+------+------+------+
|cad_waveform  |-10.3%|-13.5%|-11.6%| -0.6%| -1.1%| -1.3%|
|pcb_layout    | -3.6%| -5.8%| -5.2%|  0.0%|  0.0%|  0.0%|
|ppt_doc_xls   | -1.1%| -0.6%| -0.5%|  0.0%|  0.0%|  0.0%|
|vc_doc_sharing| -0.0%|  0.0%| -1.5%|  0.0%| -0.1%|  0.1%|
|web_browsing  | -0.1%| -0.1%| -0.1%|  0.0%| -0.2%| -0.4%|
|wordEditing   | -9.2%|-13.3%|-13.1%|  0.0%| -0.1%|  0.1%|
|park_joy      | -1.3%| -7.1%| -1.1%| -0.3%| -8.0%| -1.5%|
|old_town_cross|  0.0%| -0.1%|  0.1%|  0.0%| -0.0%|  0.0%|
+--------------+------+------+------+------+------+------+
|Average       |-3.2% | -5.1%| -4.1%| -0.1%| -1.2%| -0.4%|
+--------------+------+------+------+------+------+------+
]]>
</artwork>
</figure>
</t>

<t>
<figure align="center" anchor="perf3" title="Compression Performance, improved prediction for intra and inter blocks">
<artwork align="center">
<![CDATA[
+--------------+--------------------+--------------------+
|              |        4:4:4       |        4:2:0       |
+--------------+------+------+------+------+------+------+
|Sequence      |   Y  |   U  |   V  |   Y  |   U  |   V  |
+--------------+------+------+------+------+------+------+
|cad_waveform  |-20.0%|-24.7%|-22.4%| -4.1%| -5.7%| -5.6%|
|pcb_layout    | -7.3%|-11.1%|-10.1%| -1.1%| -1.8%| -1.6%|
|ppt_doc_xls   |-19.6%| -8.9%| -9.0%| -0.3%| -1.2%| -0.8%|
|vc_doc_sharing| -3.2%| -6.5%|-10.1%|  0.2%| -0.0%| -0.5%|
|web_browsing  | -0.5%| -0.3%| -0.5%| -0.8%| -3.7%| -2.5%|
|wordEditing   | -9.3%|-14.1%|-13.9%| -0.1%| -1.0%| -0.6%|
|park_joy      | -1.4%| -7.4%| -1.2%| -0.8%| -9.9%| -1.4%|
|old_town_cross| -0.2%| -1.4%| -0.5%| -0.0%| -4.3%| -1.7%|
+--------------+------+------+------+------+------+------+
|Average       | -7.7%| -9.3%| -8.5%| -0.9%| -3.4%| -1.8%|
+--------------+------+------+------+------+------+------+
]]>
</artwork>
</figure>
</t>

</section>

    <section anchor="IANA" title="IANA Considerations">
      <t>This document has no IANA considerations yet. TBD</t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>This document has no security considerations yet. TBD</t>
    </section>

    <section anchor="Acknowledgments" title="Acknowledgments">
      <t>The author would like to thank Arild Fuldseth and Mo Zanaty for reviewing
        this document and design.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="reference.RFC.2119"?>
      <?rfc include='reference.I-D.fuldseth-netvc-thor'?>
    </references>

    <references title="Informative References">
      <reference anchor="BDR">
         <front>
           <title>Calculation of average PSNR differences between RD-curves</title>
           <author initials="G." surname="Bjontegaard" fullname="Gisle Bjontegaard" />
           <date month="April" year="2001"/>
         </front>
        <seriesInfo name="ITU-T SG16 Q6 VCEG-M33" value="" />
      </reference>
    </references>
  </back>
</rfc>
