<?xml version="1.0" encoding="utf-8"?>
<!--
     draft-rfcxml-general-template-standard-00

     This template includes examples of the most commonly used features of RFCXML with comments
     explaining how to customise them. This template can be quickly turned into an I-D by editing
     the examples provided. Look for [REPLACE], [REPLACE/DELETE], [CHECK] and edit accordingly.
     Note - 'DELETE' means delete the element or attribute, not just the contents.

     Documentation is at https://authors.ietf.org/en/templates-and-schemas
-->
<?xml-model href="rfc7991bis.rnc"?>  <!-- Required for schema validation and schema-aware editing -->
<!-- <?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?> -->
<!-- This third-party XSLT can be enabled for direct transformations in XML processors, including most browsers -->


<!DOCTYPE rfc [
  <!ENTITY nbsp    "&#160;">
  <!ENTITY zwsp   "&#8203;">
  <!ENTITY nbhy   "&#8209;">
  <!ENTITY wj     "&#8288;">
]>
<!-- If further character entities are required then they should be added to the DOCTYPE above.
     Use of an external entity file is not recommended. -->

<rfc
  xmlns:xi="http://www.w3.org/2001/XInclude"
  category="std"
  consensus="true"
  docName="draft-ietf-mlcodec-opus-scalable-quality-extension-00"
  ipr="trust200902"
  obsoletes=""
  updates="6716"
  submissionType="IETF"
  xml:lang="en"
  version="3">
<!-- [REPLACE]
       * docName with name of your draft
     [CHECK]
       * category should be one of std, bcp, info, exp, historic
       * ipr should be one of trust200902, noModificationTrust200902, noDerivativesTrust200902, pre5378Trust200902
       * updates can be an RFC number as NNNN
       * obsoletes can be an RFC number as NNNN
-->

  <front>
    <title abbrev="Scalable Quality Extension">Scalable Quality Extension for the Opus Codec (Opus HD)</title>

    <seriesInfo name="Internet-Draft" value="draft-ietf-mlcodec-opus-scalable-quality-extension-00"/>

    <author fullname="Jean-Marc Valin" initials="JM" surname="Valin">
      <organization>Google</organization>
      <address>
        <postal>
          <country>CA</country>
          <!-- Uses two letter country code -->
        </postal>
        <email>jeanmarcv@google.com</email>
      </address>
    </author>

    <date year="2025"/>
    <!-- On draft subbmission:
         * If only the current year is specified, the current day and month will be used.
         * If the month and year are both specified and are the current ones, the current day will
           be used
         * If the year is not the current one, it is necessary to specify at least a month and day="1" will be used.
    -->

    <area>Applications and Real-Time</area>
    <workgroup>mlcodec</workgroup>
    <!-- "Internet Engineering Task Force" is fine for individual submissions.  If this element is
          not present, the default is "Network Working Group", which is used by the RFC Editor as
          a nod to the history of the RFC Series. -->

    <keyword>Opus, RFC6716</keyword>

    <abstract>
      <t>This document updates RFC6716 to add support for a scalable quality layer.</t>
    </abstract>

  </front>

  <middle>

    <section>
      <name>Introduction</name>
      <t>This document updates RFC6716 to add support for a scalable quality extension layer.
        Implementations conforming to this document will be referred to as Opus HD.</t>

      <section>
        <name>Requirements Language</name>
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
          "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT
          RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
          interpreted as described in BCP 14 <xref target="RFC2119"/>
          <xref target="RFC8174"/> when, and only when, they appear in
          all capitals, as shown here.</t>
      </section>

    </section>

    <section anchor="format">
      <name>Scalable Quality Extension</name>
      <t>The Opus codec was designed to operate at sampling frequencies up to 48 kHz,
        with an audio bandwidth up to 20 kHz.
        The CELT mode that is used for high bitrate coding uses vector quantization with
        a mostly implicit bit allocation system that is dictated by the bitstream definition.
        Opus can allocate up to 8 bits per MDCT bin in some of the bands.
      </t>

      <t>While Opus capabilities listed above are sufficient to achieve
        perceptually transparent audio coding, there is a use for codecs that scale
        beyond those specs. That includes the current market for 24-bit/96 kHz codecs,
        but also any application where the intended recipient is not (only) a human being,
        e.g. ultra-sonic applications.</t>

      <t>This document proposes a scalable quality extension layer that both increases the
        resolution of existing Opus quantizers below 20 kHz, and defines a way of coding audio
        above 20 kHz, with a sampling rate of 96 kHz. The extension is designed to be forward
        and backward compatible with <xref target="RFC6716"/>.
        All extra bits use the Opus extension mechanism defined in <xref target="opus-extension"/>
        and a 96 kHz decoder is designed to decode a regular 48 kHz RFC 6716 stream and vice versa.
        </t>

      <t>
        The code corresponding to this draft (work in progress) is available on the exp_qext26
        branch of the Opus repository at https://gitlab.xiph.org/xiph/opus/ .
      </t>

      <section anchor="existing">
        <name>Extended resolution</name>
        <t>
        To reduce the coding error, we need to increase the resolution for 3 different quantizers:
        the fine energy quantizer (scalar), the band pyramid vector quantizer (PVQ), and the
        band splitting angle quantizer.
        We also introduce a new cubic quantizer that scales to higher
        bit depths than PVQ.
        To preserve compatibility, all of the bits extending the Opus resolution are stored in
        the extension payload.</t>
        <section anchor="fine">
          <name>Fine energy quantizer</name>
          <t>For each band we can increase the resolution of the fine energy quantizer by adding
            extra bits.
            The extra bits are added in the same way as the regular fine energy quantizer adds
            resolution on top of the coarse energy quantizer.</t>
        </section>
        <section anchor="pvq">
          <name>PVQ</name>
          <t>From a size-K PVQ codebook in N dimensions we can create an extended codebook of size
            u*K, where u is always odd and selected as 2^b-1, where b is the extra depth.
            Let y_i be the (integer) value for dimension i of the size-K codebook and z_i be
            the corresponding value for the size-u*K codebook.
            We define a refinement r_i = z_i - u*y_i where |r_i| &lt; u.
            In the N=2 special case, |r_i| &lt; (u+1)/2.
            Only the refinement r_i needs to be coded since the regular Opus bitstream already
            includes y_i.
            The last residual value r_{N-1} does not need to be coded since it's value can be
            inferred from the other values and the knowledge that the sum of the absolute
            values is u*K. The only exception is when y_{N-1}=0, in which case, a single
            sign bit is coded, but the magnitude is still inferred.</t>
          <t>
            Even though |r_i| &lt; u, smaller values or r_i are more likely, so we benefit from
            entropy coding r_i. We assume that the likelihood of for |r_i| &lt; (u+1)/2 is 7/8 and
            use that probability for decoding a "large" flag. If large=0, we decode b bits and
            and subtract u/2 to get r_i. If large=1, we decode a sign bit, followed by an integer
            with b-1 bits to which we add u/2+1 and apply the sign.
            </t>
        </section>
        <section anchor="angle">
          <name>Angle quantizer</name>
          <t>When using mid-side stereo or when splitting a band, we code an angle representing
            the atan of two sub-vectors' magnitude ratio.
            The standard Opus encoder can code angles with up to 8 bits.
            In a similar way to how we define the PVQ refinement, we pick u = 2^b-1 where u is
            the number of (equidistant) extra quantization levels to be added between each of
            the original levels.
            We code a unit symbol between 0 and u-1, where 0 is almost mid-point to the previous
            (lower) quantization level, u-1 is almost mid-point to the next (higher) level,
            and (u+1)/2 perfectly lines up with with the originally selected quantization of the
            standard Opus layer.
            </t>
        </section>
        <section anchor="cubic">
          <name>Cubic quantizer</name>
          <t>The existing Opus PVQ only scales up to 32-bit codebooks. For cases where there is
            no PVQ in the base Opus layer, we define a new cubic quantizer.
            Whereas the PVQ codebook is defined as a reflected simplex warped onto the unit sphere,
            the cubic quantizer warps an N-dimensional cubic shell to the same unit sphere.
            Cubic codewords specify which face of the cube the vector lies on by coding the dimension
            and sign of the largest component (using 1+log2(N) bits).
            The face of an N-dimensional hyper-cube shell is a full N-1-dimensional cube and can be
            coded with N-1 scalar values from 0 to Q-1 ((N-1)*log2(Q) bits).
            We use even Q (Q=2^b) for non-transient bands (B==1) and odd Q (Q=2^b-1) for transient
            bands (B>1).
            </t>
        </section>
      </section>

      <section anchor="extra">
        <name>Extended frequency range</name>
        <t>
        To extend the audio bandwidth, we need to define more frequency bands.
        Because psychoacoustics is no longer involved past 20 kHz, all new bands are defined
        to have a width of 2 kHz. Therefore, when encoding 48-kHz content we add 2 extra bands
        and when encoding 96-kHz content, we add 14 extra bands. A flag is encoded to specify
        whether 2 or 14 bands are added. The decoder uses that flag to know how many bands to
        decode, regardless of whether decoding at 48 or 96 kHz.</t>
      </section>

      <section anchor="allocation">
        <name>Bit allocation</name>
        <t>
        The allocation of the extra bit depth b is explicitly signaled for each band at a time, using
        a resolution of 1/4 bit depth between 0 and a band-dependent cap C, where C=12 for bands
        up to 20 kHz, and C=14 for the added bands.
        For band b_i, we use entropy coding to give a higher probability to three different cases:
        b_i=0, b_i=C, and b_i=b_{i-1}.
        In the case where b_{i-1} is either 0 or C, we merge two of the probabilities.
        The ICDF for the general case is {120, 112, 70, 0}, where the first symbol means
        b_i=0, the second means b_i=C, the third means b_i=b_{i-1}, and the last symbol means
        that b_i is equal to 1 plus a unit value coded from 0 to C-1.
        For b_{i-1} = 0, we use the ICDF {64, 50, 0} and for b_{i-1}=C, we use {110, 60, 0},
        where the last symbol always means that a unit is coded.
        We start with b_{-1} = 0.
        </t>
        <t>
          Given b_i, the number of extra energy bits is given by (b_i+3)/4.
          The number of 1/8 bits (BITRES) allocated for PVQ refinement and/or cubic codebook bits is
          given by ((W-1)*C * b_i * 8 + 2)/4, where W is the number of bins in the band and C is the
          number of channels.
        </t>
      </section>

      <section anchor="time-domain">
        <name>Time-domain processing at 96 kHz</name>
        <t>CELT includes two time-domain filter pairs that require updating for 96 kHz:
          the preemphasis/deempahsis filters, as well as the pitch prefilter/postfilter.
          The CELT deemphasis filter is currently defined as D(z)=1/(1 - a1*z^-1) for a 48 kHz
          signal, where a1=27853/32768.
          To obtain approximately the same response in the 0-20 kHz range using a
          sampling rate of 96 kHz, we instead use D(z)=g*(1 - b1*z^-1)/(1 - a1*z^-1),
          where g=5415/8192, b1=7209/32768, a1=30245/32768.</t>

        <t>For the pitch pre-filter/post-filter, we use zero-insertion upsampling of the 48 kHz
          filters, which results in the same frequency response below 24 kHz and a "folded" image
          above 24 kHz. For example, if for a pitch period T (in 48 kHz units) the postfilter was
          P(z)=1/(1 - a0*z^-T+1 - a1*z^-T - a2*z^-T-1), then for the same pitch, the 96 kHz filter
          becomes P(z)=1/(1 - a0*z^-2T+2 - a1*z^-2T - a2*z^-2T-2).</t>
      </section>

    </section>

    <section anchor="payload">
      <name>Format</name>
      <t>The extension payload is entropy-coded in the following order</t>

      <table>
        <thead>
          <tr><th>Symbol(s)</th><th>PDF/Description</th></tr>
        </thead>
        <tbody>
          <tr><td>96 kHz flag</td> <td>{1, 1}/2</td></tr>
          <tr><td>Intensity stereo</td> <td>uint</td></tr>
          <tr><td>Dual stereo</td> <td>{1, 1}/2</td></tr>
          <tr><td>Intra coarse energy</td> <td>{7, 1}/2</td></tr>
          <tr><td>Coarse energy (high bands)</td> <td></td></tr>
          <tr><td>Bit allocation</td> <td><xref target="allocation"/></td></tr>
          <tr><td>Fine energy (low bands)</td> <td><xref target="fine"/></td></tr>
          <tr><td>PVQ refinement</td> <td><xref target="pvq"/>, <xref target="angle"/></td></tr>
          <tr><td>Fine energy (high bands)</td> <td><xref target="allocation"/></td></tr>
          <tr><td>PVQ and cubic codebook (high bands)</td> <td><xref target="cubic"/></td></tr>
        </tbody>
      </table>
    </section>

    <section anchor="conformance">
      <name>Conformance</name>
      <t>This section defines some tests for evaluating Opus HD conformance. The evaluations are based on test vectors,
        along with a custom-made comparison tool named qext_compare and derived from the original opus_compare tool
        from RFC 6716. </t>
      <section anchor="conformance-decoder">
        <name>Decoder</name>
        <t>For a decoder to conform to this specification, its output MUST be within the specified bounds
          for all testvectors when compared using qext_compare. Two sets of testvectors are provided. The first,
          qext_vector01.bit through qext_vector06.bit are high-quality 1024&nbsp;kb/s bitstreams for which the
          decoder target files are qext_vector01.f32 through qext_vector06.f32.</t>
        <t>Using the reference decoder, a testvector can be decoded as:</t>
        <sourcecode>
% opus_demo -d 96000 2 -f32 qext_vector01.f32 qext_test01.f32
        </sourcecode>
        <t>
          Then the output can be compared to the reference with specific thresholds:</t>
        <sourcecode>
% qext_compare -s -f32 -thresholds 0.05 0.1 0.1 \
                       qext_vector01dec.f32 qext_test01.f32
          </sourcecode>
        <t>which will output "Comparison PASSED" if the tested decoder is close enough to the target output.
        </t>
        <t>The second set of testvectors are meant to test corner cases. The bitstream files are
          qext_vector01fuzz.bit through qext_vector06fuzz.bit, the corresponding target files
          qext_vector01fuzz.f32 through qext_vector06fuzz.f32. Those are decoded in the same way as the first
          set of testvectors, but the comparison thresholds are looser:</t>
        <sourcecode>
% qext_compare -s -f32 -thresholds 0.1 0.5 1.0  \
                       qext_vector01decfuzz.f32 qext_test01fuzz.f32
        </sourcecode>
          <t>For the tested decoder to be deemed compliant with this specification, all testvectors from both
            sets MUST pass.</t>
      </section>
      <section anchor="conformance-encoder">
        <name>Encoder</name>
        <t>It is RECOMMENDED, but not mandatory for an encoder to comply with the following criteria.
          Encoder testing involves encoding uncoded testvectors, decoding them with the reference decoder, and
          comparing to the original uncoded files. The test is meant to evaluate encoding at bitrates around 1&nbsp;Mb/s.
          For example, encoding at 1024 kb/s can be done with (may be different for the encoder being tested):</t>
        <sourcecode>
% opus_demo -e audio 96000 2 1024000 -f32 -cbr -qext \
                           qext_vector01.f32 qext_test01.bit
        </sourcecode>
        <t>The resulting bitstream then needs to be decoded with the reference implementation with:</t>
        <sourcecode>
% opus_demo -d 96000 2 -f32 qext_test01.bit qext_test01enc.f32
        </sourcecode>
        <t>The decoded output (from the tested encoder) can then be compared against
           the reference original uncoded PCM:</t>
        <sourcecode>
% qext_compare -s -f32 -skip &lt;N&gt; \
                   -thresholds 0.1 0.5 &lt;RMS threshold&gt; \
                   qext_vector01.f32 qext_test01enc.f32
          </sourcecode>
        <t>where &lt;RMS threshold&gt; for testvectors 1 through 6 are 5, 320, 20, 20, 40, and 5, respectively.
          The &lt;N&gt; value for the skip compensates for the encoder delay.
          For the "audio" encoding mode, N=624 samples.
          For "restricted-lowdelay", N=240 samples.</t>
        <t>Because of the inherent limitations of objective quality evaluation metrics -- including
          the qext_compare tool -- it is also RECOMMENDED to perform a subjective evaluation of an encoder.
          </t>
      </section>
    </section>

    <section anchor="IANA">
      <name>IANA Considerations</name>
      <t>[Note: Until the IANA performs the actions described below, implementers should use 124 instead of 33 as the extension number.]</t>
      <t>This document assigns ID 33 to the "Opus Extension IDs" registry created in
        <xref target="opus-extension"/> to implement the proposed scalable quality extension. </t>
    </section>


    <section anchor="Security">
      <!-- All drafts are required to have a security considerations section. See RFC 3552 for a guide. -->
      <name>Security Considerations</name>
      <t>This document does not add security considerations beyond those already documented in <xref target="RFC6716"/>.
          </t>
    </section>

  </middle>

  <back>
    <references>
      <name>References</name>
      <references>
        <name>Normative References</name>

        <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.2119.xml"/>
        <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.8174.xml"/>
        <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.6716.xml"/>
        <reference anchor="opus-extension">
        <!-- Manually added reference -->
          <front>
            <title>Extension Formatting for the Opus Codec (draft-ietf-mlcodec-opus-extension)</title>
            <author initials="T.B." surname="Terriberry" fullname="Timothy B. Terriberry">
              <organization/>
            </author>
            <author initials="J.-M." surname="Valin" fullname="Jean-Marc Valin">
              <organization/>
            </author>
            <date year="2023" month="October"/>
            <abstract>
              <t>Opus extension format.
              </t>
            </abstract>
          </front>
        </reference>

        <!-- The recommended and simplest way to include a well known reference -->

      </references>

      <!--
      <references>
        <name>Informative References</name>

        <reference anchor="exampleRefMin">
          <front>
            <title>Title [REPLACE]</title>
            <author initials="Initials [REPLACE]" surname="Surname [REPLACE]">
              <organization/>
            </author>
            <date year="2006"/>
          </front>
        </reference>

      </references>
      -->
    </references>

    <!--
    <section>
      <name>Appendix 1 [REPLACE/DELETE]</name>
      <t>This becomes an Appendix [REPLACE]</t>
    </section>

    <section anchor="Acknowledgments" numbered="false">
      <name>Acknowledgments</name>
      <t>We would like to thank...</t>
    </section>

    <section anchor="Contributors" numbered="false">
      <name>Contributors</name>
      <t>Thanks to all of the contributors. [REPLACE]</t>
    </section>
      -->

 </back>
</rfc>
