<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?xml-stylesheet type='text/xsl' href='http://xml.resource.org/authoring/rfc2629.xslt' ?>
<!-- Alterations to I-D/RFC boilerplate -->
<?rfc private="" ?>
<!-- Default private="" Produce an internal memo 2.5pp shorter than an I-D or RFC -->
<?rfc rfcprocack="yes" ?>
<!-- Default rfcprocack="no" add a short sentence acknowledging xml2rfc -->
<?rfc strict="no" ?>
<!-- Default strict="no" Don't check I-D nits -->
<?rfc rfcedstyle="yes" ?>
<!-- Default rfcedstyle="yes" attempt to closely follow finer details from the latest observable RFC-Editor style -->
<!-- IETF process -->
<?rfc iprnotified="no" ?>
<!-- Default iprnotified="no" I haven't disclosed existence of IPR to IETF -->
<!-- ToC format -->
<?rfc toc="yes" ?>
<!-- Default toc="no" No Table of Contents -->
<!-- ToC depth -->
<?rfc tocdepth="4" ?>
<!-- Default tocDepth="3" Exclude subsections of depth >3 from Table of Contents -->
<!-- Cross referencing, footnotes, comments -->
<?rfc symrefs="yes"?>
<!-- Default symrefs="no" Don't use anchors, but use numbers for refs -->
<?rfc sortrefs="yes"?>
<!-- Default sortrefs="no" Don't sort references into order -->
<?rfc comments="yes" ?>
<!-- Default comments="no" Don't render comments -->
<?rfc inline="no" ?>
<!-- Default inline="no" if comments is "yes", then render comments inline; otherwise render them in an `Editorial Comments' section -->
<!-- Pagination control -->
<?rfc compact="yes"?>
<!-- Default compact="no" Start sections on new pages -->
<?rfc subcompact="no"?>
<!-- Default subcompact="(as compact setting)" yes/no is not quite as compact as yes/yes -->
<!-- HTML formatting control -->
<?rfc emoticonic="yes" ?>
<!-- Default emoticonic="no" Doesn't prettify HTML format -->
<rfc category="std" consensus="yes" docName="draft-ietf-tcpm-accurate-ecn-21"
     ipr="trust200902" updates="3168">
  <front>
    <title abbrev="Accurate TCP-ECN Feedback">More Accurate ECN Feedback in
    TCP</title>

    <author fullname="Bob Briscoe" initials="B." surname="Briscoe">
      <organization>Independent</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <country>UK</country>
        </postal>

        <email>ietf@bobbriscoe.net</email>

        <uri>http://bobbriscoe.net/</uri>
      </address>
    </author>

    <author fullname="Mirja K&uuml;hlewind" initials="M."
            surname="K&uuml;hlewind">
      <organization>Ericsson</organization>

      <address>
        <postal>
          <street/>

          <country>Germany</country>
        </postal>

        <email>ietf@kuehlewind.net</email>
      </address>
    </author>

    <author fullname="Richard Scheffenegger" initials="R."
            surname="Scheffenegger">
      <organization>NetApp</organization>

      <address>
        <postal>
          <street/>

          <city>Vienna</city>

          <region/>

          <code/>

          <country>Austria</country>
        </postal>

        <email>Richard.Scheffenegger@netapp.com</email>
      </address>
    </author>

    <date year=""/>

    <area>Transport</area>

    <workgroup>TCP Maintenance &amp; Minor Extensions (tcpm)</workgroup>

    <keyword>Congestion Control and Management</keyword>

    <keyword>Congestion Notification</keyword>

    <keyword>Feedback</keyword>

    <keyword>Reliable</keyword>

    <keyword>Ordered</keyword>

    <keyword>Protocol</keyword>

    <keyword>ECN</keyword>

    <abstract>
      <t>Explicit Congestion Notification (ECN) is a mechanism where network
      nodes can mark IP packets instead of dropping them to indicate incipient
      congestion to the end-points. Receivers with an ECN-capable transport
      protocol feed back this information to the sender. ECN was originally
      specified for TCP in such a way that only one feedback signal can be
      transmitted per Round-Trip Time (RTT). Recent new TCP mechanisms like
      Congestion Exposure (ConEx), Data Center TCP (DCTCP) or Low Latency Low
      Loss Scalable Throughput (L4S) need more accurate ECN feedback
      information whenever more than one marking is received in one RTT. This
      document updates the original ECN specification to specify a scheme to
      provide more than one feedback signal per RTT in the TCP header. Given
      TCP header space is scarce, it allocates a reserved header bit
      previously assigned to the ECN-Nonce. It also overloads the two existing
      ECN flags in the TCP header. The resulting extra space is exploited to
      feed back the IP-ECN field received during the 3-way handshake as well.
      Supplementary feedback information can optionally be provided in a new
      TCP option, which is never used on the TCP SYN. The document also
      specifies the treatment of this updated TCP wire protocol by
      middleboxes.</t>
    </abstract>
  </front>

  <!-- ================================================================ -->

  <middle>
    <!-- ================================================================ -->

    <section anchor="accecn_Introduction" title="Introduction">
      <t>Explicit Congestion Notification (ECN) <xref target="RFC3168"/> is a
      mechanism where network nodes can mark IP packets instead of dropping
      them to indicate incipient congestion to the end-points. Receivers with
      an ECN-capable transport protocol feed back this information to the
      sender. In RFC 3168, ECN was specified for TCP in such a way that only
      one feedback signal could be transmitted per Round-Trip Time (RTT).
      Recently, proposed mechanisms like Congestion Exposure (ConEx <xref
      target="RFC7713"/>), DCTCP <xref target="RFC8257"/> or L4S <xref
      target="I-D.ietf-tsvwg-l4s-arch"/> need to know when more than one
      marking is received in one RTT which is information that cannot be
      provided by the feedback scheme as specified in <xref
      target="RFC3168"/>. This document specifies an update to the ECN
      feedback scheme of RFC 3168 that provides more accurate information and
      could be used by these and potentially other future TCP extensions. A
      fuller treatment of the motivation for this specification is given in
      the associated requirements document <xref target="RFC7560"/>.</t>

      <t>This documents specifies a standards track scheme for ECN feedback in
      the TCP header to provide more than one feedback signal per RTT. It will
      be called the more accurate ECN feedback scheme, or AccECN for short.
      This document updates RFC 3168 with respect to negotiation and use of
      the feedback scheme for TCP. All aspects of RFC 3168 other than the TCP
      feedback scheme, in particular the definition of ECN at the IP layer,
      remain unchanged by this specification. <xref
      target="accecn_3168_updates"/> gives a more detailed specification of
      exactly which aspects of RFC 3168 this document updates.</t>

      <t>AccECN is intended to be a complete replacement for classic TCP/ECN
      feedback, not a fork in the design of TCP. AccECN feedback complements
      TCP's loss feedback and it can coexist alongside 'classic' <xref
      target="RFC3168"/> TCP/ECN feedback. So its applicability is intended to
      include all public and private IP networks (and even any non-IP networks
      over which TCP is used today), whether or not any nodes on the path
      support ECN, of whatever flavour. This document uses the term Classic
      ECN when it needs to distinguish the RFC 3168 ECN TCP feedback scheme
      from the AccECN TCP feedback scheme.</t>

      <t>AccECN feedback overloads the two existing ECN flags in the TCP
      header and allocates the currently reserved flag (previously called NS)
      in the TCP header, to be used as one three-bit counter field indicating
      the number of congestion experienced marked packets. Given the new
      definitions of these three bits, both ends have to support the new wire
      protocol before it can be used. Therefore during the TCP handshake the
      two ends use these three bits in the TCP header to negotiate the most
      advanced feedback protocol that they can both support, in a way that is
      backward compatible with <xref target="RFC3168"/>.</t>

      <t>AccECN is solely a change to the TCP wire protocol; it covers the
      negotiation and signaling of more accurate ECN feedback from a TCP Data
      Receiver to a Data Sender. It is completely independent of how TCP might
      respond to congestion feedback, which is out of scope, but ultimately
      the motivation for accurate ECN feedback. Like Classic ECN feedback,
      AccECN can be used by standard Reno congestion control <xref
      target="RFC5681"/> to respond to the existence of at least one
      congestion notification within a round trip. Or, unlike Reno, AccECN can
      be used to respond to the extent of congestion notification over a round
      trip, as for example DCTCP does in controlled environments <xref
      target="RFC8257"/>. For congestion response, this specification refers
      to RFC 3168, or ECN experiments such as those referred to in <xref
      target="RFC8311"/>, namely: a TCP-based Low Latency Low Loss Scalable
      (L4S) congestion control <xref target="I-D.ietf-tsvwg-l4s-arch"/>; or
      Alternative Backoff with ECN (ABE) <xref target="RFC8511"/>.</t>

      <t>It is RECOMMENDED that the AccECN protocol is implemented alongside
      SACK <xref target="RFC2018"/> and the experimental ECN++ protocol <xref
      target="I-D.ietf-tcpm-generalized-ecn"/>, which allows the ECN
      capability to be used on TCP control packets. Therefore, this
      specification does not discuss implementing AccECN alongside <xref
      target="RFC5562"/>, which was an earlier experimental protocol with
      narrower scope than ECN++.</t>

      <section title="Document Roadmap">
        <t>The following introductory section outlines the goals of AccECN
        (<xref target="accecn_Goals"/>). Then terminology is defined (<xref
        target="accecn_Terminology"/>) and a recap of existing prerequisite
        technology is given (<xref target="accecn_Recap"/>).</t>

        <t><xref target="accecn_Overview"/> gives an informative overview of
        the AccECN protocol. Then <xref target="accecn_Spec"/> gives the
        normative protocol specification, and <xref
        target="accecn_3168_updates"/> clarifies which aspects of RFC 3168 are
        updated by this specification. <xref
        target="accecn_Interact_Variants"/> assesses the interaction of AccECN
        with commonly used variants of TCP, whether standardized or not. <xref
        target="accecn_Properties"/> summarizes the features and properties of
        AccECN.</t>

        <t><xref target="accecn_IANA_Considerations"/> summarizes the protocol
        fields and numbers that IANA will need to assign and <xref
        target="accecn_Security_Considerations"/> points to the aspects of the
        protocol that will be of interest to the security community.</t>

        <t><xref target="accecn_Algo_Examples"/> gives pseudocode examples for
        the various algorithms that AccECN uses and <xref
        target="accecn_flags_rationale"/> explains why AccECN uses flags in
        the main TCP header and quantifies the space left for future use.</t>
      </section>

      <section anchor="accecn_Goals" title="Goals">
        <t><xref target="RFC7560"/> enumerates requirements that a candidate
        feedback scheme will need to satisfy, under the headings: resilience,
        timeliness, integrity, accuracy (including ordering and lack of bias),
        complexity, overhead and compatibility (both backward and forward). It
        recognizes that a perfect scheme that fully satisfies all the
        requirements is unlikely and trade-offs between requirements are
        likely. <xref target="accecn_Properties"/> presents the properties of
        AccECN against these requirements and discusses the trade-offs
        made.</t>

        <t>The requirements document recognizes that a protocol as ubiquitous
        as TCP needs to be able to serve as-yet-unspecified requirements.
        Therefore an AccECN receiver aims to act as a generic (dumb) reflector
        of congestion information so that in future new sender behaviours can
        be deployed unilaterally.</t>
      </section>

      <section anchor="accecn_Terminology" title="Terminology">
        <t>
          <list style="hanging">
            <t hangText="AccECN:">The more accurate ECN feedback scheme will
            be called AccECN for short.</t>

            <t hangText="Classic ECN:">the ECN protocol specified in <xref
            target="RFC3168"/>.</t>

            <t hangText="Classic ECN feedback:">the feedback aspect of the ECN
            protocol specified in <xref target="RFC3168"/>, including
            generation, encoding, transmission and decoding of feedback, but
            not the Data Sender's subsequent response to that feedback.</t>

            <t hangText="ACK:">A TCP acknowledgement, with or without a data
            payload (ACK=1).</t>

            <t hangText="Pure ACK:">A TCP acknowledgement without a data
            payload.</t>

            <t hangText="Acceptable packet / segment:">A packet or segment
            that passes the acceptability tests in <xref target="RFC0793"/>
            and <xref target="RFC5961"/>.</t>

            <!-- <t hangText="SupAccECN:">The Supplementary Accurate ECN field that
            provides additional resilience as well as information about the
            ordering of ECN markings covered by a delayed ACK.</t> -->

            <t hangText="TCP client:">The TCP stack that originates a
            connection.</t>

            <t hangText="TCP server:">The TCP stack that responds to a
            connection request.</t>

            <t hangText="Data Receiver:">The endpoint of a TCP half-connection
            that receives data and sends AccECN feedback.</t>

            <t hangText="Data Sender:">The endpoint of a TCP half-connection
            that sends data and receives AccECN feedback.</t>

            <!-- <t
            hangText="Outgoing AccECN Protocol Handler (or, Outgoing Protocol Handler):">The
            protocol handler at the Data Receiver that marshals the AccECN
            fields when sending an ACK.</t>

            <t
            hangText="Incoming AccECN Protocol Handler (or, Incoming Protocol Handler):">The
            protocol handler at the Data Sender that reads the AccECN fields
            when receiving an ACK.</t> -->
          </list>
        </t>

        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
        document are to be interpreted as described in BCP 14 <xref
        target="RFC2119"/> <xref target="RFC8174"/> when, and only when, they
        appear in all capitals, as shown here.</t>
      </section>

      <section anchor="accecn_Recap"
               title="Recap of Existing ECN feedback in IP/TCP">
        <t>ECN <xref target="RFC3168"/> uses two bits in the IP header. Once
        ECN has been negotiated with the receiver at the transport layer, an
        ECN sender can set two possible codepoints (ECT(0) or ECT(1)) in the
        IP header to indicate an ECN-capable transport (ECT). <!-- It is
        prohibited from doing so unless it has checked that the receiver will
        understand ECN and be able to feed it back.--> If both ECN bits are
        zero, the packet is considered to have been sent by a Not-ECN-capable
        Transport (Not-ECT). When a network node experiences congestion, it
        will occasionally either drop or mark a packet, with the choice
        depending on the packet's ECN codepoint. If the codepoint is Not-ECT,
        only drop is appropriate. If the codepoint is ECT(0) or ECT(1), the
        node can mark the packet by setting both ECN bits, which is termed
        'Congestion Experienced' (CE), or loosely a 'congestion mark'. <xref
        target="accecn_Tab_ECN"/> summarises these codepoints.</t>

        <texttable anchor="accecn_Tab_ECN"
                   title="The ECN Field in the IP Header">
          <ttcol>IP-ECN codepoint</ttcol>

          <ttcol>Codepoint name</ttcol>

          <ttcol>Description</ttcol>

          <c>0b00</c>

          <c>Not-ECT</c>

          <c>Not&nbsp;ECN-Capable&nbsp;Transport</c>

          <c>0b01</c>

          <c>ECT(1)</c>

          <c>ECN-Capable&nbsp;Transport (1)</c>

          <c>0b10</c>

          <c>ECT(0)</c>

          <c>ECN-Capable&nbsp;Transport (0)</c>

          <c>0b11</c>

          <c>CE</c>

          <c>Congestion&nbsp;Experienced</c>
        </texttable>

        <t>In the TCP header the first two bits in byte 14 are defined as
        flags for the use of ECN (CWR and ECE in <xref
        target="accecn_Fig_TCPHdr"/> <xref target="RFC3168"/>). A TCP client
        indicates it supports ECN by setting ECE=CWR=1 in the SYN, and an
        ECN-enabled server confirms ECN support by setting ECE=1 and CWR=0 in
        the SYN/ACK. On reception of a CE-marked packet at the IP layer, the
        Data Receiver starts to set the Echo Congestion Experienced (ECE) flag
        continuously in the TCP header of ACKs, which ensures the signal is
        received reliably even if ACKs are lost. The TCP sender confirms that
        it has received at least one ECE signal by responding with the
        congestion window reduced (CWR) flag, which allows the TCP receiver to
        stop repeating the ECN-Echo flag. This always leads to a full RTT of
        ACKs with ECE set. Thus any additional CE markings arriving within
        this RTT cannot be fed back.</t>

        <t>The last bit in byte 13 of the TCP header was defined as the Nonce
        Sum (NS) for the ECN Nonce <xref target="RFC3540"/>. In the absence of
        widespread deployment RFC 3540 has been reclassified as historic <xref
        target="RFC8311"/> and the respective flag has been marked as
        "reserved", making this TCP flag available for use by the AccECN
        experiment instead.</t>

        <?rfc needLines="8" ?>

        <figure align="center" anchor="accecn_Fig_TCPHdr"
                title="The (post-ECN Nonce) definition of the TCP header flags">
          <artwork align="center"><![CDATA[             
  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|               |           | N | C | E | U | A | P | R | S | F |
| Header Length | Reserved  | S | W | C | R | C | S | S | Y | I |
|               |           |   | R | E | G | K | H | T | N | N |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
]]></artwork>
        </figure>
      </section>
    </section>

    <!-- ================================================================ -->

    <section anchor="accecn_Overview"
             title="AccECN Protocol Overview and Rationale">
      <t>This section provides an informative overview of the AccECN protocol
      that will be normatively specified in <xref target="accecn_Spec"/></t>

      <t>Like the original TCP approach, the Data Receiver of each TCP
      half-connection sends AccECN feedback to the Data Sender on TCP
      acknowledgements, reusing data packets of the other half-connection
      whenever possible.</t>

      <!--<section title="Essential and Supplementary Parts">-->

      <t>The AccECN protocol has had to be designed in two parts:<list
          style="symbols">
          <t>an essential part that re-uses ECN TCP header bits for the Data
          Receiver to feed back the number of packets arriving with CE in the
          IP-ECN field. This provides more accuracy than classic ECN feedback,
          but limited resilience against ACK loss;</t>

          <t>a supplementary part using a new AccECN TCP Option that provides
          additional feedback on the number of bytes that arrive marked with
          each of the three ECN codepoints in the IP-ECN field (not just CE
          marks). This provides greater resilience against ACK loss than the
          essential feedback, but it is more likely to suffer from middlebox
          interference. <!-- <t>a supplementary part that serves three functions:<list
                style="symbols">
                <t>it greatly improves the resilience of AccECN feedback
                information against loss of ACKs;</t>

                <t>it provides information about the order in which ECN
                markings in the IP header arrived at the Data Receiver;</t>

                <t>it improves the timeliness of AccECN feedback when a
                delayed ACK covers multiple congestion signals.</t>
              </list> --></t>
        </list>The two part design was necessary, given limitations on the
      space available for TCP options and given the possibility that certain
      incorrectly designed middleboxes prevent TCP using any new options.</t>

      <t>The essential part overloads the previous definition of the three
      flags in the TCP header that had been assigned for use by ECN. This
      design choice deliberately replaces the classic ECN feedback protocol,
      rather than leaving classic ECN feedback intact and adding more accurate
      feedback separately because:<list style="symbols">
          <t>this efficiently reuses scarce TCP header space, given TCP option
          space is approaching saturation;</t>

          <t>a single upgrade path for the TCP protocol is preferable to a
          fork in the design;</t>

          <t>otherwise classic and accurate ECN feedback could give
          conflicting feedback on the same segment, which could open up new
          security concerns and make implementations unnecessarily
          complex;</t>

          <t>middleboxes are more likely to faithfully forward the TCP ECN
          flags than newly defined areas of the TCP header.</t>
        </list></t>

      <t>AccECN is designed to work even if the supplementary part is removed
      or zeroed out, as long as the essential part gets through.</t>

      <section title="Capability Negotiation">
        <t>AccECN is a change to the wire protocol of the main TCP header,
        therefore it can only be used if both endpoints have been upgraded to
        understand it. The TCP client signals support for AccECN on the
        initial SYN of a connection and the TCP server signals whether it
        supports AccECN on the SYN/ACK. The TCP flags on the SYN that the
        client uses to signal AccECN support have been carefully chosen so
        that a TCP server will interpret them as a request to support the most
        recent variant of ECN feedback that it supports. Then the client falls
        back to the same variant of ECN feedback.</t>

        <t>An AccECN TCP client does not send the new AccECN Option on the SYN
        as SYN option space is limited. The TCP server sends the AccECN Option
        on the SYN/ACK and the client sends it on the first ACK to test
        whether the network path forwards the option correctly.</t>
      </section>

      <section title="Feedback Mechanism">
        <t>A Data Receiver maintains four counters initialized at the start of
        the half-connection. Three count the number of arriving payload bytes
        respectively marked CE, ECT(1) and ECT(0) in the IP-ECN field. The
        fourth counts the number of packets arriving marked with a CE
        codepoint (including control packets without payload if they are
        CE-marked).</t>

        <t>The Data Sender maintains four equivalent counters for the half
        connection, and the AccECN protocol is designed to ensure they will
        match the values in the Data Receiver's counters, albeit after a
        little delay.</t>

        <t>Each ACK carries the three least significant bits (LSBs) of the
        packet-based CE counter using the ECN bits in the TCP header, now
        renamed the Accurate ECN (ACE) field (see <xref
        target="accecn_Fig_ACE_ACK"/> later). The 24 LSBs of each byte counter
        are carried in the AccECN Option.</t>
      </section>

      <section title="Delayed ACKs and Resilience Against ACK Loss">
        <t>With both the ACE and the AccECN Option mechanisms, the Data
        Receiver continually repeats the current LSBs of each of its
        respective counters. There is no need to acknowledge these continually
        repeated counters, so the congestion window reduced (CWR) mechanism is
        no longer used. Even if some ACKs are lost, the Data Sender ought to
        be able to infer how much to increment its own counters, even if the
        protocol field has wrapped.</t>

        <t>The 3-bit ACE field can wrap fairly frequently. Therefore, even if
        it appears to have incremented by one (say), the field might have
        actually cycled completely then incremented by one. The Data Receiver
        is not allowed to delay sending an ACK to such an extent that the ACE
        field would cycle. However cycling is still a possibility at the Data
        Sender because a whole sequence of ACKs carrying intervening values of
        the field might all be lost or delayed in transit.</t>

        <!-- "Further, if the lost ACKs included no payload they would never be retransmitted." Commented out, because even data ACks would be retransmitted 
with a different ACE field anyway.-->

        <t>The fields in the AccECN Option are larger, but they will increment
        in larger steps because they count bytes not packets. Nonetheless,
        their size has been chosen such that a whole cycle of the field would
        never occur between ACKs unless there had been an infeasibly long
        sequence of ACK losses. Therefore, as long as the AccECN Option is
        available, it can be treated as a dependable feedback channel.</t>

        <t>If the AccECN Option is not available, e.g.&nbsp;it is being
        stripped by a middlebox, the AccECN protocol will only feed back
        information on CE markings (using the ACE field). Although not ideal,
        this will be sufficient, because it is envisaged that neither ECT(0)
        nor ECT(1) will ever indicate more severe congestion than CE, even
        though future uses for ECT(0) or ECT(1) are still unclear <xref
        target="RFC8311"/>. Because the 3-bit ACE field is so small, when it
        is the only field available, the Data Sender has to interpret it
        assuming the most likely wrap, but with a degree of conservatism.</t>

        <t>Certain specified events trigger the Data Receiver to include an
        AccECN Option on an ACK. The rules are designed to ensure that the
        order in which different markings arrive at the receiver is
        communicated to the sender (as long as options are reaching the sender
        and as long as there is no ACK loss). Implementations are encouraged
        to send an AccECN Option more frequently, but this is left up to the
        implementer.</t>

        <!--As one ACK might acknowledge multiple data segments at the same time the 
proposed scheme providing accumulated information does not preserve the 
order at which the marking were received.This decision was taken 
deliberately to reduce complexity.-->
      </section>

      <section title="Feedback Metrics">
        <t>The CE packet counter in the ACE field and the CE byte counter in
        the AccECN Option both provide feedback on received CE-marks. The CE
        packet counter includes control packets that do not have payload data,
        while the CE byte counter solely includes marked payload bytes. If
        both are present, the byte counter in the option will provide the more
        accurate information needed for modern congestion control and policing
        schemes, such as L4S, DCTCP or ConEx. If the option is stripped, a
        simple algorithm to estimate the number of marked bytes from the ACE
        field is given in <xref target="accecn_Algo_ACE_Bytes"/>.</t>

        <t>Feedback in bytes is provided in order to protect against the
        receiver using attacks similar to 'ACK-Division' to artificially
        inflate the congestion window, which is why <xref target="RFC5681"/>
        now recommends that TCP counts acknowledged bytes not packets.</t>
      </section>

      <section anchor="accecn_demb_reflector" title="Generic (Dumb) Reflector">
        <t>The ACE field provides feedback about CE markings in the IP-ECN
        field of both data and control packets. According to <xref
        target="RFC3168"/> the Data Sender is meant to set the IP-ECN field of
        control packets to Not-ECT. However, mechanisms in certain private
        networks (e.g.&nbsp;data centres) set control packets to be ECN
        capable because they are precisely the packets that performance
        depends on most.</t>

        <t>For this reason, AccECN is designed to be a generic reflector of
        whatever ECN markings it sees, whether or not they are compliant with
        a current standard. Then as standards evolve, Data Senders can upgrade
        unilaterally without any need for receivers to upgrade too. It is also
        useful to be able to rely on generic reflection behaviour when senders
        need to test for unexpected interference with markings (for instance
        <xref target="accecn_sec_ecn-mangling"/>, <xref
        target="accecn_sec_ACE_init_invalid"/> and <xref
        target="accecn_Mbox_Interference"/> of the present document and para 2
        of Section 20.2 of <xref target="RFC3168"/>).</t>

        <t>The initial SYN is the most critical control packet, so AccECN
        provides feedback on its IP-ECN field. Although RFC 3168 prohibits an
        ECN-capable SYN, providing feedback of ECN marking on the SYN supports
        future scenarios in which SYNs might be ECN-enabled (without
        prejudging whether they ought to be). For instance, <xref
        target="RFC8311"/> updates this aspect of RFC 3168 to allow
        experimentation with ECN-capable TCP control packets.</t>

        <t>Even if the TCP client (or server) has set the SYN (or SYN/ACK) to
        not-ECT in compliance with RFC 3168, feedback on the state of the
        IP-ECN field when it arrives at the receiver could still be useful,
        because middleboxes have been known to overwrite the IP-ECN field as
        if it is still part of the old Type of Service (ToS) field <xref
        target="Mandalari18"/>. For example, if a TCP client has set the SYN
        to Not-ECT, but receives feedback that the IP-ECN field on the SYN
        arrived with a different codepoint, it can detect such middlebox
        interference. Previously, neither end knew what IP-ECN field the other
        had sent. So, if a TCP server received ECT or CE on a SYN, it could
        not know whether it was invalid (or valid) because only the TCP client
        knew whether it originally marked the SYN as Not-ECT (or ECT).
        Therefore, prior to AccECN, the server's only safe course of action in
        this example was to disable ECN for the connection. Instead, the
        AccECN protocol allows the server to feed back the received ECN field
        to the client, which then has all the information to decide whether
        the connection has to fall-back from supporting ECN (or not).</t>
      </section>
    </section>

    <!-- ================================================================ -->

    <section anchor="accecn_Spec" title="AccECN Protocol Specification">
      <section anchor="accecn_Negotiation" title="Negotiating to use AccECN">
        <t/>

        <section anchor="accecn_Negotiation_3WHS"
                 title="Negotiation during the TCP handshake">
          <t>Given the ECN Nonce <xref target="RFC3540"/> has been
          reclassified as historic <xref target="RFC8311"/>, the present
          specification re-allocates the TCP flag at bit 7 of the TCP header,
          which was previously called NS (Nonce Sum), as the AE (Accurate ECN)
          flag (see IANA Considerations in <xref
          target="accecn_IANA_Considerations"/>) as shown below.</t>

          <figure align="center" anchor="accecn_Fig_TCPHdr_AE"
                  title="The (post-AccECN) definition of the TCP header flags                  during the TCP handshake">
            <artwork align="center"><![CDATA[             
  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|               |           | A | C | E | U | A | P | R | S | F |
| Header Length | Reserved  | E | W | C | R | C | S | S | Y | I |
|               |           |   | R | E | G | K | H | T | N | N |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
]]></artwork>
          </figure>

          <t>During the TCP handshake at the start of a connection, to request
          more accurate ECN feedback the TCP client (host A) MUST set the TCP
          flags AE=1, CWR=1 and ECE=1 in the initial SYN segment.</t>

          <t>If a TCP server (B) that is AccECN-enabled receives a SYN with
          the above three flags set, it MUST set both its half connections
          into AccECN mode. Then it MUST set the AE, CWR and ECE TCP flags on
          the SYN/ACK to the combination in the top block of <xref
          target="accecn_Tab_Negotiation"/> that feeds back the IP-ECN field
          that arrived on the SYN. This applies whether or not the server
          itself supports setting the IP-ECN field on a SYN or SYN/ACK (see
          <xref target="accecn_demb_reflector"/> for rationale).</t>

          <t>When the TCP server returns any of the 4 combinations in the top
          block of <xref target="accecn_Tab_Negotiation"/>, it confirms that
          it supports AccECN. The TCP server MUST NOT set one of these 4
          combination of flags on the SYN/ACK unless the preceding SYN
          requested support for AccECN as above.</t>

          <t/>

          <!--Bob: Out of scope: move to fall-back draft.-->

          <!--If the sending host (A) indicated AccECN support, the receiving host (B) may set the IP ECN field of the SYN/ACK to ECT.

 <t>If the SYN was ECT and the SYN/ACK indicates that a CE mark was received
         (NS=1), the originating host (A) MUST react to this congestion
         indication e.g.&nbsp;by selecting a lower initial sending window.</t> 
         
         <t>If the SYN was ECT marked, but the receiving host is not AccECN enabled
         (ECE=0 and CWR=0 in SYN/ACK), the originating host (A) SHOULD conservatively
         reduce its initial window as if the SYN had been CE-marked.</t> -->

          <t>Once a TCP client (A) has sent the above SYN to declare that it
          supports AccECN, and once it has received the above SYN/ACK segment
          that confirms that the TCP server supports AccECN, the TCP client
          MUST set both its half connections into AccECN mode.</t>

          <t>Once in AccECN mode, a TCP client or server has the rights and
          obligations to participate in the ECN protocol defined in <xref
          target="accecn_implications_accecn_mode"/>.</t>

          <t>The procedure for the client to follow if a SYN/ACK does not
          arrive before its retransmission timer expires is given in <xref
          target="accecn_sec_SYN_rexmt"/>.</t>
        </section>

        <section anchor="accecn_sec_backward_compat"
                 title="Backward Compatibility">
          <t>The three flags set to 1 to indicate AccECN support on the SYN
          have been carefully chosen to enable natural fall-back to prior
          stages in the evolution of ECN, as above. <xref
          target="accecn_Tab_Negotiation"/> tabulates all the negotiation
          possibilities for ECN-related capabilities that involve at least one
          AccECN-capable host. The entries in the first two columns have been
          abbreviated, as follows: <list hangIndent="4" style="hanging">
              <t hangText="AccECN:">More Accurate ECN Feedback (the present
              specification)</t>

              <t hangText="Nonce:">ECN Nonce feedback <xref
              target="RFC3540"/></t>

              <t hangText="ECN:">'Classic' ECN feedback <xref
              target="RFC3168"/></t>

              <t hangText="No ECN:">Not-ECN-capable. Implicit congestion
              notification using packet drop.</t>
            </list></t>

          <!--Could turn first 4 columns into 2 columns headed A & B, with Ac, N, E, I within the columns.-->

          <!-- <?rfc needLines="23" ?> -->

          <table align="center" anchor="accecn_Tab_Negotiation">
            <name>ECN capability negotiation between Client (A) and Server (B)</name>
            <thead>
              <tr>
                <th align="left">A</th>
                <th align="left">B</th>
                <th align="center">SYN<br/>A-&gt;B<br/>AE&nbsp;CWR&nbsp;ECE</th>
                <th align="center">SYN/ACK<br/>B-&gt;A<br/>AE&nbsp;CWR&nbsp;ECE</th>
                <th align="left">Feedback Mode</th>
              </tr>
            </thead>
            <tbody>
              <tr>
                <td align="left">AccECN<br/>AccECN<br/>AccECN<br/>AccECN</td>
                <td align="left">AccECN<br/>AccECN<br/>AccECN<br/>AccECN</td>
                <td align="center">1 &nbsp; 1 &nbsp; 1<br/>1 &nbsp; 1 &nbsp; 1<br/>1 &nbsp; 1 &nbsp; 1<br/>1 &nbsp; 1 &nbsp; 1</td>
                <td align="center">0 &nbsp; 1 &nbsp; 0<br/>0 &nbsp; 1 &nbsp; 1<br/>1 &nbsp; 0 &nbsp; 0<br/>1 &nbsp; 1 &nbsp; 0</td>
                <td align="left">AccECN (Not-ECT SYN)<br/>AccECN (ECT1 on SYN)<br/>AccECN (ECT0 on SYN)<br/>AccECN (CE on SYN)</td>
              </tr>
              <tr>
                <td align="left"/>
                <td align="left"/>
                <td align="center"/>
                <td align="center"/>
                <td align="left"/>
              </tr>
              <tr>
                <td align="left">AccECN<br/>AccECN<br/>AccECN</td>
                <td align="left">Nonce<br/>ECN<br/>No ECN</td>
                <td align="center">1 &nbsp; 1 &nbsp; 1<br/>1 &nbsp; 1 &nbsp; 1<br/>1 &nbsp; 1 &nbsp; 1</td>
                <td align="center">1 &nbsp; 0 &nbsp; 1<br/>0 &nbsp; 0 &nbsp; 1<br/>0 &nbsp; 0 &nbsp; 0</td>
                <td align="left">(Reserved)<br/>classic ECN<br/>Not ECN</td>
              </tr>
              <tr>
                <td align="left"/>
                <td align="left"/>
                <td align="center"/>
                <td align="center"/>
                <td align="left"/>
              </tr>
              <tr>
                <td align="left">Nonce<br/>ECN<br/>No ECN</td>
                <td align="left">AccECN<br/>AccECN<br/>AccECN</td>
                <td align="center">0 &nbsp; 1 &nbsp; 1<br/>0 &nbsp; 1 &nbsp; 1<br/>0 &nbsp; 0 &nbsp; 0</td>
                <td align="center">0 &nbsp; 0 &nbsp; 1<br/>0 &nbsp; 0 &nbsp; 1<br/>0 &nbsp; 0 &nbsp; 0</td>
                <td align="left">classic ECN<br/>classic ECN<br/>Not ECN</td>
              </tr>
              <tr>
                <td align="left"/>
                <td align="left"/>
                <td align="center"/>
                <td align="center"/>
                <td align="left"/>
              </tr>
              <tr>
                <td align="left">AccECN</td>
                <td align="left">Broken</td>
                <td align="center">1 &nbsp; 1 &nbsp; 1</td>
                <td align="center">1 &nbsp; 1 &nbsp; 1</td>
                <td align="left">Not ECN</td>
              </tr>
            </tbody>
          </table>

          <t><xref target="accecn_Tab_Negotiation"/> is divided into blocks
          each separated by an empty row.<list style="numbers">
              <t>The top block shows the case already described in <xref
              target="accecn_Negotiation"/> where both endpoints support
              AccECN and how the TCP server (B) indicates congestion
              feedback.</t>

              <t>The second block shows the cases where the TCP client (A)
              supports AccECN but the TCP server (B) supports some earlier
              variant of TCP feedback, indicated in its SYN/ACK. Therefore, as
              soon as an AccECN-capable TCP client (A) receives the SYN/ACK
              shown it MUST set both its half connections into the feedback
              mode shown in the rightmost column. If it has set itself into
              classic ECN feedback mode it MUST then comply with <xref
              target="RFC3168"/>.<vspace blankLines="1"/>The server response
              called 'Nonce' in the table is now historic. For an AccECN
              implementation, there is no need to recognize or support ECN
              Nonce feedback <xref target="RFC3540"/>, which has been
              reclassified as historic <xref target="RFC8311"/>. AccECN is
              compatible with alternative ECN feedback integrity approaches
              (see <xref target="accecn_Integrity"/>).</t>

              <t>The third block shows the cases where the TCP server (B)
              supports AccECN but the TCP client (A) supports some earlier
              variant of TCP feedback, indicated in its SYN.<vspace
              blankLines="1"/>When an AccECN-enabled TCP server (B) receives a
              SYN with AE,CWR,ECE = 0,1,1 it MUST do one of the
              following:<list style="symbols">
                  <t>set both its half connections into the classic ECN
                  feedback mode and return a SYN/ACK with AE, CWR, ECE = 0,0,1
                  as shown. Then it MUST comply with <xref
                  target="RFC3168"/>.</t>

                  <t>set both its half-connections into No ECN mode and return
                  a SYN/ACK with AE,CWR,ECE = 0,0,0, then continue with ECN
                  disabled. This latter case is unlikely to be desirable, but
                  it is allowed as a possibility, e.g.&nbsp;for minimal TCP
                  implementations.</t>
                </list>When an AccECN-enabled TCP server (B) receives a SYN
              with AE,CWR,ECE = 0,0,0 it MUST set both its half connections
              into the Not ECN feedback mode, return a SYN/ACK with AE,CWR,ECE
              = 0,0,0 as shown and continue with ECN disabled.</t>

              <t>The fourth block displays a combination labelled `Broken'.
              Some older TCP server implementations incorrectly set the
              reserved flags in the SYN/ACK by reflecting those in the SYN.
              Such broken TCP servers (B) cannot support ECN, so as soon as an
              AccECN-capable TCP client (A) receives such a broken SYN/ACK it
              MUST fall back to Not ECN mode for both its half connections and
              continue with ECN disabled.</t>
            </list></t>

          <t>The following additional rules do not fit the structure of the
          table, but they complement it:<list style="hanging">
              <t hangText="Simultaneous Open:">An originating AccECN Host (A),
              having sent a SYN with AE=1, CWR=1 and ECE=1, might receive
              another SYN from host B. Host A MUST then enter the same
              feedback mode as it would have entered had it been a responding
              host and received the same SYN. Then host A MUST send the same
              SYN/ACK as it would have sent had it been a responding host.</t>

              <t hangText="In-window SYN during TIME-WAIT:">Many TCP
              implementations create a new TCP connection if they receive an
              in-window SYN packet during TIME-WAIT state. When a TCP host
              enters TIME-WAIT or CLOSED state, it ought to ignore any
              previous state about the negotiation of AccECN for that
              connection and renegotiate the feedback mode according to <xref
              target="accecn_Tab_Negotiation"/>.</t>
            </list></t>
        </section>

        <section anchor="accecn_sec_forward_compat"
                 title="Forward Compatibility">
          <t>If a TCP server that implements AccECN receives a SYN with the
          three TCP header flags (AE, CWR and ECE) set to any combination
          other than 000, 011 or 111, it MUST negotiate the use of AccECN as
          if they had been set to 111. This ensures that future uses of the
          other combinations on a SYN can rely on consistent behaviour from
          the installed base of AccECN servers.</t>

          <t>For the avoidance of doubt, the behaviour described in the
          present specification applies whether or not the three remaining
          reserved TCP header flags are zero.</t>
        </section>

        <section anchor="accecn_sec_SYN_rexmt"
                 title="Retransmission of the SYN">
          <!--Bob: Out of scope: move to fall-back draft.
        <t> In AccECN mode the originating host (A) MAY set the IP ECN field to
        ECT in the first ACK that finalizes the three way handshake (3WSH). 
        E.g.&nbsp;to test ECN support of the path, setting the SYN/ACK as well as
        the first ACK to ECT allows each end to determine as soon as possible
        whether the path passes ECT or a middlebox bleaches or overwrites the
        IP ECN field.</t>
-->

          <t>If the sender of an AccECN SYN times out before receiving the
          SYN/ACK, the sender SHOULD attempt to negotiate the use of AccECN at
          least one more time by continuing to set all three TCP ECN flags on
          the first retransmitted SYN (using the usual retransmission
          time-outs). If this first retransmission also fails to be
          acknowledged, the sender SHOULD send subsequent retransmissions of
          the SYN with the three TCP-ECN flags cleared (AE=CWR=ECE=0). A
          retransmitted SYN MUST use the same ISN as the original SYN.</t>

          <t>Retrying once before fall-back adds delay in the case where a
          middlebox drops an AccECN (or ECN) SYN deliberately. However,
          current measurements imply that a drop is less likely to be due to
          middlebox interference than other intermittent causes of loss,
          e.g.&nbsp;congestion, wireless interference, etc.</t>

          <t>Implementers MAY use other fall-back strategies if they are found
          to be more effective (e.g.&nbsp;attempting to negotiate AccECN on
          the SYN only once or more than twice (most appropriate during high
          levels of congestion). However, other fall-back strategies will need
          to follow all the rules in <xref
          target="accecn_implications_accecn_mode"/>, which concern behaviour
          when SYNs or SYN/ACKs negotiating different types of feedback have
          been sent within the same connection.</t>

          <t>Further it might make sense to also remove any other new or
          experimental fields or options on the SYN in case a middlebox might
          be blocking them, although the required behaviour will depend on the
          specification of the other option(s) and any attempt to co-ordinate
          fall-back between different modules of the stack.</t>

          <t>Whichever fall-back strategy is used, the TCP initiator SHOULD
          cache failed connection attempts. If it does, it SHOULD NOT give up
          attempting to negotiate AccECN on the SYN of subsequent connection
          attempts until it is clear that the blockage is persistently and
          specifically due to AccECN. The cache needs to be arranged to expire
          so that the initiator will infrequently attempt to check whether the
          problem has been resolved.</t>

          <t>The fall-back procedure if the TCP server receives no ACK to
          acknowledge a SYN/ACK that tried to negotiate AccECN is specified in
          <xref target="accecn_Mbox_Interference"/>.</t>
        </section>

        <section anchor="accecn_implications_accecn_mode"
                 title="Implications of AccECN Mode">
          <t><xref target="accecn_Negotiation_3WHS"/> describes the only ways
          that a host can enter AccECN mode, whether as a client or as a
          server.</t>

          <t>As a Data Sender, a host in AccECN mode has the rights and
          obligations concerning the use of ECN defined below, which build on
          those in <xref target="RFC3168"/> as updated by <xref
          target="RFC8311"/>:<list style="symbols">
              <t>Using ECT:<list style="symbols">
                  <t>It can set an ECT codepoint in the IP header of packets
                  to indicate to the network that the transport is capable and
                  willing to participate in ECN for this packet.</t>

                  <t>It does not have to set ECT on any packet (for instance
                  if it has reason to believe such a packet would be
                  blocked).</t>
                </list></t>

              <t>Switching feedback negotiation (e.g.&nbsp;fall-back):<list
                  style="symbols">
                  <t>It SHOULD NOT set ECT on any packet if it has received at
                  least one valid SYN or Acceptable SYN/ACK with AE=CWR=ECE=0.
                  A "valid SYN" has the same port numbers and the same ISN as
                  the SYN that caused the server to enter AccECN mode.</t>

                  <t>It MUST NOT send an ECN-setup SYN <xref
                  target="RFC3168"/> within the same connection as it has sent
                  a SYN requesting AccECN feedback.</t>

                  <t>It MUST NOT send an ECN-setup SYN/ACK <xref
                  target="RFC3168"/> within the same connection as it has sent
                  a SYN/ACK agreeing to use AccECN feedback.</t>
                </list>The above rules are necessary because, if one peer were
              to negotiate the feedback mode in two different types of
              handshake, it would not be possible for the other peer to know
              for certain which handshake packet(s) the other end had
              eventually received or in which order it received them. So, in
              the absence of these rules, the two peers could end up using
              different feedback modes without knowing it.</t>

              <t>Congestion response:<list style="symbols">
                  <t>It is still obliged to respond appropriately to AccECN
                  feedback that indicates there were ECN marks on packets it
                  had previously sent, as defined in Section 6.1 of <xref
                  target="RFC3168"/> and updated by Sections 2.1 and 4.1 of
                  <xref target="RFC8311"/>.<vspace blankLines="1"/>In general,
                  it is obliged to respond to congestion feedback even when it
                  is solely sending non-ECN-capable packets (for rationale,
                  some examples and some exceptions see <xref
                  target="accecn_sec_ecn-mangling"/>, <xref
                  target="accecn_sec_ACE_init_invalid"/>).</t>

                  <t>The commitment to respond appropriately to incoming
                  indications of congestion remains even if it sends a SYN
                  packet with AE=CWR=ECE=0, in a later transmission within the
                  same TCP connection.</t>

                  <t>Unlike an RFC 3168 data sender, it MUST NOT set CWR to
                  indicate it has received and responded to indications of
                  congestion (for the avoidance of doubt, this does not
                  preclude it from setting the bits of the ACE counter field,
                  which includes an overloaded use of the same bit).</t>
                </list></t>
            </list></t>

          <t>As a Data Receiver:<list style="symbols">
              <t>a host in AccECN mode MUST feed back the information in the
              IP-ECN field of incoming packets using Accurate ECN feedback, as
              specified in <xref target="accecn_feedback"/> below.</t>

              <t>if it receives an ECN-setup SYN or ECN-setup SYN/ACK <xref
              target="RFC3168"/> during the same connection as it receives a
              SYN requesting AccECN feedback or a SYN/ACK agreeing to use
              AccECN feedback, it MUST reset the connection with a RST
              packet.</t>

              <t>If for any reason it is not willing to provide ECN feedback
              on a particular TCP connection, to indicate this unwillingness
              it SHOULD clear the AE, CWR and ECE flags in all SYN and/or
              SYN/ACK packets that it sends.</t>

              <t>it MUST NOT use reception of packets with ECT set in the
              IP-ECN field as an implicit signal that the peer is ECN-capable.
              Reason: ECT at the IP layer does not explicitly confirm the peer
              has the correct ECN feedback logic, as the packets could have
              been mangled at the IP layer.</t>
            </list></t>
        </section>
      </section>

      <section anchor="accecn_feedback" title="AccECN Feedback">
        <t>Each Data Receiver of each half connection maintains four counters,
        r.cep, r.ceb, r.e0b and r.e1b:<list style="symbols">
            <t>The Data Receiver MUST increment the CE packet counter (r.cep),
            for every Acceptable packet that it receives with the CE code
            point in the IP ECN field, including CE marked control packets but
            excluding CE on SYN packets (SYN=1; ACK=0).</t>

            <t>A Data Receiver that supports sending of the AccECN TCP Option
            MUST increment the r.ceb, r.e0b or r.e1b byte counters by the
            number of TCP payload octets in Acceptable packets marked
            respectively with the CE, ECT(0) and ECT(1) codepoint in their
            IP-ECN field, including any payload octets on control packets, but
            not including any payload octets on SYN packets (SYN=1;
            ACK=0).</t>
          </list></t>

        <t>Each Data Sender of each half connection maintains four counters,
        s.cep, s.ceb, s.e0b and s.e1b intended to track the equivalent
        counters at the Data Receiver.</t>

        <t>A Data Receiver feeds back the CE packet counter using the Accurate
        ECN (ACE) field, as explained in <xref target="accecn_ACE"/>. And it
        optionally feeds back all the byte counters using the AccECN TCP
        Option, as specified in <xref target="accecn_option"/>.</t>

        <t>Whenever a host feeds back the value of any counter, it MUST report
        the most recent value, no matter whether it is in a pure ACK, an ACK
        with new payload data or a retransmission. Therefore the feedback
        carried on a retransmitted packet is unlikely to be the same as the
        feedback on the original packet.</t>

        <section anchor="accecn_init_counters"
                 title="Initialization of Feedback Counters">
          <t>When a host first enters AccECN mode, in its role as a Data
          Receiver it initializes its counters to r.cep = 5, r.e0b = r.e1b = 1
          and r.ceb = 0,</t>

          <t>Non-zero initial values are used to support a stateless handshake
          (see <xref target="accecn_Interaction_SYN_Cookies"/>) and to be
          distinct from cases where the fields are incorrectly zeroed
          (e.g.&nbsp;by middleboxes - see <xref
          target="accecn_sec_zero_option"/>).</t>

          <t>When a host enters AccECN mode, in its role as a Data Sender it
          initializes its counters to s.cep = 5, s.e0b = s.e1b = 1 and s.ceb =
          0.</t>
        </section>

        <section anchor="accecn_ACE" title="The ACE Field">
          <t>After AccECN has been negotiated on the SYN and SYN/ACK, both
          hosts overload the three TCP flags (AE, CWR and ECE) in the main TCP
          header as one 3-bit field. Then the field is given a new name, ACE,
          as shown in <xref target="accecn_Fig_ACE_ACK"/>.</t>

          <!-- <?rfc needLines="9" ?> -->

          <figure align="center" anchor="accecn_Fig_ACE_ACK"
                  title="Definition of  the ACE field within bytes 13 and 14 of the TCP Header (when AccECN has been negotiated and SYN=0).">
            <artwork align="center"><![CDATA[  
  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|               |           |           | U | A | P | R | S | F |
| Header Length | Reserved  |    ACE    | R | C | S | S | Y | I |
|               |           |           | G | K | H | T | N | N |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
]]></artwork>
          </figure>

          <t>The original definition of these three flags in the TCP header,
          including the addition of support for the ECN Nonce, is shown for
          comparison in <xref target="accecn_Fig_TCPHdr"/>. This specification
          does not rename these three TCP flags to ACE unconditionally; it
          merely overloads them with another name and definition once an
          AccECN connection has been established.</t>

          <t>With one exception (<xref target="accecn_ACE_3rdACK"/>), a host
          with both of its half-connections in AccECN mode MUST interpret the
          AE, CWR and ECE flags as the 3-bit ACE counter on a segment with the
          SYN flag cleared (SYN=0). On such a packet, a Data Receiver MUST
          encode the three least significant bits of its r.cep counter into
          the ACE field that it feeds back to the Data Sender. A host MUST NOT
          interpret the 3 flags as a 3-bit ACE field on any segment with SYN=1
          (whether ACK is 0 or 1), or if AccECN negotiation is incomplete or
          has not succeeded.</t>

          <t>Both parts of each of these conditions are equally important. For
          instance, even if AccECN negotiation has been successful, the ACE
          field is not defined on any segments with SYN=1 (e.g.&nbsp;a
          retransmission of an unacknowledged SYN/ACK, or when both ends send
          SYN/ACKs after AccECN support has been successfully negotiated
          during a simultaneous open).</t>

          <section anchor="accecn_ACE_3rdACK"
                   title="ACE Field on the ACK of the SYN/ACK">
            <t>A TCP client (A) in AccECN mode MUST feed back which of the 4
            possible values of the IP-ECN field was on the SYN/ACK by writing
            it into the ACE field of a pure ACK with no SACK blocks using the
            binary encoding in <xref target="accecn_Tab_SYN-ACK_fb2"/> (which
            is the same as that used on the SYN/ACK in <xref
            target="accecn_Tab_Negotiation"/>). This shall be called the
            handshake encoding of the ACE field, and it is the only exception
            to the rule that the ACE field carries the 3 least significant
            bits of the r.cep counter on packets with SYN=0.</t>

            <t>Normally, a TCP client acknowledges a SYN/ACK with an ACK that
            satisfies the above conditions anyway (SYN=0, no data, no SACK
            blocks). If an AccECN TCP client intends to acknowledge the
            SYN/ACK with a packet that does not satisfy these conditions
            (e.g.&nbsp;it has data to include on the ACK), it SHOULD first
            send a pure ACK that does satisfy these conditions (see <xref
            target="accecn_Interaction_Other"/>), so that it can feed back
            which of the four values of the IP-ECN field arrived on the
            SYN/ACK. A valid exception to this "SHOULD" would be where the
            implementation will only be used in an environment where mangling
            of the ECN field is unlikely.</t>

            <texttable anchor="accecn_Tab_SYN-ACK_fb2"
                       title="The encoding of the ACE field in the ACK of the SYN-ACK to reflect the SYN-ACK's IP-ECN field">
              <ttcol>IP-ECN codepoint on SYN/ACK</ttcol>

              <ttcol>ACE on pure ACK of SYN/ACK</ttcol>

              <ttcol>r.cep of client in AccECN mode</ttcol>

              <c>Not-ECT</c>

              <c>0b010</c>

              <c>5</c>

              <c>ECT(1)</c>

              <c>0b011</c>

              <c>5</c>

              <c>ECT(0)</c>

              <c>0b100</c>

              <c>5</c>

              <c>CE</c>

              <c>0b110</c>

              <c>6</c>
            </texttable>

            <t>When an AccECN server in SYN-RCVD state receives a pure ACK
            with SYN=0 and no SACK blocks, instead of treating the ACE field
            as a counter, it MUST infer the meaning of each possible value of
            the ACE field from <xref target="accecn_Tab_SYN-ACK_fb"/>, which
            also shows the value that an AccECN server MUST set s.cep to as a
            result.</t>

            <t>Given this encoding of the ACE field on the ACK of a SYN/ACK is
            exceptional, an AccECN server using large receive offload (LRO)
            might prefer to disable LRO until such an ACK has transitioned it
            out of SYN-RCVD state.</t>

            <texttable anchor="accecn_Tab_SYN-ACK_fb"
                       title="Meaning of the ACE field on the ACK of the SYN/ACK">
              <ttcol>ACE on ACK of SYN/ACK</ttcol>

              <ttcol>IP-ECN codepoint on SYN/ACK inferred by server</ttcol>

              <ttcol>s.cep of server in AccECN mode</ttcol>

              <c>0b000</c>

              <c>{Notes 1, 3}</c>

              <c>Disable ECN</c>

              <c>0b001</c>

              <c>{Notes 2, 3}</c>

              <c>5</c>

              <c>0b010</c>

              <c>Not-ECT</c>

              <c>5</c>

              <c>0b011</c>

              <c>ECT(1)</c>

              <c>5</c>

              <c>0b100</c>

              <c>ECT(0)</c>

              <c>5</c>

              <c>0b101</c>

              <c>Currently Unused {Note 2}</c>

              <c>5</c>

              <c>0b110</c>

              <c>CE</c>

              <c>6</c>

              <c>0b111</c>

              <c>Currently Unused {Note 2}</c>

              <c>5</c>
            </texttable>

            <t>{Note 1}: If the server is in AccECN mode, the value of zero
            raises suspicion of zeroing of the ACE field on the path (see
            <xref target="accecn_sec_ACE_init_invalid"/>).</t>

            <t>{Note 2}: If the server is in AccECN mode, these values are
            Currently Unused but the AccECN server's behaviour is still
            defined for forward compatibility. Then the designer of a future
            protocol can know for certain what AccECN servers will do with
            these codepoints.</t>

            <t>{Note 3}: In the case where a server that implements AccECN is
            also using a stateless handshake (termed a SYN cookie) it will not
            remember whether it entered AccECN mode. The values 0b000 or 0b001
            will remind it that it did not enter AccECN mode, because AccECN
            does not use them (see <xref
            target="accecn_Interaction_SYN_Cookies"/> for details). If a
            stateless server that implements AccECN receives either of these
            two values in the ACK, its action is implementation-dependent and
            outside the scope of this spec, It will certainly not take the
            action in the third column because, after it receives either of
            these values, it is not in AccECN mode. I.e., it will not disable
            ECN (at least not just because ACE is 0b000) and it will not set
            s.cep.</t>
          </section>

          <section anchor="accecn_sec_ACE_feedback"
                   title="Encoding and Decoding Feedback in the ACE Field">
            <t>Whenever the Data Receiver sends an ACK with SYN=0 (with or
            without data), unless the handshake encoding in <xref
            target="accecn_ACE_3rdACK"/> applies, the Data Receiver MUST
            encode the least significant 3 bits of its r.cep counter into the
            ACE field (see <xref target="accecn_Algo_ACE_Wrap"/>).</t>

            <t>Whenever the Data Sender receives an ACK with SYN=0 (with or
            without data), it first checks whether it has already been
            superseded by another ACK in which case it ignores the ECN
            feedback. If the ACK has not been superseded, and if the special
            handshake encoding in <xref target="accecn_ACE_3rdACK"/> does not
            apply, the Data Sender decodes the ACE field as follows (see <xref
            target="accecn_Algo_ACE_Wrap"/> for examples).<list
                style="symbols">
                <t>It takes the least significant 3 bits of its local s.cep
                counter and subtracts them from the incoming ACE counter to
                work out the minimum positive increment it could apply to
                s.cep (assuming the ACE field only wrapped at most once).</t>

                <t>It then follows the safety procedures in <xref
                target="accecn_ACE_Safety_S"/> to calculate or estimate how
                many packets the ACK could have acknowledged under the
                prevailing conditions to determine whether the ACE field might
                have wrapped more than once.</t>
              </list></t>

            <t>The encode/decode procedures during the three-way handshake are
            exceptions to the general rules given so far, so they are spelled
            out step by step below for clarity:<list style="symbols">
                <t>If a TCP server in AccECN mode receives a CE mark in the
                IP-ECN field of a SYN (SYN=1, ACK=0), it MUST NOT increment
                r.cep (it remains at its initial value of 5). <vspace
                blankLines="1"/>Reason: It would be redundant for the server
                to include CE-marked SYNs in its r.cep counter, because it
                already reliably delivers feedback of any CE marking using the
                encoding in <xref target="accecn_Tab_Negotiation"/> in the
                SYN/ACK. This also ensures that, when the server starts using
                the ACE field, it has not unnecessarily consumed more than one
                initial value, given they can be used to negotiate variants of
                the AccECN protocol (see <xref
                target="accecn_space_evolution"/>).</t>

                <t>If a TCP client in AccECN mode receives CE feedback in the
                TCP flags of a SYN/ACK, it MUST NOT increment s.cep (it
                remains at its initial value of 5), so that it stays in step
                with r.cep on the server. Nonetheless, the TCP client still
                triggers the congestion control actions necessary to respond
                to the CE feedback.</t>

                <t>If a TCP client in AccECN mode receives a CE mark in the
                IP-ECN field of a SYN/ACK, it MUST increment r.cep, but no
                more than once no matter how many CE-marked SYN/ACKs it
                receives (i.e.&nbsp;incremented from 5 to 6, but no further).
                <vspace blankLines="1"/>Reason: Incrementing r.cep ensures the
                client will eventually deliver any CE marking to the server
                reliably when it starts using the ACE field. Even though the
                client also feeds back any CE marking on the ACK of the
                SYN/ACK using the encoding in <xref
                target="accecn_Tab_SYN-ACK_fb2"/>, this ACK is not delivered
                reliably, so it can be considered as a timely notification
                that is redundant but unreliable. The client does not
                increment r.cep more than once, because the server can only
                increment s.cep once (see next bullet). Also, this limits the
                unnecessarily consumed initial values of the ACE field to
                two.</t>

                <t>If a TCP server in AccECN mode and in SYN-RCVD state
                receives CE feedback in the TCP flags of a pure ACK with no
                SACK blocks, it MUST increment s.cep (from 5 to 6). The TCP
                server then triggers the congestion control actions necessary
                to respond to the CE feedback.<vspace
                blankLines="1"/>Reasoning: The TCP server can only increment
                s.cep once, because the first ACK it receives will cause it to
                transition out of SYN-RCVD state. The server's congestion
                response would be no different even if it could receive
                feedback of more than one CE-marked SYN/ACK.<vspace
                blankLines="1"/>Once the TCP server transitions to ESTABLISHED
                state, it might later receive other pure ACK(s) with the
                handshake encoding in the ACE field. A server MAY implement a
                test for such a case, but it is not required. Therefore, once
                in the ESTABLISHED state, it will be sufficient for the server
                to consider the ACE field to be encoded as the normal ACE
                counter on all packets with SYN=0.<vspace
                blankLines="1"/>Reasoning: Such ACKs will be quite unusual,
                e.g.&nbsp;a SYN/ACK (or ACK of the SYN/ACK) that is delayed
                for longer than the server's retransmission timeout; or packet
                duplication by the network. And the impact of any error in the
                feedback on such ACKs will only be temporary.</t>
              </list></t>
          </section>

          <section anchor="accecn_sec_ecn-mangling"
                   title="Testing for Mangling of the IP/ECN Field">
            <t>The value of the ACE field on the SYN/ACK indicates the value
            of the IP/ECN field when the SYN arrived at the server. The client
            can compare this with how it originally set the IP/ECN field on
            the SYN. If this comparison implies an invalid transition (defined
            below) of the IP/ECN field, for the remainder of the
            half-connection the client is advised to send non-ECN-capable
            packets, but it still ought to respond to any feedback of CE
            markings (explained below). However, the client MUST remain in the
            AccECN feedback mode and it MUST continue to feed back any ECN
            markings on arriving packets (in its role as Data Receiver). <!--There is no need to say the following for forward compatibility:
"If the server deliberately sends false feedback in the ACE field that implies an unsafe transition, it MUST continue the connection 
even if the client does not disable sending ECN-capable packets"--></t>

            <t>The value of the ACE field on the last ACK of the 3WHS
            indicates the value of the IP/ECN field when the SYN/ACK arrived
            at the client. The server can compare this with how it originally
            set the IP/ECN field on the SYN/ACK. If this comparison implies an
            invalid transition of the IP/ECN field, for the remainder of the
            half-connection the server is advised to send non-ECN-capable
            packets, but it still ought to respond to any feedback of CE
            markings (explained below). However, the server MUST remain in the
            AccECN feedback mode and it MUST continue to feed back any ECN
            markings on arriving packets (in its role as Data Receiver).<!--There is no need to say the following for forward compatibility:
"If the client deliberately sends false feedback in the ACE field that implies an unsafe transition, it MUST continue the connection 
even if the server does not disable sending ECN-capable packets"--></t>

            <t>If a Data Sender in AccECN mode starts sending non-ECN-capable
            packets because it has detected mangling, it is still advised to
            respond to CE feedback. Reason: any CE-marking arriving at the
            Data Receiver could be due to something early in the path mangling
            the non-ECN-capable IP/ECN field into an ECN-capable codepoint and
            then, later in the path, a network bottleneck might be applying
            CE-markings to indicate genuine congestion. This argument applies
            whether the handshake packet originally sent by the client or
            server was non-ECN-capable or ECN-capable because, in either case,
            an unsafe transition could imply that future non-ECN-capable
            packets might get mangled.</t>

            <t>Once a Data Sender has entered AccECN mode it is advised to
            check whether it is receiving continuous CE marking. Specifying
            exactly how to do this is beyond the scope of the present
            specification, but the sender might check whether the feedback for
            every packet it sends for the first three or four rounds indicates
            CE-marking. If continuous CE-marking is detected, for the
            remainder of the half-connection, the Data Sender ought to send
            non-ECN-capable packets and it is advised not to respond to any
            feedback of CE markings. The Data Sender might occasionally test
            whether it can resume sending ECN-capable packets. As always, once
            a host has entered AccECN mode, it MUST remain in the same
            feedback mode and it MUST continue to feed back any ECN markings
            on arriving packets.</t>

            <t>The above advice on switching to sending non-ECN-capable
            packets but still responding to CE-markings unless they become
            continuous is not stated normatively (in capitals), because the
            best strategy might depend on experience of the most likely types
            of mangling, which can only be known at the time of
            deployment.</t>

            <t>The ACK of the SYN/ACK is not reliably delivered (nonetheless,
            the count of CE marks is still eventually delivered reliably). If
            this ACK does not arrive, the server is advised to continue to
            send ECN-capable packets without having tested for mangling of the
            IP/ECN field on the SYN/ACK.</t>

            <t>All the fall-back behaviours in this section are necessary in
            case mangling of the IP/ECN field is asymmetric, which is
            currently common over some mobile networks <xref
            target="Mandalari18"/>. Then one end might see no unsafe
            transition and continue sending ECN-capable packets, while the
            other end sees an unsafe transition and stops sending ECN-capable
            packets.</t>

            <t>Invalid transitions of the IP/ECN field are defined in section
            18 of <xref target="RFC3168"/> and repeated here for
            convenience:<list style="symbols">
                <t>the not-ECT codepoint changes;</t>

                <t>either ECT codepoint transitions to not-ECT;</t>

                <t>the CE codepoint changes.</t>
              </list></t>

            <t>RFC 3168 says that a router that changes ECT to not-ECT is
            invalid but safe. However, from a host's viewpoint, this
            transition is unsafe because it could be the result of two
            transitions at different routers on the path: ECT to CE (safe)
            then CE to not-ECT (unsafe). This scenario could well happen where
            an ECN-enabled home router congests its upstream mobile broadband
            bottleneck link, then the ingress to the mobile network clears the
            ECN field <xref target="Mandalari18"/>.</t>
          </section>

          <section anchor="accecn_sec_ACE_init_invalid"
                   title="Testing for Zeroing of the ACE Field">
            <t><xref target="accecn_ACE"/> required the Data Receiver to
            initialize the r.cep counter to a non-zero value. Therefore, in
            either direction the initial value of the ACE counter ought to be
            non-zero.</t>

            <t>If AccECN has been successfully negotiated, the Data Sender
            SHOULD check the value of the ACE counter in the first packet
            (with or without data) that arrives with SYN=0. If the value of
            this ACE field is zero (0b000), for the remainder of the
            half-connection the Data Sender ought to send non-ECN-capable
            packets and it is advised not to respond to any feedback of CE
            markings. Reason: the symptoms imply either potential mangling of
            the ECN fields in both the IP and TCP headers, or a broken remote
            TCP implementation. This advice is not stated normatively (in
            capitals), because the best strategy might depend on experience of
            the most likely types of mangling, which can only be known at the
            time of deployment.<!--There is no need to say the following for forward compatibility:
"If a data receiver negotiates AccECN but then zeros the ACE field in its first segment with SYN=0, 
it MUST continue the connection even if the data sender does not disable sending ECN-capable packets."--></t>

            <t>If reordering occurs, "the first packet ... that arrives" will
            not necessarily be the same as the first packet in sequence order.
            The test has been specified loosely like this to simplify
            implementation, and because it would not have been any more
            precise to have specified the first packet in sequence order,
            which would not necessarily be the first ACE counter that the Data
            Receiver fed back anyway, given it might have been a
            retransmission. Usually, the server checks the ACK of the SYN/ACK
            from the client, while the client checks the first data segment
            from the server.</t>

            <t>The possibility of re-ordering means that there is a small
            chance that the ACE field on the first packet to arrive is
            genuinely zero (without middlebox interference). This would cause
            a host to unnecessarily disable ECN for a half connection.
            Therefore, in environments where there is no evidence of the ACE
            field being zeroed, implementations can skip this test.</t>

            <t>Note that the Data Sender MUST NOT test whether the arriving
            counter in the initial ACE field has been initialized to a
            specific valid value - the above check solely tests whether the
            ACE fields have been incorrectly zeroed. This allows hosts to use
            different initial values as an additional signalling channel in
            future.</t>
          </section>

          <section anchor="accecn_ACE_Safety"
                   title="Safety against Ambiguity of the ACE Field">
            <t>If too many CE-marked segments are acknowledged at once, or if
            a long run of ACKs is lost or thinned out, the 3-bit counter in
            the ACE field might have cycled between two ACKs arriving at the
            Data Sender. The following safety procedures minimize this
            ambiguity.</t>

            <section anchor="accecn_ACE_Safety_R"
                     title="Data Receiver Safety Procedures">
              <t>The following rules define when a Data Receiver in AccECN
              mode emits an ACK:<list style="hanging">
                  <t hangText="Change-Triggered ACKs:">An AccECN Data Receiver
                  SHOULD emit an ACK whenever a data packet marked CE arrives
                  after the previous packet was not CE.<vspace
                  blankLines="1"/>Even though this rule is stated as a
                  "SHOULD", it is important for a transition to trigger an ACK
                  if at all possible, The only valid exception to this rule is
                  given below these bullets.<vspace blankLines="1"/>For the
                  avoidance of doubt, this rule is deliberately worded to
                  apply solely when <spanx style="emph">data</spanx> packets
                  arrive, but the comparison with the previous packet includes
                  any packet, not just data packets.</t>

                  <t hangText="Increment-Triggered ACKs:">An AccECN Data
                  Receiver MUST emit an ACK if 'n' CE marks have arrived since
                  the previous ACK. If there is newly delivered data to
                  acknowledge, 'n' SHOULD be 2. If there is no newly delivered
                  data to acknowledge, 'n' SHOULD be 3 and MUST be no less
                  than 3. In either case, 'n' MUST be no greater than 7.</t>
                </list>The above rules for when to send an ACK are designed to
              be complemented by those in <xref
              target="accecn_option_usage"/>, which concern whether the AccECN
              TCP Option ought to be included on ACKs.</t>

              <t>If the arrivals of a number of data packets are all processed
              as one event, e.g.&nbsp;using large receive offload (LRO) or
              generic receive offload (GRO), both the above rules SHOULD be
              interpreted as requiring multiple ACKs to be emitted
              back-to-back (for each transition and for each repetition by 'n'
              CE marks). If this is problematic for high performance, either
              rule can be interpreted as requiring just a single ACK at the
              end of the whole receive event.</t>

              <t>Even if a number of data packets do not arrive as one event,
              the 'Change-Triggered ACKs' rule could sometimes cause the ACK
              rate to be problematic for high performance (although high
              performance protocols such as DCTCP already successfully use
              change-triggered ACKs). The rationale for change-triggered ACKs
              is so that the Data Sender can rely on them to detect queue
              growth as soon as possible, particularly at the start of a flow.
              The approach can lead to some additional ACKs but it feeds back
              the timing and the order in which ECN marks are received with
              minimal additional complexity. If CE marks are infrequent, as is
              the case for most AQMs at the time of writing, or there are
              multiple marks in a row, the additional load will be low.
              However, marking patterns with numerous non-contiguous CE marks
              could increase the load significantly. One possible compromise
              would be for the receiver to heuristically detect whether the
              sender is in slow-start, then to implement change-triggered ACKs
              while the sender is in slow-start, and offload otherwise.</t>

              <t>With ECN-capable pure ACKs <xref
              target="I-D.ietf-tcpm-generalized-ecn"/>, the
              'Increment-Triggered ACKs' rule could cause ECN-marked pure ACKs
              to trigger further ACKs. Although TCP normally only ACKs newly
              delivered data, in this case the ACKs of ACKs would feed back
              new congestion state. The minimum of 3 for 'n' in this case
              ensures that, even if there is pathological congestion in both
              directions, any resulting ping-pong of ACKs will be rapidly
              damped.</t>

              <t>These ACKs of ACKs could be misidentified as duplicate ACKs
              in certain circumstances described below. Therefore, a host in
              AccECN mode that is sending ECN-capable pure ACKs SHOULD add one
              of the following additional checks when it tests whether an
              incoming pure ACK is a duplicate:<list style="symbols">
                  <t>If SACK has been negotatiated for the connection, but
                  there is no SACK option on the incoming pure ACK, it is not
                  a duplicate;</t>

                  <t>If timestamps are in use, and the incoming pure ACK
                  echoes a timestamp older than the oldest unacknowledged
                  data, it is not a duplicate.</t>
                </list>In the unlikely event that neither SACK nor timestamps
              are in use, or if the implementation has opted not to include
              either of the above two checks, it SHOULD NOT send ECN-capable
              pure ACKs. If it does, it could lead to false detection of
              duplicate ACKs, causing spurious retransmission(s) with a
              resulting unnecessary reduction in congestion window; but only
              in certain circumstances. Specifically, if TCP peer A has been
              sending data, then receiving, then within one round trip it
              starts sending again, and the ECN-capable pure ACKs it sent in
              the previous round encounter heavy enough congestion to trigger
              peer B to invoke the above 'n'-CE-mark rule. Also note that
              falsely considering these ACKs as duplicates would incorrectly
              imply that data left the network.</t>
            </section>

            <section anchor="accecn_ACE_Safety_S"
                     title="Data Sender Safety Procedures">
              <t>If the Data Sender has not received AccECN TCP Options to
              give it more dependable information, and it detects that the ACE
              field could have cycled, it SHOULD deem whether it cycled by
              taking the safest likely case under the prevailing conditions.
              It can detect if the counter could have cycled by using the jump
              in the acknowledgement number since the last ACK to calculate or
              estimate how many segments could have been acknowledged. An
              example algorithm to implement this policy is given in <xref
              target="accecn_Algo_ACE_Wrap"/>. An implementer MAY develop an
              alternative algorithm as long as it satisfies these
              requirements.</t>

              <t>If missing acknowledgement numbers arrive later (reordering)
              and prove that the counter did not cycle, the Data Sender MAY
              attempt to neutralize the effect of any action it took based on
              a conservative assumption that it later found to be
              incorrect.</t>

              <t>The Data Sender can estimate how many packets (of any
              marking) an ACK acknowledges. If the ACE counter on an ACK seems
              to imply that the minimum number of newly CE-marked packets is
              greater that the number of newly acknowledged packets, the Data
              Sender SHOULD believe the ACE counter, unless it can be sure
              that it is counting all control packets correctly.</t>
            </section>
          </section>
        </section>

        <section anchor="accecn_option" title="The AccECN Option">
          <t>The AccECN Option is defined as shown in <xref
          target="accecn_Fig_TCPopt"/>. The initial 'E' of each field name
          stands for 'Echo'.</t>

          <figure align="center" anchor="accecn_Fig_TCPopt"
                  title="The AccECN TCP Option">
            <artwork><![CDATA[ 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Kind = TBD0  |  Length = 11  |          EE0B field           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| EE0B (cont'd) |           ECEB field                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                  EE1B field                   |             Order 0
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Kind = TBD1  |  Length = 11  |          EE1B field           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| EE1B (cont'd) |           ECEB field                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                  EE0B field                   |             Order 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]></artwork>
          </figure>

          <t><xref target="accecn_Fig_TCPopt"/> shows two option field orders;
          order 0 and order 1. They both consists of three 24-bit fields.
          Order 0 provides the 24 least significant bits of the r.e0b, r.ceb
          and r.e1b counters, respectively. Order 1 provides the same fields,
          but in the opposite order. On each packet, the Data Receiver can use
          whichever order is more efficient.</t>

          <t>When a Data Receiver sends an AccECN Option, it MUST set the Kind
          field to TBD0 if using Order 0, or to TBD1 if using Order 1. These
          two new TCP Option Kinds are registered in <xref
          target="accecn_IANA_Considerations"/> and called respectively
          AccECN0 and AccECN1.</t>

          <t>Note that there is no field to feed back Not-ECT bytes.
          Nonetheless an algorithm for the Data Sender to calculate the number
          of payload bytes received as Not-ECT is given in <xref
          target="accecn_Algo_Not-ECT"/>.</t>

          <t>Whenever a Data Receiver sends an AccECN Option, the rules in
          <xref target="accecn_option_usage"/> allow it to omit unchanged
          fields from the tail of the option, to help cope with option space
          limitations, as long as it preserves the order of the remaining
          fields and includes any field that has changed. The length field
          MUST indicate which fields are present as follows:</t>

          <texttable suppress-title="true"
                     title="Fields included in AccECN TCP Options of each length and type">
            <ttcol>Length</ttcol>

            <ttcol>Type 0</ttcol>

            <ttcol>Type 1</ttcol>

            <c>11</c>

            <c>EE0B, ECEB, EE1B</c>

            <c>EE1B, ECEB, EE0B</c>

            <c>8</c>

            <c>EE0B, ECEB</c>

            <c>EE1B, ECEB</c>

            <c>5</c>

            <c>EE0B</c>

            <c>EE1B</c>

            <c>2</c>

            <c>(empty)</c>

            <c>(empty)</c>
          </texttable>

          <t>The empty option of Length=2 is provided to allow for a case
          where an AccECN Option has to be sent (e.g.&nbsp;on the SYN/ACK to
          test the path), but there is very limited space for the option.</t>

          <t>All implementations of a Data Sender that read any AccECN Option
          MUST be able to read in AccECN Options of any of the above lengths.
          For forward compatibility, if the AccECN Option is of any other
          length, implementations MUST use those whole 3-octet fields that fit
          within the length and ignore the remainder of the option, treating
          it as padding.<!--ToDo: I'm sure we can make this more flexible, so we can introduce a 1 byte initial field later.--></t>

          <t>The AccECN Option has to be optional to implement, because both
          sender and receiver have to be able to cope without the option
          anyway - in cases where it does not traverse a network path. It is
          RECOMMENDED to implement both sending and receiving of the AccECN
          Option. Support for the AccECN Option is particularly valuable over
          paths that introduce a high degree of ACK filtering, where the 3-bit
          ACE counter alone might sometimes be insufficient, when it is
          ambiguous whether it has wrapped. If sending of the AccECN Option is
          implemented, the fall-backs described in this document will need to
          be implemented as well (unless solely for a controlled environment
          where path traversal is not considered a problem). Even if a
          developer does not implement logic to understand received AccECN
          Options, it is RECOMMENDED that they implement logic to send AccECN
          Options. Otherwise, those remote peers that implement the receiving
          logic will still be excluded from congestion feedback that is robust
          against the increasingly aggressive ACK filtering in the Internet.
          The logic to send AccECN Options is the simpler to implement of the
          two sides.<!--ToDo: Choose whether sending but not receiving is more important, and iff so swap in the following:
Even if a developer does not implement logic to understand received AccECN Options, 
it is RECOMMENDED that they still implement logic to send AccECN Options, to provide 
richer feedback to those remote peers that do understand it.--></t>

          <t>If a Data Receiver intends to send the AccECN Option at any time
          during the rest of the connection it is RECOMMENDED to also test
          path traversal of the AccECN Option as specified in <xref
          target="accecn_Mbox_Interference"/>.</t>

          <section title="Encoding and Decoding Feedback in the AccECN Option Fields">
            <t>Whenever the Data Receiver includes any of the counter fields
            (ECEB, EE0B, EE1B) in an AccECN Option, it MUST encode the 24
            least significant bits of the current value of the associated
            counter into the field (respectively r.ceb, r.e0b, r.e1b).</t>

            <t>Whenever the Data Sender receives ACK carrying an AccECN
            Option, it first checks whether the ACK has already been
            superseded by another ACK in which case it ignores the ECN
            feedback. If the ACK has not been superseded, the Data Sender
            normally decodes the fields in the AccECN Option as follows. For
            each field, it takes the least significant 24 bits of its
            associated local counter (s.ceb, s.e0b or s.e1b) and subtracts
            them from the counter in the associated field of the incoming
            AccECN Option (respectively ECEB, EE0B, EE1B), to work out the
            minimum positive increment it could apply to s.ceb, s.e0b or s.e1b
            (assuming the field in the option only wrapped at most once).</t>

            <t><xref target="accecn_Algo_Option_Coding"/> gives an example
            algorithm for the Data Receiver to encode its byte counters into
            the AccECN Option, and for the Data Sender to decode the AccECN
            Option fields into its byte counters.</t>

            <t>Note that, as specified in <xref target="accecn_feedback"/>,
            any data on the SYN (SYN=1, ACK=0) is not included in any of the
            byte counters held locally for each ECN marking nor in the AccECN
            Option on the wire.</t>
          </section>

          <section anchor="accecn_Mbox_Interference"
                   title="Path Traversal of the AccECN Option">
            <t/>

            <section anchor="accecn_AccECN_Option_3WHS"
                     title="Testing the AccECN Option during the Handshake">
              <t>The TCP client MUST NOT include the AccECN TCP Option on the
              SYN. If there is somehow an AccECN Option on a SYN, it MUST be
              ignored when forwarded or received. (A fall-back strategy for
              the loss of the SYN, possibly due to middlebox interference, is
              specified in <xref target="accecn_sec_SYN_rexmt"/>.)</t>

              <t>A TCP server that confirms its support for AccECN (in
              response to an AccECN SYN from the client as described in <xref
              target="accecn_Negotiation"/>) SHOULD include an AccECN TCP
              Option on the SYN/ACK.</t>

              <t>A TCP client that has successfully negotiated AccECN SHOULD
              include an AccECN Option in the first ACK at the end of the
              3WHS. However, this first ACK is not delivered reliably, so the
              TCP client SHOULD also include an AccECN Option on the first
              data segment it sends (if it ever sends one).</t>

              <t>A host MAY omit the AccECN Option in any of the above three
              cases due to insufficient option space or if it has cached
              knowledge that the packet would be likely to be blocked on the
              path to the other host if it included an AccECN Option.</t>
            </section>

            <section anchor="accecn_AccECN_Option_Loss"
                     title="Testing for Loss of Packets Carrying the AccECN Option">
              <t><!--Should we make rexmt of SYN/ACK with AccECN flags, but not AccECN Option that default?-->If
              after the normal TCP timeout the TCP server has not received an
              ACK to acknowledge its SYN/ACK, the SYN/ACK might just have been
              lost, e.g.&nbsp;due to congestion, or a middlebox might be
              blocking the AccECN Option. To expedite connection setup, the
              TCP server SHOULD retransmit the SYN/ACK repeating the same AE,
              CWR and ECE TCP flags as on the original SYN/ACK but with no
              AccECN Option. If this retransmission times out, to expedite
              connection setup, the TCP server SHOULD disable AccECN and ECN
              for this connection by retransmitting the SYN/ACK with
              AE=CWR=ECE=0 and no AccECN Option.</t>

              <t>Implementers MAY use other fall-back strategies if they are
              found to be more effective (e.g.&nbsp;retrying the AccECN Option
              for a second time before fall-back - most appropriate during
              high levels of congestion). However, other fall-back strategies
              will need to follow all the rules in <xref
              target="accecn_implications_accecn_mode"/>, which concern
              behaviour when SYNs or SYN/ACKs negotiating different types of
              feedback have been sent within the same connection.</t>

              <t>If the TCP client detects that the first data segment it sent
              with the AccECN Option was lost, it SHOULD fall back to no
              AccECN Option on the retransmission. Again, implementers MAY use
              other fall-back strategies such as attempting to retransmit a
              second segment with the AccECN Option before fall-back, and/or
              caching whether the AccECN Option is blocked for subsequent
              connections. <xref target="RFC9040"/> further discusses caching
              of TCP parameters and status information.</t>

              <t>If a host falls back to not sending the AccECN Option, it
              will continue to process any incoming AccECN Options as
              normal.</t>

              <t>Either host MAY include the AccECN Option in a subsequent
              segment to retest whether the AccECN Option can traverse the
              path.</t>

              <t>If the TCP server receives a second SYN with a request for
              AccECN support, it is advised to resend the SYN/ACK, again
              confirming its support for AccECN, but this time without the
              AccECN Option. This approach rules out any interference by
              middleboxes that might drop packets with unknown options, even
              though it is more likely that the SYN/ACK would have been lost
              due to congestion. The TCP server MAY try to send another packet
              with the AccECN Option at a later point during the connection
              but it ought to monitor if that packet got lost as well, in
              which case it SHOULD disable the sending of the AccECN Option
              for this half-connection.</t>

              <t>Similarly, an AccECN end-point MAY separately memorize which
              data packets carried an AccECN Option and disable the sending of
              AccECN Options if the loss probability of those packets is
              significantly higher than that of all other data packets in the
              same connection.</t>
            </section>

            <section title="Testing for Absence of the AccECN Option">
              <t>If the TCP client has successfully negotiated AccECN but does
              not receive an AccECN Option on the SYN/ACK (e.g.&nbsp;because
              is has been stripped by a middlebox or not sent by the server),
              the client switches into a mode that assumes that the AccECN
              Option is not available for this half connection.</t>

              <t>Similarly, if the TCP server has successfully negotiated
              AccECN but does not receive an AccECN Option on the first
              segment that acknowledges sequence space at least covering the
              ISN, it switches into a mode that assumes that the AccECN Option
              is not available for this half connection.</t>

              <t>While a host is in this mode that assumes incoming AccECN
              Options are not available, it MUST adopt the conservative
              interpretation of the ACE field discussed in <xref
              target="accecn_ACE_Safety"/>. However, it cannot make any
              assumption about support of outgoing AccECN Options on the other
              half connection, so it SHOULD continue to send the AccECN Option
              itself (unless it has established that sending the AccECN Option
              is causing packets to be blocked as in <xref
              target="accecn_AccECN_Option_Loss"/>).</t>

              <t>If a host is in the mode that assumes incoming AccECN Options
              are not available, but it receives an AccECN Option at any later
              point during the connection, this clearly indicates that the
              AccECN Option is not blocked on the respective path, and the
              AccECN endpoint MAY switch out of the mode that assumes the
              AccECN Option is not available for this half connection.</t>
            </section>

            <section anchor="accecn_sec_zero_option"
                     title="Test for Zeroing of the AccECN Option">
              <t>For a related test for invalid initialization of the ACE
              field, see <xref target="accecn_sec_ACE_init_invalid"/></t>

              <t><xref target="accecn_init_counters"/> required the Data
              Receiver to initialize the r.e0b and r.e1b counters to a
              non-zero value. Therefore, in either direction the initial value
              of the EE0B field or EE1B field in the AccECN Option (if one
              exists) ought to be non-zero. If AccECN has been
              negotiated:<list style="symbols">
                  <t>the TCP server MAY check that the initial value of the
                  EE0B field or the EE1B field is non-zero in the first
                  segment that acknowledges sequence space that at least
                  covers the ISN plus 1. If it runs a test and either initial
                  value is zero, the server will switch into a mode that
                  ignores the AccECN Option for this half connection.</t>

                  <t>the TCP client MAY check the initial value of the EE0B
                  field or the EE1B field is non-zero on the SYN/ACK. If it
                  runs a test and either initial value is zero, the client
                  will switch into a mode that ignores the AccECN Option for
                  this half connection.</t>
                </list></t>

              <t>While a host is in the mode that ignores the AccECN Option it
              MUST adopt the conservative interpretation of the ACE field
              discussed in <xref target="accecn_ACE_Safety"/>.</t>

              <t>Note that the Data Sender MUST NOT test whether the arriving
              byte counters in the initial AccECN Option have been initialized
              to specific valid values - the above checks solely test whether
              these fields have been incorrectly zeroed. This allows hosts to
              use different initial values as an additional signalling channel
              in future. Also note that the initial value of either field
              might be greater than its expected initial value, because the
              counters might already have been incremented. Nonetheless, the
              initial values of the counters have been chosen so that they
              cannot wrap to zero on these initial segments.</t>
            </section>

            <section title="Consistency between AccECN Feedback Fields">
              <t>When the AccECN Option is available it ought to provide more
              unambiguous feedback. However, it supplements but does not
              replace the ACE field. An endpoint using AccECN feedback MUST
              always reconcile the information provided in the ACE field with
              that in any AccECN Option, so that the state of the ACE-related
              packet counter can be relied on if future feedback does not
              carry the AccECN Option.</t>

              <t>If the AccECN option is present, the s.cep counter might
              increase more than expected from the increase of the s.ceb
              counter (e.g.&nbsp;due to a CE-marked control packet). The
              sender's response to such a situation is out of scope, and needs
              to be dealt with in a specification that uses ECN-capable
              control packets. Theoretically, this situation could also occur
              if a middlebox mangled the AccECN Option but not the ACE field.
              However, the Data Sender has to assume that the integrity of the
              AccECN Option is sound, based on the above test of the
              well-known initial values and optionally other integrity tests
              (<xref target="accecn_Integrity"/>).</t>

              <t>If either end-point detects that the s.ceb counter has
              increased but the s.cep has not (and by testing ACK coverage it
              is certain how much the ACE field has wrapped), and if there is
              no explanation other than an invalid protocol transition due to
              some form of feedback mangling, the Data Sender MUST disable
              sending ECN-capable packets for the remainder of the
              half-connection by setting the IP/ECN field in all subsequent
              packets to Not-ECT.<!--There is no need to say the following for forward compatibility:
"If a data receiver negotiates AccECN but then deliberately makes the counters inconsistent, 
it MUST continue the connection 
even if the data sender does not disable sending ECN-capable packets."--></t>
            </section>
          </section>

          <section anchor="accecn_option_usage"
                   title="Usage of the AccECN TCP Option">
            <t>If a Data Receiver in AccECN mode intends to use the AccECN TCP
            Option to provide feedback, the rules below determine when it
            includes an AccECN TCP Option, and which fields to include, given
            other options might be competing for limited option space:<list
                style="hanging">
                <t hangText="Importance of Congestion Control:">AccECN is for
                congestion control, which SHOULD generally be considered
                important relative to other TCP options.<vspace
                blankLines="1"/>If SACK has been negotiated, and the smallest
                recommended AccECN Option would leave insufficient space for
                two SACK blocks on a particular ACK, the Data Receiver MUST
                give precedence to the SACK option (total 18 octets), because
                loss feedback is more critical.</t>

                <t hangText="Recommended Simple Scheme:">The Data Receiver
                SHOULD include an AccECN TCP Option on every scheduled ACK if
                any byte counter has incremented since the last ACK. Whenever
                possible, it SHOULD include a field for every byte counter
                that has changed at some time during the connection (see
                examples later). <vspace blankLines="1"/>A scheduled ACK means
                an ACK that the Data Receiver would send by its regular
                delayed ACK rules. Recall that <xref
                target="accecn_Terminology"/> defines an 'ACK' as either with
                data payload or without. But the above rule is worded so that,
                in the common case when most of the data is from a server to a
                client, the server only includes an AccECN TCP Option while it
                is acknowledging data from the client.</t>
              </list>When available TCP option space is limited on particular
            packets, the recommended scheme will need to include compromises.
            To guide the implementer the rules below are ranked in order of
            importance, but the final decision has to be
            implementation-dependent, because tradeoffs will alter as new TCP
            options are defined and new use-cases arise.<list style="hanging">
                <t hangText="Necessary Option Length:">The Data Receiver MUST
                only include an AccECN TCP Option on a packet if it includes
                all the counter(s) that have incremented since the previous
                AccECN Option. It MUST only truncate unchanged fields from the
                right-hand tail of the option to preserve the order of the
                remaining fields (see <xref target="accecn_option"/>);</t>

                <t hangText="Change-Triggered AccECN TCP Options:">If an
                arriving packet increments a different byte counter to that
                incremented by the previous packet, the Data Receiver SHOULD
                feed it back in an AccECN Option on the next scheduled ACK.
                <vspace blankLines="1"/>For the avoidance of doubt, this rule
                does not concern the arrival of control packets with no
                payload, because they cannot alter any byte counters.</t>

                <t hangText="Continual Repetition:">Otherwise, if arriving
                packets continue to increment the same byte counter:<list
                    style="symbols">
                    <t>the Data Receiver SHOULD include a counter that has
                    continued to increment on the next scheduled ACK following
                    a change-triggered AccECN TCP Option;</t>

                    <t>while the same counter continues to increment, it
                    SHOULD include the counter every n ACKs as consistently as
                    possible, where n can be chosen by the implementer;</t>

                    <t>It SHOULD always include an AccECN Option if the r.ceb
                    counter is incrementing and it MAY include an AccECN
                    Option if r.ec0b or r.ec1b is incrementing</t>

                    <t>It SHOULD, include each counter at least once for every
                    2^22 bytes incremented to prevent overflow during
                    continual repetition.</t>
                  </list></t>
              </list></t>

            <t>The above rules complement those in <xref
            target="accecn_ACE_Safety"/>, which determine when to generate an
            ACK irrespective of whether an AccECN TCP Option is to be
            included.</t>

            <t>The recommended scheme is intended as a simple way to ensure
            that all the relevant byte counters will be carried on any ACK
            that reaches the Data Sender, no matter how many pure ACKs are
            filtered or coalesced along the network path, and without
            consuming the space available for payload data with counter
            field(s) that have never changed.</t>

            <t>As an example of the recommended scheme, if ECT(0) is the only
            codepoint that has ever arrived in the IP-ECN field, the Data
            Receiver will feed back an AccECN0 TCP Option with only the EE0B
            field on every packet. However, as soon as even one CE-marked
            packet arrives, on every packet that acknowledges new data it will
            start to include an option with two fields, EE0B and ECEB. As a
            second example, if the first packet to arrive happens to be
            CE-marked, the Data Receiver will have to arbitrarily choose
            whether to precede the ECEB field with an EE0B field or an EE1B
            field. If it chooses, say, EEB0 but it turns out never to receive
            ECT(0), it can start sending EE1B and ECEB instead - it does not
            have to include the EE0B field if the r.e0b counter has never
            changed during the connection.</t>

            <t>With the recommended scheme, if the data sending direction
            switches during a connection, there can be cases where the AccECN
            TCP Option that is meant to feed back the counter values at the
            end of a volley in one direction never reaches the other peer, due
            to packet loss. ACE feedback ought to be sufficient to fill this
            gap, given accurate feedback becomes moot after data transmission
            has paused.</t>

            <t><xref target="accecn_Algo_ACE_Bytes"/> gives an example
            algorithm to estimate the number of marked bytes from the ACE
            field alone, if the AccECN Option is not available.</t>

            <t>If a host has determined that segments with the AccECN Option
            always seem to be discarded somewhere along the path, it is no
            longer obliged to follow any of the rules in this section.</t>
          </section>
        </section>
      </section>

      <!-- <section anchor="accecn_Rcvr_Operation"
               title="Accurate ECN Receiver Operation">
        <t>A TCP receiver MUST only feedback ECN information arriving in a
        segment that it deems is part of the flow, by using regular TCP
        techniques based on sequence numbers.</t>

        <t>{ToDo: It might be useful to describe receiver end of the feedback
        process, including special cases, e.g.&nbsp;pure ACKs, retransmissions,
        window probes, partial ACKs, etc. Does AccECN feed back each ECN
        codepoint when a data packet is duplicated?}</t>
      </section>

      <section anchor="accecn_Sndr_Operation"
               title="Accurate ECN Sender Operation">
        <t>A TCP sender MUST only accept ECN feedback on ACKs that it deems is
        part of the flow, by using regular TCP techniques based on sequence
        numbers.</t>

        <t>{ToDo: It might be useful to describe the sender end of the
        feedback process, including special cases, e.g.&nbsp;pure ACKs,
        retransmissions, window probes, partial ACKs, etc.}</t>
      </section> -->

      <!-- Comment by Mirja: not sure if the following section is needed. Of
       course a proxy should comply to the spec. Just writing this down explicitly
       doesn't help the problem; especially as the problem is old boxes that
       never get updated...! 
       Bob adds: Of course it doesn't stop legacy middleboxes being wrong, 
       but it allows us (or an operator that buys a middlebox) to say a middlebox 
       does not comply with this RFC, which can be important if the contract 
       to maintain the box says it has to comply with updated standards -->

      <section anchor="accecn_Mbox_Operation"
               title="AccECN Compliance Requirements for TCP Proxies, Offload Engines and other Middleboxes">
        <t/>

        <section title="Requirements for TCP Proxies">
          <t>A large class of middleboxes split TCP connections. Such a
          middlebox would be compliant with the AccECN protocol if the TCP
          implementation on each side complied with the present AccECN
          specification and each side negotiated AccECN independently of the
          other side.</t>
        </section>

        <section anchor="accecn_middlebox_transparent_normalizers"
                 title="Requirements for Transparent Middleboxes and TCP Normalizers">
          <t>Another large class of middleboxes intervenes to some degree at
          the transport layer, but attempts to be transparent (invisible) to
          the end-to-end connection. A subset of this class of middleboxes
          attempts to `normalize' the TCP wire protocol by checking that all
          values in header fields comply with a rather narrow interpretation
          of the TCP specifications that is also not always up to date.</t>

          <t>A middlebox that is not normalizing the TCP protocol and does not
          itself act as a back-to-back pair of TCP endpoints (i.e.&nbsp;a
          middlebox that intends to be transparent or invisible at the
          transport layer) ought to forward the AccECN TCP Option unaltered,
          whether or not the length value matches one of those specified in
          <xref target="accecn_option"/>, and whether or not the initial
          values of the byte-counter fields match those in <xref
          target="accecn_init_counters"/>. This is because blocking apparently
          invalid values prevents the standardized set of values being
          extended in future (given outdated normalizers would block updated
          hosts from using the extended AccECN standard).</t>

          <t>A TCP normalizer is likely to block or alter an AccECN TCP Option
          if the length value or the initial values of its byte-counter fields
          do not match one of those specified in <xref
          target="accecn_option"/> or <xref target="accecn_init_counters"/>.
          However, to comply with the present AccECN specification, a
          middlebox MUST NOT change the ACE field; or those fields of the
          AccECN Option that are currently specified in <xref
          target="accecn_option"/>; or any AccECN field covered by integrity
          protection (e.g.&nbsp;<xref target="RFC5925"/>).</t>

          <!-- This includes the explicitly stated requirements to forward
        Reserved (Rsvd) and Currently Unused (CU) values unaltered. 
An 'ideal' TCP normalizer would not have to change to accommodate AccECN, because AccECN does not directly 
contravene any existing TCP specifications, 
even though it uses existing TCP fields in unorthodox ways.
-->
        </section>

        <section title="Requirements for TCP ACK Filtering">
          <t>Section 5.2.1 of BCP&nbsp;69 <xref target="RFC3449"/> gives best
          current practice on filtering (aka. thinning or coalescing) of pure
          TCP ACKs. It advises that filtering ACKs carrying ECN feedback ought
          to preserve the correct operation of ECN feedback. As the present
          specification updates the operation of ECN feedback, this section
          discusses how an ACK filter might preserve correct operation of
          AccECN feedback as well.</t>

          <t>The problem divides into two parts: determining if an ACK is part
          of a connection that is using AccECN and then preserving the correct
          operation of AccECN feedback:<list style="symbols">
              <t>To determine whether a pure TCP ACK is part of an AccECN
              connection without resorting to connection tracking and per-flow
              state, a useful heuristic would be to check for a non-zero ECN
              field at the IP layer (because the ECN++ experiment only allows
              TCP pure ACKs to be ECN-capable if AccECN has been negotiated
              <xref target="I-D.ietf-tcpm-generalized-ecn"/>). This heuristic
              is simple and stateless. However, it might omit some AccECN
              ACKs, because it is only recommended but not obligatory to use
              ECN++ with AccECN - only deployment experience will tell. Also,
              TCP ACKs might be ECN-capable owing to some scheme other than
              AccECN, e.g.&nbsp;<xref target="RFC5690"/> or some future
              standards action. Again, only deployment experience will
              tell.</t>

              <t>The main concern with preserving correct AccECN operation
              involves leaving enough ACKs for the Data Sender to work out
              whether the 3-bit ACE field has wrapped. In the worst case, in
              feedback about a run of received packets that were all
              ECN-marked, the ACE field will wrap every 8 acknowledged
              packets. ACE field wrap might be of less concern if packets also
              carry the AccECN TCP Option. However, note that logic to read an
              AccECN TCP Option is optional to implement (albeit recommended
              &mdash; see <xref target="accecn_option"/>). So one end writing
              an AccECN TCP Option into a packet does not necessarily imply
              that the other end will read it.</t>
            </list></t>

          <t>Note that the present specification of AccECN in TCP does not
          presume to rely on any of the above ACK filtering behaviour in the
          network, because it has to be robust against pre-existing network
          nodes that do not distinguish AccECN ACKs, and robust against ACK
          loss during overload more generally.</t>
        </section>

        <section title="Requirements for TCP Segmentation Offload">
          <t>Hardware to offload certain TCP processing represents another
          large class of middleboxes (even though it is often a function of a
          host's network interface and rarely in its own 'box').</t>

          <t>The ACE field changes with every received CE marking, so today's
          receive offloading could lead to many interrupts in high congestion
          situations. Although that would be useful (because congestion
          information is received sooner), it could also significantly
          increase processor load, particularly in scenarios such as DCTCP or
          L4S where the marking rate is generally higher.</t>

          <t>Current offload hardware ejects a segment from the coalescing
          process whenever the TCP ECN flags change. Thus Classic ECN causes
          offload to be inefficient. In data centres it has been fortunate for
          this offload hardware that DCTCP-style feedback changes less often
          when there are long sequences of CE marks, which is more common with
          a step marking threshold (but less likely the more short flows are
          in the mix). The ACE counter approach has been designed so that
          coalescing can continue over arbitrary patterns of marking and only
          needs to stop when the counter wraps. Nonetheless, until the
          particular offload hardware in use implements this more efficient
          approach, it is likely to be more efficient for AccECN connections
          to implement this counter-style logic using software segmentation
          offload.</t>

          <t>ECN encodes a varying signal in the ACK stream, so it is
          inevitable that offload hardware will ultimately need to handle any
          form of ECN feedback exceptionally. The ACE field has been designed
          as a counter so that it is straightforward for offload hardware to
          pass on the highest counter, and to push a segment from its cache
          before the counter wraps. The purpose of working towards
          standardized TCP ECN feedback is to reduce the risk for hardware
          developers, who would otherwise have to guess which scheme is likely
          to become dominant.</t>

          <t>The above process has been designed to enable a continuing
          incremental deployment path - to more highly dynamic congestion
          control. Once offload hardware supports AccECN, it will be able to
          coalesce efficiently for any sequence of marks, instead of relying
          for efficiency on the long marking sequences from step marking. In
          the next stage, marking can evolve from a step to a ramp function.
          That in turn will allow host congestion control algorithms to
          respond faster to dynamics, while being backwards compatible with
          existing host algorithms.</t>
        </section>
      </section>
    </section>

    <section anchor="accecn_3168_updates" title="Updates to RFC 3168">
      <t>Normative statements in the following sections of RFC3168 are updated
      by the present AccECN specification: <list style="symbols">
          <t>The whole of "6.1.1 TCP Initialization" of <xref
          target="RFC3168"/> is updated by <xref target="accecn_Negotiation"/>
          of the present specification.</t>

          <t>In "6.1.2. The TCP Sender" of <xref target="RFC3168"/>, all
          mentions of a congestion response to an ECN-Echo (ECE) ACK packet
          are updated by <xref target="accecn_feedback"/> of the present
          specification to mean an increment to the sender's count of
          CE-marked packets, s.cep. And the requirements to set the CWR flag
          no longer apply, as specified in <xref
          target="accecn_implications_accecn_mode"/> of the present
          specification. Otherwise, the remaining requirements in "6.1.2. The
          TCP Sender" still stand.<vspace blankLines="1"/>It will be noted
          that RFC 8311 already updates, or potentially updates, a number of
          the requirements in "6.1.2. The TCP Sender". Section 6.1.2 of RFC
          3168 extended standard TCP congestion control <xref
          target="RFC5681"/> to cover ECN marking as well as packet drop.
          Whereas, RFC 8311 enables experimentation with alternative responses
          to ECN marking, if specified for instance by an experimental RFC on
          the IETF document stream. RFC 8311 also strengthened the statement
          that "ECT(0) SHOULD be used" to a "MUST" (see <xref
          target="RFC8311"/> for the details).</t>

          <t>The whole of "6.1.3. The TCP Receiver" of <xref
          target="RFC3168"/> is updated by <xref target="accecn_feedback"/> of
          the present specification, with the exception of the last paragraph
          (about congestion response to drop and ECN in the same round trip),
          which still stands. Incidentally, this last paragraph is in the
          wrong section, because it relates to TCP sender behaviour.</t>

          <t>The following text within "6.1.5. Retransmitted TCP packets":
          <list style="empty">
              <t>"the TCP data receiver SHOULD ignore the ECN field on
              arriving data packets that are outside of the receiver's current
              window."</t>
            </list> is updated by more stringent acceptability tests for any
          packet (not just data packets) in the present specification.
          Specifically, in the normative specification of AccECN (<xref
          target="accecn_Spec"/>) only 'Acceptable' packets contribute to the
          ECN counters at the AccECN receiver and <xref
          target="accecn_Terminology"/> defines an Acceptable packet as one
          that passes the acceptability tests in both <xref target="RFC0793"/>
          and <xref target="RFC5961"/>.</t>

          <t>Sections 5.2, 6.1.1, 6.1.4, 6.1.5 and 6.1.6 of <xref
          target="RFC3168"/> prohibit use of ECN on TCP control packets and
          retransmissions. The present specification does not update that
          aspect of RFC 3168, but it does say what feedback an AccECN Data
          Receiver ought to provide if it receives an ECN-capable control
          packet or retransmission. This ensures AccECN is forward compatible
          with any future scheme that allows ECN on these packets, as provided
          for in section 4.3 of <xref target="RFC8311"/> and as proposed in
          <xref target="I-D.ietf-tcpm-generalized-ecn"/>.</t>
        </list></t>
    </section>

    <section anchor="accecn_Interact_Variants"
             title="Interaction with TCP Variants">
      <t>This section is informative, not normative.</t>

      <section anchor="accecn_Interaction_SYN_Cookies"
               title="Compatibility with SYN Cookies">
        <t>A TCP server can use SYN Cookies (see Appendix A of <xref
        target="RFC4987"/>) to protect itself from SYN flooding attacks. It
        places minimal commonly used connection state in the SYN/ACK, and
        deliberately does not hold any state while waiting for the subsequent
        ACK (e.g.&nbsp;it closes the thread). Therefore it cannot record the
        fact that it entered AccECN mode for both half-connections. Indeed, it
        cannot even remember whether it negotiated the use of classic ECN
        <xref target="RFC3168"/>.</t>

        <t>Nonetheless, such a server can determine that it negotiated AccECN
        as follows. If a TCP server using SYN Cookies supports AccECN and if
        it receives a pure ACK that acknowledges an ISN that is a valid SYN
        cookie, and if the ACK contains an ACE field with the value 0b010 to
        0b111 (decimal 2 to 7), it can assume that:<list style="symbols">
            <t>the TCP client has to have requested AccECN support on the
            SYN</t>

            <t>it (the server) has to have confirmed that it supported
            AccECN</t>
          </list>Therefore the server can switch itself into AccECN mode, and
        continue as if it had never forgotten that it switched itself into
        AccECN mode earlier.</t>

        <t>If the pure ACK that acknowledges a SYN cookie contains an ACE
        field with the value 0b000 or 0b001, these values indicate that the
        client did not request support for AccECN and therefore the server
        does not enter AccECN mode for this connection. Further, 0b001 on the
        ACK implies that the server sent an ECN-capable SYN/ACK, which was
        marked CE in the network, and the non-AccECN client fed this back by
        setting ECE on the ACK of the SYN/ACK.</t>
      </section>

      <section anchor="accecn_Interaction_Other"
               title="Compatibility with TCP Experiments and Common TCP Options">
        <t>AccECN is compatible (at least on paper) with the most commonly
        used TCP options: MSS, time-stamp, window scaling, SACK and TCP-AO. It
        is also compatible with the recent promising experimental TCP options
        TCP Fast Open (TFO <xref target="RFC7413"/>) and Multipath TCP (MPTCP
        <xref target="RFC6824"/>). AccECN is friendly to all these protocols,
        because space for TCP options is particularly scarce on the SYN, where
        AccECN consumes zero additional header space.</t>

        <t>When option space is under pressure from other options, <xref
        target="accecn_option_usage"/> provides guidance on how important it
        is to send an AccECN Option relative to other options, and which
        fields are more important to include.</t>

        <t>Implementers of TFO need to take careful note of the recommendation
        in <xref target="accecn_ACE_3rdACK"/>. That section recommends that,
        if the client has successfully negotiated AccECN, when acknowledging
        the SYN/ACK, even if it has data to send, it sends a pure ACK
        immediately before the data. Then it can reflect the IP-ECN field of
        the SYN/ACK on this pure ACK, which allows the server to detect ECN
        mangling. Note that, as specified in <xref target="accecn_feedback"/>,
        any data on the SYN (SYN=1, ACK=0) is not included in any of the byte
        counters held locally for each ECN marking, nor in the AccECN Option
        on the wire.</t>
      </section>

      <section anchor="accecn_Integrity"
               title="Compatibility with Feedback Integrity Mechanisms">
        <t>Three alternative mechanisms are available to assure the integrity
        of ECN and/or loss signals. AccECN is compatible with any of these
        approaches:<list style="symbols">
            <t>The Data Sender can test the integrity of the receiver's ECN
            (or loss) feedback by occasionally setting the IP-ECN field to a
            value normally only set by the network (and/or deliberately
            leaving a sequence number gap). Then it can test whether the Data
            Receiver's feedback faithfully reports what it expects (similar to
            para 2 of Section 20.2 of <xref target="RFC3168"/>). Unlike the
            ECN Nonce <xref target="RFC3540"/>, this approach does not waste
            the ECT(1) codepoint in the IP header, it does not require
            standardization and it does not rely on misbehaving receivers
            volunteering to reveal feedback information that allows them to be
            detected. However, setting the CE mark by the sender might conceal
            actual congestion feedback from the network and therefore ought to
            only be done sparingly.</t>

            <t>Networks generate congestion signals when they are becoming
            congested, so networks are more likely than Data Senders to be
            concerned about the integrity of the receiver's feedback of these
            signals. A network can enforce a congestion response to its ECN
            markings (or packet losses) using congestion exposure (ConEx)
            audit <xref target="RFC7713"/>. Whether the receiver or a
            downstream network is suppressing congestion feedback or the
            sender is unresponsive to the feedback, or both, ConEx audit can
            neutralize any advantage that any of these three parties would
            otherwise gain. <vspace blankLines="1"/>ConEx is an experimental
            change to the Data Sender that would be most useful when combined
            with AccECN. Without AccECN, the ConEx behaviour of a Data Sender
            would have to be more conservative than would be necessary if it
            had the accurate feedback of AccECN.</t>

            <t>The standards track TCP authentication option (TCP-AO <xref
            target="RFC5925"/>) can be used to detect any tampering with
            AccECN feedback between the Data Receiver and the Data Sender
            (whether malicious or accidental). The AccECN fields are immutable
            end-to-end, so they are amenable to TCP-AO protection, which
            covers TCP options by default. However, TCP-AO is often too
            brittle to use on many end-to-end paths, where middleboxes can
            make verification fail in their attempts to improve performance or
            security, e.g.&nbsp;by resegmentation or shifting the sequence
            space.</t>
          </list>Originally the ECN Nonce <xref target="RFC3540"/> was
        proposed to ensure integrity of congestion feedback. With minor
        changes AccECN could be optimized for the possibility that the ECT(1)
        codepoint might be used as an ECN Nonce. However, given RFC 3540 has
        been reclassified as historic, the AccECN design has been generalized
        so that it ought to be able to support other possible uses of the
        ECT(1) codepoint, such as a lower severity or a more instant
        congestion signal than CE.</t>
      </section>
    </section>

    <!-- ================================================================ -->

    <section anchor="accecn_Properties" title="Protocol Properties">
      <t>This section is informative not normative. It describes how well the
      protocol satisfies the agreed requirements for a more accurate ECN
      feedback protocol <xref target="RFC7560"/>.<list style="hanging">
          <t hangText="Accuracy:">From each ACK, the Data Sender can infer the
          number of new CE marked segments since the previous ACK. This
          provides better accuracy on CE feedback than classic ECN. In
          addition if the AccECN Option is present (not blocked by the network
          path) the number of bytes marked with CE, ECT(1) and ECT(0) are
          provided.</t>

          <!-- <t hangText="Accuracy:">The Data Receiver can feed back to the Data
           Sender a list of the order of the IP-ECN markings covered by each
           delayed ACK.</t> -->

          <t hangText="Overhead:">The AccECN scheme is divided into two parts.
          The essential part reuses the 3 flags already assigned to ECN in the
          IP header. The supplementary part adds an additional TCP option
          consuming up to 11 bytes. However, no TCP option is consumed in the
          SYN.</t>

          <t hangText="Ordering:">The order in which marks arrive at the Data
          Receiver is preserved in AccECN feedback, because the Data Receiver
          is expected to send an ACK immediately whenever a different mark
          arrives.</t>

          <!-- <t hangText="Overhead:">Two alternative locations for the
           supplementary protocol field are proposed:<list style="numbers">
           <t>In the 16-bit Urgent Pointer when URG=0. This specification
           reserves 15 bits of this space, but while the specification is
           only experimental it refrains from using this space in the main
           TCP header. If AccECN progresses to the standards track and uses
           these 15b, it will require zero additional overhead, because it
           will overload fields that already takes up space in every TCP
           header</t>
           
           <t>In a TCP option. This takes up 4B; the fifteen bits have to
           be rounded up to 2B, plus 2B for the TCP option Kind and
           Length.</t>
           </list></t> -->

          <t hangText="Timeliness:">While the same ECN markings are arriving
          continually at the Data Receiver, it can defer ACKs as TCP does
          normally, but it will immediately send an ACK as soon as a different
          ECN marking arrives.</t>

          <t hangText="Timeliness vs Overhead:">Change-Triggered ACKs are
          intended to enable latency-sensitive uses of ECN feedback by
          capturing the timing of transitions but not wasting resources while
          the state of the signalling system is stable. Within the constraints
          of the change-triggered ACK rules, the receiver can control how
          frequently it sends the AccECN TCP Option and therefore to some
          extent it can control the overhead induced by AccECN.</t>

          <!-- <t hangText="Timeliness:">{ToDo: Add improved timeliness if the
           Delayed ACK Control (DAC) feature is included.}</t> -->

          <t hangText="Resilience:">All information is provided based on
          counters. Therefore if ACKs are lost, the counters on the first ACK
          following the losses allows the Data Sender to immediately recover
          the number of the ECN markings that it missed. And if data or ACKs
          are reordered, stale congestion information can be identified and
          ignored.</t>

          <t hangText="Resilience against Bias:">Because feedback is based on
          repetition of counters, random losses do not remove any information,
          they only delay it. Therefore, even though some ACKs are
          change-triggered, random losses will not alter the proportions of
          the different ECN markings in the feedback.</t>

          <t hangText="Resilience vs Overhead:">If space is limited in some
          segments (e.g.&nbsp;because more options are needed on some
          segments, such as the SACK option after loss), the Data Receiver can
          send AccECN Options less frequently or truncate fields that have not
          changed, usually down to as little as 5 bytes. However, it has to
          send a full-sized AccECN Option at least three times per RTT, which
          the Data Sender can rely on as a regular beacon or checkpoint.</t>

          <t hangText="Resilience vs Timeliness and Ordering:">Ordering
          information and the timing of transitions cannot be communicated in
          three cases: i) during ACK loss; ii) if something on the path strips
          the AccECN Option; or iii) if the Data Receiver is unable to support
          Change-Triggered ACKs. Following ACK reordering, the Data Sender can
          reconstruct the order in which feedback was sent, but not until all
          the missing feedback has arrived.</t>

          <!-- reworked end -->

          <!-- <t hangText="Resilience:">Subsequent ACKs will allow it to recover
           the number of other ECN markings that it missed.</t>
          
           <t hangText="Resilience against Bias:">Undetected ACK loss is as
           likely to decrease as increase congestion signals detected by the
           Data Sender.</t>
           
           <t hangText="Resilience against Bias:">However, if the supplementary
           part is unavailable, the required conservative decoding of feedback
           during ACK loss is more likely to increase perceived congestion
           signals, which would otherwise be more likely to be
           under-reported.</t> 
          
           <t hangText="Timeliness vs Overhead:">For efficiency, each delayed
           ACK only includes one of the counters at a time, therefore recovery
           of the count of the other signals might not be immediate if an ACK
           is lost that covers more than one signal. The receiver cannot
           predict which ACKs might get lost, if any. Therefore it repeats the
           count of each signal roughly in proportion to how often each signal
           changes.</t>
           
           <t hangText="Ordering:">The order of arriving ECN codepoints is
           communicated in a 10-bit field in the supplementary part;</t>
           
           <t hangText="Resilience vs. Ordering:">Following an ACK loss, only a
           count of the lost ECN signals is recovered, not their order of
           arrival over the sequence covered by the loss.</t>
           
           <t hangText="Ordering vs. Overhead:">The encoding is tailored for
           sequences of ECN codepoints expected to be typical. It can encode
           sequences of up to 15 segments but, if the pattern of arrivals
           becomes too complex, the protocol forces the Data Receiver to emit
           an ACK. The protocol can always encode any sequence of 3 segments in
           one delayed ACK;</t>
           
           <t hangText="Ordering, Timeliness and Resilience:">If one delayed
           ACK covers changes to more than one congestion counter the
           supplementary sequence information provides more timely congestion
           feedback than waiting for the other congestion counters on future
           ACKs, and it provides resilience against the possibility of those
           future ACKs going missing;</t> -->

          <!-- new -->

          <t hangText="Complexity:">An AccECN implementation solely involves
          simple counter increments, some modulo arithmetic to communicate the
          least significant bits and allow for wrap, and some heuristics for
          safety against fields cycling due to prolonged periods of ACK loss.
          Each host needs to maintain eight additional counters. The hosts
          have to apply some additional tests to detect tampering by
          middleboxes, but in general the protocol is simple to understand,
          simple to implement and requires few cycles per packet to
          execute.</t>

          <t hangText="Integrity:">AccECN is compatible with at least three
          approaches that can assure the integrity of ECN feedback. If the
          AccECN Option is stripped the resolution of the feedback is
          degraded, but the integrity of this degraded feedback can still be
          assured.</t>

          <t hangText="Backward Compatibility:">If only one endpoint supports
          the AccECN scheme, it will fall-back to the most advanced ECN
          feedback scheme supported by the other end.</t>

          <!-- <t hangText="Backward Compatibility:">Each endpoint can detect
           normalization of the Supplementary AccECN field by middleboxes at
           any time during a connection. It could then fall-back to the
           essential part using only the fewer but safer bits in the TCP
           header.</t> -->

          <!-- new -->

          <t hangText="Backward Compatibility:">If the AccECN Option is
          stripped by a middlebox, AccECN still provides basic congestion
          feedback in the ACE field. Further, AccECN can be used to detect
          mangling of the IP ECN field; mangling of the TCP ECN flags;
          blocking of ECT-marked segments; and blocking of segments carrying
          the AccECN Option. It can detect these conditions during TCP's 3WHS
          so that it can fall back to operation without ECN and/or operation
          without the AccECN Option.</t>

          <!-- new end -->

          <t hangText="Forward Compatibility:">The behaviour of endpoints and
          middleboxes is carefully defined for all reserved or currently
          unused codepoints in the scheme. Then, the designers of security
          devices can understand which currently unused values might appear in
          future. So, even if they choose to treat such values as anomalous
          while they are not widely used, any blocking will at least be under
          policy control not hard-coded. Then, if previously unused values
          start to appear on the Internet (or in standards), such policies
          could be quickly reversed.</t>
        </list></t>
    </section>

    <!-- ================================================================ -->

    <section anchor="accecn_IANA_Considerations" title="IANA Considerations">
      <t>This document reassigns bit 7 of the TCP header flags to the AccECN
      protocol. This bit was previously called the Nonce Sum (NS) flag <xref
      target="RFC3540"/>, but RFC 3540 has been reclassified as historic <xref
      target="RFC8311"/>. The flag will now be defined as:</t>

      <texttable suppress-title="true" title="TCP header flag reassignment">
        <ttcol>Bit</ttcol>

        <ttcol>Name</ttcol>

        <ttcol>Reference</ttcol>

        <c>7</c>

        <c>AE (Accurate ECN)</c>

        <c>RFC XXXX</c>
      </texttable>

      <t>[TO BE REMOVED: IANA is requested to update the existing entry in the
      Transmission Control Protocol (TCP) Header Flags registration
      (https://www.iana.org/assignments/tcp-header-flags/tcp-header-flags.xhtml#tcp-header-flags-1)
      for Bit 7 to "AE (Accurate ECN), previously used as NS (Nonce Sum) by
      [RFC3540], which is now Historic [RFC8311]" and change the reference to
      this RFC-to-be instead of RFC8311.]</t>

      <t>This document also defines two new TCP options for AccECN, assigned
      values of TBD0 and TBD1 (decimal) from the TCP option space. These
      values are defined as:</t>

      <texttable suppress-title="true" title="New TCP Option assignments">
        <ttcol>Kind</ttcol>

        <ttcol>Length</ttcol>

        <ttcol>Meaning</ttcol>

        <ttcol>Reference</ttcol>

        <c>TBD0</c>

        <c>N</c>

        <c>Accurate ECN Order 0 (AccECN0)</c>

        <c>RFC XXXX</c>

        <c>TBD1</c>

        <c>N</c>

        <c>Accurate ECN Order 1 (AccECN1)</c>

        <c>RFC XXXX</c>
      </texttable>

      <t>[TO BE REMOVED: This registration should take place at the following
      location:
      http://www.iana.org/assignments/tcp-parameters/tcp-parameters.xhtml#tcp-parameters-1
      ]</t>

      <t>Early implementations used experimental option 254 per <xref
      target="RFC6994"/> with the single magic number 0xACCE (16 bits), as
      allocated in the IANA "TCP Experimental Option Experiment Identifiers
      (TCP ExIDs)" registry. Later implementations of the two AccECN Options
      used 16-bit magic numbers 0xACC0 and 0xACC1 respectively for Order 0 and
      1. Uses of these experimental options SHOULD migrate to use the new
      option kinds (TBD0 &amp; TBD1).</t>

      <t>[TO BE REMOVED: The description of the three values in the TCP
      ExIDs registry should be changed to "AccECN (current and new
      implementations SHOULD use option kinds TBD0 and TBD1)" at the following
      location:
      https://www.iana.org/assignments/tcp-parameters/tcp-parameters.xhtml#tcp-exids
      ]</t>
    </section>

    <!-- ================================================================ -->

    <section anchor="accecn_Security_Considerations"
             title="Security Considerations">
      <t>If ever the supplementary part of AccECN based on the new AccECN TCP
      Option is unusable (due for example to middlebox interference) the
      essential part of AccECN's congestion feedback offers only limited
      resilience to long runs of ACK loss (see <xref
      target="accecn_ACE_Safety"/>). These problems are unlikely to be due to
      malicious intervention (because if an attacker could strip a TCP option
      or discard a long run of ACKs it could wreak other arbitrary havoc).
      However, it would be of concern if AccECN's resilience could be
      indirectly compromised during a flooding attack. AccECN is still
      considered safe though, because if the option is not present, the AccECN
      Data Sender is then required to switch to more conservative assumptions
      about wrap of congestion indication counters (see <xref
      target="accecn_ACE_Safety"/> and <xref
      target="accecn_Algo_ACE_Wrap"/>).</t>

      <t><xref target="accecn_Interaction_SYN_Cookies"/> describes how a TCP
      server can negotiate AccECN and use the SYN cookie method for mitigating
      SYN flooding attacks.</t>

      <t>There is concern that ECN feedback could be altered or suppressed,
      particularly because a misbehaving Data Receiver could increase its own
      throughput at the expense of others. AccECN is compatible with the three
      schemes known to assure the integrity of ECN feedback (see <xref
      target="accecn_Integrity"/> for details). If the AccECN Option is
      stripped by an incorrectly implemented middlebox, the resolution of the
      feedback will be degraded, but the integrity of this degraded
      information can still be assured. Assuring that Data Senders respond
      appropriately to ECN feedback is possible, but the scope of the present
      document is confined to the feedback protocol, and excludes the response
      to this feedback.</t>

      <t>In <xref target="accecn_option"/> a Data Sender is allowed to ignore
      an unrecognized TCP AccECN Option length and read as many whole 3-octet
      fields from it as possible up to a maximum of 3, treating the remainder
      as padding. This opens up a potential covert channel of up to 29B (40 -
      (2+3*3))B. However, it is really an overt channel (not hidden) and it is
      no different to the use of unknown TCP options with unknown option
      lengths in general. Therefore, where this is of concern, it can already
      be adequately mitigated by regular TCP normalizer technology (see <xref
      target="accecn_middlebox_transparent_normalizers"/>).</t>

      <!--Bob adds: I removed the following 3 sentences, which I felt were weak. I think it is better to admit there is a 
security concern, than try to claim it is not a problem (when it is). 
If a receiver has driven a network from marking into loss, it has already probably harmed other flows and gained a 
large share of resources for itself. 
Anyway, a receiver can regulate concealment of ECN marks to give itself more resources without driving a link into loss.-->

      <!--The motivation for concealing ECN marks is generally considered to be self-interest. Causing congestion collapse would 
not be in the interest of a receiver, 
and it has not been identified as a realistic motivation for attacks that conceal ECN marks.-->

      <!--
-->

      <!--"However, if congestion is persistent but no congestion notification is provided to the Data Sender, the congestion 
will lead to packet loss which cannot easily be concealed by a reliable TCP connection. 
Therefore the absence of ECN-based packet feedback will not lead to  congestion collapse. Further note that classic 
ECN also do not have an integrity check. 
ECN Nonce was specified separately therefore a end point that wants to conceal ECN feedback can simply present to not 
support ECN Nonce."-->

      <t>The AccECN protocol is not believed to introduce any new privacy
      concerns, because it merely counts and feeds back signals at the
      transport layer that had already been visible at the IP layer. A covert
      channel can be used to compromise privacy. However, as explained above,
      undefined TCP options in general open up such channels and common
      techniques are available to close them off.</t>

      <t>There is a potential concern that a Data Receiver could deliberately
      omit the AccECN Option pretending that it had been stripped by a
      middlebox. No known way can yet be contrived for a receiver to take
      advantage of this behaviour, which seems to always degrade its own
      performance. However, the concern is mentioned here for
      completeness.</t>
    </section>

    <!-- ================================================================ -->

    <section anchor="accecn_Acknowledgements" title="Acknowledgements">
      <t>We want to thank Koen De Schepper, Praveen Balasubramanian, Michael
      Welzl, Gorry Fairhurst, David Black, Spencer Dawkins, Michael Scharf,
      Michael Tuexen, Yuchung Cheng, Kenjiro Cho, Olivier Tilmans, Ilpo
      J&auml;rvinen, Neal Cardwell, Yoshifumi Nishida, Martin Duke, Jonathan
      Morton and Vidhi Goel for their input and discussion. The idea of using
      the three ECN-related TCP flags as one field for more accurate TCP-ECN
      feedback was first introduced in the re-ECN protocol that was the
      ancestor of ConEx.</t>

      <t>Bob Briscoe was part-funded by the Comcast Innovation Fund, the
      European Community under its Seventh Framework Programme through the
      Reducing Internet Transport Latency (RITE) project (ICT-317700) and
      through the Trilogy 2 project (ICT-317756), and the Research Council of
      Norway through the TimeIn project. The views expressed here are solely
      those of the authors.</t>

      <t>Mirja Kuehlewind was partly supported by the European Commission
      under Horizon 2020 grant agreement no. 688421 Measurement and
      Architecture for a Middleboxed Internet (MAMI), and by the Swiss State
      Secretariat for Education, Research, and Innovation under contract no.
      15.0268. This support does not imply endorsement.</t>
    </section>

    <!-- ================================================================ -->

    <section anchor="accecn_Comments_Solicited" title="Comments Solicited">
      <t>Comments and questions are encouraged and very welcome. They can be
      addressed to the IETF TCP maintenance and minor modifications working
      group mailing list &lt;tcpm@ietf.org&gt;, and/or to the authors.</t>
    </section>
  </middle>

  <back>
    <!-- ================================================================ -->

    <references title="Normative References">
      <?rfc include="reference.RFC.0793" ?>

      <?rfc include="reference.RFC.2119" ?>

      <?rfc include="reference.RFC.3168" ?>

      <?rfc include="reference.RFC.5681" ?>

      <?rfc include="reference.RFC.8174" ?>
    </references>

    <references title="Informative References">
      <?rfc include="reference.RFC.2018" ?>

      <?rfc include="reference.RFC.3449" ?>

      <?rfc include="reference.RFC.3540" ?>

      <?rfc include="reference.RFC.4987" ?>

      <?rfc include="reference.RFC.5562" ?>

      <?rfc include="reference.RFC.5690" ?>

      <?rfc include="reference.RFC.5925" ?>

      <?rfc include="reference.RFC.5961" ?>

      <?rfc include="reference.RFC.6824" ?>

      <?rfc include="reference.RFC.6994" ?>

      <?rfc include="reference.RFC.7560" ?>

      <?rfc include="reference.RFC.7413" ?>

      <?rfc include="reference.RFC.7713" ?>

      <?rfc include="reference.RFC.8257" ?>

      <?rfc include="reference.RFC.8311" ?>

      <?rfc include="reference.I-D.ietf-tcpm-generalized-ecn" ?>

      <?rfc include="reference.RFC.9040" ?>

      <?rfc include="reference.RFC.8511" ?>

      <?rfc include="reference.I-D.ietf-tsvwg-l4s-arch" ?>

      <reference anchor="Mandalari18">
        <front>
          <title>Measuring ECN++: Good News for ++, Bad News for ECN over
          Mobile</title>

          <author fullname="Anna Mandalari" initials="A." surname="Mandalari">
            <organization>UC3M</organization>
          </author>

          <author fullname="Andra Lutu" initials="A." surname="Lutu">
            <organization>Simula</organization>

            <address>
              <postal>
                <street/>

                <city/>

                <region/>

                <code/>

                <country/>
              </postal>

              <phone/>

              <facsimile/>

              <email/>

              <uri/>
            </address>
          </author>

          <author fullname="Bob Briscoe" initials="B." surname="Briscoe">
            <organization>Simula</organization>

            <address>
              <postal>
                <street/>

                <city/>

                <region/>

                <code/>

                <country/>
              </postal>

              <phone/>

              <facsimile/>

              <email/>

              <uri/>
            </address>
          </author>

          <author fullname="Marcelo Bagnulo" initials="M." surname="Bagnulo">
            <organization>UC3M</organization>

            <address>
              <postal>
                <street/>

                <city/>

                <region/>

                <code/>

                <country/>
              </postal>

              <phone/>

              <facsimile/>

              <email/>

              <uri/>
            </address>
          </author>

          <author fullname="&Ouml;zg&uuml; Alay" initials="&Ouml;."
                  surname="Alay">
            <organization>Simula</organization>

            <address>
              <postal>
                <street/>

                <city/>

                <region/>

                <code/>

                <country/>
              </postal>

              <phone/>

              <facsimile/>

              <email/>

              <uri/>
            </address>
          </author>

          <date month="March" year="2018"/>
        </front>

        <seriesInfo name="IEEE Communications Magazine" value=""/>

        <format target="http://www.it.uc3m.es/amandala/ecn++/ecn_commag_2018.html"
                type="PDF"/>
      </reference>
    </references>

    <!-- <section anchor="accecn_Algo_Examples" title="Example Algorithms">
      <t>This appendix is informative, not normative. It gives examples in
      pseudocode for the various algorithms used by AccECN.</t> -->

    <section anchor="accecn_Algo_Examples" title="Example Algorithms">
      <t>This appendix is informative, not normative. It gives example
      algorithms that would satisfy the normative requirements of the AccECN
      protocol. However, implementers are free to choose other ways to
      implement the requirements.</t>

      <section anchor="accecn_Algo_Option_Coding"
               title="Example Algorithm to Encode/Decode the AccECN Option">
        <t><!--ToDo: Example code to check the AccECN Option fields are consistent with the ACE field.-->The
        example algorithms below show how a Data Receiver in AccECN mode could
        encode its CE byte counter r.ceb into the ECEB field within the AccECN
        TCP Option, and how a Data Sender in AccECN mode could decode the ECEB
        field into its byte counter s.ceb. The other counters for bytes marked
        ECT(0) and ECT(1) in the AccECN Option would be similarly encoded and
        decoded.</t>

        <t>It is assumed that each local byte counter is an unsigned integer
        greater than 24b (probably 32b), and that the following constant has
        been assigned:<list style="empty">
            <t>DIVOPT = 2^24</t>
          </list></t>

        <t>Every time a CE marked data segment arrives, the Data Receiver
        increments its local value of r.ceb by the size of the TCP Data.
        Whenever it sends an ACK with the AccECN Option, the value it writes
        into the ECEB field is <list style="empty">
            <t>ECEB = r.ceb % DIVOPT</t>
          </list></t>

        <t>where '%' is the remainder operator.</t>

        <t>On the arrival of an AccECN Option, the Data Sender first makes
        sure the ACK has not been superseded in order to avoid winding the
        s.ceb counter backwards. It uses the TCP acknowledgement number and
        any SACK options to calculate newlyAckedB, the amount of new data that
        the ACK acknowledges in bytes (newlyAckedB can be zero but not
        negative). If newlyAckedB is zero, either the ACK has been superseded
        or CE-marked packet(s) without data could have arrived. To break the
        tie for the latter case, the Data Sender could use timestamps (if
        present) to work out newlyAckedT, the amount of new time that the ACK
        acknowledges. If the Data Sender determines that the ACK has been
        superseded it ignores the AccECN Option. Otherwise, the Data Sender
        calculates the minimum non-negative difference d.ceb between the ECEB
        field and its local s.ceb counter, using modulo arithmetic as
        follows:</t>

        <figure>
          <artwork><![CDATA[   if ((newlyAckedB > 0) || (newlyAckedT > 0)) {
       d.ceb = (ECEB + DIVOPT - (s.ceb % DIVOPT)) % DIVOPT
       s.ceb += d.ceb
   }
]]></artwork>
        </figure>

        <t>For example, if s.ceb is 33,554,433 and ECEB is 1461 (both
        decimal), then</t>

        <figure>
          <artwork><![CDATA[   s.ceb % DIVOPT = 1
         d.ceb = (1461 + 2^24 - 1) % 2^24
               = 1460
         s.ceb = 33,554,433 + 1460
               = 33,555,893
]]></artwork>
        </figure>

        <t>In practice an implementation might use heuristics to guess the
        feedback in missing ACKs, then when it subsequently receives feedback
        it might find that it needs to correct its earlier heuristics as part
        of the decoding process. The above decoding process does not include
        any such heuristics.</t>
      </section>

      <section anchor="accecn_Algo_ACE_Wrap"
               title="Example Algorithm for Safety Against Long Sequences of ACK Loss">
        <t>The example algorithms below show how a Data Receiver in AccECN
        mode could encode its CE packet counter r.cep into the ACE field, and
        how the Data Sender in AccECN mode could decode the ACE field into its
        s.cep counter. The Data Sender's algorithm includes code to
        heuristically detect a long enough unbroken string of ACK losses that
        could have concealed a cycle of the congestion counter in the ACE
        field of the next ACK to arrive.</t>

        <t>Two variants of the algorithm are given: i) a more conservative
        variant for a Data Sender to use if it detects that the AccECN Option
        is not available (see <xref target="accecn_ACE_Safety"/> and <xref
        target="accecn_Mbox_Interference"/>); and ii) a less conservative
        variant that is feasible when complementary information is available
        from the AccECN Option.</t>

        <section title="Safety Algorithm without the AccECN Option">
          <t>It is assumed that each local packet counter is a sufficiently
          sized unsigned integer (probably 32b) and that the following
          constant has been assigned:<list style="empty">
              <t>DIVACE = 2^3</t>
            </list></t>

          <t>Every time an Acceptable CE marked packet arrives (<xref
          target="accecn_sec_ACE_feedback"/>), the Data Receiver increments
          its local value of r.cep by 1. It repeats the same value of ACE in
          every subsequent ACK until the next CE marking arrives, where<list
              style="empty">
              <t>ACE = r.cep % DIVACE.</t>
            </list></t>

          <t>If the Data Sender received an earlier value of the counter that
          had been delayed due to ACK reordering, it might incorrectly
          calculate that the ACE field had wrapped. Therefore, on the arrival
          of every ACK, the Data Sender ensures the ACK has not been
          superseded using the TCP acknowledgement number, any SACK options
          and timestamps (if available) to calculate newlyAckedB, as in <xref
          target="accecn_Algo_Option_Coding"/>. If the ACK has not been
          superseded, the Data Sender calculates the minimum difference d.cep
          between the ACE field and its local s.cep counter, using modulo
          arithmetic as follows:</t>

          <figure>
            <artwork><![CDATA[   if ((newlyAckedB > 0) || (newlyAckedT > 0))
       d.cep = (ACE + DIVACE - (s.cep % DIVACE)) % DIVACE
]]></artwork>
          </figure>

          <t><xref target="accecn_ACE_Safety"/> expects the Data Sender to
          assume that the ACE field cycled if it is the safest likely case
          under prevailing conditions. The 3-bit ACE field in an arriving ACK
          could have cycled and become ambiguous to the Data Sender if a
          sequence of ACKs goes missing that covers a stream of data long
          enough to contain 8 or more CE marks. We use the word `missing'
          rather than `lost', because some or all the missing ACKs might
          arrive eventually, but out of order. Even if some of the missing
          ACKs were piggy-backed on data (i.e.&nbsp;not pure ACKs)
          retransmissions will not repair the lost AccECN information, because
          AccECN requires retransmissions to carry the latest AccECN counters,
          not the original ones.</t>

          <t>The phrase `under prevailing conditions' allows for
          implementation-dependent interpretation. A Data Sender might take
          account of the prevailing size of data segments and the prevailing
          CE marking rate just before the sequence of missing ACKs. However,
          we shall start with the simplest algorithm, which assumes segments
          are all full-sized and ultra-conservatively it assumes that ECN
          marking was 100% on the forward path when ACKs on the reverse path
          started to all be dropped. Specifically, if newlyAckedB is the
          amount of data that an ACK acknowledges since the previous ACK, then
          the Data Sender could assume that this acknowledges newlyAckedPkt
          full-sized segments, where newlyAckedPkt = newlyAckedB/MSS. Then it
          could assume that the ACE field incremented by</t>

          <figure>
            <artwork><![CDATA[    dSafer.cep = newlyAckedPkt - ((newlyAckedPkt - d.cep) % DIVACE),]]></artwork>
          </figure>

          <t>For example, imagine an ACK acknowledges newlyAckedPkt=9 more
          full-size segments than any previous ACK, and that ACE increments by
          a minimum of 2 CE marks (d.cep=2). The above formula works out that
          it would still be safe to assume 2 CE marks (because 9 - ((9-2) % 8)
          = 2). However, if ACE increases by a minimum of 2 but acknowledges
          10 full-sized segments, then it would be necessary to assume that
          there could have been 10 CE marks (because 10 - ((10-2) % 8) =
          10).</t>

          <t>Note that checks would need to be added to the above pseudocode
          for (d.cep &gt; newlyAckedPkt), which could occur if newlyAckedPkt
          had been wrongly estimated using an inappropriate packet size.</t>

          <t>ACKs that acknowledge a large stretch of packets might be common
          in data centres to achieve a high packet rate or might be due to ACK
          thinning by a middlebox. In these cases, cycling of the ACE field
          would often appear to have been possible, so the above algorithm
          would be over-conservative, leading to a false high marking rate and
          poor performance. Therefore it would be reasonable to only use
          dSafer.cep rather than d.cep if the moving average of newlyAckedPkt
          was well below 8.</t>

          <t>Implementers could build in more heuristics to estimate
          prevailing average segment size and prevailing ECN marking. For
          instance, newlyAckedPkt in the above formula could be replaced with
          newlyAckedPktHeur = newlyAckedPkt*p*MSS/s, where s is the prevailing
          segment size and p is the prevailing ECN marking probability.
          However, ultimately, if TCP's ECN feedback becomes inaccurate it
          still has loss detection to fall back on. Therefore, it would seem
          safe to implement a simple algorithm, rather than a perfect one.</t>

          <t>The simple algorithm for dSafer.cep above requires no monitoring
          of prevailing conditions and it would still be safe if, for example,
          segments were on average at least 5% of full-sized as long as ECN
          marking was 5% or less. Assuming it was used, the Data Sender would
          increment its packet counter as follows:<list style="empty">
              <t>s.cep += dSafer.cep</t>
            </list></t>

          <t>If missing acknowledgement numbers arrive later (due to
          reordering), <xref target="accecn_ACE_Safety"/> says "the Data
          Sender MAY attempt to neutralize the effect of any action it took
          based on a conservative assumption that it later found to be
          incorrect". To do this, the Data Sender would have to store the
          values of all the relevant variables whenever it made assumptions,
          so that it could re-evaluate them later. Given this could become
          complex and it is not required, we do not attempt to provide an
          example of how to do this.</t>
        </section>

        <section title="Safety Algorithm with the AccECN Option">
          <!--ToDo: Ilpo says this algo is useless, 'cos (I think) you don't have the state of d.ceb and d.cep at the same time.
See emails 3/1/20.-->

          <t>When the AccECN Option is available on the ACKs before and after
          the possible sequence of ACK losses, if the Data Sender only needs
          CE-marked bytes, it will have sufficient information in the AccECN
          Option without needing to process the ACE field. If for some reason
          it needs CE-marked packets, if dSafer.cep is different from d.cep,
          it can determine whether d.cep is likely to be a safe enough
          estimate by checking whether the average marked segment size (s =
          d.ceb/d.cep) is less than the MSS (where d.ceb is the amount of
          newly CE-marked bytes - see <xref
          target="accecn_Algo_Option_Coding"/>). Specifically, it could use
          the following algorithm:</t>

          <figure>
            <artwork><![CDATA[   SAFETY_FACTOR = 2
   if (dSafer.cep > d.cep) {
       if (d.ceb <= MSS * d.cep) {  % Same as (s <= MSS), but no DBZ
          sSafer = d.ceb/dSafer.cep
          if (sSafer < MSS/SAFETY_FACTOR)
              dSafer.cep = d.cep    % d.cep is a safe enough estimate
       } % else
           % No need for else; dSafer.cep is already correct, 
           % because d.cep must have been too small
   }
]]></artwork>
          </figure>

          <t>The chart below shows when the above algorithm will consider
          d.cep can replace dSafer.cep as a safe enough estimate of the number
          of CE-marked packets:</t>

          <figure>
            <artwork><![CDATA[                 ^
           sSafer|
                 |
              MSS+
                 |
                 |         dSafer.cep
                 |                  is
MSS/SAFETY_FACTOR+--------------+    safest
                 |              |
                 | d.cep is safe|
                 |    enough    |
                 +-------------------->
                               MSS     s

]]></artwork>
          </figure>

          <t>The following examples give the reasoning behind the algorithm,
          assuming MSS=1460 [B]:<list style="symbols">
              <t>if d.cep=0, dSafer.cep=8 and d.ceb=1460, then s=infinity and
              sSafer=182.5.<vspace blankLines="0"/>Therefore even though the
              average size of 8 data segments is unlikely to have been as
              small as MSS/8, d.cep cannot have been correct, because it would
              imply an average segment size greater than the MSS.</t>

              <t>if d.cep=2, dSafer.cep=10 and d.ceb=1460, then s=730 and
              sSafer=146.<vspace blankLines="0"/>Therefore d.cep is safe
              enough, because the average size of 10 data segments is unlikely
              to have been as small as MSS/10.</t>

              <t>if d.cep=7, dSafer.cep=15 and d.ceb=10200, then s=1457 and
              sSafer=680.<vspace blankLines="0"/>Therefore d.cep is safe
              enough, because the average data segment size is more likely to
              have been just less than one MSS, rather than below MSS/2.</t>
            </list></t>

          <t>If pure ACKs were allowed to be ECN-capable, missing ACKs would
          be far less likely. However, because <xref target="RFC3168"/>
          currently precludes this, the above algorithm assumes that pure ACKs
          are not ECN-capable.</t>
        </section>
      </section>

      <section anchor="accecn_Algo_ACE_Bytes"
               title="Example Algorithm to Estimate Marked Bytes from Marked Packets">
        <t>If the AccECN Option is not available, the Data Sender can only
        decode CE-marking from the ACE field in packets. Every time an ACK
        arrives, to convert this into an estimate of CE-marked bytes, it needs
        an average of the segment size, s_ave. Then it can add or subtract
        s_ave from the value of d.ceb as the value of d.cep increments or
        decrements. Some possible ways to calculate s_ave are outlined below.
        The precise details will depend on why an estimate of marked bytes is
        needed.</t>

        <t>The implementation could keep a record of the byte numbers of all
        the boundaries between packets in flight (including control packets),
        and recalculate s_ave on every ACK. However it would be simpler to
        merely maintain a counter packets_in_flight for the number of packets
        in flight (including control packets), which is reset once per RTT.
        Either way, it would estimate s_ave as:<list style="empty">
            <t>s_ave ~= flightsize / packets_in_flight,</t>
          </list>where flightsize is the variable that TCP already maintains
        for the number of bytes in flight. To avoid floating point arithmetic,
        it could right-bit-shift by lg(packets_in_flight), where lg() means
        log base 2.</t>

        <t>An alternative would be to maintain an exponentially weighted
        moving average (EWMA) of the segment size:<list style="empty">
            <t>s_ave = a * s + (1-a) * s_ave,</t>
          </list>where a is the decay constant for the EWMA. However, then it
        is necessary to choose a good value for this constant, which ought to
        depend on the number of packets in flight. Also the decay constant
        needs to be power of two to avoid floating point arithmetic.</t>
      </section>

      <section anchor="accecn_Algo_Not-ECT"
               title="Example Algorithm to Count Not-ECT Bytes">
        <t>A Data Sender in AccECN mode can infer the amount of TCP payload
        data arriving at the receiver marked Not-ECT from the difference
        between the amount of newly ACKed data and the sum of the bytes with
        the other three markings, d.ceb, d.e0b and d.e1b.</t>

        <!--ToDo: write-up pseudocode, rather than just describe it.-->

        <t>For this approach to be precise, it has to be assumed that spurious
        (unnecessary) retransmissions do not lead to double counting. This
        assumption is currently correct, given that RFC 3168 requires that the
        Data Sender marks retransmitted segments as Not-ECT. However, the
        converse is not true; necessary retransmissions will result in
        under-counting.</t>

        <t>However, such precision is unlikely to be necessary. The only known
        use of a count of Not-ECT marked bytes is to test whether equipment on
        the path is clearing the ECN field (perhaps due to an out-dated
        attempt to clear, or bleach, what used to be the ToS field). To detect
        bleaching it will be sufficient to detect whether nearly all bytes
        arrive marked as Not-ECT. Therefore there ought to be no need to keep
        track of the details of retransmissions.</t>
      </section>
    </section>

    <section anchor="accecn_flags_rationale"
             title="Rationale for Usage of TCP Header Flags">
      <section title="Three TCP Header Flags in the SYN-SYN/ACK Handshake">
        <t>AccECN uses a rather unorthodox approach to negotiate the highest
        version TCP ECN feedback scheme that both ends support, as justified
        below. It follows from the original TCP ECN capability negotiation
        <xref target="RFC3168"/>, in which the client set the 2 least
        significant of the original reserved flags in the TCP header, and fell
        back to no ECN support if the server responded with the 2 flags
        cleared, which had previously been the default.</t>

        <t>ECN originally used header flags rather than a TCP option because
        it was considered more efficient to use a header flag for 1 bit of
        feedback per ACK, and this bit could be overloaded to indicate support
        for ECN during the handshake. During the development of ECN, 1 bit
        crept up to 2, in order to deliver the feedback reliably and to work
        round some broken hosts that reflected the reserved flags during the
        handshake.</t>

        <t>In order to be backward compatible with RFC 3168, AccECN continues
        this approach, using the 3rd least significant TCP header flag that
        had previously been allocated for the ECN nonce (now historic). Then,
        whatever form of server an AccECN client encounters, the connection
        can fall back to the highest version of feedback protocol that both
        ends support, as explained in <xref target="accecn_Negotiation"/>.</t>

        <t>If AccECN had used the more orthodox approach of a TCP option, it
        would still have had to set the two ECN flags in the main TCP header,
        in order to be able to fall back to Classic RFC 3168 ECN, or to
        disable ECN support, without another round of negotiation. Then AccECN
        would also have had to handle all the different ways that servers
        currently respond to settings of the ECN flags in the main TCP header,
        including all the conflicting cases where a server might have said it
        supported one approach in the flags and another approach in the new
        TCP option. And AccECN would have had to deal with all the additional
        possibilities where a middlebox might have mangled the ECN flags, or
        removed the TCP option. Thus, usage of the 3rd reserved TCP header
        flag simplified the protocol.</t>

        <t>The third flag was used in a way that could be distinguished from
        the ECN nonce, in case any nonce deployment was encountered. Previous
        usage of this flag for the ECN nonce was integrated into the original
        ECN negotiation. This further justified the 3rd flag's use for AccECN,
        because a non-ECN usage of this flag would have had to use it as a
        separate single bit, rather than in combination with the other 2 ECN
        flags.</t>

        <t>Indeed, having overloaded the original uses of these three flags
        for its handshake, AccECN overloads all three bits again as a 3-bit
        counter.</t>
      </section>

      <section title="Four Codepoints in the SYN/ACK">
        <t>Of the 8 possible codepoints that the 3 TCP header flags can
        indicate on the SYN/ACK, 4 already indicated earlier (or broken)
        versions of ECN support. In the early design of AccECN, an AccECN
        server could use only 2 of the 4 remaining codepoints. They both
        indicated AccECN support, but one fed back that the SYN had arrived
        marked as CE. Even though ECN support on a SYN is not yet on the
        standards track, the idea is for either end to act as a dumb
        reflector, so that future capabilities can be unilaterally deployed
        without requiring 2-ended deployment (justified in <xref
        target="accecn_demb_reflector"/>).</t>

        <t>During traversal testing it was discovered that the ECN field in
        the SYN was mangled on a non-negligible proportion of paths. Therefore
        it was necessary to allow the SYN/ACK to feed all four IP/ECN
        codepoints that the SYN could arrive with back to the client. Without
        this, the client could not know whether to disable ECN for the
        connection due to mangling of the IP/ECN field (also explained in
        <xref target="accecn_demb_reflector"/>). This development consumed the
        remaining 2 codepoints on the SYN/ACK that had been reserved for
        future use by AccECN in earlier versions.</t>
      </section>

      <section anchor="accecn_space_evolution"
               title="Space for Future Evolution">
        <t>Despite availability of usable TCP header space being extremely
        scarce, the AccECN protocol has taken all possible steps to ensure
        that there is space to negotiate possible future variants of the
        protocol, either if a variant of AccECN is required, or if a
        completely different ECN feedback approach is needed:<list
            style="hanging">
            <t hangText="Future AccECN variants:">When the AccECN capability
            is negotiated during TCP's 3WHS, the rows in <xref
            target="accecn_Tab_Negotiation"/> tagged as 'Nonce' and 'Broken'
            in the column for the capability of node B are unused by any
            current protocol in the RFC series. These could be used by TCP
            servers in future to indicate a variant of the AccECN protocol. In
            recent measurement studies in which the response of large numbers
            of servers to an AccECN SYN has been tested, e.g.&nbsp;<xref
            target="Mandalari18"/>, a very small number of SYN/ACKs arrive
            with the pattern tagged as 'Nonce', and a small but more
            significant number arrive with the pattern tagged as 'Broken'. The
            'Nonce' pattern could be a sign that a few servers have
            implemented the ECN Nonce <xref target="RFC3540"/>, which has now
            been reclassified as historic <xref target="RFC8311"/>, or it
            could be the random result of some unknown middlebox behaviour.
            The greater prevalence of the 'Broken' pattern suggests that some
            instances still exist of the broken code that reflects the
            reserved flags on the SYN.<vspace blankLines="1"/>The requirement
            not to reject unexpected initial values of the ACE counter (in the
            main TCP header) in the last para of <xref
            target="accecn_sec_ACE_init_invalid"/> ensures that 3 unused
            codepoints on the ACK of the SYN/ACK, 6 unused values on the first
            SYN=0 data packet from the client and 7 unused values on the first
            SYN=0 data packet from the server could be used to declare future
            variants of the AccECN protocol. The word 'declare' is used rather
            than 'negotiate' because, at this late stage in the 3WHS, it would
            be too late for a negotiation between the endpoints to be
            completed. A similar requirement not to reject unexpected initial
            values in the TCP option (<xref target="accecn_sec_zero_option"/>)
            is for the same purpose. If traversal of the TCP option were
            reliable, this would have enabled a far wider range of future
            variation of the whole AccECN protocol. Nonetheless, it could be
            used to reliably negotiate a wide range of variation in the
            semantics of the AccECN Option.</t>

            <t hangText="Future non-AccECN variants:">Five codepoints out of
            the 8 possible in the 3 TCP header flags used by AccECN are unused
            on the initial SYN (in the order AE,CWR,ECE): 001, 010, 100, 101,
            110. <xref target="accecn_sec_forward_compat"/> ensures that the
            installed base of AccECN servers will all assume these are
            equivalent to AccECN negotiation with 111 on the SYN. These
            codepoints would not allow fall-back to Classic ECN support for a
            server that did not understand them, but this approach ensures
            they are available in future, perhaps for uses other than ECN
            alongside the AccECN scheme. All possible combinations of SYN/ACK
            could be used in response except either 000 or reflection of the
            same values sent on the SYN. <vspace blankLines="1"/>Of course,
            other ways could be resorted to in order to extend AccECN or ECN
            in future, although their traversal properties are likely to be
            inferior. They include a new TCP option; using the remaining
            reserved flags in the main TCP header (preferably extending the
            3-bit combinations used by AccECN to 4-bit combinations, rather
            than burning one bit for just one state); a non-zero urgent
            pointer in combination with the URG flag cleared; or some other
            unexpected combination of fields yet to be invented.</t>
          </list></t>
      </section>
    </section>
  </back>
</rfc>
