<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" []>
<?xml-stylesheet type="text/xsl" href="rfc2629.xslt"?>
<?rfc toc="yes"?>
<?rfc compact="no"?>
<?rfc subcompact="no"?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="yes"?>
<?rfc iprnotified="no"?>
<?rfc strict="yes"?>
<rfc ipr="trust200902"
     category="std"
     docName="draft-ietf-ipsecme-iptfs-02"     submissionType="IETF">
  <front>
    <title abbrev="IP Traffic Flow Security">IP Traffic Flow Security</title>
<author initials='C.' surname='Hopps' fullname='Christian Hopps'><organization>LabN Consulting, L.L.C.</organization><address><email>chopps@chopps.org</email></address></author>  <date/><abstract><t>This document describes a mechanism to enhance IPsec traffic flow
security by adding traffic flow confidentiality to encrypted IP
encapsulated traffic. Traffic flow confidentiality is provided by
obscuring the size and frequency of IP traffic using a fixed-sized,
constant-send-rate IPsec tunnel. The solution allows for congestion
control as well.</t></abstract>  </front>  <middle>

<section title="Introduction" anchor="sec-introduction">
<t>Traffic Analysis (<xref target="RFC4301"/>, <xref target="AppCrypt"/>) is the act of extracting
information about data being sent through a network. While one may
directly obscure the data through the use of encryption <xref target="RFC4303"/>,
the traffic pattern itself exposes information due to variations in
it's shape and timing (<xref target="I-D.iab-wire-image"/>, <xref target="AppCrypt"/>).
Hiding the size and frequency of traffic is referred to as Traffic
Flow Confidentiality (TFC) per <xref target="RFC4303"/>.</t>

<t><xref target="RFC4303"/> provides for TFC by allowing padding to be added to encrypted
IP packets and allowing for transmission of all-pad packets
(indicated using protocol 59). This method has the major limitation
that it can significantly under-utilize the available bandwidth.</t>

<t>The IP-TFS solution provides for full TFC without the aforementioned
bandwidth limitation. This is accomplished by using a
constant-send-rate IPsec <xref target="RFC4303"/> tunnel with fixed-sized
encapsulating packets; however, these fixed-sized packets can contain
partial, whole or multiple IP packets to maximize the bandwidth of
the tunnel.</t>

<t>For a comparison of the overhead of IP-TFS with the RFC4303
prescribed TFC solution see <xref target="sec-comparisons-of-ip-tfs"></xref>.</t>

<t>Additionally, IP-TFS provides for dealing with network congestion
<xref target="RFC2914"/>. This is important for when the IP-TFS user is not in full
control of the domain through which the IP-TFS tunnel path flows.</t>

<section title="Terminology &amp; Concepts">
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
<xref target="RFC2119"/> <xref target="RFC8174"/> when, and only when, they appear in all capitals,
as shown here.</t>

<t>This document assumes familiarity with IP security concepts described
in <xref target="RFC4301"/>.</t>

</section>

</section>

<section title="The IP-TFS Tunnel">
<t>As mentioned in <xref target="sec-introduction"></xref> IP-TFS utilizes an IPsec <xref target="RFC4303"/> tunnel
(SA) as it's transport. To provide for full TFC, fixed-sized
encapsulating packets are sent at a constant rate on the tunnel.</t>

<t>The primary input to the tunnel algorithm is the requested bandwidth
of the tunnel. Two values are then required to provide for this
bandwidth, the fixed size of the encapsulating packets, and rate at
which to send them.</t>

<t>The fixed packet size may either be specified manually or can be
determined through the use of Path MTU discovery <xref target="RFC1191"/> and <xref target="RFC8201"/>.</t>

<t>Given the encapsulating packet size and the requested tunnel
bandwidth, the corresponding packet send rate can be calculated. The
packet send rate is the requested bandwidth divided by the payload
size of the encapsulating packet.</t>

<t>The egress of the IP-TFS tunnel MUST allow for and expect the ingress
(sending) side of the IP-TFS tunnel to vary the size and rate of
sent encapsulating packets, unless constrained by other policy.</t>

<section title="Tunnel Content">
<t>As previously mentioned, one issue with the TFC padding solution in
<xref target="RFC4303"/> is the large amount of wasted bandwidth as only one IP
packet can be sent per encapsulating packet. In order to maximize
bandwidth IP-TFS breaks this one-to-one association.</t>

<t>IP-TFS aggregates as well as fragments the inner IP traffic flow into
fixed-sized encapsulating IPsec tunnel packets. Padding is only added
to the the tunnel packets if there is no data available to be sent at
the time of tunnel packet transmission, or if fragmentation has been
disabled by the receiver.</t>

<t>This is accomplished using a new Encapsulating Security Payload (ESP,
<xref target="RFC4303"/>) type which is identified by the IP protocol number
IPTFS_PROTOCOL (TBD1).</t>

</section>

<section title="IPTFS_PROTOCOL Payload Content">
<t>The IPTFS_PROTOCOL payload content defined in this document is
comprised of a 4 or 16 octet header followed by either a partial, a
full or multiple partial or full data blocks. The following diagram
illustrates this IPTFS_PROTOCOL payload within the ESP packet. See
<xref target="sec-ip-tfs-payload"></xref> for the exact formats of the IPTFS_PROTOCOL payload.</t>

<figure title="Layout of an IP-TFS IPsec Packet" anchor="sec-layout-of-an-ip-tfs-ipsec-packet"><artwork><![CDATA[
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . Outer Encapsulating Header ...                                .
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . ESP Header...                                                 .
 +---------------------------------------------------------------+
 |               ...            :           BlockOffset          |
 +---------------------------------------------------------------+
 :                  [Optional Congestion Info]                   :
 +---------------------------------------------------------------+
 |       DataBlocks ...                                          ~
 ~                                                               ~
 ~                                                               |
 +---------------------------------------------------------------|
 . ESP Trailer...                                                .
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
]]></artwork></figure>

<t>The <spanx style='verb'>BlockOffset</spanx> value is either zero or some offset into or past
the end of the <spanx style='verb'>DataBlocks</spanx> data.</t>

<t>If the <spanx style='verb'>BlockOffset</spanx> value is zero it means that the <spanx style='verb'>DataBlocks</spanx>
data begins with a new data block.</t>

<t>Conversely, if the <spanx style='verb'>BlockOffset</spanx> value is non-zero it points to the
start of the new data block, and the initial <spanx style='verb'>DataBlocks</spanx> data
belongs to a previous data block that is still being re-assembled.</t>

<t>The <spanx style='verb'>BlockOffset</spanx> can point past the end of the <spanx style='verb'>DataBlocks</spanx> data
which indicates that the next data block occurs in a subsequent
encapsulating packet.</t>

<t>Having the <spanx style='verb'>BlockOffset</spanx> always point at the next available data
block allows for recovering the next full inner packet in the
presence of outer encapsulating packet loss.</t>

<t>An example IP-TFS packet flow can be found in <xref target="sec-example-of-an-encapsulated-ip-packet-flow"></xref>.</t>

<section title="Data Blocks">
<figure title="Layout of IP-TFS data block" anchor="sec-layout-of-ip-tfs-data-block"><artwork><![CDATA[
 +---------------------------------------------------------------+
 | Type  | rest of IPv4, IPv6 or pad.
 +--------
]]></artwork></figure>

<t>A data block is defined by a 4-bit type code followed by the data
block data. The type values have been carefully chosen to coincide
with the IPv4/IPv6 version field values so that no per-data block
type overhead is required to encapsulate an IP packet. Likewise, the
length of the data block is extracted from the encapsulated IPv4 or
IPv6 packet's length field.</t>

</section>

<section title="No Implicit End Padding Required">
<t>It's worth noting that since a data block type is identified by its
first octet there is never a need for an implicit pad at the end of
an encapsulating packet. Even when the start of a data block occurs
near the end of a encapsulating packet such that there is no room for
the length field of the encapsulated header to be included in the
current encapsulating packet, the fact that the length comes at a
known location and is guaranteed to be present is enough to fetch the
length field from the subsequent encapsulating packet payload. Only
when there is no data to encapsulated is end padding required, and
then an explicit <spanx style='verb'>Pad Data Block</spanx> would be used to identify the
padding.</t>

</section>

<section title="Fragmentation, Sequence Numbers and All-Pad Payloads">
<t>In order for a receiver to be able to reassemble fragmented
inner-packets, the sender MUST send the inner-packet fragments
back-to-back in the logical IP-TFS packet stream (i.e., using
consecutive ESP sequence numbers). However, the sender is allowed to
insert "all-pad" IP-TFS packets (i.e., packets having payloads with a
<spanx style='verb'>BlockOffset</spanx> of zero and a single pad <spanx style='verb'>DataBlock</spanx>) in between the
IP-TFS packets carrying the inner-packet fragment payloads. This
possible interleaving of all-pad packets allows the sender to always
be able to send an IP-TFS tunnel packet, regardless of the
encapsulation computational requirements.</t>

<t>When a receiver is reassembling an inner-packet, and it receives an
"all-pad" IP-TFS tunnel packet, it increments the expected sequence
number that the next inner-packet fragment is expected to arrive in.</t>

</section>

<section title="Empty Payload">
<t>In order to support reporting of congestion control information
(described later) on a non-IP-TFS enabled SA, IP-TFS allows for the
sending of an IP-TFS payload with no data blocks (i.e., the ESP
payload length is equal to the IP-TFS header length). This special
payload is called an empty payload.</t>

</section>

<section title="IP Header Value Mapping">
<t><xref target="RFC4301"/> provides some direction on when and how to map various values
from an inner IP header to the outer encapsulating header, namely the
Don't-Fragment (DF) bit (<xref target="RFC0791"/> and <xref target="RFC8200"/>), the Differentiated
Services (DS) field <xref target="RFC2474"/> and the Explicit Congestion Notification
(ECN) field <xref target="RFC3168"/>. Unlike <xref target="RFC4301"/>, IP-TFS may and often will be
encapsulating more than one IP packet per ESP packet. To deal with
this, these mappings are restricted further. In particular
IP-TFS never maps the inner DF bit as it is unrelated to the IP-TFS
tunnel functionality; IP-TFS never IP fragments the inner packets and
the inner packets will not affect the fragmentation of the outer
encapsulation packets. Likewise, the ECN value need not be mapped as
any congestion related to the constant-send-rate IP-TFS tunnel is
unrelated (by design!) to the inner traffic flow. Finally, by default
the DS field SHOULD NOT be copied although an implementation MAY
choose to allow for configuration to override this behavior. An
implementation SHOULD also allow the DS value to be set by
configuration.</t>

</section>

</section>

<section title="Exclusive SA Use">
<t>It is not the intention of this specification to allow for mixed use
of an IP-TFS enabled SA. In other words, an SA that has IP-TFS
enabled is exclusively for IP-TFS use and MUST NOT have non-IP-TFS
payloads such as IP (IP protocol 4), TCP transport (IP protocol 6),
or ESP pad packets (protocol 59) intermixed with non-empty IP-TFS (IP
protocol TBD1) payloads. While it's possible to envision making the
algorithm work in the presence of sequence number skips in the IP-TFS
payload stream, the added complexity is not deemed worthwhile. Other
IPsec uses can configure and use their own SAs.</t>

</section>

<section title="Zero-Conf Receive-Side Operation On The SA.">
<t>Receive-side operation of IP-TFS does not require any per-SA
configuration on the receiver; as such, an IP-TFS implementation
SHOULD support the option of switching to IP-TFS receive-side
operation on receipt of the first IP-TFS payload.</t>

</section>

<section title="Modes of Operation">
<t>Just as with normal IPsec/ESP tunnels, IP-TFS tunnels are
unidirectional. Bidirectional IP-TFS functionality is achieved by
setting up 2 IP-TFS tunnels, one in either direction.</t>

<t>An IP-TFS tunnel can operate in 2 modes, a non-congestion controlled
mode and congestion controlled mode.</t>

<section title="Non-Congestion Controlled Mode">
<t>In the non-congestion controlled mode IP-TFS sends fixed-sized
packets at a constant rate. The packet send rate is constant and is
not automatically adjusted regardless of any network congestion
(e.g., packet loss).</t>

<t>For similar reasons as given in <xref target="RFC7510"/> the non-congestion
controlled mode should only be used where the user has full
administrative control over the path the tunnel will take. This is
required so the user can guarantee the bandwidth and also be sure as
to not be negatively affecting network congestion <xref target="RFC2914"/>. In this
case packet loss should be reported to the administrator (e.g.,
via syslog, YANG notification, SNMP traps, etc) so that any
failures due to a lack of bandwidth can be corrected.</t>

</section>

<section title="Congestion Controlled Mode" anchor="sec-congestion-controlled-mode">
<t>With the congestion controlled mode, IP-TFS adapts to network
congestion by lowering the packet send rate to accommodate the
congestion, as well as raising the rate when congestion subsides.
Since overhead is per packet, by allowing for maximal fixed-size
packets and varying the send rate transport overhead is minimized.</t>

<t>The output of the congestion control algorithm will adjust the rate
at which the ingress sends packets. While this document does not
require a specific congestion control algorithm, best current
practice RECOMMENDS that the algorithm conform to <xref target="RFC5348"/>. Congestion
control principles are documented in <xref target="RFC2914"/> as well. An example of
an implementation of the <xref target="RFC5348"/> algorithm which matches the
requirements of IP-TFS (i.e., designed for fixed-size packet and send
rate varied based on congestion) is documented in <xref target="RFC4342"/>.</t>

<t>The required inputs for the TCP friendly rate control algorithm
described in <xref target="RFC5348"/> are the receiver's loss event rate and the
sender's estimated round-trip time (RTT). These values are provided by
IP-TFS using the congestion information header fields described in
<xref target="sec-congestion-information"></xref>. In particular these values are sufficient to
implement the algorithm described in <xref target="RFC5348"/>.</t>

<t>At a minimum, the congestion information must be sent, from the
receiver and from the sender, at least once per RTT. Prior to
establishing an RTT the information SHOULD be sent constantly from
the sender and the receiver so that an RTT estimate can be
established. The lack of receiving this information over multiple
consecutive RTT intervals should be considered a congestion event
that causes the sender to adjust it's sending rate lower. For
example, <xref target="RFC4342"/> calls this the "no feedback timeout" and it is equal
to 4 RTT intervals. When a "no feedback timeout" has occurred <xref target="RFC4342"/>
halves the sending rate.</t>

<t>An implementation could choose to always include the congestion
information in it's IP-TFS payload header if sending on an IP-TFS
enabled SA. Since IP-TFS normally will operate with a large packet
size, the congestion information should represent a small portion of
the available tunnel bandwidth.</t>

<t>When an implementation is choosing a congestion control algorithm (or
a selection of algorithms) one should remember that IP-TFS is not
providing for reliable delivery of IP traffic, and so per packet ACKs
are not required and are not provided.</t>

<t>It's worth noting that the variable send-rate of a congestion
controlled IP-TFS tunnel, is not private; however, this send-rate is
being driven by network congestion, and as long as the encapsulated
(inner) traffic flow shape and timing are not directly affecting the
(outer) network congestion, the variations in the tunnel rate will
not weaken the provided inner traffic flow confidentiality.</t>

<section title="Circuit Breakers">
<t>In additional to congestion control, implementations MAY choose to
define and implement circuit breakers <xref target="RFC8084"/> as a recovery method
of last resort. Enabling circuit breakers is also a reason a user may
wish to enable congestion information reports even when using the
non-congestion controlled mode of operation. The definition of
circuit breakers are outside the scope of this document.</t>

</section>

</section>

</section>

</section>

<section title="Congestion Information" anchor="sec-congestion-information">
<t>In order to support the congestion control mode, the sender needs to
know the loss event rate and also be able to approximate the RTT
(<xref target="RFC5348"/>). In order to obtain these values the receiver sends
congestion control information on it's SA back to the sender. Thus,
in order to support congestion control the receiver must have a
paired SA back to the sender (this is always the case when the tunnel
was created using IKEv2). If the SA back to the sender is a
non-IP-TFS enabled SA then an IPTFS_PROTOCOL empty payload (i.e.,
header only) is used to convey the information.</t>

<t>In order to calculate a loss event rate compatible with <xref target="RFC5348"/>, the
receiver needs to have a round-trip time estimate. Thus the sender
communicates this estimate in the <spanx style='verb'>RTT</spanx> header field. On startup this
value will be zero as no RTT estimate is yet known.</t>

<t>In order to allow the sender to calculate the <spanx style='verb'>RTT</spanx> value, the
receiver communicates the last sequence number it has seen to the
sender in the <spanx style='verb'>LastSeqNum</spanx> header field. In addition to the
<spanx style='verb'>LastSeqNum</spanx> value, the receiver sends an estimate of the amount of
time between receiving the <spanx style='verb'>LastSeqNum</spanx> packet and transmitting
the <spanx style='verb'>LastSeqNum</spanx> value back to the sender in the congestion
information. It places this time estimate in the <spanx style='verb'>Delay</spanx> header
field along with the <spanx style='verb'>LastSeqNum</spanx>.</t>

<t>The receiver also calculates, and communicates in the <spanx style='verb'>LossEventRate</spanx>
header field, the loss event rate for use by the sender. This is
slightly different from <xref target="RFC4342"/> which periodically sends all the loss
interval data back to the sender so that it can do the calculation.
See <xref target="sec-a-send-and-loss-event-rate-calculation"></xref> for a suggested way to
calculate the loss event rate value. Initially this value will be
zero (indicating no loss) until enough data has been collected by the
receiver to update it.</t>

<section title="ECN Support">
<t>In additional to normal packet loss information IP-TFS supports use
of the ECN bits in the encapsulating IP header <xref target="RFC3168"/> for
identifying congestion. If ECN use is enabled and a packet arrives at
the egress endpoint with the Congestion Experienced (CE) value set,
then the receiver considers that packet as being dropped, although it
does not drop it. The receiver MUST set the E bit in any
IPTFS_PROTOCOL payload header containing a <spanx style='verb'>LossEventRate</spanx> value
derived from a CE value being considered.</t>

<t>As noted in <xref target="RFC3168"/> the ECN bits are not protected by IPsec and
thus may constitute a covert channel. For this reason ECN use SHOULD
NOT be enabled by default.</t>

</section>

</section>

<section title="Configuration">
<t>IP-TFS is meant to be deployable with a minimal amount of
configuration. All IP-TFS specific configuration should be able to be
specified at the unidirectional tunnel ingress (sending) side. It
is intended that non-IKEv2 operation is supported, at least, with
local static configuration.</t>

<section title="Bandwidth">
<t>Bandwidth is a local configuration option. For non-congestion
controlled mode the bandwidth SHOULD be configured. For
congestion controlled mode one can configure the bandwidth
or have no configuration and let congestion control discover the
maximum bandwidth available. No standardized configuration method is
required.</t>

</section>

<section title="Fixed Packet Size">
<t>The fixed packet size to be used for the tunnel encapsulation packets
can be configured manually or can be automatically determined using
Path MTU discovery (see <xref target="RFC1191"/> and <xref target="RFC8201"/>). No standardized
configuration method is required.</t>

</section>

<section title="Congestion Control">
<t>Congestion control is a local configuration option. No standardized
configuration method is required.</t>

</section>

</section>

<section title="IKEv2">
<section title="USE_TFS Notification Message" anchor="sec-use-tfs-notification-message">
<t>When using IKEv2, a new "USE_IPTFS" Notification Message is used to
enable operation of IP-TFS on a child SA pair. The method used is
similar to how USE_TRANSPORT_MODE is negotiated, as described in
<xref target="RFC7296"/>.</t>

<t>To request IP-TFS operation on the Child SA pair, the initiator
includes the USE_IPTFS notification in an SA payload requesting a new
Child SA (either during the initial IKE_AUTH or during non-rekeying
CREATE_CHILD_SA exchanges). If the request is accepted then response
MUST also include a notification of type USE_IPTFS. If the responder
declines the request the child SA will be established without IP-TFS
enabled. If this is unacceptable to the initiator, the initiator MUST
delete the child SA.</t>

<t>The USE_IPTFS notification MUST NOT be sent, and MUST be ignored,
during a CREATE_CHILD_SA rekeying exchange as it is not allowed to
change IP-TFS operation during rekeying.</t>

<t>The USE_IPTFS notification contains a 1 octet payload of flags that
specify any requirements from the sender of the message. If any
requirement flags are not understood or cannot be supported by the
receiver then the receiver should not enable IP-TFS mode (either by
not responding with the USE_IPTFS notification, or in the case of the
initiator, by deleting the child SA if the now established non-IP-TFS
operation is unacceptable).</t>

<t>The notification type and payload flag values are defined in <xref target="sec-ikev2-use-iptfs-notification-message"></xref>.</t>

</section>

</section>

<section title="Packet and Data Formats">
<section title="IP-TFS Payload" anchor="sec-ip-tfs-payload">
<t>An IP-TFS payload is identified by the IP protocol number
IPTFS_PROTOCOL (TBD1). The first octet of this payload indicates the
format of the remaining payload data.</t>

<figure><artwork><![CDATA[
  0 1 2 3 4 5 6 7
 +-+-+-+-+-+-+-+-+-+-+-
 |   Sub-type    | ...
 +-+-+-+-+-+-+-+-+-+-+-
]]></artwork></figure>

<t><list style="hanging">
<t hangText="Sub-type:"><vspace/>An 8 bit value indicating the payload format.</t>
</list></t>

<t>This specification defines 2 payload sub-types. These payload formats
are defined in the following sections.</t>

<section title="Non-Congestion Control IPTFS_PROTOCOL Payload Format">
<t>The non-congestion control IPTFS_PROTOCOL payload is comprised of a 4
octet header followed by a variable amount of <spanx style='verb'>DataBlocks</spanx> data as
shown below.</t>

<figure><artwork><![CDATA[
                      1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |  Sub-Type (0) |   Reserved    |          BlockOffset          |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |       DataBlocks ...
 +-+-+-+-+-+-+-+-+-+-+-
]]></artwork></figure>

<t><list style="hanging">
<t hangText="Sub-type:"><vspace/>An octet indicating the payload format. For this
non-congestion control format, the value is 0.</t>
<t hangText="Reserved:"><vspace/>An octet set to 0 on generation, and ignored on
receipt.</t>
<t hangText="BlockOffset:"><vspace/>A 16 bit unsigned integer counting the number of
octets of <spanx style='verb'>DataBlocks</spanx> data before the start of a
new data block. <spanx style='verb'>BlockOffset</spanx> can count past the end
of the <spanx style='verb'>DataBlocks</spanx> data in which case all the
<spanx style='verb'>DataBlocks</spanx> data belongs to the previous data block
being re-assembled. If the <spanx style='verb'>BlockOffset</spanx> extends
into subsequent packets it continues to only count
subsequent <spanx style='verb'>DataBlocks</spanx> data (i.e., it does not
count subsequent packets non-<spanx style='verb'>DataBlocks</spanx> octets).</t>
<t hangText="DataBlocks:"><vspace/>Variable number of octets that begins with the start
of a data block, or the continuation of a previous
data block, followed by zero or more additional data
blocks.</t>
</list></t>

</section>

<section title="Congestion Control IPTFS_PROTOCOL Payload Format">
<t>The congestion control IPTFS_PROTOCOL payload is comprised of a 16
octet header followed by a variable amount of <spanx style='verb'>DataBlocks</spanx> data as
shown below.</t>

<figure><artwork><![CDATA[
                      1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |  Sub-type (1) |  Reserved   |E|          BlockOffset          |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |              RTT              |             Delay             |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                          LossEventRate                        |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                           LastSeqNum                          |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |       DataBlocks ...
 +-+-+-+-+-+-+-+-+-+-+-
]]></artwork></figure>

<t><list style="hanging">
<t hangText="Sub-type:"><vspace/>An octet indicating the payload format. For this
congestion control format, the value is 1.</t>
<t hangText="Reserved:"><vspace/>A 7 bit field set to 0 on generation, and ignored on
receipt.</t>
<t hangText="E:"><vspace/>A 1 bit value if set indicates that Congestion Experienced
(CE) ECN bits were received and used in deriving the
reported <spanx style='verb'>LossEventRate</spanx>.</t>
<t hangText="BlockOffset:"><vspace/>The same value as the non-congestion controlled
payload format value.</t>
<t hangText="RTT:"><vspace/>A 16 bit value specifying the sender's current round-trip
time estimate in milliseconds. The value MAY be zero prior
to the sender having calculated a round-trip time estimate.
The value SHOULD be set to zero on non-IP-TFS enabled SAs.</t>
<t hangText="Delay:"><vspace/>A 16 bit value specifying the delay in milliseconds
incurred between the receiver receiving the <spanx style='verb'>LastSeqNum</spanx>
packet and the sending of this acknowledgement of it.</t>
<t hangText="LossEventRate:"><vspace/>A 32 bit value specifying the inverse of the
current loss event rate as calculated by the
receiver. A value of zero indicates no loss.
Otherwise the loss event rate is
<spanx style='verb'>1/LossEventRate</spanx>.</t>
<t hangText="LastSeqNum:"><vspace/>A 32 bit value containing the lower 32 bits of the
largest sequence number last received. This is the
latest in the sequence not necessarily the most
recent (in the case of re-ordering of packets it may
be less recent). When determining largest and 64 bit
extended sequence numbers are in use, the upper 32
bits should be used during the comparison.</t>
<t hangText="DataBlocks:"><vspace/>Variable number of octets that begins with the start
of a data block, or the continuation of a previous
data block, followed by zero or more additional data
blocks. For the special case of sending congestion
control information on an non-IP-TFS enabled SA this
value MUST be empty (i.e., be zero octets long).</t>
</list></t>

</section>

<section title="Data Blocks">
<figure><artwork><![CDATA[
                      1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 | Type  | IPv4, IPv6 or pad...
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
]]></artwork></figure>

<t><list style="hanging">
<t hangText="Type:"><vspace/>A 4 bit field where 0x0 identifies a pad data block, 0x4
indicates an IPv4 data block, and 0x6 indicates an IPv6
data block.</t>
</list></t>

<section title="IPv4 Data Block">
<figure><artwork><![CDATA[
                      1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |  0x4  |  IHL  |  TypeOfService  |         TotalLength         |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 | Rest of the inner packet ...
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
]]></artwork></figure>

<t>These values are the actual values within the encapsulated IPv4
header. In other words, the start of this data block is the start of
the encapsulated IP packet.</t>

<t><list style="hanging">
<t hangText="Type:"><vspace/>A 4 bit value of 0x4 indicating IPv4 (i.e., first nibble of
the IPv4 packet).</t>
<t hangText="TotalLength:"><vspace/>The 16 bit unsigned integer "Total Length" field of
the IPv4 inner packet.</t>
</list></t>

</section>

<section title="IPv6 Data Block">
<figure><artwork><![CDATA[
                      1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |  0x6  | TrafficClass  |               FlowLabel               |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |         PayloadLength         | Rest of the inner packet ...
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
]]></artwork></figure>

<t>These values are the actual values within the encapsulated IPv6
header. In other words, the start of this data block is the start of
the encapsulated IP packet.</t>

<t><list style="hanging">
<t hangText="Type:"><vspace/>A 4 bit value of 0x6 indicating IPv6 (i.e., first nibble of
the IPv6 packet).</t>
<t hangText="PayloadLength:"><vspace/>The 16 bit unsigned integer "Payload Length" field
of the inner IPv6 inner packet.</t>
</list></t>

</section>

<section title="Pad Data Block">
<figure><artwork><![CDATA[
                      1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |  0x0  | Padding ...
 +-+-+-+-+-+-+-+-+-+-+-
]]></artwork></figure>

<t><list style="hanging">
<t hangText="Type:"><vspace/>A 4 bit value of 0x0 indicating a padding data block.</t>
<t hangText="Padding:"><vspace/>extends to end of the encapsulating packet.</t>
</list></t>

</section>

</section>

<section title="IKEv2 USE_IPTFS Notification Message" anchor="sec-ikev2-use-iptfs-notification-message">
<t>As discussed in <xref target="sec-use-tfs-notification-message"></xref> a notification message
USE_IPTFS is used to negotiate IP-TFS operation in IKEv2.</t>

<t>The USE_IPTFS Notification Message State Type is (TBD2).</t>

<t>The notification payload contains 1 octet of requirement flags. There
are currently 2 requirement flags defined. This may be revised by
later specifications.</t>

<figure><artwork><![CDATA[
 +-+-+-+-+-+-+-+-+
 |0|0|0|0|0|0|C|D|
 +-+-+-+-+-+-+-+-+
]]></artwork></figure>

<t><list style="hanging">
<t hangText="0:"><vspace/>6 bits - reserved, MUST be zero on send, unless defined by
later specifications.</t>
<t hangText="C:"><vspace/>Congestion Control bit. If set, then the sender is requiring
that congestion control information MUST be returned to it
periodically as defined in <xref target="sec-congestion-information"></xref>.</t>
<t hangText="D:"><vspace/>Don't Fragment bit, if set indicates the sender of the notify
message does not support receiving packet fragments (i.e., inner
packets MUST be sent using a single <spanx style='verb'>Data Block</spanx>). This value only
applies to what the sender is capable of receiving; the sender MAY
still send packet fragments unless similarly restricted by the
receiver in it's USE_IPTFS notification.</t>
</list></t>

</section>

</section>

</section>

<section title="IANA Considerations">
<section title="IPTFS_PROTOCOL Type">
<t>This document requests a protocol number IPTFS_PROTOCOL be allocated
by IANA from "Assigned Internet Protocol Numbers" registry for
identifying the IP-TFS payload.</t>

<t><list style="hanging">
<t hangText="Type:"><vspace/>TBD1</t>
<t hangText="Description:"><vspace/>An IP-TFS payload.</t>
<t hangText="Reference:"><vspace/>This document</t>
</list></t>

</section>

<section title="IPTFS_PROTOCOL Sub-Type Registry">
<t>This document requests IANA create a registry called "IPTFS_PROTOCOL
Sub-Type Registry" under "IPTFS_PROTOCOL Parameters" IANA registries.
The registration policy for this registry is "Standards Action"
(<xref target="RFC8126"/> and <xref target="RFC7120"/>).</t>

<t><list style="hanging">
<t hangText="Name:"><vspace/>IPTFS_PROTOCOL Sub-Type Registry</t>
<t hangText="Description:"><vspace/>IPTFS_PROTOCOL Payload Formats.</t>
<t hangText="Reference:"><vspace/>This document</t>
</list></t>

<t>This initial content for this registry is as follows:</t>

<figure><artwork><![CDATA[
 Sub-Type  Name                           Reference     
--------------------------------------------------------
        0  Non-Congestion Control Format  This document 
        1  Congestion Control Format      This document 
    3-255  Reserved                                     
]]></artwork></figure>

</section>

<section title="USE_IPTFS Notify Message Status Type">
<t>This document requests a status type USE_IPTFS be allocated
from the "IKEv2 Notify Message Types - Status Types" registry.</t>

<t><list style="hanging">
<t hangText="Value:"><vspace/>TBD2</t>
<t hangText="Name:"><vspace/>USE_IPTFS</t>
<t hangText="Reference:"><vspace/>This document</t>
</list></t>

</section>

</section>

<section title="Security Considerations">
<t>This document describes a mechanism to add Traffic Flow
Confidentiality to IP traffic. Use of this mechanism is expected to
increase the security of the traffic being transported. Other than
the additional security afforded by using this mechanism, IP-TFS
utilizes the security protocols <xref target="RFC4303"/> and <xref target="RFC7296"/> and so their
security considerations apply to IP-TFS as well.</t>

<t>As noted previously in <xref target="sec-congestion-controlled-mode"></xref>, for TFC to be
fully maintained the encapsulated traffic flow should not be
affecting network congestion in a predictable way, and if it would be
then non-congestion controlled mode use should be considered instead.</t>

</section>

</middle>
<back>
<references title="Normative References">


<reference  anchor='RFC2119' target='https://www.rfc-editor.org/info/rfc2119'>
<front>
<title>Key words for use in RFCs to Indicate Requirement Levels</title>
<author initials='S.' surname='Bradner' fullname='S. Bradner'><organization /></author>
<date year='1997' month='March' />
<abstract><t>In many standards track documents several words are used to signify the requirements in the specification.  These words are often capitalized. This document defines these words as they should be interpreted in IETF documents.  This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.</t></abstract>
</front>
<seriesInfo name='BCP' value='14'/>
<seriesInfo name='RFC' value='2119'/>
<seriesInfo name='DOI' value='10.17487/RFC2119'/>
</reference>


<reference  anchor='RFC4303' target='https://www.rfc-editor.org/info/rfc4303'>
<front>
<title>IP Encapsulating Security Payload (ESP)</title>
<author initials='S.' surname='Kent' fullname='S. Kent'><organization /></author>
<date year='2005' month='December' />
<abstract><t>This document describes an updated version of the Encapsulating Security Payload (ESP) protocol, which is designed to provide a mix of security services in IPv4 and IPv6.  ESP is used to provide confidentiality, data origin authentication, connectionless integrity, an anti-replay service (a form of partial sequence integrity), and limited traffic flow confidentiality.  This document obsoletes RFC 2406 (November 1998).  [STANDARDS-TRACK]</t></abstract>
</front>
<seriesInfo name='RFC' value='4303'/>
<seriesInfo name='DOI' value='10.17487/RFC4303'/>
</reference>


<reference  anchor='RFC7296' target='https://www.rfc-editor.org/info/rfc7296'>
<front>
<title>Internet Key Exchange Protocol Version 2 (IKEv2)</title>
<author initials='C.' surname='Kaufman' fullname='C. Kaufman'><organization /></author>
<author initials='P.' surname='Hoffman' fullname='P. Hoffman'><organization /></author>
<author initials='Y.' surname='Nir' fullname='Y. Nir'><organization /></author>
<author initials='P.' surname='Eronen' fullname='P. Eronen'><organization /></author>
<author initials='T.' surname='Kivinen' fullname='T. Kivinen'><organization /></author>
<date year='2014' month='October' />
<abstract><t>This document describes version 2 of the Internet Key Exchange (IKE) protocol.  IKE is a component of IPsec used for performing mutual authentication and establishing and maintaining Security Associations (SAs).  This document obsoletes RFC 5996, and includes all of the errata for it.  It advances IKEv2 to be an Internet Standard.</t></abstract>
</front>
<seriesInfo name='STD' value='79'/>
<seriesInfo name='RFC' value='7296'/>
<seriesInfo name='DOI' value='10.17487/RFC7296'/>
</reference>


<reference  anchor='RFC8174' target='https://www.rfc-editor.org/info/rfc8174'>
<front>
<title>Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words</title>
<author initials='B.' surname='Leiba' fullname='B. Leiba'><organization /></author>
<date year='2017' month='May' />
<abstract><t>RFC 2119 specifies common key words that may be used in protocol  specifications.  This document aims to reduce the ambiguity by clarifying that only UPPERCASE usage of the key words have the  defined special meanings.</t></abstract>
</front>
<seriesInfo name='BCP' value='14'/>
<seriesInfo name='RFC' value='8174'/>
<seriesInfo name='DOI' value='10.17487/RFC8174'/>
</reference>
</references>
<references title="Informative References">
<reference anchor="AppCrypt">
<front>
<title>Applied Cryptography: Protocols, Algorithms, and Source Code in C</title>
<author initials='B.' surname='Schneier' fullname='Bruce Schneier'><organization/></author>
<date day="1" month="11" year="2017"/>
</front>
</reference>


<reference  anchor='RFC0791' target='https://www.rfc-editor.org/info/rfc791'>
<front>
<title>Internet Protocol</title>
<author initials='J.' surname='Postel' fullname='J. Postel'><organization /></author>
<date year='1981' month='September' />
</front>
<seriesInfo name='STD' value='5'/>
<seriesInfo name='RFC' value='791'/>
<seriesInfo name='DOI' value='10.17487/RFC0791'/>
</reference>


<reference  anchor='RFC1191' target='https://www.rfc-editor.org/info/rfc1191'>
<front>
<title>Path MTU discovery</title>
<author initials='J.C.' surname='Mogul' fullname='J.C. Mogul'><organization /></author>
<author initials='S.E.' surname='Deering' fullname='S.E. Deering'><organization /></author>
<date year='1990' month='November' />
<abstract><t>This memo describes a technique for dynamically discovering the maximum transmission unit (MTU) of an arbitrary internet path.  It specifies a small change to the way routers generate one type of ICMP message.  For a path that passes through a router that has not been so changed, this technique might not discover the correct Path MTU, but it will always choose a Path MTU as accurate as, and in many cases more accurate than, the Path MTU that would be chosen by current practice.  [STANDARDS-TRACK]</t></abstract>
</front>
<seriesInfo name='RFC' value='1191'/>
<seriesInfo name='DOI' value='10.17487/RFC1191'/>
</reference>


<reference  anchor='RFC2474' target='https://www.rfc-editor.org/info/rfc2474'>
<front>
<title>Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers</title>
<author initials='K.' surname='Nichols' fullname='K. Nichols'><organization /></author>
<author initials='S.' surname='Blake' fullname='S. Blake'><organization /></author>
<author initials='F.' surname='Baker' fullname='F. Baker'><organization /></author>
<author initials='D.' surname='Black' fullname='D. Black'><organization /></author>
<date year='1998' month='December' />
<abstract><t>This document defines the IP header field, called the DS (for differentiated services) field.  [STANDARDS-TRACK]</t></abstract>
</front>
<seriesInfo name='RFC' value='2474'/>
<seriesInfo name='DOI' value='10.17487/RFC2474'/>
</reference>


<reference  anchor='RFC2914' target='https://www.rfc-editor.org/info/rfc2914'>
<front>
<title>Congestion Control Principles</title>
<author initials='S.' surname='Floyd' fullname='S. Floyd'><organization /></author>
<date year='2000' month='September' />
<abstract><t>The goal of this document is to explain the need for congestion control in the Internet, and to discuss what constitutes correct congestion control.  This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.</t></abstract>
</front>
<seriesInfo name='BCP' value='41'/>
<seriesInfo name='RFC' value='2914'/>
<seriesInfo name='DOI' value='10.17487/RFC2914'/>
</reference>


<reference  anchor='RFC3168' target='https://www.rfc-editor.org/info/rfc3168'>
<front>
<title>The Addition of Explicit Congestion Notification (ECN) to IP</title>
<author initials='K.' surname='Ramakrishnan' fullname='K. Ramakrishnan'><organization /></author>
<author initials='S.' surname='Floyd' fullname='S. Floyd'><organization /></author>
<author initials='D.' surname='Black' fullname='D. Black'><organization /></author>
<date year='2001' month='September' />
<abstract><t>This memo specifies the incorporation of ECN (Explicit Congestion Notification) to TCP and IP, including ECN's use of two bits in the IP header.  [STANDARDS-TRACK]</t></abstract>
</front>
<seriesInfo name='RFC' value='3168'/>
<seriesInfo name='DOI' value='10.17487/RFC3168'/>
</reference>


<reference  anchor='RFC4301' target='https://www.rfc-editor.org/info/rfc4301'>
<front>
<title>Security Architecture for the Internet Protocol</title>
<author initials='S.' surname='Kent' fullname='S. Kent'><organization /></author>
<author initials='K.' surname='Seo' fullname='K. Seo'><organization /></author>
<date year='2005' month='December' />
<abstract><t>This document describes an updated version of the &quot;Security Architecture for IP&quot;, which is designed to provide security services for traffic at the IP layer.  This document obsoletes RFC 2401 (November 1998).  [STANDARDS-TRACK]</t></abstract>
</front>
<seriesInfo name='RFC' value='4301'/>
<seriesInfo name='DOI' value='10.17487/RFC4301'/>
</reference>


<reference  anchor='RFC4342' target='https://www.rfc-editor.org/info/rfc4342'>
<front>
<title>Profile for Datagram Congestion Control Protocol (DCCP) Congestion Control ID 3: TCP-Friendly Rate Control (TFRC)</title>
<author initials='S.' surname='Floyd' fullname='S. Floyd'><organization /></author>
<author initials='E.' surname='Kohler' fullname='E. Kohler'><organization /></author>
<author initials='J.' surname='Padhye' fullname='J. Padhye'><organization /></author>
<date year='2006' month='March' />
<abstract><t>This document contains the profile for Congestion Control Identifier 3, TCP-Friendly Rate Control (TFRC), in the Datagram Congestion Control Protocol (DCCP).  CCID 3 should be used by senders that want a TCP-friendly sending rate, possibly with Explicit Congestion Notification (ECN), while minimizing abrupt rate changes.  [STANDARDS-TRACK]</t></abstract>
</front>
<seriesInfo name='RFC' value='4342'/>
<seriesInfo name='DOI' value='10.17487/RFC4342'/>
</reference>


<reference  anchor='RFC5348' target='https://www.rfc-editor.org/info/rfc5348'>
<front>
<title>TCP Friendly Rate Control (TFRC): Protocol Specification</title>
<author initials='S.' surname='Floyd' fullname='S. Floyd'><organization /></author>
<author initials='M.' surname='Handley' fullname='M. Handley'><organization /></author>
<author initials='J.' surname='Padhye' fullname='J. Padhye'><organization /></author>
<author initials='J.' surname='Widmer' fullname='J. Widmer'><organization /></author>
<date year='2008' month='September' />
<abstract><t>This document specifies TCP Friendly Rate Control (TFRC).  TFRC is a congestion control mechanism for unicast flows operating in a best-effort Internet environment.  It is reasonably fair when competing for bandwidth with TCP flows, but has a much lower variation of throughput over time compared with TCP, making it more suitable for applications such as streaming media where a relatively smooth sending rate is of importance.</t><t>This document obsoletes RFC 3448 and updates RFC 4342.  [STANDARDS-TRACK]</t></abstract>
</front>
<seriesInfo name='RFC' value='5348'/>
<seriesInfo name='DOI' value='10.17487/RFC5348'/>
</reference>


<reference  anchor='RFC7120' target='https://www.rfc-editor.org/info/rfc7120'>
<front>
<title>Early IANA Allocation of Standards Track Code Points</title>
<author initials='M.' surname='Cotton' fullname='M. Cotton'><organization /></author>
<date year='2014' month='January' />
<abstract><t>This memo describes the process for early allocation of code points by IANA from registries for which &quot;Specification Required&quot;, &quot;RFC                        Required&quot;, &quot;IETF Review&quot;, or &quot;Standards Action&quot; policies apply.  This process can be used to alleviate the problem where code point allocation is needed to facilitate desired or required implementation and deployment experience prior to publication of an RFC, which would normally trigger code point allocation.  The procedures in this document are intended to apply only to IETF Stream documents.</t></abstract>
</front>
<seriesInfo name='BCP' value='100'/>
<seriesInfo name='RFC' value='7120'/>
<seriesInfo name='DOI' value='10.17487/RFC7120'/>
</reference>


<reference  anchor='RFC7510' target='https://www.rfc-editor.org/info/rfc7510'>
<front>
<title>Encapsulating MPLS in UDP</title>
<author initials='X.' surname='Xu' fullname='X. Xu'><organization /></author>
<author initials='N.' surname='Sheth' fullname='N. Sheth'><organization /></author>
<author initials='L.' surname='Yong' fullname='L. Yong'><organization /></author>
<author initials='R.' surname='Callon' fullname='R. Callon'><organization /></author>
<author initials='D.' surname='Black' fullname='D. Black'><organization /></author>
<date year='2015' month='April' />
<abstract><t>This document specifies an IP-based encapsulation for MPLS, called MPLS-in-UDP for situations where UDP (User Datagram Protocol) encapsulation is preferred to direct use of MPLS, e.g., to enable UDP-based ECMP (Equal-Cost Multipath) or link aggregation.  The MPLS- in-UDP encapsulation technology must only be deployed within a single network (with a single network operator) or networks of an adjacent set of cooperating network operators where traffic is managed to avoid congestion, rather than over the Internet where congestion control is required.  Usage restrictions apply to MPLS-in-UDP usage for traffic that is not congestion controlled and to UDP zero checksum usage with IPv6.</t></abstract>
</front>
<seriesInfo name='RFC' value='7510'/>
<seriesInfo name='DOI' value='10.17487/RFC7510'/>
</reference>


<reference  anchor='RFC8084' target='https://www.rfc-editor.org/info/rfc8084'>
<front>
<title>Network Transport Circuit Breakers</title>
<author initials='G.' surname='Fairhurst' fullname='G. Fairhurst'><organization /></author>
<date year='2017' month='March' />
<abstract><t>This document explains what is meant by the term &quot;network transport                          Circuit Breaker&quot;.  It describes the need for Circuit Breakers (CBs) for network tunnels and applications when using non-congestion- controlled traffic and explains where CBs are, and are not, needed. It also defines requirements for building a CB and the expected outcomes of using a CB within the Internet.</t></abstract>
</front>
<seriesInfo name='BCP' value='208'/>
<seriesInfo name='RFC' value='8084'/>
<seriesInfo name='DOI' value='10.17487/RFC8084'/>
</reference>


<reference  anchor='RFC8126' target='https://www.rfc-editor.org/info/rfc8126'>
<front>
<title>Guidelines for Writing an IANA Considerations Section in RFCs</title>
<author initials='M.' surname='Cotton' fullname='M. Cotton'><organization /></author>
<author initials='B.' surname='Leiba' fullname='B. Leiba'><organization /></author>
<author initials='T.' surname='Narten' fullname='T. Narten'><organization /></author>
<date year='2017' month='June' />
<abstract><t>Many protocols make use of points of extensibility that use constants to identify various protocol parameters.  To ensure that the values in these fields do not have conflicting uses and to promote interoperability, their allocations are often coordinated by a central record keeper.  For IETF protocols, that role is filled by the Internet Assigned Numbers Authority (IANA).</t><t>To make assignments in a given registry prudently, guidance describing the conditions under which new values should be assigned, as well as when and how modifications to existing values can be made, is needed.  This document defines a framework for the documentation of these guidelines by specification authors, in order to assure that the provided guidance for the IANA Considerations is clear and addresses the various issues that are likely in the operation of a registry.</t><t>This is the third edition of this document; it obsoletes RFC 5226.</t></abstract>
</front>
<seriesInfo name='BCP' value='26'/>
<seriesInfo name='RFC' value='8126'/>
<seriesInfo name='DOI' value='10.17487/RFC8126'/>
</reference>


<reference  anchor='RFC8200' target='https://www.rfc-editor.org/info/rfc8200'>
<front>
<title>Internet Protocol, Version 6 (IPv6) Specification</title>
<author initials='S.' surname='Deering' fullname='S. Deering'><organization /></author>
<author initials='R.' surname='Hinden' fullname='R. Hinden'><organization /></author>
<date year='2017' month='July' />
<abstract><t>This document specifies version 6 of the Internet Protocol (IPv6). It obsoletes RFC 2460.</t></abstract>
</front>
<seriesInfo name='STD' value='86'/>
<seriesInfo name='RFC' value='8200'/>
<seriesInfo name='DOI' value='10.17487/RFC8200'/>
</reference>


<reference  anchor='RFC8201' target='https://www.rfc-editor.org/info/rfc8201'>
<front>
<title>Path MTU Discovery for IP version 6</title>
<author initials='J.' surname='McCann' fullname='J. McCann'><organization /></author>
<author initials='S.' surname='Deering' fullname='S. Deering'><organization /></author>
<author initials='J.' surname='Mogul' fullname='J. Mogul'><organization /></author>
<author initials='R.' surname='Hinden' fullname='R. Hinden' role='editor'><organization /></author>
<date year='2017' month='July' />
<abstract><t>This document describes Path MTU Discovery (PMTUD) for IP version 6. It is largely derived from RFC 1191, which describes Path MTU Discovery for IP version 4.  It obsoletes RFC 1981.</t></abstract>
</front>
<seriesInfo name='STD' value='87'/>
<seriesInfo name='RFC' value='8201'/>
<seriesInfo name='DOI' value='10.17487/RFC8201'/>
</reference>


<reference anchor='I-D.iab-wire-image'>
<front>
<title>The Wire Image of a Network Protocol</title>

<author initials='B' surname='Trammell' fullname='Brian Trammell'>
    <organization />
</author>

<author initials='M' surname='Kuehlewind' fullname='Mirja Kuehlewind'>
    <organization />
</author>

<date month='November' day='5' year='2018' />

<abstract><t>This document defines the wire image, an abstraction of the information available to an on-path non-participant in a networking protocol.  This abstraction is intended to shed light on the implications on increased encryption has for network functions that use the wire image.</t></abstract>

</front>

<seriesInfo name='Internet-Draft' value='draft-iab-wire-image-01' />
<format type='TXT'
        target='http://www.ietf.org/internet-drafts/draft-iab-wire-image-01.txt' />
</reference>
</references>
<section title="Example Of An Encapsulated IP Packet Flow" anchor="sec-example-of-an-encapsulated-ip-packet-flow">
<t>Below an example inner IP packet flow within the encapsulating tunnel
packet stream is shown. Notice how encapsulated IP packets can start
and end anywhere, and more than one or less than 1 may occur in a
single encapsulating packet.</t>

<figure title="Inner and Outer Packet Flow" anchor="sec-inner-and-outer-packet-flow"><artwork><![CDATA[
  Offset: 0        Offset: 100    Offset: 2900    Offset: 1400
 [ ESP1  (1500) ][ ESP2  (1500) ][ ESP3  (1500) ][ ESP4  (1500) ]
 [--800--][--800--][60][-240-][--4000----------------------][pad]
]]></artwork></figure>

<t>The encapsulated IP packet flow (lengths include IP header and
payload) is as follows: an 800 octet packet, an 800 octet packet, a 60
octet packet, a 240 octet packet, a 4000 octet packet.</t>

<t>The <spanx style='verb'>BlockOffset</spanx> values in the 4 IP-TFS payload headers for this
packet flow would thus be: 0, 100, 2900, 1400 respectively. The first
encapsulating packet ESP1 has a zero <spanx style='verb'>BlockOffset</spanx> which points at the
IP data block immediately following the IP-TFS header. The following
packet ESP2s <spanx style='verb'>BlockOffset</spanx> points inward 100 octets to the start of the
60 octet data block. The third encapsulating packet ESP3 contains the
middle portion of the 4000 octet data block so the offset points past
its end and into the forth encapsulating packet. The fourth packet
ESP4s offset is 1400 pointing at the padding which follows the
completion of the continued 4000 octet packet.</t>

</section>

<section title="A Send and Loss Event Rate Calculation" anchor="sec-a-send-and-loss-event-rate-calculation">
<t>The current best practice indicates that congestion control should be
done in a TCP friendly way. A TCP friendly congestion control
algorithm is described in <xref target="RFC5348"/>. For this IP-TFS use case (as with
<xref target="RFC4342"/>) the (fixed) packet size is used as the segment size for the
algorithm. The formula for the send rate is then as follows:</t>

<figure><artwork><![CDATA[
                                1
   X_Pps = -----------------------------------------------
           R * (sqrt(2*p/3) + 12*sqrt(3*p/8)*p*(1+32*p^2))
]]></artwork></figure>

<t>Where <spanx style='verb'>X_Pps</spanx> is the send rate in packets per second, <spanx style='verb'>R</spanx> is the
round trip time estimate and <spanx style='verb'>p</spanx> is the loss event rate (the inverse
of which is provided by the receiver).</t>

<t>The IP-TFS receiver, having the RTT estimate from the sender MAY use
the same method as described in <xref target="RFC4342"/> to collect the loss intervals
and calculate the loss event rate value using the weighted average as
indicated. The receiver communicates the inverse of this value back
to the sender in the IPTFS_PROTOCOL payload header field
<spanx style='verb'>LossEventRate</spanx>.</t>

<t>The IP-TFS sender now has both the <spanx style='verb'>R</spanx> and <spanx style='verb'>p</spanx> values and can
calculate the correct sending rate (<spanx style='verb'>X_Pps</spanx>). If following <xref target="RFC5348"/>
the sender SHOULD also use the slow start mechanism described therein
when the IP-TFS SA is first established.</t>

</section>

<section title="Comparisons of IP-TFS" anchor="sec-comparisons-of-ip-tfs">

<section title="Comparing Overhead">
<section title="IP-TFS Overhead">
<t>The overhead of IP-TFS is 40 bytes per outer packet. Therefore the
octet overhead per inner packet is 40 divided by the number of outer
packets required (fractional allowed). The overhead as a percentage of
inner packet size is a constant based on the Outer MTU size.</t>

<figure><artwork><![CDATA[
   OH = 40 / Outer Payload Size / Inner Packet Size
   OH % of Inner Packet Size = 100 * OH / Inner Packet Size
   OH % of Inner Packet Size = 4000 / Outer Payload Size
]]></artwork></figure>

<figure title="IP-TFS Overhead as Percentage of Inner Packet Size" anchor="sec-ip-tfs-overhead-as-percentage-of-inner-packet-size"><artwork><![CDATA[
		     Type  IP-TFS  IP-TFS  IP-TFS 
		      MTU     576    1500    9000 
		    PSize     536    1460    8960 
		   -------------------------------
		       40   7.46%   2.74%   0.45% 
		      576   7.46%   2.74%   0.45% 
		     1500   7.46%   2.74%   0.45% 
		     9000   7.46%   2.74%   0.45% 
]]></artwork></figure>

</section>

<section title="ESP with Padding Overhead">
<t>The overhead per inner packet for constant-send-rate padded ESP
(i.e., traditional IPsec TFC) is 36 octets plus any padding, unless
fragmentation is required.</t>

<t>When fragmentation of the inner packet is required to fit in the
outer IPsec packet, overhead is the number of outer packets required
to carry the fragmented inner packet times both the inner IP overhead
(20) and the outer packet overhead (36) minus the initial inner IP
overhead plus any required tail padding in the last encapsulation
packet. The required tail padding is the number of required packets
times the difference of the Outer Payload Size and the IP Overhead
minus the Inner Payload Size. So:</t>

<figure><artwork><![CDATA[
  Inner Paylaod Size = IP Packet Size - IP Overhead
  Outer Payload Size = MTU - IPsec Overhead

                Inner Payload Size
  NF0 = ----------------------------------
         Outer Payload Size - IP Overhead

  NF = CEILING(NF0)

  OH = NF * (IP Overhead + IPsec Overhead)
       - IP Overhead
       + NF * (Outer Payload Size - IP Overhead)
       - Inner Payload Size

  OH = NF * (IPsec Overhead + Outer Payload Size)
       - (IP Overhead + Inner Payload Size)

  OH = NF * (IPsec Overhead + Outer Payload Size)
       - Inner Packet Size
]]></artwork></figure>

</section>

</section>

<section title="Overhead Comparison">
<t>The following tables collect the overhead values for some common L3
MTU sizes in order to compare them. The first table is the number of
octets of overhead for a given L3 MTU sized packet. The second table
is the percentage of overhead in the same MTU sized packet.</t>

<t></t>


<figure title="Overhead comparison in octets" anchor="sec-overhead-comparison-in-octets"><artwork><![CDATA[
        Type  ESP+Pad  ESP+Pad  ESP+Pad  IP-TFS  IP-TFS  IP-TFS 
      L3 MTU      576     1500     9000     576    1500    9000 
       PSize      540     1464     8964     536    1460    8960 
     -----------------------------------------------------------
          40      500     1424     8924     3.0     1.1     0.2 
         128      412     1336     8836     9.6     3.5     0.6 
         256      284     1208     8708    19.1     7.0     1.1 
         536        4      928     8428    40.0    14.7     2.4 
         576      576      888     8388    43.0    15.8     2.6 
        1460      268        4     7504   109.0    40.0     6.5 
        1500      228     1500     7464   111.9    41.1     6.7 
        8960     1408     1540        4   668.7   245.5    40.0 
        9000     1368     1500     9000   671.6   246.6    40.2 
]]></artwork></figure>

<figure title="Overhead as Percentage of Inner Packet Size" anchor="sec-overhead-as-percentage-of-inner-packet-size"><artwork><![CDATA[
       Type  ESP+Pad  ESP+Pad   ESP+Pad  IP-TFS  IP-TFS  IP-TFS 
        MTU      576     1500      9000     576    1500    9000 
      PSize      540     1464      8964     536    1460    8960 
     -----------------------------------------------------------
         40  1250.0%  3560.0%  22310.0%   7.46%   2.74%   0.45% 
        128   321.9%  1043.8%   6903.1%   7.46%   2.74%   0.45% 
        256   110.9%   471.9%   3401.6%   7.46%   2.74%   0.45% 
        536     0.7%   173.1%   1572.4%   7.46%   2.74%   0.45% 
        576   100.0%   154.2%   1456.2%   7.46%   2.74%   0.45% 
       1460    18.4%     0.3%    514.0%   7.46%   2.74%   0.45% 
       1500    15.2%   100.0%    497.6%   7.46%   2.74%   0.45% 
       8960    15.7%    17.2%      0.0%   7.46%   2.74%   0.45% 
       9000    15.2%    16.7%    100.0%   7.46%   2.74%   0.45% 
]]></artwork></figure>

</section>

<section title="Comparing Available Bandwidth">
<t>Another way to compare the two solutions is to look at the amount of
available bandwidth each solution provides. The following sections
consider and compare the percentage of available bandwidth. For the
sake of providing a well understood baseline normal (unencrypted)
Ethernet as well as normal ESP values are included.</t>

<section title="Ethernet">
<t>In order to calculate the available bandwidth the per packet overhead
is calculated first. The total overhead of Ethernet is 14+4 octets of
header and CRC plus and additional 20 octets of framing (preamble,
start, and inter-packet gap) for a total of 38 octets. Additionally
the minimum payload is 46 octets.</t>


<figure title="L2 Octets Per Packet" anchor="sec-l2-octets-per-packet"><artwork><![CDATA[
      Size  E + P  E + P  E + P  IPTFS  IPTFS  IPTFS  Enet   ESP 
       MTU    590   1514   9014    590   1514   9014   any   any 
        OH     74     74     74     78     78     78    38    74 
     ------------------------------------------------------------
        40    614   1538   9038     45     42     40    84   114 
       128    614   1538   9038    146    134    129   166   202 
       256    614   1538   9038    293    269    258   294   330 
       536    614   1538   9038    614    564    540   574   610 
       576   1228   1538   9038    659    606    581   614   650 
      1460   1842   1538   9038   1672   1538   1472  1498  1534 
      1500   1842   3076   9038   1718   1580   1513  1538  1574 
      8960  11052  10766   9038  10263   9438   9038  8998  9034 
      9000  11052  10766  18076  10309   9480   9078  9038  9074 
]]></artwork></figure>


<figure title="Packets Per Second on 10G Ethernet" anchor="sec-packets-per-second-on-10g-ethernet"><artwork><![CDATA[
     Size  E + P  E + P  E + P  IPTFS  IPTFS  IPTFS  Enet   ESP   
      MTU  590    1514   9014   590    1514   9014   any    any   
       OH  74     74     74     78     78     78     38     74    
    --------------------------------------------------------------
       40  2.0M   0.8M   0.1M   27.3M  29.7M  31.0M  14.9M  11.0M 
      128  2.0M   0.8M   0.1M   8.5M   9.3M   9.7M   7.5M   6.2M  
      256  2.0M   0.8M   0.1M   4.3M   4.6M   4.8M   4.3M   3.8M  
      536  2.0M   0.8M   0.1M   2.0M   2.2M   2.3M   2.2M   2.0M  
      576  1.0M   0.8M   0.1M   1.9M   2.1M   2.2M   2.0M   1.9M  
     1460  678K   812K   138K   747K   812K   848K   834K   814K  
     1500  678K   406K   138K   727K   791K   826K   812K   794K  
     8960  113K   116K   138K   121K   132K   138K   138K   138K  
     9000  113K   116K   69K    121K   131K   137K   138K   137K  
]]></artwork></figure>

<figure title="Percentage of Bandwidth on 10G Ethernet" anchor="sec-percentage-of-bandwidth-on-10g-ethernet"><artwork><![CDATA[
 Size   E + P   E + P   E + P   IPTFS   IPTFS   IPTFS    Enet     ESP 
          590    1514    9014     590    1514    9014     any     any 
           74      74      74      78      78      78      38      74 
----------------------------------------------------------------------
   40   6.51%   2.60%   0.44%  87.30%  94.93%  99.14%  47.62%  35.09% 
  128  20.85%   8.32%   1.42%  87.30%  94.93%  99.14%  77.11%  63.37% 
  256  41.69%  16.64%   2.83%  87.30%  94.93%  99.14%  87.07%  77.58% 
  536  87.30%  34.85%   5.93%  87.30%  94.93%  99.14%  93.38%  87.87% 
  576  46.91%  37.45%   6.37%  87.30%  94.93%  99.14%  93.81%  88.62% 
 1460  79.26%  94.93%  16.15%  87.30%  94.93%  99.14%  97.46%  95.18% 
 1500  81.43%  48.76%  16.60%  87.30%  94.93%  99.14%  97.53%  95.30% 
 8960  81.07%  83.22%  99.14%  87.30%  94.93%  99.14%  99.58%  99.18% 
 9000  81.43%  83.60%  49.79%  87.30%  94.93%  99.14%  99.58%  99.18% 
]]></artwork></figure>

<t>A sometimes unexpected result of using IP-TFS (or any packet
aggregating tunnel) is that, for small to medium sized packets, the
available bandwidth is actually greater than native Ethernet. This is
due to the reduction in Ethernet framing overhead. This increased
bandwidth is paid for with an increase in latency. This latency is
the time to send the unrelated octets in the outer tunnel frame. The
following table illustrates the latency for some common values on a
10G Ethernet link. The table also includes latency introduced by
padding if using ESP with padding.</t>

<figure title="Added Latency" anchor="sec-added-latency"><artwork><![CDATA[
	             ESP+Pad  ESP+Pad  IP-TFS   IP-TFS  
	             1500     9000     1500     9000    
                                          
	      ------------------------------------------
	         40  1.14 us  7.14 us  1.17 us  7.17 us 
	        128  1.07 us  7.07 us  1.10 us  7.10 us 
	        256  0.97 us  6.97 us  1.00 us  7.00 us 
	        536  0.74 us  6.74 us  0.77 us  6.77 us 
	        576  0.71 us  6.71 us  0.74 us  6.74 us 
	       1460  0.00 us  6.00 us  0.04 us  6.04 us 
	       1500  1.20 us  5.97 us  0.00 us  6.00 us 
]]></artwork></figure>

<t>Notice that the latency values are very similar between the two
solutions; however, whereas IP-TFS provides for constant high
bandwidth, in some cases even exceeding native Ethernet, ESP with
padding often greatly reduces available bandwidth.</t>

</section>

</section>

</section>

<section title="Acknowledgements">
<t>We would like to thank Don Fedyk for help in reviewing and editing
this work.</t>

</section>

<section title="Contributors">
<t>The following people made significant contributions to this document.</t>

<figure><artwork><![CDATA[
   Lou Berger
   LabN Consulting, L.L.C.

   Email: lberger@labn.net
]]></artwork></figure>

</section>
  </back>
</rfc>
