<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<?rfc strict='yes'?>
<?rfc iprnotified='no'?>
<rfc category="std" docName="draft-templin-intarea-parcels-03"
     ipr="trust200902" updates="RFC2675">
  <front>
    <title abbrev="IP Parcels">IP Parcels</title>

    <author fullname="Fred L. Templin" initials="F. L." role="editor"
            surname="Templin">
      <organization>Boeing Research &amp; Technology</organization>

      <address>
        <postal>
          <street>P.O. Box 3707</street>

          <city>Seattle</city>

          <region>WA</region>

          <code>98124</code>

          <country>USA</country>
        </postal>

        <email>fltemplin@acm.org</email>
      </address>
    </author>

    <date day="20" month="December" year="2021"/>

    <keyword>I-D</keyword>

    <keyword>Internet-Draft</keyword>

    <abstract>
      <t>IP packets (both IPv4 and IPv6) are understood to contain a unit of
      data which becomes the retransmission unit in case of loss. Upper layer
      protocols such as the Transmission Control Protocol (TCP) prepare data
      units known as "segments", with traditional arrangements including a
      single segment per packet. This document presents a new construct known
      as the "IP Parcel" which permits a single packet to carry multiple
      segments, essentially creating a "packet-of-packets". Parcels can be
      broken into smaller parcels by a middlebox on the path if necessary,
      then rejoined into one or more repackaged parcels to be forwarded
      further toward the final destination. While not desirable, reordering of
      segments within parcels and individual segment loss are possible. But,
      what matters is that the number of parcels delivered to the final
      destination should be kept to a minimum, and that loss or receipt of
      individual segments (and not parcel size) determines the retransmission
      unit.</t>
    </abstract>
  </front>

  <middle>
    <section anchor="intro" title="Introduction">
      <t>IP packets (both IPv4 <xref target="RFC0791"/> and IPv6 <xref
      target="RFC8200"/>) are understood to contain a unit of data which
      becomes the retransmission unit in case of loss. Upper layer protocols
      such as the Transmission Control Protocol (TCP) <xref
      target="RFC0793"/>, QUIC <xref target="RFC9000"/>, LTP <xref
      target="RFC5326"/> and others prepare data units known as "segments",
      with traditional arrangements including a single segment per packet.
      This document presents a new construct known as the "IP Parcel" which
      permits a single packet to carry multiple segments. This essentially
      creates a "packet-of-packets" with the IP layer headers appearing only
      once but with possibly multiple upper layer protocol segments.</t>

      <t>Parcels are formed when an upper layer protocol entity (identified by
      the "5-tuple" source IP address/port number, destination IP address/port
      number and protocol number) prepares a buffer of data with the
      concatenation of up to 64 properly-formed segments that can be broken
      out into smaller parcels using a copy of the IP header. All segments
      except the final segment must be equal in size and no larger than 65535
      (minus headers), while the final segment must be no larger than the
      others but may be smaller. The upper layer protocol entity then delivers
      the buffer and non-final segment size to the IP layer, which appends the
      necessary IP headers to identify this as a parcel and not an ordinary
      packet.</t>

      <t>Each parcel can be opened at a first-hop middlebox on the path with
      its included segments broken out into smaller parcels, then rejoined
      into one or more parcels at a last-hop middlebox to be forwarded to the
      final destination. Repackaging of parcels is therefore commonplace, while
      reordering of segments within a parcel or loss of individual segments is
      possible but not desirable. But, what matters is that the number of parcels
      delivered to the final destination should be kept to a minimum, and that
      loss or receipt of individual segments (and not parcel size) determines
      the retransmission unit.</t>

      <t>The following sections discuss rationale for creating and shipping
      parcels as well as the actual protocol constructs and procedures
      involved. It is expected that the parcel concept may drive future
      innovation in applications, operating systems, network equipment and
      data links.</t>
    </section>

    <section anchor="terms" title="Terminology">
      <t>A "parcel" is defined as "a thing or collection of things wrapped in
      paper in order to be carried or sent by mail". Indeed, there are many
      examples of parcel delivery services worldwide that provide an essential
      transit backbone for efficient business and consumer transactions.</t>

      <t>In this same spirit, an "IP parcel" is simply a collection of up to
      64 packets wrapped in an efficient package for transmission and delivery
      (i.e., a "packet of packets") while a "singleton IP parcel" is simply a
      parcel that contains a single packet. IP parcels are distinguished from
      ordinary packets through the special header constructions discussed in
      this document.</t>

      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
      "OPTIONAL" in this document are to be interpreted as described in BCP 14
      <xref target="RFC2119"/><xref target="RFC8174"/> when, and only when,
      they appear in all capitals, as shown here.</t>
    </section>

    <section anchor="aero-omni" title="Background and Motivation">
      <t>Studies have shown that by sending and receiving larger packets
      applications can realize greater performance due to reduced numbers of
      system calls and interrupts as well as larger atomic data copies between
      kernel and user space. Within edge networks, large packets also result
      in reduced numbers of device interrupts and better network utilization
      in comparison with smaller packet sizes.</t>

      <t>A first study involved performance enhancement of the QUIC protocol
      <xref target="RFC9000"/> using the Generic Segment/Receive Offload
      (GSO/GRO) facility <xref target="QUIC"/>. GSO/GRO provide a robust (but
      non-standard) service very similar in nature to the IP parcel service
      described here, and its application has shown significant performance
      increases due to the increased transfer unit size between the operating
      system kernel and QUIC application. A second study showed that GSO/GRO
      also improved performance for the Licklider Transmission Protocol (LTP)
      <xref target="RFC5326"/> in a similar fashion <xref
      target="I-D.templin-dtn-ltpfrag"/>. Historically, the NFS protocol also
      saw dramatic performance increases when using larger UDP datagram sizes
      even when those sizes invoked IP fragmentation.</t>

      <t>TCP also benefits from larger packet sizes and efforts have
      investigated TCP performance using jumbograms internally with changes to
      the linux GSO/GRO facilities <xref target="BIG-TCP"/>. The idea is to
      use the jumbo payload internally and to allow GSO and GRO to use buffer
      sizes larger than just ~64KB, but with the understanding that links that
      support jumbos natively are not yet widely available. Hence, IP parcels
      provides a packaging that can be considered in the near term under
      current deployment limitations.</t>

      <t>The issue with sending large packets is that they are often lost at
      links with smaller Maximum Transmission Units (MTUs), and the resulting
      Packet Too Big (PTB) message may be lost somewhere in the path back to
      the original source. This "Path MTU black hole" condition can cripple
      application performance unless also supplemented with robust path
      probing techniques, however the best case performance always occurs when
      no packets are lost due to size restrictions.</t>

      <t>These considerations therefore motivate a design where the maximum
      segment size should be no larger than 65535 (minus headers), while
      parcels that carry the segments may themselves be significantly larger.
      Then, even if a middlebox needs to open the parcels to deliver
      individual segments further toward final hops as separate IP packets, an
      important performance optimization for both the original source and
      final destination can be realized.</t>

      <t>An analogy: when a consumer orders 50 small items from a major online
      retailer, the retailer does not ship the order in 50 separate small
      boxes. Instead, the retailer puts as many of the small boxes as possible
      into one or a few larger boxes (or parcels) then places the parcels on a
      semi-truck or airplane. The parcels arrive at a regional distribution
      center where they may be further redistributed into slightly smaller
      parcels that get delivered to the consumer. But most often, the consumer
      will only find one or a few parcels at his doorstep and not 50
      individual boxes. This greatly reduces handling overhead for both the
      retailer and consumer.</t>
    </section>

    <section anchor="parcels" title="IP Parcel Formation">
      <t>IP parcel formation is invoked by an upper layer protocol (identified
      by the 5-tuple as above) when it produces a data buffer containing the
      concatenation of up to 64 segments. All non-final segments MUST be equal
      in length while the final segment MUST NOT be larger but MAY be smaller.
      Each non-final segment MUST be no larger than 65535 minus the length of
      the IP header plus extensions, minus the length of an additional IPv6
      header in case encapsulation is necessary (see: <xref target="xmit"/>).
      The upper layer protocol then presents the buffer and non-final segment
      size to the IP layer which appends a single IP header (plus any
      extension headers) before presenting the parcel to lower layers.</t>

      <t>For IPv4, the IP layer prepares the parcel by appending an IPv4
      header with a Jumbo Payload option (identified by option code TBD1)
      formed as follows:<figure>
          <artwork><![CDATA[+--------+--------+--------+--------+--------+--------+
|000(TBD1)00000110|       Jumbo Payload Length        |
+--------+--------+--------+--------+--------+--------+]]></artwork>
        </figure>where "Jumbo Payload Length" is a 32-bit unsigned integer
      value (in network byte order) set to the lengths of the IPv4 header plus
      all concatenated segments. The IP layer next sets the IPv4 header DF bit
      to 1, then sets the IPv4 header Total Length field to the length of the
      IPv4 header plus the length of the first segment only. Note that the IP
      layer can form true IPv4 jumbograms (as opposed to parcels) by instead
      setting the IPv4 header Total Length field to 0 (see: <xref
      target="jumbo"/>).</t>

      <t>For IPv6, the IP layer forms a parcel by appending an IPv6 header
      with a Jumbo Payload option <xref target="RFC2675"/> the same as for
      IPv4 above where "Jumbo Payload Length" is set to the lengths of the
      IPv6 Hop-by-Hop Options header and any other extension headers present
      plus all concatenated segments. The IP layer next sets the IPv6 header
      Payload Length field to the lengths of the IPv6 Hop-by-Hop Options
      header and any other extension headers present plus the length of the
      first segment only. As with IPv4 the IP layer can form true IPv6
      jumbograms (as opposed to parcels) by instead setting the IPv6 header
      Payload Length field to 0 (see: <xref target="RFC2675"/>).</t>

      <t>An IP parcel therefore has the following appearance:</t>

      <t><figure>
          <artwork><![CDATA[+--------+--------+--------+--------+
|                                   |
~        Segment J (K octets)       ~
|                                   |
+--------+--------+--------+--------+
~                                   ~
~                                   ~
+--------+--------+--------+--------+
|                                   |
~        Segment 3 (L octets)       ~
|                                   |
+--------+--------+--------+--------+
|                                   |
~        Segment 2 (L octets)       ~
|                                   |
+--------+--------+--------+--------+
|                                   |
~        Segment 1 (L octets)       ~
|                                   |
+--------+--------+--------+--------+
|     IP Header Plus Extensions     |
~    [Total, Payload] Length = M    ~
|      Jumbo Payload Length = N     |
+--------+--------+--------+--------+
]]></artwork>
        </figure>where J is the total number of segments (between 1 and 64), L
      is the length of each non-final segment which MUST be no larger than
      65535 minus the length of the IP header plus extensions, minus the
      length of an additional IPv6 header in case encapsulation is necessary,
      and K is the length of the final segment which MUST be no larger than L.
      The values M and N are then set to the length of the IP header plus
      extensions for IPv4 or to the length of the extensions only for IPv6,
      then further calculated as follows:<list style="empty">
          <t>M = M + ((J - 1) ? L : K)</t>

          <t>N = N + (((J -1) * L) + K)</t>
        </list>Note a NULL parcel consisting of only the IP header plus
      extensions is also a legal parcel. In that case, J is 0 and the above
      segment length calculation is omitted.</t>
    </section>

    <section anchor="xmit" title="Transmission of IP Parcels">
      <t>The IP layer next presents the parcel to the outgoing network
      interface. For OMNI interfaces <xref target="I-D.templin-6man-omni"/>,
      the OMNI Adaptation Layer (OAL) source sub-divides the parcel into
      smaller parcels if necessary then forwards these smaller parcels into
      the OMNI link. (The smallest subdivision possible is a singleton where J
      in the above equation is 1, in which case M and N become equal.) These
      smaller parcels eventually arrive at the OAL destination which may
      re-combine them into a larger parcel or parcels to forward to the final
      destination. Details for OAL parcel processing are discussed in <xref
      target="I-D.templin-6man-omni"/>.</t>

      <t>For ordinary network interfaces, the IP layer instead forwards the
      parcel according to the path MTU to either an OAL source or the final
      destination itself, whichever comes first. If the parcel is no larger
      than the path MTU, the IP layer simply forwards the parcel the same as
      it would an ordinary IP packet and processes any PTB messages that may
      arrive while first applying encapsulation if necessary (see: <xref
      target="compat"/>). If the parcel is larger than 65535 (minus
      encapsulation headers) and also larger than the path MTU, the IP layer
      instead discards the parcel and returns a packet size error to the upper
      layer protocol or a PTB to the original source.</t>

      <t>If the parcel is no larger than 65535 (minus encapsulation headers)
      but larger than the path MTU, the IP layer instead performs IP
      encapsulation with destination set to the IP address of the OAL source
      or final destination and [Total, Payload] Length set to N plus the
      encapsulation header length. The IP layer then performs
      source-fragmentation on the encapsulated parcel the same as for an
      ordinary IP packet by generating IP fragments destined for the OAL
      source or final destination.</t>

      <t>When the OAL source or final destination receives the fragments or
      whole parcels, it reassembles if necessary, discards the encapsulation
      headers then presents the parcel to the OMNI link in the first case or
      the upper layer protocol in the second case.</t>
    </section>

    <section anchor="compat" title="Compatibility">
      <t>Legacy networking gear that forwards parcels over ordinary data links
      may fail to recognize the new Jumbo Payload extension header coding and
      instead act only on the [Total, Payload] Length field value. In that
      case, the legacy gear would likely forward only the IP header plus first
      segment while truncating the remainder of the parcel.</t>

      <t>In networks where compatibility is thought to be an issue, the
      original source can perform encapsulation on parcels uniformly whether
      or not fragmentation is necessary to ensure they are delivered to the
      OAL source or final destination (whichever comes first). In the same way
      the OAL destination can uniformly perform encapsulation to ensure that
      parcels are delivered to the final destination.</t>

      <t>When the original source or OAL destination applies encapsulation, it
      sets the encapsulation header [Total, Payload] Length to N plus the
      encapsulation header length if that value is no larger than 65535.
      Otherwise, it sets [Total, Payload] Length to 0 and MUST include a Jumbo
      Payload option with the encapsulation header with the length set to N
      plus the encapsulation header length.</t>
    </section>

    <section anchor="tcpopt" title="TCP Parcel-Permitted Option">
      <t>TCP peers that wish to employ IP parcels must negotiate their use
      upon connection establishment by including the Parcel-Permitted option.
      This two-byte option may be sent in a SYN by a TCP that has been
      extended to receive (and presumably process) IP parcels once the
      connection has opened. It MUST NOT be sent on non-SYN segments. The TCP
      option has the following format:<figure>
          <artwork><![CDATA[       TCP Parcel-Permitted Option:

       Kind: TBD2

       +---------+---------+
       |Kind=TBD2| Length=2|
       +---------+---------+]]></artwork>
        </figure>A TCP that includes the Parcel-Permitted option MUST be
      capable of reassembling the maximum-length encapsulated parcel that can
      undergo fragmentation (see: <xref target="parcels"/> and <xref
      target="xmit"/>).</t>

      <t>Note: the TCP protocol is currently under revision for second edition
      RFC publication <xref target="I-D.ietf-tcpm-rfc793bis"/>. An
      investigation of the applicability of IP parcels for the second edition
      publication is recommended.</t>
    </section>

    <section anchor="integrity" title="Integrity">
      <t>Parcels can range in length from as small as only the IP header sizes
      to as large as the IP headers plus (64 * (2**16 minus headers)) octets.
      Although link layer integrity checks provide sufficient protection for
      contiguous data blocks up to approximately 9KB, reliance on the presence
      of link-layer integrity checks may not be possible over links such as
      tunnels. Moreover, the segment contents of a received parcel may arrive
      in an incomplete and/or rearranged order with respect to their original
      packaging.</t>

      <t>For these reasons, upper layers must include individual integrity
      checks with each segment included in the parcel with a strength
      compatible with the segment length. The integrity check must then be
      verified at the receiver on a per-segment basis, which discards any
      corrupted segments and considers them as a loss event.</t>
    </section>

    <section anchor="issues" title="RFC2675 Updates">
      <t>Section 3 of <xref target="RFC2675"/> provides a list of certain
      conditions to be considered as errors. In particular:<list style="empty">
          <t>error: IPv6 Payload Length != 0 and Jumbo Payload option
          present</t>

          <t>error: Jumbo Payload option present and Jumbo Payload Length &lt;
          65,536</t>
        </list></t>

      <t>Implementations that obey this specification ignore these conditions
      and do not consider them as errors.</t>
    </section>

    <section anchor="jumbo" title="IPv4 Jumbograms">
      <t>By defining a new IPv4 Jumbo Payload option, this document also
      implicitly enables an IPv4 jumbogram service defined as an IPv4 packet
      with Total Length set to 0 and with a Jumbo Payload option in the IPv4
      extension headers. All aspects of IPv4 jumbograms (including length
      determination for upper layer protocols) follow exactly the same as for
      IPv6 jumbograms as specified in <xref target="RFC2675"/>.</t>
    </section>

    <section anchor="implement" title="Implementation Status">
      <t>Common widely-deployed implementations include services such as TCP
      Segmentation Offload (TSO) and Generic Segmentation/Receive Offload
      (GSO/GRO). These services support a robust (but not standardized)
      service that has been shown to improve performance in many instances.
      Implementation of the IP parcel service is a work in progress.</t>
    </section>

    <section anchor="iana" title="IANA Considerations">
      <t>The IANA is instructed to allocate a new IP option code in the 'ip
      option numbers' registry for the "JUMBO - IPv4 Jumbo Payload" option.
      The Copy and Class fields must both be set to 0, and the Number and
      Value fields must both be set to 'TBD1 (to be assigned by IANA)'. The
      reference must be set to this document (RFCXXXX).</t>

      <t>The IANA is instructed to allocate a new TCP Option Kind Number in
      the 'tcp-parameters' registry for the "IP Parcel Permitted" option. The
      Kind must be set to 'TBD2' (to be assigned by IANA) and the Length must
      be set to 2. The reference must be set to this document (RFCXXXX).</t>
    </section>

    <section anchor="secure" title="Security Considerations">
      <t>Communications networking security is necessary to preserve
      confidentiality, integrity and availability.</t>
    </section>

    <section anchor="ack" title="Acknowledgements">
      <t>This work was inspired by ongoing AERO/OMNI/DTN investigations. The
      concepts were further motivated through discussions on the intarea
      list.</t>

      <t>A considerable body of work over recent years has produced useful
      "segmentation offload" facilities available in widely-deployed
      implementations.</t>

      <t>.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="reference.RFC.2119"?>

      <?rfc include="reference.RFC.8174"?>

      <?rfc include="reference.RFC.2675"?>

      <?rfc include="reference.RFC.0791"?>

      <?rfc include="reference.RFC.8200" ?>
    </references>

    <references title="Informative References">
      <?rfc ?>

      <?rfc include="reference.RFC.0793"?>

      <?rfc include="reference.I-D.templin-6man-omni"?>

      <?rfc include="reference.RFC.9000"?>

      <?rfc include="reference.RFC.5326"?>

      <?rfc include="reference.I-D.ietf-tcpm-rfc793bis"?>

      <?rfc include="reference.I-D.templin-dtn-ltpfrag"?>

      <reference anchor="QUIC">
        <front>
          <title>Accelerating UDP packet transmission for QUIC,
          https://blog.cloudflare.com/accelerating-udp-packet-transmission-for-quic/</title>

          <author fullname="Alessandro Ghedini" initials="A."
                  surname="Ghedini">
            <organization/>
          </author>

          <date day="8" month="January" year="2020"/>
        </front>
      </reference>

      <reference anchor="BIG-TCP">
        <front>
          <title>BIG TCP, Netdev 0x15 Conference (virtual),
          https://netdevconf.info/0x15/session.html?BIG-TCP</title>

          <author fullname="Eric Dumazet" initials="E." surname="Dumazet">
            <organization/>
          </author>

          <date day="31" month="August" year="2021"/>
        </front>
      </reference>
    </references>
  </back>
</rfc>
