Transport Area Working Group                                  B. Briscoe
Internet-Draft                                                        BT
Intended status: Standards Track                           June 30, 2007
Expires: January 1, 2008


            Layered Encapsulation of Congestion Notification
                   draft-briscoe-tsvwg-ecn-tunnel-00

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on January 1, 2008.

Copyright Notice

   Copyright (C) The IETF Trust (2007).

Abstract

   This document redefines how the explicit congestion notification
   (ECN) field of the outer IP header of a tunnel should be constructed.
   It brings all IP in IP tunnels (v4 or v6) into line with the way
   IPsec tunnels now construct the ECN field, ensuring that the outer
   header reveals any congestion experienced so far on the path.  It
   specifies the default ECN tunneling behaviour for any Diffserv per-
   hop behaviour (PHB), but also gives general principles to guide the
   design of alternate congestion marking behaviours for specific PHBs


Briscoe                  Expires January 1, 2008                [Page 1]

Internet-Draft               ECN Tunnelling                    June 2007


   and for lower layer congestion notification schemes.


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Requirements notation  . . . . . . . . . . . . . . . . . . . .  5
   3.  Design Constraints . . . . . . . . . . . . . . . . . . . . . .  6
     3.1.  Security Constraints . . . . . . . . . . . . . . . . . . .  6
     3.2.  Control Constraints  . . . . . . . . . . . . . . . . . . .  7
     3.3.  Management Constraints . . . . . . . . . . . . . . . . . .  8
   4.  Design Principles  . . . . . . . . . . . . . . . . . . . . . .  9
   5.  Default ECN Tunnelling Rules . . . . . . . . . . . . . . . . . 11
   6.  Backward Compatibility . . . . . . . . . . . . . . . . . . . . 12
   7.  Changes from Earlier RFCs  . . . . . . . . . . . . . . . . . . 13
   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 14
   9.  Security Considerations  . . . . . . . . . . . . . . . . . . . 14
   10. Conclusions  . . . . . . . . . . . . . . . . . . . . . . . . . 14
   11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 15
   12. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 15
   13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15
     13.1. Normative References . . . . . . . . . . . . . . . . . . . 15
     13.2. Informative References . . . . . . . . . . . . . . . . . . 16
   Appendix A.  In-path Load Regulation . . . . . . . . . . . . . . . 17
   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 20
   Intellectual Property and Copyright Statements . . . . . . . . . . 21


Briscoe                  Expires January 1, 2008                [Page 2]

Internet-Draft               ECN Tunnelling                    June 2007


1.  Introduction

   This document redefines how the explicit congestion notification
   (ECN) field [RFC3168] of the outer IP header of a tunnel should be
   constructed.  It brings all IP in IP tunnels (v4 or v6) into line
   with the way IPsec tunnels [RFC4301] now construct the ECN field,
   ensuring that the outer header reveals any congestion experienced so
   far on the path.  Although this memo focuses on IP in IP tunnelling
   it also gives generalised advice for any encapsulation by lower layer
   headers.

   ECN allows a congested resource to notify the onset of congestion
   without having to drop packets, by explicitly marking a proportion of
   packets with the congestion experienced (CE) codepoint.  Congestion
   notification is unusual in that it propagates from the physical layer
   upwards to the transport layer, because congestion is exhaustion of a
   physical resource.  The transport layer can directly detect loss of a
   packet (or frame) by a lower layer.  But if a lower layer marks a
   packet (or frame) to notify incipient congestion, this marking has to
   be explicitly copied up the layers at every header decapsulation.
   So, at each decapsulation of an outer (lower layer) header a
   congestion marking has to be arranged to propagate into the forwarded
   (upper layer) header.  It must continue upwards until it reaches the
   destination transport, which should feed congestion notification back
   to the source transport.

   Note that often lower layer resources are arranged to be protected by
   higher layer buffers, so instead of blocking occurring at the lower
   layer, it occurs when the higher layer queue overflows.  Thus, non-
   blocking link and physical layer technologies do not have to
   implement congestion notification, which can be introduced solely in
   IP layer active queue management (AQM).  However, if we want to use
   congestion notification, we have to arrange for it to be explicitly
   copied up the layers when IP is tunnelled in IP (and if a particular
   link layer technology isn't protected from blocking by network layer
   queues).

   IPsec tunnel mode is a specific form of tunnelling that can hide the
   inner headers.  Because the ECN field has to be mutable, it cannot be
   covered by IPsec encryption or authentication calculations.
   Therefore concern has been raised in the past that the ECN field
   could be used as a low bandwidth covert channel to communicate with
   someone on the unprotected public Internet even if an end-host is
   restricted to only communicate with the public Internet through an
   IPsec gateway.  However, the recently updated version of IPsec
   [RFC4301] chose not to block this covert channel, deciding that the
   threat could be managed given the channel bandwidth is so limited
   (ECN is a 2-bit field).


Briscoe                  Expires January 1, 2008                [Page 3]

Internet-Draft               ECN Tunnelling                    June 2007


   An unfortunate sequence of standards actions leading up to this
   latest change in IPsec has left us with nearly the worst of all
   possible combinations of outcomes, despite the best endeavours of
   everyone concerned.  Even though information about congestion
   experienced on the upstream path has various uses if it is revealed
   in the outer header of a tunnel, when ECN was standardised[RFC3168]
   it was decided that all IP in IP tunnels should hide upstream
   congestion information simply to avoid the extra complexity of two
   different mechanisms for IPsec and non-IPsec tunnels.  However, now
   that [RFC4301] IPsec tunnels deliberately no longer hide this
   information, we are left in the perverse position where non-IPsec
   tunnels still hide congestion information unnecessarily.  This
   document is designed to correct that anomaly.

   Specifically, RFC3168 says that, if a tunnel supports ECN (termed a
   'full-functionality' ECN tunnel), the tunnel ingress must not copy a
   CE marking from the inner header into the outer header that it
   creates.  Instead the tunnel ingress has to set the ECN field of the
   outer header to ECT(0) (i.e. codepoint 10).  We term this 'resetting'
   a CE codepoint.  However, RFC4301 reverses this, stating that the
   tunnel ingress must simply copy the ECN field from the inner to the
   outer header.  The main purpose of this document is to carry over
   this new relaxed attitude to covert channels from IPsec to all IP in
   IP tunnels, so all tunnel ingress nodes consistently copy the ECN
   field.

   The rest of the document deals with the knock-on effects of this
   apparently minor change.  It is organised as follows:

   o  S.5 of RFC3168 permits the Diffserv codepoint (DSCP)[RFC2474] to
      'switch in' different behaviours for marking the ECN field, just
      as it switches in different per-hop behaviours (PHBs) for
      scheduling.  Therefore we cannot only discuss the ECN protocol
      that RFC3168 gives as a default.  We need to also give guidance
      for possible different marking schemes.  Therefore in Section 3 we
      lay out the design constraints when tunneling congestion
      notification.

   o  Then in Section 4 we resolve the tensions between these
      constraints to give general design principles on how a tunnel
      should process congestion notification; principles that could
      apply to any marking behaviour for any PHB, not just the default
      in RFC3168.  In particular, we examine the underlying principles
      behind whether CE should be reset or copied into the outer header
      at the ingress to a tunnel--or indeed at the ingress of any
      layered encapsulation of headers with congestion notification
      fields.


Briscoe                  Expires January 1, 2008                [Page 4]

Internet-Draft               ECN Tunnelling                    June 2007


   o  Section 5 then confirms the precise rules for the default ECN
      tunnelling behaviour based on the above design principles.  These
      rules apply to all PHBs, unless stated otherwise in the
      specification of a PHB.  There is no requirement for a PHB to
      state anything about ECN behaviour if the default behaviour is
      sufficient.

   o  Extending the new IPsec tunnel ingress behaviour to all IP in IP
      tunnels causes one further knock-on effect that is dealt with in
      Section 6 on Backward Compatibility.  If one end of an IPsec
      tunnel is compliant with [RFC4301], assuming IKEv2 key management
      is used, the other end can be guaranteed to also be [RFC4301]
      compliant.  So there is no backward compatibility problem with
      IKEv2 RFC4301 IPsec tunnels.  But once we extend our scope to any
      IP in IP tunnel, we have to cater for the possibility that a
      tunnel ingress compliant with this specification is sending to an
      egress that doesn't even understand ECN (e.g. a legacy [RFC2003]
      tunnel egress).  If a tunnel ingress copied incoming ECN-capable
      headers into outer headers, then a legacy tunnel egress would
      discard any congestion markings added to the outer header within
      the tunnel.  ECN-capable traffic sources would not see any
      congestion feedback and instead continually ratchet up their share
      of the bandwidth without realising that cross-flows from other ECN
      sources were continually having to ratchet down.

   The scope of this document is all IP in IP tunnelling, irrespective
   of whether IPv4 or IPv6 is used for either of the inner and outer
   headers.  The document only concerns wire protocol processing at
   tunnel endpoints and makes no changes or recommendations concerning
   algorithms for congestion marking or congestion response.  The
   general design principles of Section 4 may also be useful when any
   datagram/packet/frame with a congestion notification capability is
   encapsulated by a connectionless outer header [BBnet] that might also
   support a congestion notification capability in the future as
   discussed in S.9.3 of [RFC3168] (e.g.  IP encapsulated in L2TP
   [RFC2661], GRE [RFC1701] or PPTP [RFC2637]).  However, of course, the
   IETF does not have standards authority over every link or tunnel
   protocol, so this document focuses only on IP in IP.
   [I-D.ietf-tsvwg-ecn-mpls] applies these principles to IP in MPLS and
   to MPLS in MPLS.


2.  Requirements notation

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].


Briscoe                  Expires January 1, 2008                [Page 5]

Internet-Draft               ECN Tunnelling                    June 2007


3.  Design Constraints

   Tunnel processing of a congestion notification field has to meet
   congestion control needs without creating new information security
   vulnerabilities (if information security is required).

3.1.  Security Constraints

   Information security can be assured by using various end to end
   security solutions (including IPsec in transport mode [RFC4301]), but
   a commonly used scenario involves the need to communicate between two
   physically protected domains across the public Internet.  In this
   case there are certain management advantages to using IPsec in tunnel
   mode solely across the publicly accessible part of the path.  The
   path followed by a packet then crosses security 'domains'; the ones
   protected by physical or other means before and after the tunnel and
   the one protected by an IPsec tunnel across the otherwise unprotected
   domain.  We will use the scenario in Figure 1 where endpoints 'A' and
   'B' communicate through a tunnel with ingress 'I' and egress 'E'
   within physically protected edge domains across an unprotected
   internetwork where there may be 'men in the middle', M.

             physically       unprotected     physically
         <-protected domain-><--domain--><-protected domain->
         +------------------+            +------------------+
         |                  |      M     |                  |
         |    A-------->I=========>==========>E-------->B   |
         |                  |            |                  |
         +------------------+            +------------------+
                        <----IPsec secured---->
                                tunnel

                      Figure 1: IPsec Tunnel Scenario

   IPsec encryption is typically used to prevent 'M' seeing messages
   from 'A' to 'B'.  IPsec authentication is used to prevent 'M'
   masquerading as the sender of messages from 'A' to 'B' or altering
   their contents.  But 'I' can also use IPsec tunnel mode to allow 'A'
   to communicate with 'B', but impose encryption to prevent 'A' leaking
   information to 'M'.  Or 'E' can insist that 'I' uses tunnel mode
   authentication to prevent 'M' communicating information to 'B'.
   Mutable IP header fields such as the ECN field (as well as the TTL/
   Hop Limit and DS fields) cannot be included in the cryptographic
   calculations of IPsec.  Therefore, if 'I' encrypts but copies these
   mutable fields into the outer header that is exposed across the
   tunnel it will have allowed a covert channel from 'A' to M. And if
   'E' copies these fields from the outer header to the inner, even if
   it validates authentication from 'I', it will have allowed a covert


Briscoe                  Expires January 1, 2008                [Page 6]

Internet-Draft               ECN Tunnelling                    June 2007


   channel from 'M' to 'B'.

   ECN at the IP layer is designed to carry information about congestion
   from a congested resource to some downstream node that will feed the
   information back somehow to the point upstream of the congestion that
   can regulate the load on the congested resource.  In terms of the
   above scenario, ECN is effectively intended to create an information
   channel from 'M' to 'B', for 'B' to forward to 'A'.  Therefore the
   goals of IPsec and ECN are mutually incompatible.

   With respect to the DS or ECN fields, S.5.1.2 of RFC4301 says,
   "controls are provided to manage the bandwidth of this [covert]
   channel".  Using the ECN processing rules of RFC4301, the channel
   bandwidth is two bits per datagram from 'A' to 'M' and one bit per
   datagram from 'M' to 'A' because 'E' limits the combinations it will
   copy.  In both cases the covert channel bandwidth is further reduced
   by noise from any real congestion marking.  RFC4301 therefore implies
   that these covert channels are sufficiently limited to be considered
   a manageable threat.  However, with respect to the larger (6b) DS
   field, the same section of RFC4301 says not copying is the default,
   but a configuration option can allow copying "to allow a local
   administrator to decide whether the covert channel provided by
   copying these bits outweighs the benefits of copying".  Of course, an
   administrator considering copying of the DS field has to take into
   account that it could be concatenated with the ECN field giving an 8b
   per datagram channel.

3.2.  Control Constraints

   Congestion control requires that any congestion notification marked
   into packets by a resource will be able to traverse a feedback loop
   back to a node capable of controlling the load on that resource.  To
   avoid ambiguity later rather than calling this node the data source
   we will call it the Load Regulator.  This will allow us to deal with
   exceptional cases where load is not regulated by the data source, but
   usually the two will be synonymous.  Note the term "a node _capable
   of_ controlling the load" deliberately includes a source application
   that doesn't actually control the load but ought to (e.g. an
   application without congestion control that uses UDP).

              A--->R--->I=========>M=========>E-------->B

                     Figure 2: Simple Tunnel Scenario

   We now consider a similar tunneling scenario to the IPsec one just
   described, but without the different security domains so we can just
   focus on ensuring the control loop and management monitoring can work
   (Figure 2).  If we want resources in the tunnel to be able to


Briscoe                  Expires January 1, 2008                [Page 7]

Internet-Draft               ECN Tunnelling                    June 2007


   explicitly notify congestion and the feedback loop is from 'B' to
   'A', it will certainly be necessary for 'E' to copy any CE marking
   from the outer header to the inner header for onward transmission to
   'B', otherwise congestion notification from resources like 'M' cannot
   be fed back to the Load Regulator ('A').  But it doesn't seem
   necessary for 'I' to copy CE markings from the inner to the outer
   header.  For instance, if resource 'R' is congested, it can send
   congestion information to 'B' using the congestion field in the inner
   header without 'I' copying the congestion field into the outer header
   and 'E' copying it back to the inner header.  'E' can then write any
   additional congestion marking introduced across the tunnel into the
   congestion field of the inner header.

   Indeed, this arrangement can be extended to multi-level congestion
   marking (such as that proposed for PCN [PCN-arch]) as long as all the
   marks have unambiguously ranked values.  For instance, if a
   hypothetical multi-level marking scheme for PCN had PCN-capable
   codepoints ranked 1, 2 and 3, then, if 'I' reset the outer congestion
   field to the lowest ranked value that is PCN-capable (1), 'E' would
   simply write the highest ranked of the inner and outer congestion
   markings into the forwarded header.  For instance, if the inner
   marking on arrival at 'I' was 3 and 'I' reset the outer to 1, but 'M'
   subsequently set it to 2, then the header forwarded by 'E' would be
   max(3,2) = 3.

   It might be useful for the tunnel egress to be able to tell whether
   congestion occurred across a tunnel or upstream of it.  If outer
   header congestion marking was reset at the tunnel ingress ('I'), by
   the end of a tunnel ('E') the outer headers would indicate congestion
   experienced across the tunnel ('I' to 'E'), while the inner header
   would indicate congestion upstream of 'I'.  But the same information
   could be gleaned even if the tunnel ingress copied the inner to the
   outer headers.  By the end of the tunnel ('E'), any packet with an
   _extra_ mark in the outer header relative to the inner header would
   indicate congestion across the tunnel ('I' to 'E'), while the inner
   header would still indicate congestion upstream of ('I').

   All this shows that 'E' can preserve the control loop irrespective of
   whether 'I' copies congestion notification into the outer header or
   resets it.

3.3.  Management Constraints

   As well as control, there are also management constraints.
   Specifically, a management system may monitor congestion markings in
   passing packets, perhaps at the border between networks as part of a
   service level agreement.  For instance, monitors at the borders of
   autonomous systems may need to measure how much congestion has


Briscoe                  Expires January 1, 2008                [Page 8]

Internet-Draft               ECN Tunnelling                    June 2007


   accumulated since the original source to determine between them how
   much of the congestion is contributed by each domain.

   Therefore it should be clear how far back in the path the congestion
   markings have accumulated from.  In this document we term this the
   baseline of the congestion marking, i.e. the source of the layer that
   last reset rather than copied the congestion notification field when
   creating an outer header.  Given some tunnels cross domain borders
   (e.g. consider M in Figure 2 is monitoring a border), it is therefore
   desirable for 'I' to copy congestion accumulated so far into the
   outer headers exposed across the tunnel.

   Appendix A discusses various scenarios where the Load Regulator lies
   in-path, not at the source host as we would typically expect.  It
   concludes that the baseline for congestion notification should be
   determined by where the Load Regulator function is, whether it is at
   the source host or within the path.  Therefore every tunnel ingress
   should copy the ECN field into the outer header it creates unless it
   is also a Load Regulator, in which case it should reset any CE
   markings, which is an exception to the normal copying rule for a
   tunnel ingress.


4.  Design Principles

   The constraints from the three perspectives of security, control and
   management in Section 3 are somewhat in tension as to whether a
   tunnel ingress should copy congestion markings into the outer header
   it creates or reset them.  From the control perspective either
   copying or resetting works.  From the management perspective copying
   is preferable (with the exception of an in-path load regulator).
   From the security perspective resetting is preferable but copying is
   now considered acceptable given the bandwidth of a 2-bit covert
   channel can be managed.

   Therefore an outer encapsulating header capable of carrying
   congestion markings SHOULD reflect accumulated congestion since the
   last interface designed to regulate load (the Load Regulator).  This
   implies congestion notification SHOULD be copied into the outer
   header of each new encapsulating header that supports it--except at
   an in-path Load Regulator.  An in-path Load Regulator knows its
   function is to regulate load, so if it also acts as the ingress to a
   tunnel, in every new outer header it creates it MUST reset any
   congestion marking.

   The Load Regulator is the node to which congestion feedback should be
   returned by the next downstream node with a transport layer function
   (typically but not always the data receiver).  The Load Regulator is


Briscoe                  Expires January 1, 2008                [Page 9]

Internet-Draft               ECN Tunnelling                    June 2007


   not always (or even typically) the same thing as the node identified
   by the source address of the outermost exposed header.  In general
   the addressing of the outermost encapsulation header says nothing
   about the identifiers of either the upstream or the downstream
   transport layer functions.  As long as the transport functions know
   each other's addresses, they don't have to be identified in the
   network layer or in any link layer.  It was only a convenience that a
   TCP receiver assumed that the address of the source transport is the
   same as the network layer source address of a packet it receives.

   More generally, the return transport address could be identified
   solely in the transport layer protocol.  For instance, a signalling
   protocol like RSVP [RFC2205] breaks up a path into transport layer
   hops and informs each hop of the address of its transport layer
   neighbour without any need to identify these hops in the network
   layer.  RSVP can be arranged so that these transport layer hops are
   bigger than the underlying network layer hops.  The host identity
   protocol (HIP) architecture [RFC4423] also supports the same
   principled separation (for mobility amongst other things), where the
   transport layer receiver identifies the transport layer sender using
   an identifier provided by the transport layer, which gets mapped to a
   network layer address below the transport layer.

   Note that this principle deliberately doesn't require a packet header
   to reveal the origin address of the baseline that congestion
   notification has accumulated from.  It is not necessary for the
   network and lower layers to know the address of the Load Regulator.
   Only the destination transport needs to know that.  With congestion
   notification, the network and link layers only notify congestion
   forwards, they aren't involved in feeding it backwards.  If they are,
   e.g. backward congestion notification (BCN) in Ethernet [802.1au],
   that should be considered as a transport function added to the lower
   layer, which must sort out its own addressing.  Indeed, this is one
   reason why ICMP source quench is now deprecated [RFC1254]; when
   congestion occurs within a tunnel it is complex (particularly in the
   case of IPsec tunnels) to return the ICMP messages beyond the tunnel
   ingress back to the Load Regulator .

   Similarly, if a management system is monitoring congestion and needs
   to know the baseline of congestion notification, the management
   system has to find this out from the transport; in general it cannot
   tell solely by looking at the network or link layer headers.

   We have said that a tunnel ingress that is not a Load Regulator
   SHOULD (as opposed to MUST) copy incoming congestion notification
   into an outer encapsulating header that supports it.  In the case of
   2-bit ECN, the IETF security area have deemed the benefit always
   outweighs the risk.  Therefore for 2-bit ECN we can and we will say


Briscoe                  Expires January 1, 2008               [Page 10]

Internet-Draft               ECN Tunnelling                    June 2007


   'MUST' (Section 5).  But in this section where we are setting down
   general design principles, we leave it as a 'SHOULD'.  This allows
   for future multi-bit congestion notification fields where the risk
   from the covert channel created by copying congestion notification
   might outweigh the congestion control benefit of copying.


5.  Default ECN Tunnelling Rules

   The following ECN tunnel processing rules are the default for a
   packet with any DSCP.  If required, different ECN processing rules
   MAY be defined for the appropriate Diffserv PHB using the guidelines
   in Section 4.

   When a tunnel ingress creates an encapsulating IP header, the 2-bit
   ECN field of the inner IP header MUST be copied into the outer IP
   header, for all types of IP in IP tunnel (except if the tunnel
   ingress is in compatibility mode--see Section 6).  If the tunnel
   ingress is also a Load Regulator, it MUST instead reset the outer
   header to ECT(0).

   To decapsulate the inner header at the tunnel egress, the outgoing
   inner header MUST be calculated from the combination of the incoming
   inner and outer headers setting the outgoing ECN field to the
   codepoints displayed in the body of Table 1.

                        +--Incoming Outer Header---

   +--------------------+---------+------------+-----------+-----------+
   |   Incoming Inner   | Not-ECT |   ECT(0)   |   ECT(1)  |     CE    |
   |       Header       |         |            |           |           |
   +--------------------+---------+------------+-----------+-----------+
   |       Not-ECT      | Not-ECT | drop (!!!) | drop(!!!) | drop(!!!) |
   |       ECT(0)       |  ECT(0) |   ECT(0)   |   ECT(0)  |     CE    |
   |       ECT(1)       |  ECT(1) |   ECT(1)   |   ECT(1)  |     CE    |
   |         CE         |    CE   |  CE (!!!)  |  CE (!!!) |     CE    |
   +--------------------+---------+------------+-----------+-----------+

                        +-----Outgoing Header------

                      Table 1: IP in IP Decapsulation

   The exclamation marks '(!!!)' in Table 1 indicate that this
   combination of inner and outer headers should not be possible if only
   legal transitions have taken place.  So, the decapsulator should drop
   or mark the ECN field as the table specifies, but it MAY also raise
   an appropriate alarm.  It MUST NOT raise an alarm so often that the
   illegal combinations would amplify into a flood of alarm messages.


Briscoe                  Expires January 1, 2008               [Page 11]

Internet-Draft               ECN Tunnelling                    June 2007


6.  Backward Compatibility

   A legacy tunnel egress may not know how to process an ECN field, so
   it will most likely simply disregard all outer headers.  Therefore,
   unless a compliant tunnel ingress has established that the tunnel
   egress understands ECN processing, it MUST only send packets with the
   ECN field set to Not-ECT in the outer header.  Otherwise, if ECN
   capable outer headers were sent towards a legacy egress, it would
   dangerously remove information about congestion experienced within
   the tunnel.

   A tunnel ingress may establish whether its tunnel egress will
   understand ECN processing by configuration or by negotiation.  Note
   that a [RFC4301] tunnel ingress that has used IKEv2 key management
   [RFC4306] can guarantee that the tunnel egress is also RFC4301-
   compliant and therefore need not negotiate ECN capabilities.

   To be compliant with this specification a tunnel ingress that does
   not know the egress ECN capability (e.g. by configuration) MUST
   implement a 'normal' mode and a 'compatibility' mode, and it MUST
   initiate each negotiated tunnel in compatibility mode.  On the other
   hand, a compliant tunnel egress MUST merely implement the one
   behaviour in Section 5, which we term 'full-functionality' mode.

   Before switching to normal mode, a compliant tunnel ingress that does
   not know the egress ECN capability (e.g. by configuration) MUST
   negotiate with the tunnel egress to establish whether the egress is
   in full functionality mode.  If the egress is in full functionality
   mode, the ingress puts itself into normal mode.  In normal mode the
   ingress follows the encapsulation rule in Section 5 (i.e. it copies
   the inner ECN field into the outer header).  If the egress is not in
   full-functionality mode or doesn't understand the question, the
   tunnel ingress MUST remain in compatibility mode.

   A tunnel ingress in compatibility mode MUST set all outer headers to
   Not-ECT.

   The decapsulation rules for the egress of the tunnel in Section 5
   have been defined in such a way that congestion control will still
   work safely if any of the earlier versions of ECN processing are used
   unilaterally at the encapsulating ingress of the tunnel.  If a tunnel
   ingress tries to negotiate to use limited functionality mode or full
   functionality mode, a decapsulating tunnel egress compliant with this
   specification MUST agree to the request, even though its behaviour
   will be the same in both cases.  For 'forward compatibility', a
   compliant tunnel egress MUST raise a warning about any requests to
   enter modes it doesn't recognise, but it can continue operating.  If
   no ECN-related mode is requested, no error or warning need be raised


Briscoe                  Expires January 1, 2008               [Page 12]

Internet-Draft               ECN Tunnelling                    June 2007


   as the egress behaviour is compatible with all the legacy ingress
   behaviours that don't negotiate capabilities.

   Note that if a compliant node is the ingress for multiple tunnels, a
   mode setting will need to be stored for each tunnel ingress.
   However, if a node is the egress for multiple tunnels, none of the
   tunnels will need to store a mode setting, because a compliant egress
   can only be in one mode.


7.  Changes from Earlier RFCs

   The rule that a tunnel ingress MUST copy any ECN field into the outer
   header is a change to RFC3168 (unless it is a Load Regulator as well,
   in which case there is no change).

   The rules for calculating the outgoing ECN field on decapsulation at
   a tunnel egress are in line with the full functionality mode of ECN
   in RFC3168 and with RFC4301, except that neither identified the need
   to raise an alarm if the inner header was CE but the outer header was
   ECT.

   The rules for how a tunnel establishes whether the egress has full
   functionality ECN capabilities are an update to RFC3168.  For all the
   typical cases, RFC4301 is not updated by the ECN capability check in
   this specification, because a typical RFC4301 tunnel ingress will
   have already established that it is talking to an RFC4301 tunnel
   egress (e.g. if it uses IKEv2).  However, there may be some corner
   cases (e.g. manual keying) where an RFC4301 tunnel ingress talks with
   an egress with limited functionality ECN handling.  For such corner
   cases, the requirement to use compatibility mode in this
   specification updates RFC4301.

   The optional ECN Tunnel field in the IPsec security association
   database (SAD) and the optional ECN Tunnel Security Association
   Attribute defined in RFC3168 are no longer needed.  The security
   association (SA) has no policy on ECN usage, because all RFC4301
   tunnels now support ECN without any policy choice.

   RFC3168 defines a (required) limited functionality mode and an
   (optional) full functionality mode for a tunnel, but RFC4301 doesn't
   need modes.  In this specification only the ingress might need two
   modes, unlike the modes of RFC3168 that were properties of the pair
   of tunnel endpoints after negotiation.

   All these ECN processing rules update RFC2003 on IP in IP tunnelling.


Briscoe                  Expires January 1, 2008               [Page 13]

Internet-Draft               ECN Tunnelling                    June 2007


8.  IANA Considerations

   This memo includes no request to IANA.


9.  Security Considerations

   Section 3.1 discusses the security constraints imposed on ECN tunnel
   processing.  The Design Principles of Section 4 trade-off between
   security (covert channels) and congestion monitoring & control.  In
   fact, ensuring congestion markings are not lost is itself another
   aspect of security, because if we allowed congestion notification to
   be lost, any attempt to enforce a response to congestion would be
   much harder.

   We keep the behaviour defined in both RFC3168 and RFC4301 where, if
   the inner and outer headers carry contradictory ECT values the inner
   header is preserved for onward forwarding.  However, in writing this
   document we noticed this behaviour would hide illegal suppression of
   congestion notification from the detection mechanism designed for
   this attack.  One reason two ECT codepoints were defined was to
   enable the source to detect if a CE marking had been applied then
   subsequently removed.  The source could detect this by weaving a
   pseudo-random sequence of ECT(0) and ECT(1) values into a stream of
   packets [RFC3540].  With the rules as they stand in RFC3168 and
   RFC4301, within a tunnel a CE marking could be added and subsequently
   removed by a non-compliant node without detection, because the
   evidence of such misbehaviour is removed by the decapsulator.

   We could have specified that an outer header value of ECT should
   overwrite a contradictory ECT value in the inner header to close this
   loophole.  But we chose not to for two reasons: i) we wanted to avoid
   any changes to IPsec tunnelling behaviour; ii) allowing ECT values in
   the outer header to override the inner header would have increased
   the bandwidth of the covert channel through the egress gateway from 1
   to 1.5 bit per datagram, potentially threatening to upset the
   consensus established in the security area that says that the
   bandwidth of this covert channel can now be safely managed.


10.  Conclusions

   This document updates the tunnelling treatment of RFC3168 ECN for all
   IP in IP tunnels to bring it into line with the new behaviour in the
   IPsec architecture of RFC4301.

   At the tunnel egress, header decapsulation for the default ECN
   marking behaviour is broadly unchanged except that one exceptional


Briscoe                  Expires January 1, 2008               [Page 14]

Internet-Draft               ECN Tunnelling                    June 2007


   case has been catered for.  At the ingress, for all forms of IP in IP
   tunnel, encapsulation has been brought into line with the new IPsec
   rules in RFC4301 which copy rather than reset CE markings when
   creating outer headers.  Previously, upstream congestion information
   was not revealed in the outer header, which limited the scope of some
   management monitoring techniques and prevented certain active queue
   management algorithms from taking account of upstream congestion
   markings.  The change ensures all IP in IP tunnels reflect the more
   relaxed attitude to revealing congestion information in the new IPsec
   architecture, which now deems that the threat from 2-bit covert
   channels can be managed without disabling ECN.

   Also, this document defines more generic principles to guide the
   design of alternate forms of tunnel processing of congestion
   notification, if required for specific Diffserv PHBs (such as will be
   required for the PCN working group) or for other lower layer
   encapsulating protocols that might support congestion notification in
   the future (e.g.  MPLS).


11.  Acknowledgements

   Thanks to David Black, Bruce Davie, Toby Moncaster and Gabriele
   Corliano for their careful review comments.


12.  Comments Solicited

   Comments and questions are encouraged and very welcome.  They can be
   addressed to the IETF Transport Area working group mailing list
   <tsvwg@ietf.org>, and/or to the authors.


13.  References

13.1.  Normative References

   [RFC2003]  Perkins, C., "IP Encapsulation within IP", RFC 2003,
              October 1996.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC2474]  Nichols, K., Blake, S., Baker, F., and D. Black,
              "Definition of the Differentiated Services Field (DS
              Field) in the IPv4 and IPv6 Headers", RFC 2474,
              December 1998.


Briscoe                  Expires January 1, 2008               [Page 15]

Internet-Draft               ECN Tunnelling                    June 2007


   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
              of Explicit Congestion Notification (ECN) to IP",
              RFC 3168, September 2001.

   [RFC4301]  Kent, S. and K. Seo, "Security Architecture for the
              Internet Protocol", RFC 4301, December 2005.

13.2.  Informative References

   [802.1au]  "IEEE Standard for Local and Metropolitan Area Networks--
              Virtual Bridged Local Area Networks - Amendment 10:
              Congestion Notification", 2006,
              <http://www.ieee802.org/1/pages/802.1au.html>.

              (Work in Progress; Access Controlled link within page)

   [BBnet]    Sexton, M. and A. Reid, "Broadband Networking: {ATM},
              {SDH} and {SONET}", Artech House telecommunications
              library ISBN: 0-89006-578-0, 1997.

   [I-D.ietf-tsvwg-ecn-mpls]
              Davie, B., "Explicit Congestion Marking in MPLS",
              draft-ietf-tsvwg-ecn-mpls-00 (work in progress),
              March 2007.

   [I-D.rosen-pwe3-congestion]
              Rosen, E., "Pseudowire Congestion Control Framework",
              draft-rosen-pwe3-congestion-04 (work in progress),
              October 2006.

   [PCN-arch]
              Eardley, P., Babiarz, J., Chan, K., Charny, A., Geib, R.,
              Karagiannis, G., Menth, M., and T. Tsou, "Pre-Congestion
              Notification Architecture",
              draft-eardley-pcn-architecture-00 (work in progress),
              June 2007.

   [PCNcharter]
              IETF, "Congestion and Pre-Congestion Notification (pcn)",
              IETF w-g charter , Feb 2007,
              <http://www.ietf.org/html.charters/pcn-charter.html>.

   [RFC1254]  Mankin, A. and K. Ramakrishnan, "Gateway Congestion
              Control Survey", RFC 1254, August 1991.

   [RFC1701]  Hanks, S., Li, T., Farinacci, D., and P. Traina, "Generic
              Routing Encapsulation (GRE)", RFC 1701, October 1994.


Briscoe                  Expires January 1, 2008               [Page 16]

Internet-Draft               ECN Tunnelling                    June 2007


   [RFC2205]  Braden, B., Zhang, L., Berson, S., Herzog, S., and S.
              Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1
              Functional Specification", RFC 2205, September 1997.

   [RFC2637]  Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little,
              W., and G. Zorn, "Point-to-Point Tunneling Protocol",
              RFC 2637, July 1999.

   [RFC2661]  Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn,
              G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"",
              RFC 2661, August 1999.

   [RFC3426]  Floyd, S., "General Architectural and Policy
              Considerations", RFC 3426, November 2002.

   [RFC3540]  Spring, N., Wetherall, D., and D. Ely, "Robust Explicit
              Congestion Notification (ECN) Signaling with Nonces",
              RFC 3540, June 2003.

   [RFC4306]  Kaufman, C., "Internet Key Exchange (IKEv2) Protocol",
              RFC 4306, December 2005.

   [RFC4423]  Moskowitz, R. and P. Nikander, "Host Identity Protocol
              (HIP) Architecture", RFC 4423, May 2006.

   [Shayman]  "Using ECN to Signal Congestion Within an MPLS Domain",
              2000, <http://www.ee.umd.edu/~shayman/papers.d/
              draft-shayman-mpls-ecn-00.txt>.

              (Expired)


Appendix A.  In-path Load Regulation

   In the traditional Internet architecture one tends to think of the
   source host as the Load Regulator for a path.  It is generally not
   desirable or practical for a node part way along the path to regulate
   the load.  However, various reasonable proposals for in-path load
   regulation have been made from time to time (e.g. fair queuing,
   traffic engineering).  Also the IETF has recently chartered a working
   group to standardise admission control across a part of a path using
   pre-congestion notification (PCN) [PCNcharter], which involves in-
   path load regulation.  This is of particular relevance here because
   it involves congestion notification with an in-path Load Regulator
   and it can involve tunnelling.

   We will use the more complex scenario in Figure 3 to tease out all
   the issues that arise when combining congestion notification and


Briscoe                  Expires January 1, 2008               [Page 17]

Internet-Draft               ECN Tunnelling                    June 2007


   tunnelling with various possible in-path load regulation schemes.  In
   this case 'I1' and 'E2' break up the path into three separate
   congestion control loops.  The feedback for these loops is shown
   going right to left across the top of the figure.  The 'V's are arrow
   heads representing the direction of feedback, not letters.  But there
   are also two tunnels within the middle control loop: 'I1' to 'E1' and
   'I2' to 'E2'.  The two tunnels might be VPNs, perhaps over two MPLS
   core networks.  M is a congestion monitoring point, perhaps between
   two border routers where the same tunnel continues unbroken across
   the border.
        ______     _______________________________________      _____
       /      \   /                                        \   /     \
      V        \ V                                M         \ V       \
      A--->R--->I1===========>E1----->I2=========>==========>E2------->B

                     Figure 3: complex Tunnel Scenario

   The question is, should the congestion markings in the outer exposed
   headers of a tunnel represent congestion only since the tunnel
   ingress or over the whole upstream path from the source of the inner
   header (whatever that may mean)?  Or put another way, should 'I1' and
   'I2' copy or reset CE markings?

   The answer is that the baseline of congestion marking should be the
   nearest upstream interface designed to regulate traffic load--the
   Load Regulator.  In Figure 3 'A', 'I1' or 'E2' are all Load
   Regulators.  We have shown the feedback loops returning to each of
   these nodes so that they can regulate the load causing the congestion
   notification.  So the baseline for congestion markings exposed to M
   should be 'I1' (the Load Regulator), not 'I2'.  That is, 'I2' SHOULD
   copy any CE marking into the outer header it creates, while 'I1' is
   an exception because it is an in-path load regulator, so it should
   reset the ECN field in the outer header it creates.

   The following further examples illustrate how this answer might be
   applied:

   o  Preemption marking is currently defined for PCN [PCN-arch] so that
      the rate of unmarked packets at the end of a path of multiple
      bottlenecks determines the maximum sustainable aggregate bit rate
      over that path.  To produce the correct marking by the end, each
      congested node must only consider packets to be eligible for
      marking if they have not already been marked by any previous
      bottleneck along a path that may span multiple tunnels (including
      MPLS encapsulations etc.).  This scheme only results in the
      correct marking rate if the markings accumulated so far along the
      path are copied into the outer exposed header of each tunnel or
      encapsulation.  Consider that 'I1' and 'E2' in the complex


Briscoe                  Expires January 1, 2008               [Page 18]

Internet-Draft               ECN Tunnelling                    June 2007


      scenario of Figure 3 are edge gateways of a PCN region.  Admission
      control based on PCN measurements is a form of load regulation, so
      'I1' regulates the load on the PCN region.  Therefore 'I1' should
      be the baseline of congestion marking for _both_ tunnels within
      the scope of its feedback loop.  Therefore 'I2' should follow the
      normal rules and copy congestion marking into the outer tunnel
      header, while 'I1' is an exception because it is also a load
      regulator, so it should reset CE markings in the outer header.

   o  [Shayman] suggested feedback of ECN accumulated across an MPLS
      domain could cause the ingress to trigger re-routing to mitigate
      congestion.  This case is more like the simple scenario of
      Figure 2, with a feedback loop across the MPLS domain ('E' back to
      'I').  The baseline for congestion exposed in outer headers in
      this case will be the tunnel ingress, which should therefore reset
      the ECN field in the outer headers it creates.  But the reason it
      should act as the baseline is because it is an in-path load
      regulator (re-routing around congestion is a load regulation
      function), not just because it is a tunnel ingress.

   o  The PWE3 working group of the IETF is considering the problem of
      how and whether an aggregate private wire emulation should respond
      to congestion [I-D.rosen-pwe3-congestion].  Although the study is
      still at the requirements stage, some (controversial) solution
      proposals include in-path load regulation at the ingress to the
      tunnel that could lead to tunnel arrangements with similar
      complexity to that of Figure 3.

   These are not contrived scenarios--they could be a lot worse.  For
   instance, a host may create a tunnel for IPsec which is placed inside
   a tunnel for Mobile IP over a remote part of its path.  And around
   this all we may have MPLS labels being pushed and popped as packets
   pass across different core networks.  Similarly, it is possible that
   subnets could be built from link technology (e.g. ethernet switches)
   so that link headers being added and removed could involve congestion
   notification in future link headers with all the same issues as with
   IP in IP tunnels.

   The reason we introduced the concept of a Load Regulator was to allow
   for in-path load regulation.  In the traditional Internet
   architecture one tends to think of a host and a Load Regulator as
   synonymous, but when considering tunnelling, even the definition of a
   host is too fuzzy, whereas a Load Regulator is a clearly defined
   function.  Similarly, the concept of innermost header is too fuzzy to
   be able to (wrongly) say that the source address of the innermost
   header should be the baseline.  Which is the innermost header when
   multiple encapsulations may be in use?  Where do we stop?  If we say
   the original source in the above IPsec-Mobile IP case is the host,


Briscoe                  Expires January 1, 2008               [Page 19]

Internet-Draft               ECN Tunnelling                    June 2007


   how do we know it isn't tunnelling an encrypted packet stream on
   behalf of another host in a p2p network?

   The reason there has been so much confusion over the question of
   whether a tunnel ingress should copy or reset CE markings is that we
   have become used to thinking that only hosts regulate load.  The end
   to end design principle advises that this is a good idea [RFC3426],
   but it also advises that it is only a guiding principle intended to
   make the designer think very carefully before breaking it.  We do
   have proposals where load regulation functions sit within a network
   path for good, if sometimes controversial, reasons, e.g.  PCN edge
   admission control gateways [PCN-arch] or traffic engineering
   functions at domain borders to re-route around congestion [Shayman].


Author's Address

   Bob Briscoe
   BT
   B54/77, Adastral Park
   Martlesham Heath
   Ipswich  IP5 3RE
   UK

   Phone: +44 1473 645196
   Email: bob.briscoe@bt.com
   URI:   http://www.cs.ucl.ac.uk/staff/B.Briscoe/


Briscoe                  Expires January 1, 2008               [Page 20]

Internet-Draft               ECN Tunnelling                    June 2007


Full Copyright Statement

   Copyright (C) The IETF Trust (2007).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.


Acknowledgments

   Funding for the RFC Editor function is provided by the IETF
   Administrative Support Activity (IASA).  This document was produced
   using xml2rfc v1.32 (of http://xml.resource.org/) from a source in
   RFC-2629 XML format.


Briscoe                  Expires January 1, 2008               [Page 21]