<?xml version="1.0"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [<!ENTITY RFC2119 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC8287 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8287.xml">
<!ENTITY RFC8029 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8029.xml">
<!ENTITY RFC8403 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8403.xml">
<!ENTITY RFC8402 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8402.xml">
<!ENTITY RFC8604 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8604.xml">
<!ENTITY RFC7743 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7743.xml">
<!ENTITY RFC3107 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3107.xml">
<!ENTITY RFC8660 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8660.xml">
<!ENTITY RFC7110 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7110.xml">
<!ENTITY RFC9087 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.9087.xml">
<!ENTITY RFC8277 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8277.xml">
<!ENTITY RFC8174 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8174.xml">
]>
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-ietf-mpls-spring-inter-domain-oam-01" ipr="trust200902">
<front>
  <title abbrev="Inter-as-OAM">PMS/Head-end based MPLS Ping and Traceroute in Inter-domain
   SR Networks</title>

  <author initials="S." surname="Hegde" fullname="Shraddha Hegde">
    <organization>Juniper Networks Inc.</organization>
    <address>
      <postal>
        <street>Exora Business Park</street>
        <city>Bangalore</city>
        <region>KA</region>
        <code>560103</code>
        <country>India</country>
      </postal>
      <email>shraddha@juniper.net</email>
    </address>
  </author>
  
  <author initials="K." surname="Arora" fullname="Kapil Arora">
    <organization>Juniper Networks Inc.</organization>
    <address>
      <postal>
        <street></street>
        <city></city>
        <region></region>
        <code></code>
        <country></country>
      </postal>
      <email>kapilaro@juniper.net</email>
    </address>
  </author>
    
  <author initials="M." surname="Srivastava" fullname="Mukul Srivastava">
    <organization>Juniper Networks Inc.</organization>
    <address>
      <postal>
        <street></street>
        <city></city>
        <region></region>
        <code></code>
        <country></country>
      </postal>
      <email>msri@juniper.net</email>
    </address>
  </author>

<author initials="S." surname="Ninan" fullname="Samson Ninan">
    <organization>Individual Contributor</organization>
    <address>
      <postal>
        <street></street>
        <city></city>
        <region></region>
        <code></code>
        <country></country>
      </postal>
      <email>samson.cse@gmail.com</email>
    </address>
  </author>

   <author initials="N." surname="Kumar" fullname="Nagendra Kumar">
    <organization>Cisco Systems, Inc.</organization>
    <address>
      <postal>
        <street></street>
        <city></city>
        <region></region>
        <code></code>
        <country></country>
      </postal>
      <email>naikumar@cisco.com</email>
    </address>
  </author>

   <date year="2021"/>
  <area>Routing</area>
  <workgroup>Routing area</workgroup>
  <keyword>OAM</keyword>
  <keyword>EPE</keyword>
  <keyword>BGP-LS</keyword>
  <keyword>BGP</keyword>
  <keyword>SPRING</keyword>
  <keyword>SDN</keyword>
  <abstract>
 <t>Segment Routing (SR) architecture leverages source routing and
   tunneling paradigms and can be directly applied to the use of a
   Multiprotocol Label Switching (MPLS) data plane. A network may 
   consist of multiple IGP
   domains or multiple ASes under the control of same organization.
   It is useful to have the Label switched Path (LSP)
   Ping and traceroute procedures when an SR 
   end-to-end path spans across multiple ASes or domains.
   This document describes mechanisms to facilitae LSP ping and 
   traceroute in inter-AS/inter-domain 
   SR-MPLS networks in an efficient manner with simple Operations,
   Administration, and Maintenance (OAM) protocol 
   extension which uses
   dataplane forwarding alone for sending echo reply.   </t>
  </abstract>

   <note title="Requirements Language">
    <t> The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
      NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",
      "MAY", and "OPTIONAL" in this document are to be interpreted as
      described in BCP 14 <xref target ="RFC2119"/> 
	  <xref target ="RFC8174"/> when, and only when, they
      appear in all capitals, as shown here.</t>
  </note>

</front>

<middle>
<section title="Introduction" anchor='intro'>

<figure anchor="Topology_1" title="Inter-AS Segment Routing topology">

      <artwork>
                    +----------------+
                    | Controller/PMS |
                    +----------------+



 |---AS1-----|                |------AS2------|            |----AS3---|
   
                ASBR2----ASBR3                ASBR5------ASBR7
                /             \               /            \
               /               \             /              \
 PE1----P1---P2                 P3---P4---PE4               P5---P6--PE5
               \               /            \               /
                \             /              \             /
                 ASBR1----ASBR4              ASBR6------ASBR8

       </artwork>
</figure>

<t> Many network deployments have built their networks consisting of multiple 
Autonomous Systems either for ease of operations or as a result of network 
mergers and acquisitions. Segment Routing can be deployed in such scenarios
to provide end to end paths, traversing multiple Autonomous systems(AS). These
paths consist of Segment Identifiers(SID) of 
different type as per <xref target ="RFC8402"/>. </t>
 <t>    <xref target='RFC8660'/> specifies the
forwarding plane behaviour to allow Segment Routing to operate on top of
MPLS data plane. 
<xref target='RFC9087'/> describes BGP peering
SIDs, which will help in steering packet from one Autonomous system to another.
Using above SR capabilities, paths which span across multiple Autonomous systems
can be created. </t>
<t>For example <xref target="Topology_1"/> 
describes an inter-AS network scenario consisting of ASes AS1 and AS2. 
Both AS1 and AS2 are Segment Routing enabled and the EPE links 
have EPE labels configured 
and advertised via <xref target="I-D.ietf-idr-bgpls-segment-routing-epe"/>. 
Controller or head-end can build
end-to-end Traffic-Engineered path consisting of Node-SIDs,
 Adjacency-SIDs and EPE-SIDs.
It is advantageous for operations to be able to perform LSP ping and traceroute
procedures on these inter-AS SR-MPLS paths. LSP ping/traceroute procedures use IP 
connectivity for echo reply to 
reach the head-end. In inter-AS networks, IP connectivity may not be there 
from each router in the path.For example
in <xref target="Topology_1"/> P3 and P4 may not have IP connectivity for PE1.</t>

<t><xref target ="RFC8403"/> describes mechanisms to carry out the 
MPLS ping/traceroute from a Path Monitoring System (PMS).
It is possible to build GRE tunnels or static routes to each router 
in the network
 to get  IP connectivity for the reverse path.
This mechanism is operationally very heavy and requires PMS to be capable
of building huge number of GRE tunnels, which may not be feasible.</t>



<t>It is not possible to carry out LSP ping and Traceroute functionality
on these paths to verify basic connectivity and fault isolation using existing
LSP ping and Traceroute mechanism(<xref target="RFC8287"/> and
 <xref target="RFC8029"/>). 
This is because, there exists no IP connectivity
to source address of ping packet, which is in a different AS, from the
 destination of Ping/Traceroute.
</t> 
<t>  <xref target="RFC7743"/> describes a Echo-relay based solution
based on advertising a new Relay Node Address Stack TLV containing 
stack of Echo-relay IP addresses. These mechansims can be applied to 
segment routing networks as well. <xref target="RFC7743"/>
mechanism requires the return ping packet to be processed in slow 
path or as a bump-in-the-wire
on every relay node. The motivation of the current document
is to provide an alternate mechanism for ping/traceroute in 
inter-domain segment routing networks.

</t>
<t>
This document describes a new mechanism which is efficient and simple 
and can be easily deployed in SR-MPLS networks. This mechanism uses MPLS path and
no changes required in the forwarding path. 
Any MPLS capable node will be able to forward
the echo-reply packet in fast path.

The current draft describes a mechanism that uses Reply path TLV <xref target="RFC7110"/> 
to convey the reverse path. Three new sub-TLVs for Reply path TLV are defined,
that faciliate encoding segment routing label stack.
The TLV can either be derived by a smart application or controller which has a full 
topology view. This document also proposes mechanisms to derive the return path dynamically
during traceroute procedures. 
</t>

<section anchor='domain_definition' title='Definition of Domain'>
<t> The term domain used in this document implies an IGP domain where every node
is visible to every other node for the purposes of shortest path computation. The
domain implies an IGP area or level. An Autonomous System (AS) consists of
one or more IGP domains.The procedures described in this document are applicable 
to paths built across multiple domains which includes inter-area as well as inter-AS paths.
 This document is applicable to SR-MPLS networks
where all nodes in each of the domains are SR capable. It is also applicable to
SR-MPLS networks where SR acts an an overlay having SR incapable underlay nodes. In such
networks, the traceroute procedure is executed only on the overlay SR nodes.</t>
</section>
</section>

<section anchor='inter_domain' title='Inter domain networks with multiple IGPs'>
<figure anchor="Topology_2" title="Inter-domain networks with multiple IGPs">
      <artwork>
                   

 |-Domain 1|-------Domain 2-----|--Domain 3-|
   
                     
 PE1------ABR1--------P--------ABR2------PE4
  \        / \                  /\        /
   --------   -----------------   -------
    BGP-LU         BGP-LU          BGP-LU

       </artwork>
</figure>
<t>When the network consists of large number of nodes, the nodes are 
seggregated into multiple IGP domains.
The connectivity to the remote PEs can be achieved using
BGP-Labeled Unicast (BGP-LU) 
<xref target="RFC8277"/> or by stacking the labels
for each domain as described in <xref target="RFC8604"/>. 
It is useful to support MPLS ping and traceroute 
mechanisms for these networks. The procedures described in 
this document for constructing Reply path TLV and its use
in echo reply is equally applicable to networks consisting 
of multiple IGP domains that use BGP-LU or label stacking.

</t>
</section>
<section anchor='Reply_path_TLV' title='Reply Path TLV'>
<t>Segment Routing networks statically assign the labels to 
nodes and PMS/Head-end 
may know the entire database. The reverse path can be built 
from PMS/Head-end  by stacking
segments for the reverse path. Reply path TLV as defined in 
<xref target="RFC7110"/> is used
to carry the return path. While using the procedures 
described in this document, the 
reply mode MUST be set to 5 “Reply via Specified Path” and as
 specified in <xref target="RFC7110"/> Reply Path TLV MUST be 
included in the echo request message.
The procedures decribed in <xref target="RFC7110"/> are 
applicable for constructing 
the Reply Path TLV. This document define three new sub-TLVs 
to encode the Segment Routing path.</t>


<t>
The type of segment that the head-end
chooses to send in the Reply Path TLV is governed 
by local policy. Implementations may provide CLI input parameters in Labels,
IPv4 addresses or IPv6 addresses or a combination of these which gets encoded
in the return path TLV. Implementations may also provide mechansims to acquire
the database of remote domains and compute the return path based 
on the acquired database. For traceroute purposes, the return path will have to
consider the reply being sent from every node along the path. The return path
changes when the traceroute progresses and crosses each domain. One of the ways this can be
implemented on headend is to acquire the entire database (of all domains) and build return path
for every node along the SR-MPLS path based on the knowledge of the database. 
Another mechansim is to  use dynamically
computed return path as described in <xref target="Dynamic_TLV_building"/>
</t>
<t>
Some networks may consist of pure IPV4 domains and pure IPv6 domains. 
Handling end-to-end MPLS OAM for such networks
is out of scope for this document. It is recommended to use dual 
stack in such cases and use end-to-end IPv6 addresses
for MPLS ping and trace route procedures.
</t>
</section>
<section anchor='segment_sub_tlv' title='Segment sub-TLV'>   
<t> <xref target="I-D.ietf-spring-segment-routing-policy"/> 
defines various types of segments. 
The types of segments applicable to this document have been defined in this section for the use of
MPLS OAM.
The motivation has been to keep the definitions same as in 
<xref target="I-D.ietf-spring-segment-routing-policy"/>
with minimal modifications if it is absolutely needed.
One or more segment sub-TLV can be included in the Reply Path TLV. 
The segment sub-TLVs included in a Reply Path TLV MAY be of different types.</t>

<t> Below types of segment sub-TLVs are applicable for the 
Reverse Path Segment List TLV.</t>

<t>Type  A: SID only, in the form of MPLS Label</t>
<t>Type  C: IPv4 Node Address with optional SID</t>
<t>Type  D: IPv6 Node Address with optional SID for SR MPLS</t>


<section anchor='type1' title='Type A: SID only, in the form of MPLS Label'> 

   <t>The Type-A Segment Sub-TLV encodes a single SID in the form of an
   MPLS label.  The format is as follows:</t>
   
   <figure anchor="type1_tlv" title="Type 1 Segment sub-TLV">
    <artwork>

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Type                      |   Length                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Flags       |   RESERVED                                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          Label                        | TC  |S|       TTL     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   
   
   </artwork>
	  </figure>
   <t>
   where:</t>
   

   <t>  Type: TBD1(to be assigned by IANA from the registry 
   "Sub-TLVs for TLV Types 1, 16, and 21").</t>

   <t> Length is 8.</t>

   <t> Flags: 1 octet of flags as defined in  <xref target="flags"/>.</t>

   <t> RESERVED: 3 octets of reserved bits.  SHOULD be unset on
      transmission and MUST be ignored on receipt.</t>

   <t>  Label: 20 bits of label value.</t>

   <t>  TC: 3 bits of traffic class</t>

   <t>  S: 1 bit Reserved</t>

   <t>  TTL: 1 octet of TTL.</t>

  <t> The following applies to the Type-1 Segment sub-TLV:</t>



   <t>  The S bit SHOULD be zero upon transmission, and MUST be ignored
      upon reception.</t>

   <t>  If the originator wants the receiver to choose the TC value, it
      sets the TC field to zero.</t>

   <t> If the originator wants the receiver to choose the TTL value, it
      sets the TTL field to 255.</t>

   <t>  If the originator wants to recommend a value for these fields, it
      puts those values in the TC and/or TTL fields.</t>

   <t>  The receiver MAY override the originator's values for these
      fields.  This would be determined by local policy at the receiver.
      One possible policy would be to override the fields only if the
      fields have the default values specified above.</t>
	
	  
</section>


<section anchor='type3' title='Type C: IPv4 Node Address with optional SID for SR-MPLS'>

   <t>The Type-C Segment Sub-TLV encodes an IPv4 node address, SR Algorithm
   and an optional SID in the form of an MPLS label.  The format is as
   follows:</t>
	<figure anchor="type3_tlv" title="Type 3 Segment sub-TLV">
    <artwork>
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Type                      |   Length                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Flags       |  RESERVED (MBZ)             | SR Algorithm    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                 IPv4 Node Address (4 octets)                  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                SID (optional, 4 octets)                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    </artwork>
	  </figure>
   <t>where:</t>

   <t>  Type: TBD3(to be assigned by IANA from the registry 
   "Sub-TLVs for TLV Types 1, 16, and 21").</t>

   <t> Length is 8 or 12.</t>

   <t>  Flags: 1 octet of flags as defined in  <xref target="flags"/>.</t>

   <t>  SR Algorithm: 1 octet specifying SR Algorithm as described in
      section 3.1.1 in <xref target="RFC8402"/>, when A-Flag as
      defined in   <xref target="flags"/>is present.  SR Algorithm is used 
	  by the receiver to derive the Label. When A-flag is unset,
	  this field has no meaning and thus MUST be set to zero on 
	  transmission and ignored on receipt.</t>
	  
   <t> RESERVED: 2 octets of reserved bits.MUST be set to zero when sending; 
                MUST be ignored on receipt.</t>


   <t>  IPv4 Node Address: a 4 octet IPv4 address representing a node.</t>

   <t>  SID: optional :4 octet field containing label, TC, S and
      TTL as defined in  <xref target="type1"/></t>

   <t>The following applies to the Type-3 Segment sub-TLV:</t>

   <t>  The IPv4 Node Address MUST be present.</t>

   <t>  The SID is optional and specifies a 4 octet MPLS SID containing
      label, TC, S and TTL as defined in  <xref target="type1"/>.</t>

   <t>  If length is 8, then only the IPv4 Node Address is present.</t>

   <t> If length is 12, then the IPv4 Node Address and the MPLS SID are
      present.When the MPLS SID field is present, it MUST be used for constructing the
	  Reply Path TLV.</t>
	 

</section>

<section anchor='type4' title='Type D: IPv6 Node Address with optional SID for SR MPLS'>

   <t>The Type-D Segment Sub-TLV encodes an IPv6 node address, SR Algorithm
   and an optional SID in the form of an MPLS label.  The format is as
   follows:</t>
	<figure anchor="type4_tlv" title="Type 4 Segment sub-TLV">
    <artwork>
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Type                      |   Length                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Flags       |       RESERVED(MBZ)           | SR Algorithm  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   //                IPv6 Node Address (16 octets)                //
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                SID (optional, 4 octets)                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	  </artwork>
	  </figure>
   <t>where:</t>

   <t>  Type: TBD4(to be assigned by IANA from the registry 
   "Sub-TLVs for TLV Types 1, 16, and 21").</t>

   <t>  Length is 20 or 24.</t>

   <t>  Flags: 1 octet of flags as defined in  <xref target="flags"/>.</t>

   <t> SR Algorithm: 1 octet specifying SR Algorithm as described in
      section 3.1.1 in <xref target="RFC8402"/>, when A-Flag as
      defined in  <xref target="flags"/> is present.  SR Algorithm is
	  used by the receiver to derive the label.W hen A-flag is unset, this field has no 
	  meaning and thus MUST be set to zero on transmission and ignored on receipt.</t>
	  
   <t> RESERVED: 2 octets of reserved bits.MUST be set to zero when sending;
                 MUST be ignored on receipt..</t>

   <t> IPv6 Node Address: a 16 octet IPv6 address representing a node.</t>

   <t> SID: optional :4 octet field containing label, TC, S and
      TTL as defined in  <xref target="type1"/></t>

   <t> The following applies to the Type-4 Segment sub-TLV:</t>

   <t> The IPv6 Node Address MUST be present.</t>

   <t>  The SID is optional and specifies a 4 octet MPLS SID containing
      label, TC, S and TTL as defined in  <xref target="type1"/> .</t>

   <t>  If length is 20, then only the IPv6 Node Address is present.</t>

   <t>  If length is 24, then the IPv6 Node Address and the MPLS SID are
      present. When the MPLS SID field is present, it MUST be used for constructing the
	  Reply Path TLV.</t>

	  </section>
	  
<section anchor='flags' title='Segment Flags'>	  

   <t>The Segment Types described above contain following flags in the
   "Flags" field (codes to be assigned by IANA from the 
   registry "Segment sub-TLV Flags" ) </t>
<figure anchor="flags_field" title="Flags">
    <artwork>
    0 1 2 3 4 5 6 7
   +-+-+-+-+-+-+-+-+
   | |A|           |
   +-+-+-+-+-+-+-+-+
</artwork>
	  </figure>
   <t>where:</t>

    
      <t>A-Flag: This flag indicates the presence of SR Algorithm id in the
      "SR Algorithm" field applicable to various Segment Types.  </t>

      <t>Unused bits in the Flag octet SHOULD be set to zero upon
      transmission and MUST be ignored upon receipt.</t>

   <t>The following applies to the Segment Flags:</t>

   <t>  A-Flag is applicable to Segment Types 3, 4.  If A-Flag
      appears with any other Segment Type, it MUST be ignored.</t>
	  
</section>
</section>

<section anchor='SRv6' title='SRv6 Dataplane'>
<t>SRv6 dataplane is not in the scope of this document and will be 
addressed in a separate document.</t>

</section>


<section anchor='procedure' title='Detailed Procedures'>

<section anchor='initiator_procedure' title='Sending an echo request'>
<t>In the inter-AS scenario when there is no reverse path connectivity, 
the procedures described in this document should be used.
LSP ping initiator MUST set the Reply Mode of the echo
request to "Reply via Specified Path", and a Reply Path
TLV MUST be carried in the echo request message correspondingly.   
The Reply Path TLV must contain the Segment Routing Path in the 
reverse direction encoded as an ordered list
of segments. The first Segment MUST correspond to the top Segment in
MPLS header that the responder MUST use while sending the echo reply.
 
 </t>
</section>
<section anchor='responder_procedure' title='Receiving an echo request'>
<t>As described in <xref target="RFC7110"/>, when Reply mode is set 
to 5 (Reply via Specified Path),The echo request MUST contain the Reply 
path TLV. Absence of Reply path TLV is treated as malformed
echo request.
when an echo request is
   received, if the egress LSR does not know the Reply Mode 5 defined in
   <xref target="RFC7110"/>, an echo reply with the return code set to "Malformed
   echo request received" and the Subcode set to zero will be sent back
   to the ingress LSR according to the rules of <xref target="RFC8029"/>.
   
When a Reply Path TLV is received, and the responder that
supports processing it, it  MUST use the segments in Reply Path TLV to build
the echo reply.The responder MUST follow the normal FEC validation 
procedures as described in  <xref target="RFC8029"/>
and  <xref target="RFC8287"/> and this document does not suggest
any change to those procedures. When the echo reply has to be sent 
out the Reply Path TLV is used to construct the MPLS packet to send out.
</t>
</section>
<section anchor='sending_echo_reply' title='Sending an echo reply'>
<t>The echo reply message is sent as MPLS packet with a MPLS label stack. 
The echo reply message MUST be constructed
as described in the <xref target="RFC8029"/>. An MPLS packet is 
constructed with echo reply in the payload.
The top label MUST be constructed from the first Segment from the 
Reply Path TLV.
The remaining labels MUST follow the order from the Reply Path TLV. 
The responder  MAY check the reachability of the top label in its
 own Label Forwarding Information Base (LFIB) before sending the echo reply.
 In certain scenarios the head-end
may choose to send Type 3/Type 4 segments consisting of IPV4 address  or
IPv6 address. Optionally a SID may also be assiciated with Type 3/Type4 segment.
In such cases the node sending the
echo reply MUST derive the MPLS labels based on Node-SIDs associated with the 
IPv4 /IPv6 addresses or from the optional MPLS SIDs in the type 3/ type 4 segments
and encode the echo reply with MPLS labels.</t>
<t>
The reply path return code MUST be set as described in section 7.4 of  <xref target="RFC7110"/>.
The Reply Path TLV MUST be included in echo reply indicating the 
specified return path that the echo reply message is required to follow as described in 
section 5.3 of  <xref target="RFC7110"/>.</t>
<t>
When the node is configured to dynamically create return path for next echo request,
the procedures described in <xref target="Dynamic_TLV_building"/> MUST be used.
The reply path return code MUST be set to TBA1 and same Reply Path TLV
or a new Reply Path TLV MUST be included in the echo reply.


</t>
</section>

<section anchor='Receving_echo_reply' title='Receiving an echo reply'>
<t>The rules and process defined in Section 4.6 of  <xref target="RFC8029"/> 
and section 5.4 of <xref target="RFC7110"/>  apply here. In addition,
if the Reply path return code is  "Use Reply Path TLV 
in echo reply for next echo request", the Reply Path TLV from the
echo Reply MUST be sent in the next echo request with TTL incremented by 1.</t>
</section>
</section>

<section anchor='Topology_description' title='Detailed Example '>
<t>Example topologies given in <xref target="Topology_1"/> and 
<xref target="Topology_2"/> will be used in below sections to explain
LSP Ping and Traceroute procedures. The PMS/Head-end has  complete view
of topology. PE1, P1, P2, ASBR1 and ASBR2 are in AS1. Similarly ASBR3, 
ASBR4, P3, P4 and PE4 are in AS2.</t>
<t>AS1 and AS2 have Segment Routing enabled. 
IGPs like OSPF/ISIS are used to flood
SIDs in each Autonomous System. The ASBR1, 
ASBR2, ASBR3, ASBR4 advertise BGP EPE SIDs 
for the inter-AS links.
Topology of AS1 and AS2 are advertised via BGP-Link State (BGP-LS)
to the controller/PMS or Head-end node.
The EPE-SIDs are also advertised via BGP-LS as described in 
<xref target="I-D.ietf-idr-bgpls-segment-routing-epe"/></t>
<t>The description in the document uses below notations for 
Segment Identifiers(SIDs).</t>
<t>Node SIDs : N-PE1, N-P1, N-ASBR1 N-ABR1, N-ABR2etc.</t>
<t>Adjacency SIDs : Adj-PE1-P1, Adj-P1-P2 etc.</t>
<t>EPE SIDS : EPE-ASBR2-ASBR3, EPE-ASBR1-ASBR4, EPE-ASBR3-ASBR2 etc.</t>

<t>Let us consider a traffic engineered path built from PE1 to PE4 
with Segment List stack as below.
N-P1, N-ASBR1, EPE-ASBR1-ASBR4, N-PE4 for following procedures. 
This stack may be programmed
by controller/PMS or Head-end router PE1 may have 
imported the whole topology information 
from BGP-LS and computed the inter-AS path. 
</t>

<section anchor='Mpls_ping_procedures' title='Procedures for Segment Routing LSP ping'>
<t>To perform LSP ping procedure on an SR-MPLS-Path from PE1 to PE4 
consisting of label stacks [N-P1,N-ASBR1,EPE-ASBR1-ASBR4, N-PE4],
The remote end(PE4) needs IP connectivity to head end(PE1) for the
 Segment Routing ping to succeed,
because echo reply needs to travel back to PE1 from PE4.
But in typical deployment scenario there will be no IP route from 
PE4 to PE1 as they belong to different ASes. </t>
<t> PE1 adds return Path from PE4 to PE1 in the MPLS echo request
 using multiple Segments in “Reply Path TLV” 
as defined above.
 An example return path TLV for PE1 to PE4 for
 LSP ping is [N-ASBR4, EPE-ASBR4-ASBR1, N-PE1].
 An implementation may also build a return Path  consisting of labels 
 to reach its own AS. Once the
label stack is popped-off the echo reply message will be exposed.
The further packet forwarding will be based on IP lookup.
An example return Path for this case could be [N-ASBR4, EPE-ASBR4-ASBR1].</t>
<t>On receiving MPLS echo request PE4 first validates FEC in the echo request.
 PE4 then builds label stack  to send the response from PE4 to PE1 by copying
 the labels from “Reply Path TLV”. PE4 builds the echo reply packet
with the MPLS label stack constructed and imposes MPLS headers 
on top of echo reply packet and sends out the packet towards PE1.
This Segment List stack can successfully steer reply back to Head-end node(PE1).</t>
</section>
<section anchor='Mpls_traceroute_procedures' 
title='Procedures for Segment Routing LSP Traceroute'>
<t>Traceroute procedure involves visiting every node on the path and echo reply
sent from every node. In this section, we describe the traceroute mechanims 
when the headend/PMS has complete visibility of the database. Headend/PMS
computes the return path from each node in the entire SR-MPLS path that
is being tracerouted. The return path computation is implementation dependant.
As the headend/PMS completely controls the return path, it can use proprietary 
computations to build the return path. </t>
<t>One of the ways the return path can be
built, is to use the principle of building label stacks by adding each domain
border node's Node SID on the return path label stack as the traceroute progresses.
For inter-AS networks, in addition to border node's Node-SID, EPE-SID in the reverse
direction also need to be added to the label stack.
</t>
<t>The Inter-domain/inter-as traceroute procedure uses the TTL expiry 
mechansim as specified in  <xref target="RFC8029"/> and 
<xref target="RFC8287"/>.  Every echo request packet Headend/PMS
MUST include the appropriate return path in the Reply Path TLV. 
The node that receives the echo request MUST follow procedures
described in section <xref target="initiator_procedure"/> and 
section <xref target="responder_procedure"/> to send out echo reply.
</t>

<t>For Example: </t>
<t> Let us consider a topology from <xref target="Topology_1"/>.
Let us consider a SR-MPLS path [N-P1,N-ASBR1,EPE-ASBR1-ASBR4, N-PE4].
The traceroute is being executed for this inter-AS path for destination PE4.
PE1 sends first echo request with TTL set to 1 and includes return path TLV
consisting of  Type 1 Segment containing label derived from its 
own SR Global Block (SRGB).
Note that the type of segment used in constructing the return Path is 
local policy. If the entire network has same SRGB configured, Type 1 segments
can be used.The TTL expires on P1 and the P1 sends echo reply using the 
return path. Note that implementations may choose to exclude return path TLV
until traceroute reaches the first domain border as the return IP path to PE1
is expected to be available inside the first domain.</t>
<t> TTL is set to 2 and the next echo request is sent out. Until the 
traceroute procedure reaches the domain border node ASBR1, same return path
TLV consisting of single Label (PE1's node Label)is used. When echo request reaches
ASBR1, and echo reply is received, the next echo request needs to include additional
label as ASBR1 is a border node. The return path TLV is built based on the forward path.
As the forward path consists of EPE-ASBR1-ASBR4, an EPE-SID in the reverse direction
is included in the return path TLV. The return path now consists of two
labels [N-PE1, EPE-ASBR4-ASBR1]. The echo reply from ASBR4 will use this return path 
to send the reply.</t>
<t>
The next echo request after visiting the border node ASBR4 will update the return path
with Node-SID label of ASBR4. The return path beyond ASBR4 will be 
[N-PE1, EPE-ASBR4-ASBR1, N-ASBR4]. This same return path is used until the
traceroute procedure reaches next set of border nodes. When there are multiple ASes
the traceroute procedure will continue by adding a set of Node labels and EPE labels as
the border nodes are visited.
</t>
<t>Note that the above return path building procedure requires the database of all the
domains to be available at the headend/PMS. </t>

<t> The above description assumed the same SRGB is configured on all nodes along the path.
The SRGB may differ from one node to another node and the SR architecture <xref target="RFC8402"/>
allows the nodes to use different SRGB. In such scenarios PE1 sends Type 3
(or Type 4 in case of IPv6 networks) segment with Node address of PE1 and with optional MPLS SID
associated with the Node address. The receiving node  derives the label for the return path based on
its own SRGB. When the traceroute procedure crosses the border ASBR1, headend PE1 should 
send type 1 segment for N-PE1 based on the label derived from ASBR1's SRGB. This is
required because in AS2, ASBR4, P3, P4 etc may not have the topology information to
derive SRGB for PE1. After the traceroute procedure reaches ASBR4 the return path
will be [N-PE1(type1 with label based on ASBR1's SRGB), EPE-ASBR4-ASBR1, N-ASBR4 (Type 3)].</t>

<t> In order to extend the example to multiple ASes consisting of 3 or more ASes, let us consider
a traceroute from PE1 to PE5 in <xref target="Topology_1"/>. In this example,
the PE1 to PE5 path has to cross 3 domains AS1, AS2 and AS3. Let us consider a path from PE1
to PE5 that goes through [PE1, ASBR1, ASBR4, ASBR6, ASBR8,PE5].
When the traceroute procedure is visiting the nodes in AS1, the Reply path TLV
sent from headend consists of [N-PE1]. When the traceroute procedure reaches the ASBR4,
the return Path consists of [N-PE1, EPE-ASBR4-ASBR1]. While visiting nodes in AS2,
the traceroute procedure consists of Reply Path TLV [N-PE1, EPE-ASBR4-ASBR1, N-ASBR4].
similarly, while visiting the ASBR8 Reply Path TLV adds the EPE SID from ASBR8 to ASBR6.
While visiting nodes in AS3 Node-SId of ASBR8 would also be added which makes the 
return Path [N-PE1, EPE-ASBR4-ASBR1, N-ASBR4, EPE-ASBR8-ASBR6, N-ASBR8]</t>

<t>Let us consider another example from topology <xref target="Topology_2"/>.
This topology consists of multi-domain IGP with common border node between the domains. 
This could be achieved with multi-area or multi-level IGP or multiple instances of IGP
deployed on same node.
The return path computation for this topology is similar to the multi-AS computation
except that the return path consists of single border node label. When traceroute
procedure is visiting node P, the return path consists of [N-PE1, N-ABR1].</t>


</section>
</section>

<section anchor='Dynamic_TLV_building' title='Building Reply Path TLV dynamically'>
<t>In some cases, the head-end may not have complete visibility of 
Inter-AS/Inter-domain topology. 
In such cases, it can rely on downstream routers to build the 
reverse path for MPLS traceroute procedures.
 For this purpose, Reply Path TLV
 in the echo reply corresponds to the return path to be 
 used in next echo request.


<figure anchor="return_code" title="Reply path return Code">
    <artwork>
   Value         Meaning
   ------        ----------------------
   TBA1        Use Reply Path TLV in echo reply for next echo request.
   
</artwork>
	  </figure>

  
</t>


<section anchor='TLV_build_procedures' title='The procedures to build the return path'>
<t> In order to dynamically build the return Path for traceroute procedures,
the domain border nodes along the path being tracerouted MUST support the 
procedures described in this section. Local policy on the domain border nodes
SHOULD determine whether the domain border node participates in building
return path dynamically during traceroute.</t>
<t> Headend/PMS node MAY include its own node label while initiating traceroute
procedure.  When an ABR receives the echo request, if the local policy
implies building dynamic return path, ABR MUST include its own Node label. 
If there is a Reply Path TLV included in the received echo request message,
the ABR's node label is added before the existing segments. The type of
segment added is based on local policy. In cases when SRGB is not uniform across the
network, it is RECOMMENDED to add type 3 or type 4 segment.
If the existing segment in the Reply Path TLV is a type 3/type 4 segment, 
that segment MUST be converted to Type 1 segment based on ABR's
own SRGB.This is because downstream nodes will not know what SRGB to use
to translate the IP address to a label. As the ABR added its own Node label, it is
guaranteed that this ABR will be in the return path and will be forwarding the 
traffic based on next label after its own label.</t>

<t>When an ASBR receives an echo request from another AS, and ASBR is 
configured to build the return path dynamically,
ASBR MUST build a Reply Path TLV and include it in the echo reply.
The Reply Path TLV MUST consist of its own node label and an EPE-SID 
to the AS from where the traceroute message was received.
A Reply path return code of TBA1 MUST be set in the echo reply to indicate that next echo request
 should use the return Path from the Reply Path TLV in the echo reply.
 ASBR MUST locally decide the outgoing interface
for the echo reply packet. Generally, remote ASBR will choose interface
on which the incoming OAM packet was receieved  to send the echo reply out.
Reply Path  TLV is built by adding two segment sub TLVs. The top segment
 sub TLV consists of the ASBR's Node SID and second segment consists 
 of the EPE SID in the reverse direction to reach the AS from which 
the OAM packet was received.The type of segment chosen to build 
Reply Path TLV is a local policy. It is RECOMMENDED to use type 3/type4 segment for 
the top segment when the SRGB is not gurateed to be uniform in the domain.
</t>
<t> Irrespective of which type of segment is included in the Reply Path TLV,
 the responder
 of echo request MUST always translate the Reply Path TLV to a label stack and build
 MPLS header for the the echo reply packet. This procedure can be applied to
 an end-to-end path consisting of multiple ASes.
 Each ASBR that receives echo request from another AS adds its 
 Node-SID and EPE-SID on top of 
 existing segments in the Reply Path TLV.</t> 
 <t> An ASBR that receives the echo request from a neighbor 
 belonging to same AS,
 MUST look at the Reply Path TLV received in the echo request. 
 If the Reply Path TLV
 consists of a Type 3/Type 4 segment, it MUST convert the Type 3/4 segment to Type 1
 segment by deriving label from its own SRGB. The ASBR MUST set the
 reply path return code to TBA1
 and send the newly constructed Reply Path TLV in the echo reply.</t>
 
 <t>Internal nodes or non domain border nodes MAY not set the Reply Path TLV 
 return code to TBA1  in the 
 echo reply message as there is no change in the return Path. In these cases,
the headend node/PMS that initiates the traceroute procedure MUST continue
to send previously sent
Reply Path TLV in the echo request message in every next echo request. </t>

<t>Note that an ASBR's local policy may prohibit it from participating in the dynamic
traceroute procedures. If such an ASBR is encountered in the forward path, dynamic return path
building procedures will fail. In such cases, ASBR that supports this document MUST set the
return code TBA2 to indicate local policies do not allow the dynamic return path building.</t>
<t>
<figure anchor="return_code2" title="Local policy return Code">
    <artwork>
   Value         Meaning                                                 
   ------        ---------------------------------------------------                         
    TBA2        Local policy does not allow dynamic return Path building.  

</artwork>
	  </figure>

  
</t>

</section>
<section anchor='TLV_build_procedure_example' title='Details with example'>
<t>Let us consider a topology from <xref target="Topology_1"/>.
Let us consider a SR policy path  built from PE1 to PE4 
with a label stack as below.  
N-P1, N-ASBR1, EPE-ASBR1-ASBR4, N-PE4.

PE1 begins traceroute with TTL set to 1 and includes [N-PE1] in the Reply Path TLV.
The traceroute packet TTL expires on P1 and P1 processes the traceroute as per the
procedures described in <xref target="RFC8029"/> and <xref target="RFC8287"/>.
P1 sends echo reply with the same return Path TLV with reply path return code set to 6.
The return code of the echo reply itself is set to the return code as per 
<xref target="RFC8029"/> and <xref target="RFC8287"/>.
 This traceroute doesn't need any changes to the Reply Path TLV 
till it leaves AS1. The same Reply Path TLV that is received may be included in the
echo reply by P1 and P2 or no Reply Path TLV included so that headend continues to
use same return path in echo request that it used to send previous echo request.</t>
 
 <t>When ASBR1 receives the echo request, in case it recieved type3/type 4 segment
 in the Reply Path TLV in the echo request, it converts that type 3/4 segment to
 Type 1 based on its own SRGB.
 When ASBR4 receives the echo request, it should form this Reply Path TLV 
 using its own Node SID(N-ASBR4)
 and EPE SID (EPE-ASRB4-ASBR1) labels and set the reply path return code to TBA1. 
 Then PE1 should use this Reply Path TLV  in subsequent echo requests.
 In this example, when the subsequent echo request reaches P3, it should use
 this Reply Path TLV for sending the
 echo reply. The same Reply Path TLV is sufficient  for any router in AS2 to send the reply. 
 Because the first label(N-ASBR4) can direct echo reply to ASBR4 
 and second one (EPE-ASBR4-ASBR1) to direct 
 echo reply to AS1. Once echo reply reaches AS1, normal IP forwarding or the N-PE1 helps 
 it to reach PE1.</t>
 <t> The example described in above paragraphs can be extended to multiple ASes by following
 the same procedure of each ASBR adding Node-SID and EPE-SID on receieving echo request from
 neighboring AS.</t>
 
 <t>Let us consider a topology from <xref target="Topology_2"/>.
 It consists of multiple IGP domains with multiple area/levels or separate IGP instances.
 There is a single border node that seperates the two domains. In this case,
 PE1 sends traceroute packet with TTL set to 1 and includes N-PE1 in the return path TLV.
 ABR1 receives the echo request and while sending echo reply adds its own node Label
 to the Reply Path TLV and sets the Reply path return code to TBA1. The Reply path TLV
 in the echo reply from ABR1 consists of [N-PE1, N-ABR1]. Next echo request with TTL 2 reaches
 P node. It is an internal node so it does not change the return Path. echo request with TTL 3
 reaches ABR2 and it adds its own Node label so the return path TLV sent in echo reply
 will be [N-PE1, N-ABR1, N-ABR2]. echo request with TTL 4 reaches PE4 and it sends echo reply return code 
 as Egress. PE4 does not include any Reply Path TLV in echo reply. The above example assumes 
 uniform SRGB throughout the domain. In case of different SRGBs, the top segment will be a type 3/4
 segment and all other segments will be type 1. Each border node converts the type 3/type 4 segment to
 type 1 before adding its own segment to the Reply Path TLV.</t>
 
</section>
</section>
  
  <section title='Security Considerations' anchor='sec-con'>
    <t> The procedures described in this document enable LSP ping and traceroute to be
	executed across multiple domains or multiple ASes that belong to same adminstration
	or closely co-operating administration. It is assumed that sharing domain internal
	information across such domains does not pose security risk. 
	However procedures described in this document may be used by an attacker to extract the
	domain internal information. An operator MUST deploy appropriate filter policies
	as described in <xref target="RFC8029"/> to restrict the LSP ping/traceroute packets based on origin.
	It is also suggested that an operator SHOULD deploy security mechanisms such as MACSEC
	on inter-domain links or security vulnerable links to prevent spoofing attacks. </t>
  </section>
  <section anchor="IANA" title="IANA Considerations">
   
	   
	     <t>Sub-TLVs for TLV Types 1, 16, and 21
        <list>
		<t>SID only in the form of MPLS label : TBD (Range 32768-65535)</t>
		<t>IPv4 Node Address with optional SID for SR-MPLS : TBD (Range 32768-65535)</t>
		<t>IPv6 Node Address with optional SID for SR-MPLS : TBD (Range 32768-65535)</t>
		</list>
	   </t>
	   <t> Reply Path Return Codes Registry</t>
	   <t>
	       <list>
	       <t>TBA1:Use Reply Path TLV in echo reply for next echo request.</t>
	  
   
           <t>TBA2:Local policy does not allow dynamic return Path building.</t>
           </list>
        </t>

  </section>
    <section title='Contributors'>
	<t>1.Carlos Pignataro</t>
	<t>Cisco Systems, Inc.</t>
	<t>cpignata@cisco.com</t>
	
	<t>   </t>
	<t>2. Zafar Ali</t>
	<t>Cisco Systems, Inc.</t>
	<t>zali@cisco.com</t>
	
	
  </section>
  <section title='Acknowledgments'>
    <t> Thanks to Bruno Decreane for suggesting use of generic Segment sub-TLV.
	    Thanks to Adrian Farrel, Huub van Helvoort, Dhruv Dhody, Dongjie, for careful review and comments.
		Thanks to Mach Chen
		for suggesting to use Reply Path TLV. Thanks to Gregory
		Mirsky for detailed review which helped improve the readability of the document to
		a great extent.
       	</t>
  </section>

</middle>

<back>
  <references title='Normative References'>
    &RFC8174;
	&RFC2119;
	&RFC8287;
	&RFC8029; 	
	&RFC7110; 		
    &RFC9087;

 
  </references>
  <references title='Informative References'>  
	
    &RFC8403;
    &RFC8402;
    &RFC8604;
	&RFC7743;
    &RFC8277;
	&RFC8660;
   <?rfc include="reference.I-D.ietf-idr-bgpls-segment-routing-epe"?>   
   
   <?rfc include="reference.I-D.ietf-spring-segment-routing-policy"?>
  </references>
</back>
</rfc>
