Network Working Group                                                 Tony Li
INTERNET DRAFT                                                  Li Consulting

                                                              Tony Przygienda
                                                                Siara Systems

                                                                    Henk Smit
                                                                Cisco Systems
                                                                    June 1999

         Domain-wide Prefix Distribution with Multi-Level IS-IS

                  <draft-ietf-isis-domain-wide-01.txt>

Status

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet- Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

1.0 Abstract

   This document describes extensions to the IS-IS protocol to support
   optimal routing within a multi-level domain.  The IS-IS protocol is
   specified in ISO 10589 [1], with extensions for supporting IPv4
   specified in RFC 1195 [2].

   This document extends the semantics presented in RFC 1195 so that a
   routing domain running with both Level 1 and Level 2 Intermediate
   Systems (IS) [routers] can distribute IP prefixes between Level 1 and

Expiration Date December 1999                                   [Page 1]

INTERNET DRAFT                                                 June 1999

   Level 2 and vice versa.  This distribution requires certain
   restrictions to insure that persistent forwarding loops do not form.
   The goal of this domain-wide prefix distribution is to increase the
   granularity of the routing information within the domain.

2.0 Introduction

   An IS-IS routing domain (a.k.a., an autonomous system running IS-IS)
   can be partitioned into multiple level 1 (L1) areas, and a level 2
   (L2) connected subset of the topology that interconnects all of the
   L1 areas.  Within each L1 area, all routers exchange link state
   information.  L2 routers also exchange L2 link state information to
   compute routes between areas.

   RFC 1195 [2] defines the Type, Length and Value (TLV) tuples that are
   used to transport IPv4 routing information in IS-IS.  RFC 1195 also
   specifies the semantics and procedures for interactions between
   levels.  Specifically, routers in a L1 area will exchange information
   within the L1 area.  For IP destinations not found in the prefixes in
   the L1 database, the L1 router should forward packets to the nearest
   router that is in both L1 and L2 (i.e., an L1L2 router) with the
   'attach' bit set in its L1 Link State Protocol Data Unit (LSP).

   Also per RFC 1195, an L1L2 router should be manually configured with
   a set of prefixes that summarize the IP prefixes found in that L1
   area.  These summaries are injected into L2.  RFC 1195 specifies no
   further interactions between L1 and L2 for IPv4 prefixes.

2.1 Motivations for domain-wide prefix distribution

   The mechanisms specified in RFC 1195 are appropriate in many
   situations, and lead to excellent scalability properties.  However,
   in certain circumstances, the domain administrator may wish to
   sacrifice some amount of scalability and distribute more specific
   information than is described by RFC 1195.  This section discusses
   the various reasons why the domain administrator may wish to make
   such a tradeoff.

   One major reason for distributing more prefix information is to
   improve the quality of the resulting routes.  A well know property of
   prefix summarization or any abstraction mechanism is that it
   necessarily results in a loss of information.  This loss of
   information in turn results in the computation of a route based upon
   less information, which will frequently result in routes that are not
   optimal.

Expiration Date December 1999                                   [Page 2]

INTERNET DRAFT                                                 June 1999

   A simple example can serve to demonstrate this adequately.  Suppose
   that a L1 area has two L1L2 routers that both advertise a single
   summary of all prefixes within the L1 area.  To reach a destination
   inside the L1 area, any other L2 router is going to compute the
   shortest path to one of the two L1L2 routers for that area.  Suppose,
   for example, that both of the L1L2 routers are equidistant from the
   L2 source, and that the L2 source arbitrarily selects one L1L2
   router.  This router may not be the optimal router when viewed from
   the L1 topology.  In fact, it may be the case that the path from the
   selected L1L2 router to the destination router may traverse the L1L2
   router that was not selected.  If more detailed topological
   information or more detailed metric information was available to the
   L2 source router, it could make a more optimal route computation.

   This situation is symmetric in that an L1 router has no information
   about prefixes in L2 or within a different L1 area.  In using the
   nearest L1L2 router, that L1L2 is effectively injecting a default
   route without metric information into the L1 area.  The route
   computation that the L1 router performs is similarly suboptimal.

   Besides the optimality of the routes computed, there two other
   significant drivers for the domain wide distribution of prefix
   information.

   When a router learns multiple possible paths to external destinations
   via BGP, it will select only one of those routes to be installed in
   the forwarding table.  One of the factors in the BGP route selection
   is the IGP cost to the BGP next hop address.  Many ISP networks
   depend on this technique, which is known as "shortest exit routing".
   If a L1 router does not know the exact IGP metric to all BGP speakers
   in other L1 areas, it cannot do effective shortest exit routing.

   The third driver is the current practice of using the IGP (IS-IS)
   metric as part of the BGP Multi-Exit Discriminator (MED).  The value
   in the MED is advertised to other domains and is used to inform other
   domains of the optimal entry point into the current domain.  Current
   practice is to take the IS-IS metric and insert it as the MED value.
   This tends to cause external traffic to enter the domain at the point
   closest to the exit router.  Note that the receiving domain may,
   based upon policy, choose to ignore the MED that is advertised.
   However, current practice is to distribute the IGP metric in this way
   in order to optimize routing wherever possible.  This is possible in
   current networks that only are a single area, but becomes problematic
   if hierarchy is to be installed into the network.  This is again
   because the loss of end-to-end metric information means that the MED
   value will not reflect the true distance across the advertising
   domain.  Full distribution of prefix information within the domain
   would alleviate this problem as it would allow accurate computation

Expiration Date December 1999                                   [Page 3]

INTERNET DRAFT                                                 June 1999

   of the IS-IS metric across the domain, resulting in an accurate value
   presented in the MED.

2.2 Scalability

   The disadvantage to performing the domain-wide prefix distribution
   described above is that it has an impact to the scalability of IS-IS.
   Areas within IS-IS help scalability in that LSPs are contained within
   a single area.  This limits the size of the link state database, that
   in turn limits the complexity of the shortest path computation.

   Further, the summarization of the prefix information aids scalability
   in that the abstraction of the prefix information removes the sheer
   number of data items to be transported and the number of routes to be
   computed.

   It should be noted quite strongly that the distribution of prefixes
   on a domain wide basis impacts the scalability of IS-IS in the second
   respect.  It will increase the number of prefixes throughout the
   domain.  This will result in increased memory consumption,
   transmission requirements and computation requirements throughout the
   domain.

   It must also be noted that the domain-wide distribution of prefixes
   has no effect whatsoever on the first aspect of scalability, namely
   the existence of areas and the limitation of the distribution of the
   link state database.

   Thus, the net result is that the introduction of domain-wide prefix
   distribution into a formerly flat, single area network is a clear
   benefit to the scalability of that network.  However, it is a
   compromise and does not provide the maximum scalability available
   with IS-IS.  Domains that choose to make use of this facility should
   be aware of the tradeoff that they are making between scalability and
   optimality and provision and monitor their networks accordingly.
   Normal provisioning guidelines that would apply to a fully
   hierarchical deployment of IS-IS will not apply to this type of
   configuration.

3.0 New semantics for external type metrics

   RFC 1195 defines two TLVs for carrying IP prefixes.  TLV 128 is
   defined to carry 'internal' prefixes and TLV 130 is defined to carry
   'external' prefixes.  The original intent of RFC 1195 was to carry
   intra-domain routes within the internal prefix TLV and inter-domain
   routes or intra-domain routes from alternate IGPs in an external

Expiration Date December 1999                                   [Page 4]

INTERNET DRAFT                                                 June 1999

   prefix TLV.  Interestingly, TLV type 130 is not documented to exist
   in Level 1 LSPs.

   In addition to this distinction, RFC 1195 provides for a bit in each
   of these TLVs that distinguishes between an internal metric type and
   an external metric type.  Similarly, the clear intent was that the
   internal metric type should reflect a total metric that is the sum of
   the metrics to the advertising router plus the metric to the prefix.
   Further, for an external metric type, the total metric should simply
   be the metric advertised to the prefix, not including the total
   metric necessary to reach the exit router.  Prefixes with internal
   metrics are always preferred over external metrics, regardless of the
   value of the metrics.

   It should be noted that the combination of an internal prefix with an
   external metric type is not obviously useful, and is not allowed by
   RFC 1195.

   It should also be noted that as of this writing, the author knows of
   no deployed implementations that make use of either the external
   prefix or the external metric type.  The implication is that this
   proposal is free to redefine the semantics of the external metric
   type bit without conflicting with existing protocol deployment.

   An essential property when redistributing prefixes between levels is
   to insure that no persistent loops form in the distribution of
   information (i.e., a routing loop), as this would lead to the
   indefinite propagation of the information, even in the event that the
   information was no longer originated by some system in the domain.
   Further, a routing loop is likely to form a forwarding loop, where
   actual traffic traverses the network in a cycle in the topology.
   Forwarding loops are known to consume large amounts of resources and
   are to be avoided.

3.1 Proposed semantics for inter-area routes

   To provide the above properties, this proposal defines the following
   syntax and semantics.

   An intra-area route is a route computed based on a prefix advertised
   by some IS-IS router in the area.  Thus, a prefix advertised in the
   L1 link state database may become a L1 intra-area route within the
   area of the advertiser.  Similarly, a prefix advertised in the L2
   link state database may become a L2 intra-area route within L2.
   Prefixes associated with an intra-area route are also said to be
   intra-area prefixes.

Expiration Date December 1999                                   [Page 5]

INTERNET DRAFT                                                 June 1999

   An inter-area route is a route computed based on a prefix advertised
   by an IS-IS router not in the local area.  Inter-area routes exist
   either in L2, in which case they are L1->L2 inter-area routes, or in
   L1, in which case they are L2->L1 inter-area routes.  Prefixes
   associated with an inter-area route area also said to be inter-area
   prefixes.

   External prefixes are reserved for prefixes originating outside of
   the IS-IS system, usually learned from another routing protocol.

   The following tables describe the types of prefixes now defined
   within IS-IS and how they are encoded:

         Level-1 LSPs       |  Internal TLV (128)  | External TLV (130)
     ----------------------------------------------------------------------
     Internal metric-type   |   L1 intra-area      |   external           |
     ----------------------------------------------------------------------
     External metric-type   |   L2->L1 inter-area  |   external           |
     ----------------------------------------------------------------------

         Level-2 LSPs       |  Internal TLV (128)  | External TLV (130)
     ----------------------------------------------------------------------
     Internal metric-type   |   L2 intra-area  or  |   external           |
                            |   L1->L2 inter-area  |                      |
     ----------------------------------------------------------------------
     External metric-type   |   should not exist   |   external           |
     ----------------------------------------------------------------------

   Based on these definitions and encodings, this proposal defines the
   following redistribution rules:

   1) Only L1 intra-area prefixes and external prefixes are
   redistributed from L1 into L2.

   2) All prefixes can be redistributed from L2 into L1 and become L2-
   >L1 inter-area routes.  A L2 prefix must not be redistributed into a
   L1 area if that same prefix is an intra-area prefix in the L1 area.

   3) Within L1, an intra-area prefix is preferred over an inter-area
   prefix, regardless of the comparison of the metrics.

   Based on these rules, we first observe that this proposal is free
   from routing loops.  No prefix can be redistributed from L2 to L1 and
   back into L2, because the route first becomes an L1 inter-area prefix
   by rule (2) and by rule (1) cannot be redistributed into L2.
   Similarly, a prefix redistributed from L1 to L2 becomes an L2 inter-
   area prefix by rule (1) but will not be redistributed into the

Expiration Date December 1999                                   [Page 6]

INTERNET DRAFT                                                 June 1999

   original L1 area by rule (2).

   Even when following all the indicated rules, there is the possibility
   of a transient routing loop when the original prefix is withdrawn and
   the inter-area prefix is selected.  However, all link state protocols
   are subject to transient routing loops, so this is no worse than the
   status quo.

   Note that this proposal is not radically different than the current
   semantics for RFC 1195: internal metric types are always preferred
   over externals, so rule (3) is an extension that allows external
   metric types in internal prefix TLVs.  It does not introduce a new
   comparison between internal and external metric values.

3.2 Transition issues

   Because no implementations currently make use of the external metric
   type, the deployment of prefixes with an external metric type is
   somewhat problematic.  There is the possibility that the new type of
   advertisement may result in software instability in systems that do
   not deal with even the original semantics correctly.  Further, there
   is a danger that haphazard deployment of systems supporting this
   proposal and legacy systems would have an unfortunate interaction.
   It is required, for any L1 area that should perform the mutual
   redistribution described in this proposal, that the L1L2 systems be
   updated first.  If these systems operate correctly, this is
   sufficient to insure that there are no persistent routing loops. In
   case where L1L2 systems are not being upgraded, consistent routing
   loops are possible. Consider the following figure that gives an
   according example:

Expiration Date December 1999                                   [Page 7]

INTERNET DRAFT                                                 June 1999

            Level 1/8 @ 200
                 |
                 |     +-- L2 link cost 1 --+
                 |     |                    |
                 |    computes 1/8          |
                 |    @ 64  through (B)     |
                [1]    |                    |
                 V     V                    ^
              +--+--------+           +-----+-----+
              | new style |           | old style |
          (A) | L1/L2     |           | L1/L2     | (B)
              | leaks 1/8 |           | leaks up  |
              | @E-cost 63|           | @E-cost 63|
              +----+------+           +-------+---+
                   V                          ^
                   |                          |
                   |                    computes L1 route
                   |                    1/8 @ cost 128
                   |                          |
                   +---- L1 link with cost 1 -+

   Originally a prefix 1/8 with a cost of 200 is being computed by
   upgraded L1L2 router (A) as best route towards 1/8 through interface
   1. The prefix leaks at maximum cost of 63 (and with the I/E bit being
   set) into L1 domain and is used by L1L2 router (B) which has not been
   upgraded to compute best route to 1/8 at cost 128 in L1. We assume
   that (B) is not masking the I/E bit out but is using it as part of
   the metric, however the scenario holds as well in case (B) perceives
   the metric to be 63.  This L1 route will be preferred by (B) to a
   computed L2 route. Assuming that (B) leaks 1/8 into L2 domain, (A)
   will use it for another L2 computation that ends up with a shorter L2
   route to 1/8 through (B). Hence, a forwarding loop has been formed.

   As described in the previous section, rule (2) must be followed to
   prevent looping when this extension is deployed using L1 routers
   understanding the semantics of the L1 external metric mixed with
   RFC1195 routers that treat the metric as purely internal. The
   following example visualizes a forwarding loop encountered under
   those assumptions.

Expiration Date December 1999                                   [Page 8]

INTERNET DRAFT                                                 June 1999

                           1/8 with L2 @ cost 200
                               /
                              /
                        +==========+
                        | L1/L2    |   routing table for 1/8
                        | leaking  |   (top is active route)
                    (A) | 1/8 down |       L1   @ 131 active
                        | with cost|       L2   @ 200
                        | of 63 and|       L1-E @ 127
                        | I/E set  |
                        +====+====++
                             |     \
                             |      \
                             |       \ cost 1
                             |        \
                     cost 1  |       +-+--+ routing table
                             |   (C) | L1 |    L1-E @128 active
                             |       +---++    L1   @130
                             |            \
                             |             \
                             |            path with
                             |            total cost
      routing table      +---+-+          of 130
       L1-E @128 active  | L1  | (B)           \
       L1   @131         +---+-+                \
                             |                   \
                             |                    \
                             |                   +-+--+
                             +- path with -------+ L1 | advertises
                                total cost       +----+ 1/8 as L1
                                of 131             (D)  attached
                                                        prefix

   (A) is L1L2 router and leaks into L1 a prefix 1/8 that it computed
   through L2 at the maximum cost of 127 (or expressed differently, at
   cost 63 with I/E bit set) which violates rule (2). Router (B), (C),
   (D) are all purely RFC1195 compliant routers so they perceive the
   leaked prefix as internal L1. At the same time, (D) advertises the
   same prefix 1/8 as L1 directly attached subnet into L1. To
   distinguish the different copies, the leaked prefix is shown as L1-E
   (for L1 external metric).  (B) computes the L1-E route at a cost of
   127+1 and prefers it to the one through (D) since such a router has a
   cost of 131. Therefore (B) forwards a packet to 1/8 towards (A).  (A)
   cannot prefer the L1-E route since it could not really forward using
   it but has to use L2 to get the packet into the L2 backbone.
   However, L1 computed to (D) must be preferred to L2 based on usual
   preference rules.  Hence, (A) forwards the packet towards (C). (C)
   has the L1-E as preferred (since it looks like cheaper L1 route to

Expiration Date December 1999                                   [Page 9]

INTERNET DRAFT                                                 June 1999

   1/8 than the L1 route through (D)) and forwards the packet back to
   (A).

4.0 Comparisons with other proposals

   Another proposal is currently being discussed which is similar to
   this one in nature.

   In [3], a new TLV is proposed to transport IP prefix information.
   Because this is a new TLV, it is somewhat harder to deploy, requiring
   that all systems understand the new TLV before it can become
   effective.  For this reason, this proposal provides an alternative
   that can be deployed sooner.  There is no effective semantic
   difference between the two proposals.  In [3], a bit is defined to
   mark a prefix as 'up' or 'down'.  This is essentially the same
   semantics as is proposed here.

5.0 Security Considerations

   This document raises no new security issues for IS-IS.

6.0 References

   [1] ISO 10589, "Intermediate System to Intermediate System Intra-
   Domain Routeing Exchange Protocol for use in Conjunction with the
   Protocol for Providing the Connectionless-mode Network Service (ISO
   8473)" [Also republished as RFC 1142]

   [2] RFC 1195, "Use of OSI IS-IS for routing in TCP/IP and dual
   environments", R.W. Callon, Dec. 1990

   [3] Smit, H., Li, T. "IS-IS extensions for Traffic Engineering",
   draft-ietf-isis-traffic-00.txt, work in progress

7.0 Authors' Addresses

   Tony Li
   Li Consulting
   Email: tony1@home.net

   Tony Przygienda
   Siara Systems
   300 Ferguson Drive
   Mountain View, CA 94043
   Email: prz@siara.com
   Voice: +1 650 237 2173

Expiration Date December 1999                                  [Page 10]

INTERNET DRAFT                                                 June 1999

   Henk Smit
   Cisco Systems, Inc.
   210 West Tasman Drive
   San Jose, CA 95134
   Email: hsmit@cisco.com
   Voice: +31 20 342 3736

Expiration Date December 1999                                  [Page 11]