Routing over Large Clouds Working Group Juha Heinanen Request for Comments: DRAFT (Telecom Finland) Expires Apr 15, 1994 Ramesh Govindan (Bellcore) October 15, 1993 NBMA Next Hop Resolution Protocol (NHRP) Status of this Memo This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a ``working draft'' or ``work in progress.'' Please check the 1id- abstracts.txt listing contained in the internet-drafts Shadow Directories on nic.ddn.mil, nnsc.nsf.net, nic.nordu.net, ftp.nisc.sri.com, or munnari.oz.au to learn the current status of any Internet Draft. Abstract This document describes the NBMA Next Hop Resolution Protocol (NHRP). NHRP can be used by a source terminal (host or router) connected to a Non-Broadcast, Multi-Access link layer (NBMA) network to find out the IP and NBMA addresses of the "NBMA next hop" towards a destination terminal. The NBMA next hop is the destination terminal itself, if the destination is connected to the NBMA network. Otherwise, it is the egress router from the NBMA network that is "nearest" to the destination terminal. Although this document focuses on NHRP in the context of IP, the technique is applicable to other network layer protocols as well. 1. Introduction The NBMA Next Hop Resolution Protocol (NHRP) allows a source terminal (a host or router), wishing to communicate over a Non-Broadcast, Multi-Access link layer (NBMA) network, to find out the IP and NBMA addresses of the "NBMA next hop" towards a destination terminal. The NBMA next hop is the destination terminal itself, if the destination is connected to the NBMA network. Otherwise, it is the egress router Heinanen & Govindan Expires April 15, 1994 [Page 1] RFC DRAFT NBMA NHRP October 15, 1993 (out of the NBMA network) nearest to the destination terminal. Conventional hop-by-hop IP routing may not be sufficient to resolve the "NBMA next hop" towards the destination terminal. An NBMA network may, in general, consist of multiple logically independent IP subnets (LISs, [3]); IP routing would only resolve the next hop LIS towards the destination terminal. Once the NBMA next hop has been resolved, the source may either start sending IP packets to the destination (in a connectionless NBMA network such as SMDS) or may first establish a connection to the destination with the desired bandwidth and QOS characteristics (in a connection oriented NBMA network such as ATM). An NBMA network can be non-broadcast either because it technically doesn't support broadcasting (e.g. an X.25 network) or because broadcasting is not feasible for one reason or another (e.g. an SMDS broadcast group or an extended Ethernet would be too large). 2. Protocol Overview In this section, we briefly describe how a source S uses NHRP to determine the "NBMA next hop" to destination D. S first determines the next hop to D through normal routing processes. If this next hop is reachable through its NBMA interface, S formulates an NHRP request containing the source and destination IP addresses and QOS information. S then forwards the request to an entity called the "Next Hop Server" (NHS). For administrative and policy reasons, a physical NBMA network may be partitioned into several disjoint logical NBMA networks (discussed later in this section); NHSs cooperatively resolve the NBMA next hop within their logical NBMA network. Unless otherwise specified, we use NBMA network to mean logical NBMA network. Each NHS "serves" a pre-configured set of terminals and peers with a pre-configured set of NHSs, which all belong to the same NBMA network. An NHS exchanges routing information with its peers (and possibly with the terminals it serves) using regular routing protocols. (However, an NHS, unless it is also an egress/ingress router, need not necessarily be able to switch regular IP packets). This exchange is used to construct a forwarding table per QOS in every NHS. The forwarding table determines the next hop NHS towards the NHRP request's destination. This next hop NHS may depend on the request's QOS information. After receiving an NHRP request, the NHS checks if it "serves" D. If so, the NHS resolves D's NBMA address, using mechanisms beyond the Heinanen & Govindan Expires April 15, 1994 [Page 2] RFC DRAFT NBMA NHRP October 15, 1993 scope of this document (examples of such mechanisms include ARP [1, 2] and pre-configured tables). The NHS then either forwards the NHRP request to D or generates a positive NHRP reply on its behalf. The reply contains D's (D is S's NBMA next hop) IP and NBMA address and is sent back to S. NHRP replies usually traverse the same sequence of NHSs as the NHRP request (in reverse order, of course). If the NHS does not serve D, it extracts from its forwarding table the next hop towards D. If no such next hop entry is found, the NHS generates a negative NHRP reply. If the next hop is behind the NHS's NBMA interface, the NHS forwards the NHRP request to the next hop. If the next hop is behind some other interface, the NHS may be willing to act as an egress router for traffic bound to D. In that case, the NHS generates a positive NHRP reply containing its own IP and NBMA address (i.e., the NHS is the NBMA next hop from S to D). An NHS receiving an NHRP reply may cache the NBMA next hop information contained therein. To a subsequent NHRP request, this NHS might respond with the cached, non-authoritative, NBMA next hop or with cached negative information. If a communication attempt based on non-authoritative information fails, a source terminal can choose to send an authoritative NHRP request. NHSs never respond to authoritative NHRP requests with cached information. NHRP requests and replies never cross the borders of a logical NBMA network. Thus, IP traffic out of and into a logical NBMA network always traverses an IP router at its border. Network layer filtering can then be implemented at these border routers. NHRP provides a mechanism to aggregate NBMA next hop information in NHS caches. Suppose that NHS X is the NBMA next hop from S to D. Suppose further that X is an egress router for all terminals sharing an IP address prefix with D. When X generates an NHRP reply in response to a request, it may replace the IP address of D with this prefix. The prefix to egress router mapping in the reply is cached in all NHSs on the path of the reply. A subsequent (non- authoritative) NHRP request for some destination that shares an IP address prefix with D can be satisfied with this cached information. To dynamically detect link-layer filtering in NBMA networks, NHRP incorporates a "Route record" in replies. This Route record contains the network and link layer addresses of intermediate NHSs willing to route packets from the source to the destination prefix. When a source terminal is unable to open a connection to the responder, it attempts to do so successively with one of the NHSs in the Route record until it succeeds. This approach finds the optimal best hop in Heinanen & Govindan Expires April 15, 1994 [Page 3] RFC DRAFT NBMA NHRP October 15, 1993 the presence of link-layer filtering. 3. Configuration Terminals To participate in NHRP, a terminal connected to an NBMA network should to be configured with the IP address(es) of its NHS(s). These NHS(s) may be physically located on the terminal's default or peer routers, so their addresses may be obtained from the terminal's IP forwarding table. If the terminal is attached to several link layer networks (including logical NBMA networks), it should also be configured to receive routing information from its NHS(s) and peer routers so that the terminal can determine which IP networks are reachable through which link layer networks. Next Hop Servers An NHS is configured with a set of IP address prefixes that correspond to the IP addresses of the terminals it is serving. Moreover, the NHS must be configured to exchange routing information with its peer NHSs (if any). If a served terminal is attached to several link layer networks, the NHS may also need to be configured to advertize routing information to such terminals. If an NHS is acting as an egress router for terminals connected to other link layer networks than the NBMA network, the NHS must, in addition to the above, be configured to exchange routing information between the NBMA network and these other link layer networks. In all cases, routing information is exchanged using regular intra-domain and/or inter-domain routing protocols. 4. Packet Formats NHRP requests and replies are carried as ICMP messages. This section describes the packet formats of NHRP requests and replies: NHRP Request Heinanen & Govindan Expires April 15, 1994 [Page 4] RFC DRAFT NBMA NHRP October 15, 1993 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Code | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Hop Count | Unused | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Destination IP address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source IP address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type 19 Code A response to an NHRP request may contain cached information. If an authoritative answer is desired, then code 2 (NHRP request for authoritative information) should be used. Otherwise, a code value of 1 (NHRP request) should be used. Hop Count The Hop count indicates the maximum number of NHSs that a request or reply is allowed to traverse before being discarded. Source and Destination IP Addresses Respectively, these are the IP addresses of the NHRP request initiator and the terminal for which the NBMA next hop is desired. NHRP Reply 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Code | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Hop Count | Unused | Route record length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source IP address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Destination IP address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Destination mask | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | NHRP route record (variable) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Heinanen & Govindan Expires April 15, 1994 [Page 5] RFC DRAFT NBMA NHRP October 15, 1993 Type 20 Code NHRP replies may be positive or negative. An NHRP positive, non-authoritative reply carries a code of 1, while a positive, authoritative reply carries a code of 2. An NHRP negative, non-authoritative reply carries a code of 3 and a negative, authoritative reply carries a code of 4. An NHS is not allowed to reply to an NHRP request for authoritative information with cached information, but may do so for an NHRP Request. Route Record Length The length in words of the NHRP route record (see below). Source IP Address The address of the initiator of the corresponding NHRP request. Destination IP Address and Mask If the NHRP Request's destination is on the NBMA, the reply contains that destination address and a mask of all 1s. Otherwise, the responder may choose to act as the egress router for all terminals in the destination's subnet. If so, the reply contains a prefix of the requested destination IP address and the corresponding mask. NHRP Route Record The NHRP route record is a list of NHRP "Route elements" for NHSs on the path of a positive NHRP reply. Only NHSs that are willing to act as egress routers for packets from the source to the destination insert a Route element in the NHRP reply. Negative replies do not carry Route elements. The first Route element is always that of the destination terminal or, if the destination is not directly attached to the NBMA, that of the responding egress router. Each Route element is formatted as follows: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IP address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | LL length | Link Layer (LL) address | +-+-+-+-+-+-+-+-+-+ | | (variable length) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The LL length field is the length of the link layer address in bits. The LL address itself is zero-filled to the nearest 32-bit Heinanen & Govindan Expires April 15, 1994 [Page 6] RFC DRAFT NBMA NHRP October 15, 1993 boundary. On the reply path, an NHS willing to route packets from source to the destination prefix should append its Route element to the current Route Record, adjust the Route record length appropriately, and recompute the ICMP checksum. The Route record is used to discover link layer filters, as described in Section 2. If the first Route element's IP address and the destination's IP address differ, the source terminal may assume that the reply was generated by an egress router. An NHS may cache replies containing a Route record. Subsequently, when it responds to an NHRP request with the cached reply, intermediate NHSs on the path to the initiator may attach Route elements to the reply. 5. Protocol Operation The external behavior of an NHS may be described in terms of two procedures (processRequest and processReply) operating on two tables (forwardingTable and cacheTable). In an actual implementation, the code and data structures may be realized differently. Each NHS has for each supported QOS an NHRP forwardingTable consisting of entries with the fields: In case the NHS is also a host or serving as a router, the NHRP forwarding table may be integrated with the normal IP forwarding table of the NHS. The networkLayerAddrPrefix field identifies a set of network layer addresses known to the NHS. It consists of two subfields . The type field indicates the type of the networkLayerAddrPrefix. The possible values are: - locallyServed: The NHS is itself serving the networkLayerAddrPrefix. The outIf field denotes the NBMA interface via which the served terminals can be reached and the outIfAddr field has no meaning. Such a forwardingTable entry has been created by manual configuration. - nhsLearned: The NHS has learned about the networkLayerAddrPrefix from another NHS. The outIf and outIfAddr fields, respectively, Heinanen & Govindan Expires April 15, 1994 [Page 7] RFC DRAFT NBMA NHRP October 15, 1993 denote the NBMA interface and IP address of this next hop NHS. Such a forwardingTable entry is a result of network layer address prefix information exchange with one of the NHS's peers. - externallyLearned: The NHS has learned about the networkLayerAddrPrefix via normal IP routing from outside of the NBMA network. In this case, the NHS may act as an egress router for the terminals sharing the networkLayerAddrPrefix. The outIf and outIfAddr fields, respectively, denote the interface and IP address of the next hop router. If the outIfAddr field is empty, the networkLayerAddrPrefix is assumed to be directly connected to the outIf of the NHS. The protocol used to exchange networkLayerAddrPrefix information among the NHSs or between NHSs and their peer routers can be any regular IP intra-domain or inter-domain routing protocol. In addition to the forwardingTable, each NHS has for each supported QOS an NHRP cacheTable consisting of entries with the fields: The entries in the cacheTable are learned from NHRP replies traversing the NHS. The networkLayerAddrPrefix field identifies a set of IP addresses sharing a common Route record. The networkLayerAddrPrefix field consists of two subfields . The routeElementList field is either empty (in case of a negative cache entry) or consists of a list of subfields of the form . The cacheTable entries could also include a timeStamp field to be used to age cacheTable entries after a certain hold period. The following pseudocode defines how NBMA NHRP requests and replies are processed by an NHS. Heinanen & Govindan Expires April 15, 1994 [Page 8] RFC DRAFT NBMA NHRP October 15, 1993 procedure processRequest(request); let bestMatch == matchForwardingTable(request.dIPa) do if bestMatch then if bestMatch.type == locallyServed then let nbmaAddr == arp(request.dIPa) do if nbmaAddr then genPosAuthReply(request.sIPa, request.dIPa, 0xFFFFFFFF, request.dIPa, nbmaAddr) else genNegAuthReply(request.sIPa, request.dIPa) end end elseif bestMatch.type == nhsLearned then if not requestForAuthInfo?(request) then let cacheMatch == matchCacheTable(request.dIPa) do if cacheMatch then if cacheMatch.routeElementList == EMPTY then genNegNonAuthReply(request.sIPa, request.dIPa) else genPosNonAuthReply(request.sIPa, cacheMatch.networkLayerAddrPrefix.ipAddr, cacheMatch.networkLayerAddrPrefix.mask, cacheMatch.routeElementList); end else /* no cache match */ forwardRequest(request, bestMatch.OutIf, bestMatch.OutIfAddr) end end else /* request for authoritative information */ forwardRequest(request, bestMatch.OutIf, bestMatch.OutIfAddr) end else /* bestMatch.type == externallyLearned */ genPosAuthReply(request.sIPa, bestMatch.networkLayerAddrPrefix.ipAddr, bestMatch.networkLayerAddrPrefix.mask, selfIpAddr, selfNbmaAddr) end else /* no match in forwardingTable */ genNegAuthReply(request.sIPa, request.dIPa) end end end Heinanen & Govindan Expires April 15, 1994 [Page 9] RFC DRAFT NBMA NHRP October 15, 1993 procedure processReply(reply); addCacheTableEntry(reply.dIPa, reply.dm, reply.routeRecord); if reply.sIPa == selfIpAddr then /* reply is to the NHS itself */ else let bestMatch == matchForwardingTable(reply.sIPa) do if bestMatch then if bestMatch.type != externallyLearned then forwardReply(reply, bestMatch.outIf, bestMatch.outIfAddr) else /* bestMatch.type == externallyLearned */ /* request should never originate outside of the NBMA */ end end end end end The semantics of the procedures and constants used in the pseudocode are explained below. matchForwardingTable(ipAddress) returns the forwardingTable entry whose networkLayerAddrPrefix field is the longest match for ipAddress or FALSE if no match is found. arp(ipAddress) resolves the NBMA address corresponding to ipAddress. It returns FALSE if the resolution fails. genPosAuthReply(sourceIpAddr, destinationIpAddr, destinationMask, originatorIpAddr, originatorNbmaAddr) generates a positive, authoritative reply with sourceIpAddr, destinationIpAddr, and destinationMask in Source IP address, Destination IP address and Destination mask fields, respectively. The Route record field of the reply consists of one Route element that contains originatorIpAddr and originatorNbmaAddr as its IP and Link layer addresses. genNegAuthReply(sourceIpAddr, destinationIpAddr) and genNegNonAuthReply(sourceIpAddr, destinationIpAddr) respectively generate a negative, authoritative and non-authoritative reply with sourceIpAddr and destinationIpAddr in Source IP address and Destination IP address fields. The Destination mask field has always value 0xFFFFFFFF and the route Record field is empty. selfIpAddr and selfNbmaAddr denote the egress router's own IP and NBMA addresses in the NBMA via which its peer NHSs can be reached. requestForAuthInfo?(request) tests if request is a Request for authoritative information. Heinanen & Govindan Expires April 15, 1994 [Page 10] RFC DRAFT NBMA NHRP October 15, 1993 matchCacheTable(ipAddr) returns a cacheTable entry whose networkLayerAddr field is the best match for ipAddr or FALSE if no match is found. genPosNonAuthReply(sourceIpAddr, destinationIpAddr, destinationMask, routeElementList) generates a positive, non-authoritative reply with sourceIpAddr, destinationIpAddr, and destinationMask in Source IP address, Destination IP address and Destination mask fields, respectively. The Route record field of the reply is constructed from routeElementList. forwardRequest(request, interface, ipAddr) decrements the Hop count field of request, recomputes the ICMP Checksum field, and forwards request to ipAddr of interface provided that the value of the Hop count field remains positive. addCacheTableEntry(ipAddr, mask, routeRecord) adds a new entry to the cacheTable or overwrites an existing entry whose networkLayerAddrPrefix field is equal to . A new entry is not added if matchCacheTable(ipAddr) returns an entry whose routeElementList is equivalent to routeRecord. The networkLayerAddrPrefix field of the new entry is . The routeElementList field is constructed from routeRecord. In addition, if the NHS processing the reply would be willing to serve as an egress router for , it should add a new Route element to the end of the routeElementList field. forwardReply(reply, interface, ipAddr) decrements the Hop count field of request, recomputes the ICMP Checksum field, and forwards request to ipAddr of interface provided that the value of the Hop count field remains positive. If the NHS processing the reply would be willing to serve as an egress router for , it should, before recomputing the Checksum field, add a new Route element to the end of reply.routeRecord. An NBMA terminal has, for each supported QOS, a forwardingTable and one or more cacheTables. The former can be the terminal's IP forwarding table and is either manually configured or filled via routing information exchange with the terminal's NHSs or peer routers. There is one cacheTable per connected NBMA network. If the terminal's forwardingTable shows that a particular destination is behind an NBMA network, the terminal first consults the corresponding cacheTable. If no match is found, it generates an NHRP request to the NHS pointed to by the forwardingTable entry. When the reply arrives, the terminal updates the appropriate cacheTable in the same way as an NHS does. Heinanen & Govindan Expires April 15, 1994 [Page 11] RFC DRAFT NBMA NHRP October 15, 1993 6. Discussion The result of an NHRP request depends on how routing is configured among the NHSs of an NBMA network. If the destination terminal is directly connected to the NBMA network and the NHSs always prefer NBMA routes over routes via other link layer networks, the NHRP replies always return the NBMA address of the destination terminal itself rather than the NBMA address of some egress router. For destinations outside the NBMA network, egress routers and routers in the other link layer networks should exchange routing information so that the optimal egress router is always found. When the NBMA next hop towards a destination is not the destination terminal itself, the optimal NBMA next hop may change dynamically. This can happen, for instance, when an egress router nearer to the destination becomes available. To detect this change, a source terminal can periodically reissue the NHRP request. Alternatively, the source can be configured to receive routing information from its NHSs. When it detects an improvement in the route to the destination, the source can reissue the NHRP request to obtain the current optimal NBMA next hop. In addition to NHSs, an NBMA terminal could also be associated with one or more regular routers that could act as "connectionless servers" for the terminal. Then the terminal could choose to resolve the NBMA next hop or just send the IP packets to one of the terminal's connectionless servers. The latter option may be desirable if communication with the destination is short-lived and/or doesn't require much network resources. The connectionless servers could, of course, be physically integrated in the NHSs by augmenting them with IP switching functionality. NHRP supports portability of NBMA terminals. A terminal can be moved anywhere within the NBMA network and still keep its original IP address as long as its NHS(s) remain the same. Requests for authoritative information will always return the correct link layer address. References [1] Address Resolution Protocol, David C. Plummer, RFC 826. [2] Classical IP and ARP over ATM, Mark Laubach, Internet Draft. [3] Transmission of IP datagrams over the SMDS service, J. Lawrence and D. Piscitello, RFC 1209. Heinanen & Govindan Expires April 15, 1994 [Page 12] RFC DRAFT NBMA NHRP October 15, 1993 Acknowledgements We would like to thank John Burnett of Adaptive, Dennis Ferguson of ANS, Joel Halpern of Network Systems, and Paul Francis of Bellcore for their valuable insight and comments to earlier versions of this draft. Authors' Addresses Juha Heinanen Ramesh Govindan Telecom Finland, Bell Communications Research PO Box 228, MRE 2P-341, 445 South Street SF-33101 Tampere, Morristown, NJ 07960 Finland Phone: +358 49 500 958 Phone: +1 201 829 4406 Email: Juha.Heinanen@datanet.tele.fi Email: rxg@thumper.bellcore.com Heinanen & Govindan Expires April 15, 1994 [Page 13] Routing over Large Clouds Working Group Juha Heinanen Request for Comments: DRAFT (Telecom Finland) Expires Apr 15, 1994 Ramesh Govindan (Bellcore) October 15, 1993 NBMA Next Hop Resolution Protocol (NHRP) Status of this Memo This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a ``working draft'' or ``work in progress.'' Please check the 1id- abstracts.txt listing contained in the internet-drafts Shadow Directories on nic.ddn.mil, nnsc.nsf.net, nic.nordu.net, ftp.nisc.sri.com, or munnari.oz.au to learn the current status of any Internet Draft. Abstract This document describes the NBMA Next Hop Resolution Protocol (NHRP). NHRP can be used by a source terminal (host or router) connected to a Non-Broadcast, Multi-Access link layer (NBMA) network to find out the IP and NBMA addresses of the "NBMA next hop" towards a destination terminal. The NBMA next hop is the destination terminal itself, if the destination is connected to the NBMA network. Otherwise, it is the egress router from the NBMA network that is "nearest" to the destination terminal. Although this document focuses on NHRP in the context of IP, the technique is applicable to other network layer protocols as well. 1. Introduction The NBMA Next Hop Resolution Protocol (NHRP) allows a source terminal (a host or router), wishing to communicate over a Non-Broadcast, Multi-Access link layer (NBMA) network, to find out the IP and NBMA addresses of the "NBMA next hop" towards a destination terminal. The NBMA next hop is the destination terminal itself, if the destination is connected to the NBMA network. Otherwise, it is the egress router Heinanen & Govindan Expires April 15, 1994 [Page 1] RFC DRAFT NBMA NHRP October 15, 1993 (out of the NBMA network) nearest to the destination terminal. Conventional hop-by-hop IP routing may not be sufficient to resolve the "NBMA next hop" towards the destination terminal. An NBMA network may, in general, consist of multiple logically independent IP subnets (LISs, [3]); IP routing would only resolve the next hop LIS towards the destination terminal. Once the NBMA next hop has been resolved, the source may either start sending IP packets to the destination (in a connectionless NBMA network such as SMDS) or may first establish a connection to the destination with the desired bandwidth and QOS characteristics (in a connection oriented NBMA network such as ATM). An NBMA network can be non-broadcast either because it technically doesn't support broadcasting (e.g. an X.25 network) or because broadcasting is not feasible for one reason or another (e.g. an SMDS broadcast group or an extended Ethernet would be too large). 2. Protocol Overview In this section, we briefly describe how a source S uses NHRP to determine the "NBMA next hop" to destination D. S first determines the next hop to D through normal routing processes. If this next hop is reachable through its NBMA interface, S formulates an NHRP request containing the source and destination IP addresses and QOS information. S then forwards the request to an entity called the "Next Hop Server" (NHS). For administrative and policy reasons, a physical NBMA network may be partitioned into several disjoint logical NBMA networks (discussed later in this section); NHSs cooperatively resolve the NBMA next hop within their logical NBMA network. Unless otherwise specified, we use NBMA network to mean logical NBMA network. Each NHS "serves" a pre-configured set of terminals and peers with a pre-configured set of NHSs, which all belong to the same NBMA network. An NHS exchanges routing information with its peers (and possibly with the terminals it serves) using regular routing protocols. (However, an NHS, unless it is also an egress/ingress router, need not necessarily be able to switch regular IP packets). This exchange is used to construct a forwarding table per QOS in every NHS. The forwarding table determines the next hop NHS towards the NHRP request's destination. This next hop NHS may depend on the request's QOS information. After receiving an NHRP request, the NHS checks if it "serves" D. If so, the NHS resolves D's NBMA address, using mechanisms beyond the Heinanen & Govindan Expires April 15, 1994 [Page 2] RFC DRAFT NBMA NHRP October 15, 1993 scope of this document (examples of such mechanisms include ARP [1, 2] and pre-configured tables). The NHS then either forwards the NHRP request to D or generates a positive NHRP reply on its behalf. The reply contains D's (D is S's NBMA next hop) IP and NBMA address and is sent back to S. NHRP replies usually traverse the same sequence of NHSs as the NHRP request (in reverse order, of course). If the NHS does not serve D, it extracts from its forwarding table the next hop towards D. If no such next hop entry is found, the NHS generates a negative NHRP reply. If the next hop is behind the NHS's NBMA interface, the NHS forwards the NHRP request to the next hop. If the next hop is behind some other interface, the NHS may be willing to act as an egress router for traffic bound to D. In that case, the NHS generates a positive NHRP reply containing its own IP and NBMA address (i.e., the NHS is the NBMA next hop from S to D). An NHS receiving an NHRP reply may cache the NBMA next hop information contained therein. To a subsequent NHRP request, this NHS might respond with the cached, non-authoritative, NBMA next hop or with cached negative information. If a communication attempt based on non-authoritative information fails, a source terminal can choose to send an authoritative NHRP request. NHSs never respond to authoritative NHRP requests with cached information. NHRP requests and replies never cross the borders of a logical NBMA network. Thus, IP traffic out of and into a logical NBMA network always traverses an IP router at its border. Network layer filtering can then be implemented at these border routers. NHRP provides a mechanism to aggregate NBMA next hop information in NHS caches. Suppose that NHS X is the NBMA next hop from S to D. Suppose further that X is an egress router for all terminals sharing an IP address prefix with D. When X generates an NHRP reply in response to a request, it may replace the IP address of D with this prefix. The prefix to egress router mapping in the reply is cached in all NHSs on the path of the reply. A subsequent (non- authoritative) NHRP request for some destination that shares an IP address prefix with D can be satisfied with this cached information. To dynamically detect link-layer filtering in NBMA networks, NHRP incorporates a "Route record" in replies. This Route record contains the network and link layer addresses of intermediate NHSs willing to route packets from the source to the destination prefix. When a source terminal is unable to open a connection to the responder, it attempts to do so successively with one of the NHSs in the Route record until it succeeds. This approach finds the optimal best hop in Heinanen & Govindan Expires April 15, 1994 [Page 3] RFC DRAFT NBMA NHRP October 15, 1993 the presence of link-layer filtering. 3. Configuration Terminals To participate in NHRP, a terminal connected to an NBMA network should to be configured with the IP address(es) of its NHS(s). These NHS(s) may be physically located on the terminal's default or peer routers, so their addresses may be obtained from the terminal's IP forwarding table. If the terminal is attached to several link layer networks (including logical NBMA networks), it should also be configured to receive routing information from its NHS(s) and peer routers so that the terminal can determine which IP networks are reachable through which link layer networks. Next Hop Servers An NHS is configured with a set of IP address prefixes that correspond to the IP addresses of the terminals it is serving. Moreover, the NHS must be configured to exchange routing information with its peer NHSs (if any). If a served terminal is attached to several link layer networks, the NHS may also need to be configured to advertize routing information to such terminals. If an NHS is acting as an egress router for terminals connected to other link layer networks than the NBMA network, the NHS must, in addition to the above, be configured to exchange routing information between the NBMA network and these other link layer networks. In all cases, routing information is exchanged using regular intra-domain and/or inter-domain routing protocols. 4. Packet Formats NHRP requests and replies are carried as ICMP messages. This section describes the packet formats of NHRP requests and replies: NHRP Request Heinanen & Govindan Expires April 15, 1994 [Page 4] RFC DRAFT NBMA NHRP October 15, 1993 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Code | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Hop Count | Unused | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Destination IP address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source IP address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type 19 Code A response to an NHRP request may contain cached information. If an authoritative answer is desired, then code 2 (NHRP request for authoritative information) should be used. Otherwise, a code value of 1 (NHRP request) should be used. Hop Count The Hop count indicates the maximum number of NHSs that a request or reply is allowed to traverse before being discarded. Source and Destination IP Addresses Respectively, these are the IP addresses of the NHRP request initiator and the terminal for which the NBMA next hop is desired. NHRP Reply 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Code | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Hop Count | Unused | Route record length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source IP address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Destination IP address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Destination mask | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | NHRP route record (variable) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Heinanen & Govindan Expires April 15, 1994 [Page 5] RFC DRAFT NBMA NHRP October 15, 1993 Type 20 Code NHRP replies may be positive or negative. An NHRP positive, non-authoritative reply carries a code of 1, while a positive, authoritative reply carries a code of 2. An NHRP negative, non-authoritative reply carries a code of 3 and a negative, authoritative reply carries a code of 4. An NHS is not allowed to reply to an NHRP request for authoritative information with cached information, but may do so for an NHRP Request. Route Record Length The length in words of the NHRP route record (see below). Source IP Address The address of the initiator of the corresponding NHRP request. Destination IP Address and Mask If the NHRP Request's destination is on the NBMA, the reply contains that destination address and a mask of all 1s. Otherwise, the responder may choose to act as the egress router for all terminals in the destination's subnet. If so, the reply contains a prefix of the requested destination IP address and the corresponding mask. NHRP Route Record The NHRP route record is a list of NHRP "Route elements" for NHSs on the path of a positive NHRP reply. Only NHSs that are willing to act as egress routers for packets from the source to the destination insert a Route element in the NHRP reply. Negative replies do not carry Route elements. The first Route element is always that of the destination terminal or, if the destination is not directly attached to the NBMA, that of the responding egress router. Each Route element is formatted as follows: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IP address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | LL length | Link Layer (LL) address | +-+-+-+-+-+-+-+-+-+ | | (variable length) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The LL length field is the length of the link layer address in bits. The LL address itself is zero-filled to the nearest 32-bit Heinanen & Govindan Expires April 15, 1994 [Page 6] RFC DRAFT NBMA NHRP October 15, 1993 boundary. On the reply path, an NHS willing to route packets from source to the destination prefix should append its Route element to the current Route Record, adjust the Route record length appropriately, and recompute the ICMP checksum. The Route record is used to discover link layer filters, as described in Section 2. If the first Route element's IP address and the destination's IP address differ, the source terminal may assume that the reply was generated by an egress router. An NHS may cache replies containing a Route record. Subsequently, when it responds to an NHRP request with the cached reply, intermediate NHSs on the path to the initiator may attach Route elements to the reply. 5. Protocol Operation The external behavior of an NHS may be described in terms of two procedures (processRequest and processReply) operating on two tables (forwardingTable and cacheTable). In an actual implementation, the code and data structures may be realized differently. Each NHS has for each supported QOS an NHRP forwardingTable consisting of entries with the fields: In case the NHS is also a host or serving as a router, the NHRP forwarding table may be integrated with the normal IP forwarding table of the NHS. The networkLayerAddrPrefix field identifies a set of network layer addresses known to the NHS. It consists of two subfields . The type field indicates the type of the networkLayerAddrPrefix. The possible values are: - locallyServed: The NHS is itself serving the networkLayerAddrPrefix. The outIf field denotes the NBMA interface via which the served terminals can be reached and the outIfAddr field has no meaning. Such a forwardingTable entry has been created by manual configuration. - nhsLearned: The NHS has learned about the networkLayerAddrPrefix from another NHS. The outIf and outIfAddr fields, respectively, Heinanen & Govindan Expires April 15, 1994 [Page 7] RFC DRAFT NBMA NHRP October 15, 1993 denote the NBMA interface and IP address of this next hop NHS. Such a forwardingTable entry is a result of network layer address prefix information exchange with one of the NHS's peers. - externallyLearned: The NHS has learned about the networkLayerAddrPrefix via normal IP routing from outside of the NBMA network. In this case, the NHS may act as an egress router for the terminals sharing the networkLayerAddrPrefix. The outIf and outIfAddr fields, respectively, denote the interface and IP address of the next hop router. If the outIfAddr field is empty, the networkLayerAddrPrefix is assumed to be directly connected to the outIf of the NHS. The protocol used to exchange networkLayerAddrPrefix information among the NHSs or between NHSs and their peer routers can be any regular IP intra-domain or inter-domain routing protocol. In addition to the forwardingTable, each NHS has for each supported QOS an NHRP cacheTable consisting of entries with the fields: The entries in the cacheTable are learned from NHRP replies traversing the NHS. The networkLayerAddrPrefix field identifies a set of IP addresses sharing a common Route record. The networkLayerAddrPrefix field consists of two subfields . The routeElementList field is either empty (in case of a negative cache entry) or consists of a list of subfields of the form . The cacheTable entries could also include a timeStamp field to be used to age cacheTable entries after a certain hold period. The following pseudocode defines how NBMA NHRP requests and replies are processed by an NHS. Heinanen & Govindan Expires April 15, 1994 [Page 8] RFC DRAFT NBMA NHRP October 15, 1993 procedure processRequest(request); let bestMatch == matchForwardingTable(request.dIPa) do if bestMatch then if bestMatch.type == locallyServed then let nbmaAddr == arp(request.dIPa) do if nbmaAddr then genPosAuthReply(request.sIPa, request.dIPa, 0xFFFFFFFF, request.dIPa, nbmaAddr) else genNegAuthReply(request.sIPa, request.dIPa) end end elseif bestMatch.type == nhsLearned then if not requestForAuthInfo?(request) then let cacheMatch == matchCacheTable(request.dIPa) do if cacheMatch then if cacheMatch.routeElementList == EMPTY then genNegNonAuthReply(request.sIPa, request.dIPa) else genPosNonAuthReply(request.sIPa, cacheMatch.networkLayerAddrPrefix.ipAddr, cacheMatch.networkLayerAddrPrefix.mask, cacheMatch.routeElementList); end else /* no cache match */ forwardRequest(request, bestMatch.OutIf, bestMatch.OutIfAddr) end end else /* request for authoritative information */ forwardRequest(request, bestMatch.OutIf, bestMatch.OutIfAddr) end else /* bestMatch.type == externallyLearned */ genPosAuthReply(request.sIPa, bestMatch.networkLayerAddrPrefix.ipAddr, bestMatch.networkLayerAddrPrefix.mask, selfIpAddr, selfNbmaAddr) end else /* no match in forwardingTable */ genNegAuthReply(request.sIPa, request.dIPa) end end end Heinanen & Govindan Expires April 15, 1994 [Page 9] RFC DRAFT NBMA NHRP October 15, 1993 procedure processReply(reply); addCacheTableEntry(reply.dIPa, reply.dm, reply.routeRecord); if reply.sIPa == selfIpAddr then /* reply is to the NHS itself */ else let bestMatch == matchForwardingTable(reply.sIPa) do if bestMatch then if bestMatch.type != externallyLearned then forwardReply(reply, bestMatch.outIf, bestMatch.outIfAddr) else /* bestMatch.type == externallyLearned */ /* request should never originate outside of the NBMA */ end end end end end The semantics of the procedures and constants used in the pseudocode are explained below. matchForwardingTable(ipAddress) returns the forwardingTable entry whose networkLayerAddrPrefix field is the longest match for ipAddress or FALSE if no match is found. arp(ipAddress) resolves the NBMA address corresponding to ipAddress. It returns FALSE if the resolution fails. genPosAuthReply(sourceIpAddr, destinationIpAddr, destinationMask, originatorIpAddr, originatorNbmaAddr) generates a positive, authoritative reply with sourceIpAddr, destinationIpAddr, and destinationMask in Source IP address, Destination IP address and Destination mask fields, respectively. The Route record field of the reply consists of one Route element that contains originatorIpAddr and originatorNbmaAddr as its IP and Link layer addresses. genNegAuthReply(sourceIpAddr, destinationIpAddr) and genNegNonAuthReply(sourceIpAddr, destinationIpAddr) respectively generate a negative, authoritative and non-authoritative reply with sourceIpAddr and destinationIpAddr in Source IP address and Destination IP address fields. The Destination mask field has always value 0xFFFFFFFF and the route Record field is empty. selfIpAddr and selfNbmaAddr denote the egress router's own IP and NBMA addresses in the NBMA via which its peer NHSs can be reached. requestForAuthInfo?(request) tests if request is a Request for authoritative information. Heinanen & Govindan Expires April 15, 1994 [Page 10] RFC DRAFT NBMA NHRP October 15, 1993 matchCacheTable(ipAddr) returns a cacheTable entry whose networkLayerAddr field is the best match for ipAddr or FALSE if no match is found. genPosNonAuthReply(sourceIpAddr, destinationIpAddr, destinationMask, routeElementList) generates a positive, non-authoritative reply with sourceIpAddr, destinationIpAddr, and destinationMask in Source IP address, Destination IP address and Destination mask fields, respectively. The Route record field of the reply is constructed from routeElementList. forwardRequest(request, interface, ipAddr) decrements the Hop count field of request, recomputes the ICMP Checksum field, and forwards request to ipAddr of interface provided that the value of the Hop count field remains positive. addCacheTableEntry(ipAddr, mask, routeRecord) adds a new entry to the cacheTable or overwrites an existing entry whose networkLayerAddrPrefix field is equal to . A new entry is not added if matchCacheTable(ipAddr) returns an entry whose routeElementList is equivalent to routeRecord. The networkLayerAddrPrefix field of the new entry is . The routeElementList field is constructed from routeRecord. In addition, if the NHS processing the reply would be willing to serve as an egress router for , it should add a new Route element to the end of the routeElementList field. forwardReply(reply, interface, ipAddr) decrements the Hop count field of request, recomputes the ICMP Checksum field, and forwards request to ipAddr of interface provided that the value of the Hop count field remains positive. If the NHS processing the reply would be willing to serve as an egress router for , it should, before recomputing the Checksum field, add a new Route element to the end of reply.routeRecord. An NBMA terminal has, for each supported QOS, a forwardingTable and one or more cacheTables. The former can be the terminal's IP forwarding table and is either manually configured or filled via routing information exchange with the terminal's NHSs or peer routers. There is one cacheTable per connected NBMA network. If the terminal's forwardingTable shows that a particular destination is behind an NBMA network, the terminal first consults the corresponding cacheTable. If no match is found, it generates an NHRP request to the NHS pointed to by the forwardingTable entry. When the reply arrives, the terminal updates the appropriate cacheTable in the same way as an NHS does. Heinanen & Govindan Expires April 15, 1994 [Page 11] RFC DRAFT NBMA NHRP October 15, 1993 6. Discussion The result of an NHRP request depends on how routing is configured among the NHSs of an NBMA network. If the destination terminal is directly connected to the NBMA network and the NHSs always prefer NBMA routes over routes via other link layer networks, the NHRP replies always return the NBMA address of the destination terminal itself rather than the NBMA address of some egress router. For destinations outside the NBMA network, egress routers and routers in the other link layer networks should exchange routing information so that the optimal egress router is always found. When the NBMA next hop towards a destination is not the destination terminal itself, the optimal NBMA next hop may change dynamically. This can happen, for instance, when an egress router nearer to the destination becomes available. To detect this change, a source terminal can periodically reissue the NHRP request. Alternatively, the source can be configured to receive routing information from its NHSs. When it detects an improvement in the route to the destination, the source can reissue the NHRP request to obtain the current optimal NBMA next hop. In addition to NHSs, an NBMA terminal could also be associated with one or more regular routers that could act as "connectionless servers" for the terminal. Then the terminal could choose to resolve the NBMA next hop or just send the IP packets to one of the terminal's connectionless servers. The latter option may be desirable if communication with the destination is short-lived and/or doesn't require much network resources. The connectionless servers could, of course, be physically integrated in the NHSs by augmenting them with IP switching functionality. NHRP supports portability of NBMA terminals. A terminal can be moved anywhere within the NBMA network and still keep its original IP address as long as its NHS(s) remain the same. Requests for authoritative information will always return the correct link layer address. References [1] Address Resolution Protocol, David C. Plummer, RFC 826. [2] Classical IP and ARP over ATM, Mark Laubach, Internet Draft. [3] Transmission of IP datagrams over the SMDS service, J. Lawrence and D. Piscitello, RFC 1209. Heinanen & Govindan Expires April 15, 1994 [Page 12] RFC DRAFT NBMA NHRP October 15, 1993 Acknowledgements We would like to thank John Burnett of Adaptive, Dennis Ferguson of ANS, Joel Halpern of Network Systems, and Paul Francis of Bellcore for their valuable insight and comments to earlier versions of this draft. Authors' Addresses Juha Heinanen Ramesh Govindan Telecom Finland, Bell Communications Research PO Box 228, MRE 2P-341, 445 South Street SF-33101 Tampere, Morristown, NJ 07960 Finland Phone: +358 49 500 958 Phone: +1 201 829 4406 Email: Juha.Heinanen@datanet.tele.fi Email: rxg@thumper.bellcore.com Heinanen & Govindan Expires April 15, 1994 [Page 13]