Internet-Draft                                      Grenville Armitage
                                                              Bellcore
                                                    February 4th, 1995


         Support for Multicast over UNI 3.1 based ATM Networks.
                   <draft-ietf-ipatm-ipmc-04.txt>


Status of this Memo

   This document was submitted to the IETF IP over ATM WG. Publication
   of this document does not imply acceptance by the IP over ATM WG of
   any ideas expressed within.  Comments should be submitted to the ip-
   atm@matmos.hpl.hp.com mailing list.

   Distribution of this memo is unlimited.

   This memo is an internet draft. Internet Drafts are working documents
   of the Internet Engineering Task Force (IETF), its Areas, and its
   Working Groups. Note that other groups may also distribute working
   documents as Internet Drafts.

   Internet Drafts are draft documents valid for a maximum of six
   months.  Internet Drafts may be updated, replaced, or obsoleted by
   other documents at any time. It is not appropriate to use Internet
   Drafts as reference material or to cite them other than as a "working
   draft" or "work in progress".  Please check the lid-abstracts.txt
   listing contained in the internet-drafts shadow directories on
   nic.ddn.mil, nnsc.nsf.net, nic.nordu.net, ftp.nisc.src.com, or
   munnari.oz.au to learn the current status of any Internet Draft.

Abstract

   This memo describes a Multicast Address Resolution Server (MARS)
   architecture that allows ATM based IP hosts to support RFC 1112 style
   Level 2 IP multicast using the ATM Forum's UNI 3.1 point to
   multipoint connection service. It also describes how this
   architecture can be generalized to support other protocols wishing to
   multicast over UNI 3.1 based ATM service.

      [Editorial note: The differences between this version and 03.txt
      are substantial in the area of multicast server support. This
      impacts on Chapter 8, and anything referring to MARS_MSERV. Two
      control VCs have been identified and named, two sequence numbers
      are now used, and three major appendices have been added
      discussing issues that cannot at this time be standardized. The
      MARS_JOIN/LEAVE message format has been extended by 32 bits, and


Armitage                Expires August 4th, 1995                 [Page 1]

Internet Draft                                        February 4th, 1995


      modified to support multiple address groups. Scattered
      editorial/clarificatory changes have been made to the rest of the
      document. Editorial notes will be removed.]


1.  Introduction.

   Multicast support allows a source host or protocol entity to send a
   packet to multiple destinations simultaneously using a single, local
   'transmit' operation. This facility is utilized by network layer
   protocols such IP. Most models, like the one described in RFC 1112
   [1] for IP multicasting, assume sources may send their packets to an
   abstract 'multicast group addresses'.  Link layer support for such an
   abstraction is assumed to exist, and is provided by technologies such
   as Ethernet.

   ATM is being utilized as a new link layer technology to support a
   variety of protocols, including IP. With RFC 1483 [2] the IETF
   defined a multiprotocol mechanism for encapsulating and transmitting
   packets using AAL5 over ATM Virtual Channels (VCs). However, the ATM
   Forum's currently published signalling specification (UNI 3.0 [4],
   with additions for UNI 3.1 released in late 1994) does not provide
   the multicast address abstraction. Unicast connections are supported
   by point to point, bidirectional VCs. Multicasting is supported
   through point to multipoint VCs. The key limitation is that the
   sender must have prior knowledge of each intended recipient, and
   explicitly establish a VC with itself as the root node and the
   recipients as the leaf nodes.

   The main goal of this document is to define an address registration
   and distribution mechanism that allows UNI 3.1 based networks to
   support the multicast service of protocols such as IP. The second
   goal is to define specific endpoint behaviour and management of point
   to multipoint VCs.  As the IETF is currently in the forefront of
   using wide area multicasting this document's descriptions will focus
   on IP version 4 (IPv4). A final chapter will note the more general
   application of the architecture.

   The Multicast Address Resolution Server (MARS), a distant relative of
   the ATM ARP Server introduced in RFC 1577 [3], acts as a registry of
   multicast group membership. MARS messages, based on the ATM ARP
   format, support the distribution of multicast group membership
   information between MARS and hosts or endpoints. Endpoint address
   resolution entities query the MARS when a multicast group address
   needs to be resolved.  The actual mechanism for multicasting data
   packets may be through meshes of point to multipoint VCs, or the use
   of Multicast Servers.  To provide for asynchronous notification of
   group membership changes the MARS manages two point to multipoint VCs


Armitage                Expires August 4th, 1995                 [Page 2]

Internet Draft                                        February 4th, 1995


   - one out to all endpoints desiring multicast support, and the other
   to all multicast servers registered as providing support to any
   multicast groups.  The choice of mesh or multicast server is
   configurable on a group by group basis.

   The numerical size of link layer multicast groups will be constrained
   by practical concerns such as limited VC support within endpoint ATM
   interfaces.  Each MARS manages a 'cluster' of ATM-attached endpoints.
   A cluster is defined as a set of endpoints willing to be grouped
   together as link layer members of multicast groups. It is assumed
   that specially configured routers are used to pass multicast traffic
   between clusters. This document explicitly avoids specifying the
   nature of inter-cluster multicast routing protocols.

   The mapping of clusters to other constrained sets of endpoints (such
   as Logical IP Subnets) is left to network administrators.  A simple
   approach in overlaid IP environments would be for each LIS to be
   served by a separate MARS, with the cluster being built from the LIS
   members.  IP multicast routers would interconnect each LIS as they do
   with conventional subnets. However, there is no requirement that a
   cluster be limited to a single LIS.

   Section 2 provides an overview of IP multicast and what RFC 1112
   required from Ethernet. Section 3 outlines the set of generic
   functions that should be available to clients of a local host's UNI
   3.1 signalling service.  Section 4 specifies the encapsulation to be
   used for MARS messages and multicast packet traffic. The basic
   behaviour for the sending side of an interface is described in
   section 5, with section 6 covering the mechanism whereby a host joins
   and leaves multicast groups. Sections 7 covers the way in which hosts
   respond to dynamic group membership changes. Configuring the use of
   Multicast Servers is covered in section 8. Support for multicast
   routers is described in section 9, and section 10 explains the
   features included to improve the reliability of the membership
   management mechanisms. Section 11 discusses the application of this
   document beyond IP. Section 12 is a summary of the documents key
   points.

   The appendices provide discussion on issues that arise out the
   implementation of this memo. Appendix A discusses MARS and endpoint
   algorithms for parsing MARS messages. Appendix B describes the
   particular problems introduced by the current IGMP paradigms, and
   possible interim work-arounds. Finally, Appendix C covers the various
   designs that are possible for multicast server support within
   clusters.

   This document assumes an understanding of concepts explained in
   greater detail in RFC 1112, RFC 1577, UNI 3.1, and <draft-ietf-


Armitage                Expires August 4th, 1995                 [Page 3]

Internet Draft                                        February 4th, 1995


   ipatm-sig-02.txt>.


2.  Review of RFC 1112 and IP Multicast over Ethernet.

   Under IP version 4 (IPv4) ddresses in the range of 224.0.0.0 and
   239.255.255.255 are termed 'Class D' or 'multicast group' addresses.
   In RFC 1112 the behaviour of the transmit and receive sides are quite
   independent, making the concept of being a 'member' of an IP
   multicast group imprecise at the link layer interface.

   The interface must support the transmission of IP packets to an IP
   multicast group address, whether or not the node considers itself a
   'member' of that group. Consequently, group membership is effectively
   irrelevant to the transmit side of the link layer interfaces. No
   address resolution is required to transmit packets - an algorithmic
   mapping from IP multicast address to Ethernet multicast address is
   performed locally before the packet is sent out the local interface
   in the same 'send and forget' manner as a unicast IP packet.

   Joining and Leaving an IP multicast group is more explicit on the
   receive side - with the primitives JoinLocalGroup and LeaveLocalGroup
   affecting what groups the local link layer interface should accept
   packets from. When the IP layer wants to receive packets from a
   group, it issues JoinLocalGroup. When it no longer wants to receive
   packets, it issues LeaveLocalGroup. A key point to note is that
   changing state is a local issue, it has no affect on other hosts
   attached to the Ethernet.

   IGMP is defined in RFC 1112 to support IP multicast routers attached
   to a given subnet. Hosts issue IGMP Report messages when they perform
   a JoinLocalGroup, or in response to an IP multicast router sending an
   IGMP Query. By periodically transmitting queries IP multicast routers
   are able to identify what IP multicast groups have non-zero
   membership on a given subnet.

   A specific IP multicast address, 224.0.0.1, is allocated for the
   transmission of IGMP Query messages. All IP multicast hosts must
   issue JoinLocalGroup for 224.0.0.1 during their initialisation. Each
   host keeps a list of IP multicast groups it has been JoinLocalGroup'd
   to. When a router issues an IGMP Query on 224.0.0.1 each host begins
   to send IGMP Reports for each group it is a member of. IGMP Reports
   are sent to the group address, not 224.0.0.1, "so that other members
   of the same group on the same network can overhear the Report" and
   not bother sending one of their own. IP multicast routers conclude
   that a group has no members on the subnet when IGMP Queries no longer
   elict associated replies.


Armitage                Expires August 4th, 1995                 [Page 4]

Internet Draft                                        February 4th, 1995


3. Multicast support under UNI 3.1.

   This document will describe its operation in terms of 'generic'
   functions that should be available to clients of a UNI 3.1 signalling
   entity in a given ATM endpoint. The ATM model broadly describes 'AAL
   Users' as any entity that establishes and manages VCs and underlying
   AAL service to exchange data. An IP over ATM interface is a form of
   'AAL User' (either directly, when VC multiplexing is used, or
   indirectly, when LLC/SNAP encpasulation is used).

   The most fundamental limitations of UNI 3.1's multicast support are:

      Only point to multipoint, unidirectional VCs may be established.

      Only the root node of a given VC may add or remove leaf nodes.

   Within these constraints, multicast group members can communicate by
   the use of multicast meshes, or multicast servers. With a mesh each
   transmitting host is the Root of a point to multipoint VC that has
   every other host in the group as a Leaf. The Multicast Server model
   has every group member send their packets directly to a 'server'
   entity somewhere on the ATM cloud, which then retransmits copies to
   all other members.

   This document defines the MARS-Endpoint signalling required to
   support both mechanisms. Issues relating to the architecture,
   operation, and management of multicast servers are discussed in
   Appendix C.

   The following generic signalling functions are presumed to be
   available to local AAL Users:

   L_CALL_RQ     - Establish a unicast VC to a specific endpoint.
   L_MULTI_RQ    - Establish multicast VC to a specific endpoint.
   L_MULTI_ADD   - Add new leaf node to previously established VC.
   L_MULTI_DROP  - Remove specific leaf node from established VC.
   L_RELEASE     - Release unicast VC, or all Leaves of a multicast VC.

   The signalling exchanges and local information passed between AAL
   User and UNI 3.1 signalling entity with these functions is currently
   beyond the scope of this document.

   The following indications are assumed to be available to AAL Users,
   generated by by the local UNI 3.1 signalling entity:

   L_ACK          - Succesful completion of a request to signalling
   entity.
   L_REMOTE_CALL  - A new VC has been established to the AAL User.


Armitage                Expires August 4th, 1995                 [Page 5]

Internet Draft                                        February 4th, 1995


   ERR_L_RQFAILED - A remote ATM endpoint rejected an L_CALL_RQ,
   L_MULTI_RQ, or L_MULTI_ADD.
   ERR_L_RELEASE  - A remote ATM endpoint has elected to terminate a
   pre-existing VC.

   The signalling exchanges and local information passed between AAL
   User and UNI 3.1 signalling entity with these functions is currently
   beyond the scope of this document.

   UNI 3.1 defines two ATM address formats - E.164 and ISO NSAP. In UNI
   3.1 an 'ATM Number' is the primary identification of an ATM endpoint,
   and it may use either format. Under some circumstances an ATM
   endpoint must be identified by both an E.164 address (identifying the
   attachment point of a private network to a public network), and an
   ISO NSAP address ('ATM Subaddress') identifying the final endpoint
   within the private network. For the rest of this document the term
   'ATM Address' will be used to mean either a single 'ATM Number' or an
   'ATM Number' combined with an 'ATM Subaddress'.

4.  Overview of the Multicast Address Resolution Server.

   The MARS may reside within any ATM endpoint that is directly
   addressable by the endpoints it is serving. Endpoints wishing to join
   a multicast cluster must be configured with the ATM address of the
   node on which the cluster's MARS resides. This is the cluster's
   Primary MARS. If a cluster is to be served by a backup MARS,
   endpoints are configured with the ATM address of a Secondary MARS.
   Section 10 will discuss the relationship between the Primary MARS and
   Secondary MARS during failure conditions. Although a Secondary MARS
   is optional, endpoint implementations must be capable of utilizing
   them as described in section 10. References to 'the MARS' in
   following sections will be assumed to mean the acting MARS for the
   cluster.

   Architecturally the MARS is similar to the RFC 1577 ARP Server,
   although there is little overlap between the information they manage.
   Whilst the ARP Server keeps a table of {IP,ATM} address pairs for all
   IP endpoints in the LIS, the MARS keeps extended tables of {multicast
   address, ATM.1, ATM.2, ..... ATM.n} mappings. It can either be
   configured with certain mappings, or dynamically 'learn' mappings.

   The MARS distributes group membership information to cluster members
   over a point to multipoint VC known as the ClusterControlVC. When
   supporting multicast servers within a cluster, the MARS also
   establishes a separate point to multipoint VC known as the
   ServerControlVC. All cluster members are leaf nodes of
   ClusterControlVC. All registered multicast servers are leaf nodes of
   ServerControlVC (Section 8 will discuss the use of ServerControlVC).


Armitage                Expires August 4th, 1995                 [Page 6]

Internet Draft                                        February 4th, 1995


   The MARS message format is an extension of the ATM ARP message
   format.  By default all MARS messages MUST be LLC/SNAP encapsulated
   in accordance with RFC 1483, using the same encapsulation as ATM ARP:

      LLC = 0xAA-AA-03
      OUI = 0x00-00-00
      PID = 0x08-06

   The default for data traffic carried on point to multipoint VCs is
   LLC/SNAP encapsulation with a header appropriate to the protocol
   being carried. For IP traffic this is defined in RFC 1483 as:

      LLC = 0xAA-AA-03
      OUI = 0x00-00-00
      PID = 0x08-00

   The choice of common encapsulation and message format means that MARS
   and ARP Server functionality may be implemented within a common
   entity if a network designer so chooses.

5.  Transmitting to Multicast groups.

         [Editorial note: This section has discarded the MARS_MSERV
         function of version ipmc-03.txt. MARS_MSERV is now used in an
         entirely different fashion.  Endpoint VC management is now
         entirely independent of whether the group is mesh or mc server
         supported.]

   The following description will be in terms of an IP/ATM interface
   that is capable of transmitting packets to a Class D address at any
   time, without prior warning.

   When a packet arrives for transmission, and there is no outgoing VC
   already marked as serving the packet's multicast destination address,
   the MARS is queried for the set of ATM endpoints currently making up
   the multicast group.

   The query is executed by issuing a MARS_REQUEST. The MARS_REQUEST
   message is formatted as an ATM ARP_REQUEST with type code of 11
   (decimal).  The reply from the MARS may take one of two forms:

      MARS_MULTI - Sequence of MARS_MULTI messages return the set of
      endpoints in the group.

      MARS_NAK - No mapping found, group is empty.

   The request/response traffic MUST occur on a point to point VC
   established by the host to the MARS. Where the MARS and ARP Server


Armitage                Expires August 4th, 1995                 [Page 7]

Internet Draft                                        February 4th, 1995


   are co-resident, this VC may be shared between ATM ARP traffic and
   MARS traffic.

5.1   Retrieving Group Membership from the MARS.

   If the MARS had no mapping for the desired Class D address a MARS_NAK
   will be returned. In this case the IP packet MUST be discarded
   silently. If a match is found in the MARS's tables it proceeds to
   return addresses ATM.1 through ATM.n in a sequence of one or more
   MARS_MULTIs.  A simple mechanism is used to detect and recover from
   loss of MARS_MULTI messages.

   Each MARS_MULTI carries a new boolean field x, and a 15 bit integer
   field y - expressed as MARS_MULTI(x,y). Field y acts as a sequence
   number, starting at 1 and incrementing for each MARS_MULTI sent.
   Field x acts as an 'end of reply' marker. When x == 1 the MARS
   response is considered complete.

   In addition, each MARS_MULTI may carry multiple ATM addresses from
   the set {ATM.1, ATM.2, .... ATM.n}. A MARS MUST minimise the number
   of MARS_MULTIs transmitted by placing as many group member's
   addresses in a single MARS_MULTI as possible. The limit on MARS_MULTI
   message length MUST be the MTU of the underlying VC.

   Assume n ATM addresses must be returned, each MARS_MULTI is limited
   to only p ATM addresses, and p << n. This would require a sequence of
   k MARS_MULTI messages (where k = (n/p)+1, using integer arithmetic),
   transmitted as follows:

      MARS_MULTI(0,1) carries back {ATM.1 ... ATM.p}
      MARS_MULTI(0,2) carries back {ATM.(p+1) ... ATM.(2p)}
            [.......]
      MARS_MULTI(1,k) carries back { ... ATM.n}

   If k == 1 then only MARS_MULTI(1,1) is sent.

   Typical failure mode will be losing one or more of MARS_MULTI(0,1)
   through MARS_MULTI(0,k-1). This is detected when y jumps by more than
   one between consecutive MARS_MULTI's. An alternative failure mode is
   losing MARS_MULTI(1,k).  A timer MUST be implemented to flag the
   failure of the last MARS_MULTI to arrive. A default value of 10
   seconds is suggested.

   If a 'sequence jump' is detected, the host MUST wait for the
   MARS_MULTI(1,k), discard all results, and repeat the MARS_REQUEST.

   If a timeout occurs, the host MUST discard all results, and repeat
   the MARS_REQUEST.


Armitage                Expires August 4th, 1995                 [Page 8]

Internet Draft                                        February 4th, 1995


   Corruption of cell contents will lead to loss of a MARS_MULTI through
   AAL5 CPCS_PDU reassembly failure, which will be detected through the
   mechanisms described above.

   If the MARS is managing a cluster of endpoints spread across
   different but directly accessible ATM networks it will not be able to
   return all the group members in a single MARS_MULTI. The MARS_MULTI
   message format allows for either E.164, ISO NSAP, or (E.164 + NSAP)
   to be returned as ATM addresses. However, each MARS_MULTI message may
   only return ATM addresses of the same type. The returned addresses
   MUST be grouped according to type (E.164, ISO NSAP, or both) and
   returned in a sequence of separate MARS_MULTI parts.

5.2   MARS_REQUEST, MARS_MULTI, MARS_MSERV, and MARS_NAK formats.

   MARS_REQUEST is based on an ATM ARP_REQUEST, but with an 'operation
   type value' of 11 (decimal). The multicast address being resolved is
   placed into the the target protocol address field (ar$tpa). The
   hardware type (ar$hrd) is set to 19 (decimal), and in IP environments
   the protocol type is 2048 (decimal).  Section 6.6 of RFC 1577 should
   be consulted for specific details and coding of the ar$shtl, ar$sstl,
   ar$thtl, and ar$tstl fields.

   MARS_NAK is the MARS_REQUEST returned with operation type value of 16
   (decimal).

   The MARS_MULTI message is identified by an 'operation type value' of
   12 (decimal). The message format is:

      Data:
       ar$hrd     16 bits  Hardware type ( 19 decimal, 0x13 hex)
       ar$pro     16 bits  Protocol type
       ar$shtl     8 bits  Type & length of source ATM number (q)
       ar$sstl     8 bits  Type & length of source ATM subaddress (r)
       ar$op      16 bits  Operation code (MARS_MULTI)
       ar$spln     8 bits  Length of source protocol address (s)
       ar$thtl     8 bits  Type & length of target ATM number (x)
       ar$tstl     8 bits  Type & length of target ATM subaddress (y)
       ar$tpln     8 bits  Length of target multicast group address (z)
       ar$tnum    16 bits  Number of target ATM addresses returned (N).
       ar$seqxy   16 bits  Boolean flag x and sequence number y.
       ar$msn     32 bits  MARS Sequence Number.
       ar$sha     qoctets  source ATM number
       ar$ssa     roctets  source ATM subaddress
       ar$spa     soctets  source protocol address
       ar$tha.1   xoctets  target ATM number 1
       ar$tsa.1   yoctets  target ATM subaddress 1
       ar$tpa     zoctets  target multicast group address


Armitage                Expires August 4th, 1995                 [Page 9]

Internet Draft                                        February 4th, 1995


       ar$tha.2   xoctets  target ATM number 2
       ar$tsa.2   yoctets  target ATM subaddress 2
                 [.......]
       ar$tha.N   xoctets  target ATM number N
       ar$tsa.N   yoctets  target ATM subaddress N

   ar$seqxy is coded with flag x in the leading bit, and sequence number
   y coded as an unsigned integer in the remaining 15 bits.

           0                   1
           0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |x|                 y             |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   ar$tnum is an unsigned integer indicating how many pairs of
   {ar$tha,ar$tsa} (i.e. how many group member's ATM addresses) are
   present in the message. ar$msn is an unsigned 32 bit number filled in
   by the MARS before transmitting each MARS_MULTI. Its use is described
   further in section 10. Section 6.6 of RFC 1577 should be consulted
   for specific details and coding of all other fields.

   As an example, assume we have a multicast cluster using 4 byte
   protocol addresses, 20 byte ATM numbers, and 0 byte ATM subaddresses.
   For n group members in a single MARS_MULTI we require a (44 + 20n)
   byte message. If we assume the default MTU of 9180 bytes, we can
   return a maximum of 456 group member's addresses in a single
   MARS_MULTI.

5.3   Establishing the Multicast VC.

   Following the completion of the MARS_MULTI reply the endpoint may
   establish a new point to multipoint VC, or reuse an existing one.

   If establishing a new VC, an L_MULTI_RQ is issued for ATM.n, followed
   by an L_MULTI_ADD for every member of the set {ATM.1, ....ATM.(n-1)}
   (assuming the set is non-null). The packet is then transmitted over
   the newly created VC just as it would be for a unicast VC.

   After transmitting the packet, the local interface holds the VC open
   and marks it as the active path out of the host for any subsequent IP
   packets being sent to that Class D address.

   When establishing a new multicast VC is is possible that one or more
   returned endpoints may reject an L_MULTI_RQ or L_MULTI_ADD. If this
   occurs then the endpoint's ATM address is dropped from the set
   {ATM.1, ATM.2, .... ATM.n} returned by the MARS, and the creation of
   the multipoint VC continues.


Armitage                Expires August 4th, 1995                [Page 10]

Internet Draft                                        February 4th, 1995


   Multicast VCs have the potential to be expensive in their use of
   resources. Therefore each VC MUST have a configurable inactivity
   timer associated with it. If the timer expires, an L_RELEASE is
   issued for that VC, and the Class D address is no longer considered
   to have an active path out of the local host. The timer SHOULD be no
   less than 1 minute, and a default of 20 minutes is RECOMMENDED.
   Choice of specific timer periods is beyond the scope of this
   document.

   VC consumption may also be reduced by endpoints noting when a new
   group's set of {ATM.1, ....ATM.n} matches that of a pre-existing VC
   out to another group. With careful local management, and assuming the
   QoS of the existing VC is sufficient for both groups, a new pt to mpt
   VC may not be necessary.  Algorithms for performing this type of
   optimization are not discussed here, and are not required for
   conformance with this memo.

   Section 7 describes the endpoint's response to group membership
   changes while the VC is open. Section 10 describes the mechanism for
   ensuring hosts remain up to date with changes that occur while the VC
   is open.

6.   Joining and Leaving Multicast Groups.

   A cluster member is a 'group member' (in the sense that it receives
   packets directed at the group) when its ATM address appears in the
   MARS's table entry for the group's multicast address. A key
   requirement within each cluster is the distribution of group
   membership information between the MARS and cluster members.

   Two new messages are defined: MARS_JOIN and MARS_LEAVE. These are
   sent to the MARS by endpoints joining or leaving a multicast group.
   The MARS propagates these messages back out to the cluster over its
   ClusterControlVC, to ensure the knowledge is distributed in a timely
   fashion. ClusterControlVC is an outgoing, point to multipoint VC with
   each cluster member as a leaf node.

   RFC1112 expects that IP multicast routers are capable of behaving
   'promiscuously'.  This functionality may be emulated by allowing
   routers to request that the MARS returns them as 'wild card' members
   of all Class D addresses.  However, a problem inherent in the current
   ATM model is that completely promiscuous behaviour may be wasteful of
   reassembly resources on the router's ATM interface. This document
   describes a generalisation to the notion of 'wild card' entries,
   enabling routers to limit themselves to 'blocks' of the Class D
   address space. The application of this facility is described in
   greater detail in Section 9.


Armitage                Expires August 4th, 1995                [Page 11]

Internet Draft                                        February 4th, 1995


   A block can be as small as 1 (a single group) or as large as the
   entire Class D address space (default IPv4 'promiscuous' behaviour).
   A block is defined as all addresses between, and inclusive of, a
   <min,max> address pair.

   The key extensions required to manage the MARS table entries are:

      Two new message types:

         MARS_JOIN carries one or more <min,max> pairs (specifying one
         or more blocks of groups being joined) and a unicast ATM
         address (of the node joining).

         MARS_LEAVE carries one or more <min,max> pairs (specifying one
         or more blocks of groups being left) and a unicast ATM address
         (of the node leaving).

      When a MARS_JOIN is received by the MARS it adds the specified ATM
      address to the table entry for the specified multicast group
      address(es).

      When a MARS_LEAVE is received by the MARS it removes the specified
      ATM address from the ARP entry for the specified multicast group
      address(es).

      MARS_JOIN and MARS_LEAVE messages arriving from individual hosts
      are processed locally by the MARS and retransmitted on
      ClusterControlVC (possibly after modification, as detailed in
      Section 8).

      All endpoints MUST ignore MARS_JOIN or MARS_LEAVE messages that
      simply confirm information already held. The MARS retransmits
      redundant messages, but otherwise takes no action. Section 7
      describes how endpoints utilize retransmitted MARS_JOIN and
      MARS_LEAVE messages.

      Cluster members MUST only include a single <min,max> pair in each
      JOIN/LEAVE message they issue. They MUST be able to process
      multiple <min,max> pairs in JOIN/LEAVE messages received on
      ClusterControlVC from the MARS (the interpretation being that the
      join/leave operation applies to all addresses in range from <min>
      to <max> inclusive, for every <min,max> pair).

   In IPv4 environments JoinLocalGroup now results in two messages being
   transmitted:

      MARS_JOIN, sent over a VC to the ARP Server. It identifies the
      single IP group being joined, and the host's unicast ATM address.


Armitage                Expires August 4th, 1995                [Page 12]

Internet Draft                                        February 4th, 1995


      An IGMP Report, except for 224.0.0.1 (in accordance with RFC1112).

   In IPv4 environments LeaveLocalGroup now results in a MARS_LEAVE
   being sent over a VC to the MARS, identifying the IP group being
   left, and the host's unicast ATM address.

   Endpoints with special requirements (e.g. multicast routers) may
   directly issue MARS_JOINs and MARS_LEAVEs specifying blocks of
   multicast group addresses. No IGMP Report is issued for such
   operations in IP environments.

   An endpoint must register with a MARS in order to become a member of
   a cluster and be added as a leaf to ClusterControlVC.  Registration
   is covered in section 6.2.

6.1 Format of the MARS_JOIN and MARS_LEAVE Messages.

   The MARS_JOIN message is indicated by an operation type value of 14
   (decimal). MARS_LEAVE has the same format and operation type value of
   15 (decimal). The message format is:

      Data:
       ar$hrd     16 bits  Hardware type (19 decimal)
       ar$pro     16 bits  Protocol type
       ar$shtl     8 bits  Type & length of source ATM number (q)
       ar$sstl     8 bits  Type & length of source ATM subaddress (r)
       ar$op      16 bits  Operation code (MARS_JOIN or MARS_LEAVE)
       ar$spln     8 bits  Length of source protocol address (s)
       ar$tpln     8 bits  Length of multicast group address (z)
       ar$pnum    16 bits  Number of multicast group address pairs (N)
       ar$resv    16 bits  Reserved.
       ar$msn     32 bits  MARS Sequence Number.
       ar$sha     qoctets  source ATM number (E.164 or ATM Forum NSAPA).
       ar$ssa     roctets  source ATM subaddress (ATM Forum NSAPA).
       ar$spa     soctets  source protocol address
       ar$min.1   zoctets  Minimum multicast group address - pair.1
       ar$max.1   zoctets  Maximum multicast group mask - pair.1
                 [.......]
       ar$min.N   zoctets  Minimum multicast group address - pair.N
       ar$max.N   zoctets  Maximum multicast group mask - pair.N

   Refer to RFC 1577, section 6.6 for the coding of the ar$shtl and
   ar$sstl fields. For conventional IPv4 environments ar$spln and
   ar$tpln are both set to 4. Note that the message format differs from
   ATMARP_REPLY in the fields after ar$op.  ar$msn is an unsigned 32 bit
   number filled in by the MARS before re-transmitting a MARS_JOIN or
   MARS_LEAVE. The originator SHOULD set it to zero, although it will be
   ignored by the MARS. Its use is described further in section 10.


Armitage                Expires August 4th, 1995                [Page 13]

Internet Draft                                        February 4th, 1995


   A join/leave message carries a set {<min,max>, <min,max>,  ....,
   <min,max>}, with at least one <min,max> pair. ar$pnum indicates how
   many pairs are included in the message.  To simplify MARS and endhost
   interpretation, the following restrictions are imposed:

      Assume max(N) is the <max> field from the Nth <min,max> pair.
      Assume min(N) is the <min> field from the Nth <min,max> pair.
      Assume a join/leave message arrives with K <min,max> pairs.
      The following must hold:
         max(N) < min(N+1) for 1 <= N < K
         max(N) >= min(N) for 1 <= N <= K

   In plain english, the set must specify an ascending sequence of
   address blocks. The definition of "greater" or "less than" may be
   protocol specific. In IP environments the addresses are treated as
   simple unsigned binary values.

6.1.1 Important IPv4 default values.

   The JoinLocalGroup and LeaveLocalGroup operations are only valid for
   a single group. For any arbitrary group address X the associated
   MARS_JOIN or MARS_LEAVE MUST specify a single pair <X, X>.

   A router choosing to behave strictly in accordance with RFC1112 MUST
   specify the entire Class D space. The associated MARS_JOIN or
   MARS_LEAVE MUST specify a single pair <224.0.0.0, 239.255.255.255>.

   The use of alternative <min, max> values is discussed in Section 9.

6.2   Registering with the MARS.

   Two separate signalling paths exist between cluster members and their
   associated MARS. The first is a transient point to point VC that
   cluster members establish to the MARS when they need to issue
   MARS_REQUESTs, MARS_JOINs, or MARS_LEAVEs. This VC is used by the
   MARS to return MARS_MULTI messages. It has an associated idle timer,
   and is dismantled if not used for a configurable period of time. The
   minimum suggested value for this time is 1 minute, and the
   RECOMMENDED default is 20 minutes.

   The second signalling path is ClusterControlVC. Every endpoint
   registered as a cluster member is added as a leaf node to this VC,
   which exists for the lifetime of the MARS.  It is used to re-
   distribute MARS_JOIN and MARS_LEAVE messages received by the MARS
   from individual cluster members.  Registration with the MARS as a
   cluster member occurs when an endpoint issues a MARS_JOIN for a
   protocol specific multicast group address. Once this occurs the
   endpoint is added as a leaf node to ClusterControlVC.


Armitage                Expires August 4th, 1995                [Page 14]

Internet Draft                                        February 4th, 1995


   In IPv4 environments the 'all nodes' Class D address of 224.0.0.1 is
   used to register with the MARS. RFC 1112 requires that all hosts
   (including routers) that wish to participate in Level 2 IP
   multicasting must explicitly issue a JoinLocalGroup for group
   224.0.0.1 when they initialise (Level 1 is not supported by this
   memo).  The JoinLocalGroup to 224.0.0.1 will result in an MARS_JOIN
   being transmitted from the host to the MARS.

   If an IPv4 endpoint issues a LeaveLocalGroup for 224.0.0.1 it will
   also be considered to have ceased membership of all other groups for
   which it may have joined. The MARS MUST flush that endpoint's ATM
   address from any Class D address entries it appears in. Finally, the
   endpoint is released as a Leaf node from ClusterControlVC.

   If the MARS receives an ERR_L_RELEASE on ClusterControlVC indicating
   that a cluster member has died, that member's ATM address MUST be
   removed from all groups for which it may have joined.

   Registration of endpoints for other protocols is currently beyond the
   scope of this document.

7.   Endpoint management of point to multipoint VCs.

   Once a cluster member has established a new VC to the members
   returned in a MARS_MULTI response it must:

      Monitor traffic on ClusterControlVC for updates to the group's
      membership.

      Revalidate a group's membership if a leaf node releases itself
      from the VC.

7.1   Monitoring updates on ClusterControlVC.

   When a cluster member joins or leaves a particular multicast group it
   is not sufficient to simply update the mapping table in the cluster's
   MARS.  Endpoints that are already transmitting to the multicast
   group's members must be informed of the change so they may add or
   remove a leaf node as appropriate. Cluster members track MARS_JOIN
   and MARS_LEAVE messages retransmitted by the MARS to determine when
   another endpoint joins or leaves a group or block of groups.

   If a MARS_JOIN is seen that refers to (or encompasses) a group for
   which the transmit side already has a VC open, the new member's ATM
   address is extracted and an L_MULTI_ADD issued locally. This ensures
   that hosts already sending to a given group will immediately add the
   new member to their list of recipients. It also ensures that routers
   joining a 'block' of groups are added by all endpoints currently


Armitage                Expires August 4th, 1995                [Page 15]

Internet Draft                                        February 4th, 1995


   sending to groups within the block.

   If a MARS_LEAVE is seen that refers to (or encompasses) a group for
   which the transmit side already has a VC open, the old member's ATM
   address is extracted and an L_MULTI_DROP issued locally. This ensures
   that hosts already sending to a given group will immediately drop the
   old member from their list of recipients.

   In an IPv4 environment any endpoint leaving 224.0.0.1 is assumed to
   be ceasing support for IP multicast operation. If a MARS_LEAVE is
   seen that refers to group 224.0.0.1 then the ATM address of the
   endpoint specified in the message MUST be removed from every
   multipoint VC on which it is listed as a leaf node.

   The transmit side of the interface MUST NOT shut down an active VC to
   a group for which the receive side has just executed a
   LeaveLocalGroup.  This behaviour is consistent with the model of
   hosts transmitting to groups regardless of their own membership
   status.

   If a MARS_JOIN or MARS_LEAVE arrives with ar$pnum == 0 it carries no
   <min,max> pairs, and is only used for validation as described in
   section 10.

7.2   Revalidating when leaf nodes drop themselves.

   During the life of a multipoint VC an ERR_L_RELEASE may be received
   indicating that a leaf node has terminated its participation at the
   ATM level. The ATM endpoint associated with the ERR_L_RELEASE MUST be
   removed from the locally held set {ATM.1, ATM.2, .... ATM.n}
   associated with the VC.

   After a random period of time between 1 and 10 seconds the endpoint
   MUST revalidate the associated group's membership by re-issuing a
   MARS_REQEUEST. The returned set of members {NewATM.1, NewATM.2, ....
   NewATM.n} is compared with the set already held locally.
   L_MULTI_DROPs are issued on the group's VC for each node that appears
   in the original set of members but not in the revalidated set of
   members. L_MULTI_ADDs are issued on the group's VC for each node that
   appears in the revalidated set of members but not in the original set
   of members.

8.   Configuring for Multicast Servers or Multicast Meshes.

   Endpoint's assume that all groups are supported by meshes of point to
   multipoint VCs. Under certain circumstances the consumption of VCs
   and AAL resources around the cluster can make meshes unattractive,
   despite their performance advantages.  The MARS protocol provides a


Armitage                Expires August 4th, 1995                [Page 16]

Internet Draft                                        February 4th, 1995


   mechanism for introducing multicast servers on a per-multicast group
   basis, and in a manner that is completely transparent to cluster
   members.

   The multicast server has two key roles:

      Providing one (or a limited number of) leaf nodes for outgoing VCs
      from cluster members.

      Constructing a single point to multipoint VC, with each group
      memember as a leaf. This reduces the AAL consumption to one per
      group, rather than one per sender per group.

   The MARS must keep two sets of mappings for each multicast group
   address supported by multicast servers. The original {multicast
   address, ATM.1, ATM.2, ... ATM.n} mapping (the 'host map', although
   it includes routers) is augmented by a parallel {multicast address,
   server.1, server.2, .... server.K} mapping (the 'server map'). It is
   assumed that no ATM addresses appear in both the server and host maps
   for the same multicast group. Typically K will be 1, but it will be
   larger when multiple multicast servers are configured to share the
   data load of a given group.

   When the MARS receives a MARS_REQUEST for a multicast address that
   has both host and server maps it generates a response based on the
   identity of the request's source. If the requestor is a member of the
   server map for the requested group then the MARS returns the contents
   of the host map in a sequence of one or more MARS_MULTIs. Otherwise
   the MARS returns the contents of the server map in a sequence of one
   or more MARS_MULTIs.  Servers use the host map to establish a basic
   distribution VC for the group. Cluster members will establish
   outgoing multipoint VCs to members of the group's server map, without
   being aware that their packets will not be going directly the
   multicast group's members.

   The MARS also maintains a point to multipoint VC out to any multicast
   servers it is aware of, called ServerControlVC. This serves an
   analogous role to ClusterControlVC, allowing the MARS to update the
   servers with group membership changes as they occur.

   A set of four MARS messages cover the current requirements:

      MARS_MSERV      Register as multicast server for one or more
      groups.
      MARS_UNSERV     Deregister as multicast server for one or more
      groups.
      MARS_SJOIN      A JOIN message on ServerControlVC.
      MARS_SLEAVE     A LEAVE message on ServerControlVC.


Armitage                Expires August 4th, 1995                [Page 17]

Internet Draft                                        February 4th, 1995


   MARS_SJOIN/SLEAVE are identical in format to MARS_JOIN/LEAVE, but
   have different operation codes so that a node acting as both a
   cluster member and multicast server may distinguish between updates
   arriving on ServerControlVC and ClusterControlVC.

8.1   Registering and deregistering multicast servers.

   MARS_MSERV and MARS_UNSERV are identical to the MARS_JOIN message.
   MARS_MSERV uses the set {<min,max>, <min,max>,  ...., <min,max>} to
   specify one or more sets of multicast groups that a multicast server
   is willing to support. MARS_UNSERV indicates the set of groups that
   the multicast server is no longer willing to support.  The operation
   code for MARS_MSERV is 11 (decimal), and MARS_UNSERV is 17 (decimal).

   When a node registers with MARS_MSERV the MARS adds the new ATM
   address to the server maps for each specified group, possibly
   constructing a new server map if this is the first multicast server
   for the group.  If the multicast server is not already a leaf node of
   ServerControlVC it is added.

   When a node deregisters with MARS_UNSERV the MARS removes its ATM
   address from the server maps for each specified group, deleting the
   server map if this was the only server for the group.

   Both of these messages are sent to the MARS over a point to point VC,
   and echoed on ServerControlVC by the MARS (section 10 covers the use
   of this behaviour). The operation code is then changed to MARS_JOIN
   or MARS_LEAVE respectively, and a copy of the original message is
   transmitted on ClusterControlVC.

   The MARS retransmits but otherwise ignores redundant MARS_MSERV and
   MARS_UNSERV messages.

   It is assumed that at least one server will have registered to
   support a group before the first cluster member joins it. If a
   MARS_MSERV arrives for a group that has a non-null host map but no
   server map the default response of the MARS will be to drop the
   MARS_MSERV without any further action. The originating multicast
   server will eventually flag an error when repeated attempts to
   register fail.

   The opposite situation is where the last or only multicast server for
   a group deregisters itself while the group still has members.  The
   default solution is for multicast servers to sever all VCs to which
   they are attached as leaf nodes when they deregister, forcing any
   active senders to the group to revalidate (as described in section
   7).  Since the MARS will have deleted the server map, the
   revalidation will result in the host map being return, and the group


Armitage                Expires August 4th, 1995                [Page 18]

Internet Draft                                        February 4th, 1995


   reverts to being a mesh. This shall be the default mechanism until
   future work develops a more elegant approach.

   Appendix C discusses possible extensions to allow dynamic transitions
   between mesh and multicast server support while a group is active.
   However, these are not required for conformance with this memo.

8.2  Handling group membership changes.

   The existence of multicast servers supporting some groups but not
   others requires the MARS to intervene in the distribution of single
   and block join/leave updates to cluster members. The MARS_SJOIN and
   MARS_SLEAVE messages are identical to MARS_JOIN, with operation codes
   18 and 19 (decimal) respectively. They exist to allow a node
   combining cluster member and multicast server to distinguish between
   information arriving on ClusterControlVC and ServerControlVC.

   When a cluster member issues MARS_JOIN or MARS_LEAVE for a single
   group, the MARS checks to see if the group has an associated server
   map.

   If the specified group does not have a server map the MARS_JOIN or
   MARS_LEAVE is retransmitted on ClusterControlVC.

   If it does have a server map two transmissions occur:

      A copy is made with type MARS_SJOIN or MARS_SLEAVE as appropriate
      and transmitted on ServerControlVC. This allows the server(s)
      supporting the group to note the new member and add it as a leaf
      node.

      The original message's ar$pnum field is set to 0, and it is
      transmitted back using the VC it arrived on (rather than
      ClusterControlVC).

   (Section 10 requires cluster members have a mechanism to confirm the
   reception of their message by the MARS. For mesh supported groups,
   using ClusterControlVC serves dual purpose of providing this
   confirmation and distributing group update information. When using
   multicast servers there is no reason for having all cluster members
   process and discard null join/leave messages on ClusterControlVC).

   Receipt of a block join/leave (e.g. from a router coming on-line)
   requires a more complex response. Cluster members must be directly
   informed of which mesh supported groups the block covers.  Multicast
   servers must also be informed in case they support one of the groups
   covered by the block being joined.


Armitage                Expires August 4th, 1995                [Page 19]

Internet Draft                                        February 4th, 1995


   The solution is for the MARS to 'punch holes' in the block of
   addresses supplied in the join/leave message, creating a set of
   <min,max> pairs that excludes those addresses/groups supported by the
   multicast servers. This hole-punched set is then sent out on
   ClusterControlVC, ensuring the router is immediately noted by senders
   to any mesh supported groups in the block.  The original
   MARS_JOIN/LEAVE is then converted to a MARS_SJOIN/SLEAVE and
   transmitted on ServerControlVC. Appendix A discusses some algorithms
   for 'hole punching'.

   If punching holes in the originally specified block leaves a null
   set, the ar$pnum field is set to zero before sending the modified
   MARS_JOIN/LEAVE on ClusterControlVC.

8.3  Multicast server architectures.

   Specification of multicast server architectures, and the
   synchronisation of multiple multicast servers supporting single
   multicast groups, is beyond the scope of this document and is
   expected to be the subject of further work. Appendix C discusses some
   possible approaches.

9.   Utilizing blocks for for multicast routers.

   Multicast routers are required for the propagation of multicast
   traffic beyond the constraints of a single cluster. There is a sense
   in which they are multicast servers acting at the next higher layer,
   with clusters rather than individual endpoints as their abstract
   sources and destinations.

   Multicast routers typically participate in higher layer multicast
   routing algorithms and policies that are beyond the scope of this
   memo (e.g. DVMRP [5] in the IPv4 environment).

   It is assumed that the multicast routers will be implemented over the
   same sort of IP/ATM interface that a multicast host would use.  They
   will use the basic services described in the preceeding sections to
   join and leave multicast groups as necessary, and will register with
   the MARS as a cluster member.

   The rest of this section will assume a simple IPv4 scenario where the
   scope of a cluster has been limited to a particular LIS that is part
   of an overlaid IP network. Not all members of the LIS are necessarily
   registered cluster members.


Armitage                Expires August 4th, 1995                [Page 20]

Internet Draft                                        February 4th, 1995


9.1    Sending to a Group.

   If the multicast router needs to transmit a packet to a group within
   the cluster it opens a VC in the same manner as a normal host would.
   Once a VC is open, the router watches for MARS_JOIN and MARS_LEAVE
   messages and responds to them as a normal host would.

   The multicast router's transmit side MUST implement inactivity timers
   to shut down idle outgoing VCs, as for normal hosts.

   As with normal host, the multicast router does not need to be a
   member of a group it is sending to.

9.2    Promiscuously Joining Groups.

   Once registered and initialised, the simplest model of IPv4 multicast
   router operation is for it to issue a MARS_JOIN encompassing the
   entire Class D address space.  In effect it becomes 'promiscuous', as
   it will be a leaf node to all present and future multipoint VCs
   established to IPv4 groups on the cluster.

   How a router chooses which groups to propagate outside the cluster is
   beyond the scope of this memo.

   Consistent with RFC 1112, IP multicast routers may retain the use of
   IGMP Query and IGMP Report messages to ascertain group membership.

9.3    Forward Multicast Traffic Across the cluster.

   Under some circumstances the cluster may simply be another hop
   between IP subnets that have participants in a multicast group.

      [LAN.1] ----- IPmcR.1 -- [LIS] -- IPmcR.2 ----- [LAN.2]

   LAN.1 and LAN.2 are subnets (such as Ethernet) with attached hosts
   that are members of group X.

   IPmcR.1 and IPmcR.2 are multicast routers with interfaces to the LIS.

   A traditional solution would be to treat the LIS as a unicast subnet,
   and use tunneling routers. However, this would not allow hosts on the
   LIS to participate in the cross-LIS traffic.

   Assume IPmcR.1 is receiving packets promiscuously on its LAN.1
   interface. Assume further it is configured to propagate multicast
   traffic to all attached interfaces. In this case that means the LIS.

   When a packet for group X arrives on its LAN.1 interface, IPmcR.1


Armitage                Expires August 4th, 1995                [Page 21]

Internet Draft                                        February 4th, 1995


   simply sends the packet to group X on the LIS interface as a normal
   host would (Issuing MARS_REQUEST for group X, creating the VC,
   sending the packet).

   Assuming IPmcR.2 initialised itself with the MARS as a member of the
   entire Class D space, it will have been returned as a member of X
   even if no other nodes on the LIS were members. All packets for group
   X received on IPmcR.2's LIS interface may be retransmitted on LAN.2.

   If IPmcR.1 is similarly initialised the reverse process will apply
   for multicast traffic from LAN.2 to LAN.1, for any multicast group.
   The benefit of this scenario is that cluster members within the LIS
   may also join and leave group X at anytime.

9.4   Restricted 'promiscous' Operation.

   Both unicast and multicast IP routers have a common problem -
   limitations on the number of AAL contexts available at their ATM
   interfaces.  Being 'promiscuous' in the RFC 1112 sense means that for
   every M hosts sending to N groups, a multicast router's ATM interface
   will have M*N incoming reassembly engines tied up.

   It is not hard to envisage situations where a number of multicast
   groups are active within the LIS but are not required to be
   propagated beyond the LIS itself. An example might be a distributed
   simulation system specifically designed to use the high speed IP/ATM
   environment. There may be no practical way its traffic could be
   utilised on 'the other side' of the multicast router, yet under the
   conventional scheme the router would have to be a leaf to each
   participating host anyway.

   As this problem occurs at the link layer, it is worth noting that
   'scoping' mechanisms at the IP multicast routing level do not provide
   a solution.

   In this situation the network administrator might configure their
   multicast routers to exclude sections of the Class D address space
   when issuing MARS_JOIN(s). Multicast groups that will never be
   propagated beyond the cluster will not have the router listed as a
   member by the MARS, and the router will never have to receive and
   ignore traffic from those groups.

   Another scenario involves the product M*N exceeding the capacity of a
   single router's interface (especially if the same interface must also
   support a unicast IP router service).

   A network administrator may choose to add a second node, to function
   as a parallel IP multicast router. Each router would be configured to


Armitage                Expires August 4th, 1995                [Page 22]

Internet Draft                                        February 4th, 1995


   be 'promiscuous' over separate parts of the Class D address space,
   thus exposing themselves to only part of the VC load. This sharing
   would be completely transparent to IP hosts within the LIS.

   Restricted promiscuous mode does not break RFC 1112's use of IGMP
   Report messages. If the router is configured to serve a given block
   of Class D addresses, it will receive the IGMP Report.  If the router
   is not configured to support a given block, then the existence of an
   IGMP Report for a group in that block is irrelevant to the router.
   All routers are able to track membership changes through the
   MARS_JOIN and MARS_LEAVE traffic anyway.

   Mechanisms for establishing these modes of operation are beyond the
   scope of this memo.

10.    Robustness of interaction with the MARS.

   Transient problems may result in the loss of messages between the
   MARS, cluster members, and multicast servers.  More serious problems
   may result in the failure of the MARS itself. There are two problem
   scenarios that are addressed. The first is the inability of a cluster
   member to send messages to the MARS itself, either through cell loss
   on the VC to the MARS, or the cluster member's inability to establish
   a VC to the MARS.

   The second is with the MARS_JOIN/SJOIN/LEAVE/SLEAVE messages re-
   transmitted from the MARS. If a cluster member or multicast server
   currently sending to a group misses an join update, the newly joined
   member misses out on some traffic to the group. If a cluster member
   or multicast server currently sending to a group misses a leave
   update, the cluster member that left will continue to receive packets
   unecessarily.

10.1   Ensuring the MARS hears you.

   A simple algorithm solves the first problem. Cluster members
   retransmit MARS_JOIN and MARS_LEAVE messages at regular intervals
   until they receive a copy back again, either on ClusterControlVC or
   the VC on which they are sending the messages.  At this point the
   local endpoint can be certain that at least the MARS received it.

   Multicast servers retransmit MARS_MSERV and MARS_UNSERV messages at
   regular intervals until they receive a copy back on ServerControlVC.

   The interval should be no shorter than 5 seconds, and a default value
   of 10 seconds is recommended. After 5 retransmissions the attempt
   should be flagged locally as a failure. This should be considered as
   a MARS failure, and handled as described in section 10.2.


Armitage                Expires August 4th, 1995                [Page 23]

Internet Draft                                        February 4th, 1995


   A 'copy' is defined as seeing a message of the same operation code
   containing the local host's identity in the source address fields.
   The <min,max> pair set is not checked, and does not have to be the
   same (this is required so that cluster members may verify a MARS_JOIN
   they've sent even if the MARS's hole-punching creates a totally
   different set of <min,max> pairs).

10.2   Temporary failure of the MARS.

   Two failure modes indicate problems with the MARS itself:

      If an ERR_L_RELEASE occurs for the cluster member's attachment to
      ClusterControlVC it may be assumed some problem exists with the
      MARS.

      If the cluster member receives ERR_L_RQFAILED when it attempts to
      establish a point to point VC to the MARS in order to send MARS
      messages.

   The cluster member should wait a random period of time between 1 and
   10 seconds before attempting to re-register with the MARS. If the
   registration MARS_JOIN is successful (in accordance with section
   10.1) then:

      The cluster member MUST then proceed to rejoin every group that
      its local higher layer protocol(s) have joined. It is recommended
      that a random delay between 1 and 10 seconds be inserted before
      the transmission of each MARS_JOIN.

      Finally, using the mechanism described in section 7, the cluster
      member MUST begin revalidating every multicast group it was
      sending to.

      The rejoin and revalidation procedure must not disrupt the cluster
      member's use of multipoint VCs that were already open at the time
      of the MARS failure.

   If the re-registration with the Primary MARS fails, and there is no
   configured Secondary MARS, the cluster member MUST wait for at least
   1 minute before repeating the re-registration procedure. It is
   RECOMMENDED that the cluster member signals an error condition in
   some locally significant fashion.

   If the re-registration with the Primary MARS fails, and a Secondary
   MARS has been configured, the Secondary and Primary MARS addresses
   are swapped and the cluster member immediately repeats the re-
   registration procedure. If this is succesful the cluster member will
   resume normal operation using the Secondary MARS.  It is RECOMMENDED


Armitage                Expires August 4th, 1995                [Page 24]

Internet Draft                                        February 4th, 1995


   that the cluster member signals a warning of this condition in some
   locally significant fashion.

   If the attempt at re-registration with the Secondary MARS fails, the
   cluster member MUST wait for at least 1 minute before reverting back
   to the Primary MARS and starting the whole re-registration process
   over again. In the worst case scenario this will result in cluster
   members looping between registration attempts with the Primary MARS
   and Secondary MARS until network administrators manually intervene.

   Multicast servers shall behave in a similar manner to cluster members
   on this issue.

10.3   The MARS Sequence Number.

   There is an unsigned 32 bit sequence number identified as ar$msn in
   most MARS messages. The following extensions govern its use:

      The MARS keeps two independent counters, Cluster Sequence Number
      (CSN) and Server Sequence Number (SSN). They are incremented every
      time a message is sent out ClusterControlVC or ServerControlVC
      respectively.

         [Editorial note - in ipmc-03.txt the counter was incremented
         only when a change occurred in the mapping tables. this is a
         simplification.]

      The current CSN is copied into the ar$msn field of MARS messages
      being sent to cluster members (either out ClusterControlVC or on
      an individual VC).

      The current SSN is copied into the ar$msn field of MARS messages
      being sent to multicast servers (either out ServerControlVC or on
      an individual VC).

      Cluster members and multicast servers track the increments of CSN
      or SSN to determine if they have missed any update messages.

   Calculations on the sequence numbers MUST be performed as unsigned 32
   bit arithmetic, to ensure no glitches when the counters roll over.

   Every cluster member keeps its own 32 bit Host Sequence Number (HSN)
   to track the MARS's sequence number. Whenever a MARS_MULTI,
   MARS_JOIN, or MARS_LEAVE is received the following check is then
   performed on the ar$msn field of the new message:

         Seq.diff = ar$msn - HSN


Armitage                Expires August 4th, 1995                [Page 25]

Internet Draft                                        February 4th, 1995


         ar$msn -> HSN
         {...process MARS message as appropriate...}

         if ((Seq.diff != 1) && (Seq.diff != 0))
            then {...revalidate group membership information...}

   The basic result is that the cluster member attempts to keep locked
   in step with membership changes noted by the MARS. If it ever detects
   that a membership change occurred (in any group) without it noticing,
   it re-validates the membership of all groups it currently has
   multicast VCs open to. Revalidation involves treating each VC as
   though an ERR_L_RELEASE was received from a leaf node, and executing
   the procedure described in section 7.

   The ar$msn field of consecutive MARS_MULTIs sent in response to a
   MARS_REQUEST must be constant. If the ar$msn field changes then all
   the messages MUST be discarded at the completion of the response, and
   the MARS_REQUEST re-issued.

   One implication of this mechanism is that the MARS should serialize
   its processing of 'simultaneous' MARS_REQUEST, MARS_JOIN and
   MARS_LEAVE messages. Join and Leave operations should be queued
   within the MARS along with MARS_REQUESTS, and not processed until all
   the reply packets of a preceeding MARS_REQUEST have been transmitted.

   The MARS is free to choose a value of CSN and SSN. When a new cluster
   member starts up it should initialise HSN to zero. When the cluster
   member sends the MARS_JOIN to register, the HSN will be correctly set
   when it receives a copy of its MARS_JOIN from the MARS. If Seq.diff >
   1 when the MARS_JOIN returns no action will be taken anyway, as the
   host will not have any multicast related VCs established at this
   stage.

   If a sequence number jump occurs when establishing a new group's VC
   the cluster member should not revalidate the membership of the group
   it just established.  The membership returned in the MARS_MULTIs that
   carried the new ar$msn field should be considered already validated.

   A MARS should be carefully designed to minimise the possibility of
   CSN or SSN jumping unecessarily. Under normal operation only hosts
   that are affected by transient link problems will miss ar$msn updates
   and be forced to revalidate. If the MARS itself glitches it will be
   innundated with requests for a period as every cluster member
   attempts to revalidate.

   Multicast servers should utilize the ar$msn fields in exactly the
   same manner as cluster members. This will enable them to track the
   SSN, and recover from missing any MARS_SJOIN/SLEAVE traffic.


Armitage                Expires August 4th, 1995                [Page 26]

Internet Draft                                        February 4th, 1995


10.4   Why a Gobal sequence number?

   The CSN and SSN are global within the context of a given protocol
   (e.g. IP).  They count ClusterControlVC and ServerControlVC activity
   without reference to the multicast group(s) involved.  This may be
   perceived as a limitation, because there is no way for cluster
   members or multicast servers to isolate exactly which multicast group
   they may have missed an update for. An alternative was to try and
   provide a per-group sequence number.

   Unfortunately per-group sequence numbers are not practical. The
   current mechanism allows sequence information to be piggy-backed onto
   MARS messages already in transit for other reasons. The ability to
   specify blocks of multicast addresses with a single MARS_JOIN or
   MARS_LEAVE means that a single message can refer to membership change
   for multiple groups simultaneously. A single ar$msn field cannot
   provide meaningful information about each group's sequence.  Multiple
   ar$msn fields would have been unwieldy.

   Any MARS or cluster member that supports different protocols MUST
   keep separate mapping tables and sequence numbers for each protocol.

10.5   Synchronizing the Primary and Secondary MARS.

   If a Secondary MARS exists for a given cluster then some mechanism is
   needed to ensure reasonable consistency between its mapping tables
   and those of the Primary MARS, especially as cluster members will
   only ever register with one MARS. The inter-server protocol also
   needs to cope with post-failure situations where some cluster members
   end up registered with the Primary and others with the Secondary.

   The definition of an inter-server protocol is beyond the current
   scope of this document, and is expected to be the subject of further
   work in the area.

11.    Using the MARS in non-IP environments.

   An deliberate attempt has been made to describe the MARS and
   associated mechanisms in a manner independent of a specific higher
   layer protocol being run over the ATM cloud. The immediate
   application of this document will be in an IPv4 environment, and this
   is reflected by the focus of key examples.  However, the coding of
   each MARS message means that any higher layer protocol identifiable
   by a two byte Ethernet Type code can be supported by a MARS.

   The 16 bit 'Protocol type' at the start of each MARS message, taken
   from the set of Ethernet Type codes.  Every MARS MUST implement
   entirely separate logical mapping tables and support. Every cluster


Armitage                Expires August 4th, 1995                [Page 27]

Internet Draft                                        February 4th, 1995


   member must interpret messages from the MARS in the context of the
   protocol type that the MARS message refers to.

   The LLC/SNAP encapsulation specified in section 4 should not be
   considered a hinderance in  non-IP environments.  Experimenters
   deploying IPX or AppleTalk over ATM are encouraged to use the
   architecture described in this document to support possible multicast
   needs.

12.    Key Decisions and open issues.

   The key decisions this memo proposes:

      A Multicast Address Resolution Server (MARS) is proposed to co-
      ordinate and distribute mappings of ATM endpoint addresses to
      arbitrary higher layer 'multicast group addresses'. The specific
      case of IP version 4 multicast is used as the example.

      Individual multicast groups may be supported by multicast meshes
      between group members, or by multicast servers. The concept of
      'clusters' is introduced to define the scope of a MARS's
      responsibility, and the set of ATM endpoints willing to
      participate in link level multicasting.

      MARS message formats and encapsulation allow co-resident MARS and
      ATM ARP Server implementations.

      New message types: MARS_JOIN, MARS_LEAVE, MARS_REQUEST. Allow
      endpoints to join, leave, and request the current membership list
      of multicast groups.

      New message type: MARS_MULTI. Allows multiple ATM addresses to be
      returned by the MARS in response to a MARS_REQUEST.

      New message types: MARS_MSERV, MARS_UNSERV. Allow multicast
      servers to register and deregister themselves with the MARS.

      New message types: MARS_SJOIN, MARS_SLEAVE. Allow MARS to pass on
      group membership changes to multicast servers.

      'wild card' MARS mapping table entries possible, where a single
      ATM address is simultaneously associated with blocks of multicast
      group addresses.

   Some issues have not been addressed, although they may be in future
   revisions.

      MARS has no mechanism for realising cluster members have silently


Armitage                Expires August 4th, 1995                [Page 28]

Internet Draft                                        February 4th, 1995


      died.

      The future development of ATM Group Addresses and Leaf Initiated
      Join to ATM Forum's UNI specification has not been addressed. The
      problems identified in this memo with respect to VC scarcity and
      impact on AAL contexts will not be fixed by such developments in
      the signalling protocol.

Security Consideration

   Security consideration are not addressed in this memo.

Acknowledgments

   The discussions within the IP over ATM Working Group have helped
   clarify the ideas expressed in this document. John Moy of Cascade
   Communications Corp. initially suggested the idea of wild-card
   entries in the ARP Server.  Drew Perkins of Fore Systems provided
   rigorous and useful critique of early proposed mechanisms for
   distributing and validating group membership information.  Susan
   Symington (and co-workers at MITRE Corp., Don Chirieleison, Rich
   Verjinski, and Bill Barns) clearly articulated the need for multicast
   server support, proposed a solution, and challenged earlier block
   join/leave mechanisms.

Author's Address

   Grenville Armitage
   MRE 2P340, 445 South Street
   Morristown, NJ, 07960-6438
   USA

   Email: gja@thumper.bellcore.com


References
   [1] S. Deering, "Host Extensions for IP Multicasting", RFC 1112,
   Standford University, August 1989.

   [2] Heinanen, J., "Multiprotocol Encapsulation over ATM Adaption
   Layer 5", RFC 1483, USC/Information Science Institute, July 1993.

   [3] Laubach, M., "Classical IP and ARP over ATM", RFC1577, Hewlett-
   Packard Laboratories, December 1993

   [4] ATM Forum, "ATM User-Network Interface Specification Version
   3.0", Englewood Cliffs, NJ: Prentice Hall, September 1993


Armitage                Expires August 4th, 1995                [Page 29]

Internet Draft                                        February 4th, 1995


   [5] D. Waitzman, C. Partridge, S. Deering, "Distance Vector Multicast
   Routing Protocol", RFC 1075, November 1988.

   [6] M. Perez, F. Liaw, D. Grossman, A. Mankin, E. Hoffman, A. Malis,
   "ATM Signaling Support for IP over ATM", Internet Draft, IP over ATM
   Working Group, draft-ietf-ipatm-sig-02.txt, November, 1994.


Armitage                Expires August 4th, 1995                [Page 30]

Internet Draft                                        February 4th, 1995


Appendix A.  Parsing MARS messages.

   Implementations are entirely free to comply with the body of this
   memo in any way they see fit. This appendix is purely for
   clarification.

   A smart MARS implementation will pre-construct a set of <min,max>
   pairs (P) that reflects the entire Class D space, excluding any
   addresses currently supported by multicast servers. The <min> field
   of the first pair MUST be 224.0.0.0, and the <max> field of the last
   pair MUST be 239.255.255.255. The first and last pair may be the
   same. This set is updated whenever a multicast server registers or
   deregisters.

   When the MARS must perform 'hole punching' it might consider the
   following algorithm:

      Assume the MARS_JOIN/LEAVE received by the MARS from the cluster
      member specied the block <Emin, Emax>.

      Assume Pmin(N) and Pmax(N) are the <min> and <max> fields from the
      Nth pair in the MARS's current set P.

      Assume set P has K pairs. Pmin(1) MUST equal 224.0.0.0, and
      Pmax(M) MUST equal 239.255.255.255. (If K == 1 then no hole
      punching is required).

      Execute pseudo-code:

         create copy of set P, call it set C.

         index1 = 1;
         while (Pmax(index1) <= Emin)
            index1++;

         index2 = K;
         while (Pmin(index2) >= Emax)
            index2--;

         if (Pmin(index1) < Emin)
            Cmin(index1) = Emin;

         if (Pmax(index2) > Emax)
            Cmax(index2) = Emax;

         Set C is the required 'hole punched' set of address blocks.

   The resulting set C retains all the MARS's pre-constructed 'holes'


Armitage                Expires August 4th, 1995                [Page 31]

Internet Draft                                        February 4th, 1995


   covering the multicast servers, but will have been pruned to cover
   the section of the Class D space specified by the originating host's
   <Emin,Emax> values.

   The host end should keep a table, H, of open VCs in ascending order
   of Class D address.

      Assume H(x).addr is the Class address associated with VC.x.
      Assume H(x).addr < H(x+1).addr.

   The pseudo code for updating VCs based on an incoming JOIN/LEAVE
   might be:

      x = 1;
      N = 1;

      while (x < no.of VCs open)
      {
            while (H(x).addr > max(N))
            {
                  N++;
                  if (N > no. of pairs in JOIN/LEAVE)
                        return(0);
            }

            if ((H(x).addr <= max(N) &&
                        ((H(x).addr >= min(N))
                              perform_VC_update();
            x++;
      }


Armitage                Expires August 4th, 1995                [Page 32]

Internet Draft                                        February 4th, 1995


Appendix B.  Coping with IPv4 idiosyncracies.

   Implementing any part of this appendix is not required for
   conformance with this memo.  It is provided solely to document issues
   that have been identified.

   The intent of section 5.3 is for cluster members to only have
   outgoing point to multipoint VCs when they are actually sending data
   to a particular multicast groups. However, in most IPv4 environments
   the multicast routers attached to a cluster will periodically issue
   IGMP Queries to ascertain if particular groups have members.  The
   current IGMP specification attempts to avoid having every group
   member respond by insisting that each group member wait a random
   period, and responding if no other member has responded before them.
   The IGMP reply is sent to the multicast address of the group being
   queried.

   Unfortunately, as it stands the IGMP algorithm will be a nuisance for
   cluster members that are essentially passive receivers within a given
   multicast group. It is just as likely that a passive member, with no
   outgoing VC already established to the group, will decide to send an
   IGMP reply - causing a VC to be established were there was no need
   for one. This is not a fatal problem for small clusters, but will
   seriously impact on the ability of a cluster to scale.

   Various solutions exist, providing short and long term solutions to
   the problem. One long term solution would be to modify the IGMP
   algorithm, for example:

      If the group member has VC open to the group proceed as per RFC
      1112 (picking a random reply delay between 0 and 10 seconds).

      If the group member does not have VC already open to the group,
      pick random reply delay between 10 and 20 seconds instead, and
      then proceed as per RFC 1112.

   If even one group member is sending to the group at the time the IGMP
   Query is issued then all the passive receivers will find the IGMP
   Reply has been transmitted before their delay expires, so no new VC
   is required. If all group members are passive at the time of the IGMP
   Query then a response will eventually arrive, but 10 seconds later
   than under conventional circumstances.

   The preceeding solution requires re-writing existing IGMP code, and
   implies the ability of the IGMP entity to ascertain the status of VCs
   on the underlying ATM interface. This is not likely to be available
   in the short term.


Armitage                Expires August 4th, 1995                [Page 33]

Internet Draft                                        February 4th, 1995


   One short term solution is to provide something like the preceeding
   functionality with a 'hack' at the IP/ATM driver level within cluster
   members. Arrange for the IP/ATM driver to snoop inside IP packets
   looking for IGMP traffic. If an IGMP packet is accepted for
   transmission, the IP/ATM driver can buffer it locally if there is no
   VC already active to that group. A 10 second timer is started, and if
   an IGMP Reply for that group is received from elsewhere on the
   cluster the timer is reset. If the timer expires, the IP/ATM driver
   then establishes a VC to the group as it would for a normal IP
   multicast packet.

   Some network implementors may find it advantageous to configure a
   multicast server to support the group 224.0.0.1, rather than rely on
   a mesh. Given that IP multicast routers regularly send IGMP queries
   to this address, a mesh will mean that each router will permanently
   consume an AAL context within each cluster member. In clusters served
   by multiple routers the VC load within switches in the underlying ATM
   network will become a scaling problem.

   Finally, if a multicast server is used to support 224.0.0.1, another
   ATM driver level hack becomes a possible solution to IGMP Reply
   traffic.  The ATM driver may choose to grab all outgoing IGMP packets
   and send them out on the VC established for sending to 224.0.0.1,
   regardless of the Class D address the IGMP message was actually for.
   Given that all hosts and routers must be members of 224.0.0.1, the
   intended recipients will still receive the IGMP Replies. The negative
   impact is that all cluster members will receive the IGMP Replies.


Armitage                Expires August 4th, 1995                [Page 34]

Internet Draft                                        February 4th, 1995


Appendix C.   Issues relating to multicast servers.

   Implementing any part of this appendix is not required for
   conformance with this memo.  It is provided to give some structure to
   further research and development on multicast server support within
   clusters.

   Various items are not addressed by this memo. They include:

      Automatic migration of cluster members from a mesh to a multicast
      server while a group is active.

      An elegant mechanism for migration of cluster members from
      multicast servers back to a mesh while the group is active.

      Additional intelligence in the MARS to perform load sharing
      between multicast servers if more than one registers for the same
      group.

   If one or more multicast servers attempt to register for a group that
   already has members, it would be nice to have current senders to the
   group migrate their outgoing VCs from the actual cluster members to
   the newly registered multicast server(s). One approach might be to
   have the MARS issue a sequence of fabricated MARS_JOINs for the
   multicast servers, followed by MARS_LEAVEs for each member of the
   group's current host map. What load this would place on the MARS, and
   its scalability, have not been considered.

   An elegant mechanism for the reverse migration might well be based
   around the reverse process. Issue MARS_JOINs for all entries in the
   host map, then issue MARS_LEAVEs for all remaining entries in the
   server map.

   In case of groups served by multiple multicast servers, the current
   expectation is that each server retrieves the entire group's
   membership with MARS_REQUESTs.  This memo expects there to be an
   external mechanism for multiple multicast servers to synchronize the
   load sharing amongst themselves. Whether the MARS should to be
   extended to play a part is a subject for further work.

   An issue not immediately related to the MARS architecture is whether
   a multicast server retransmits using a point to multipoint VC out to
   group members, or a set of one VC per group member. The first
   approach makes better use of the underlying ATM fabric, but data
   sources that are also members of the group will receive copies of
   their own traffic back. The alternative avoids this problem, but at
   the expense of consuming more VCs and bandwidth on the path out of
   the multicast server itself.


Armitage                Expires August 4th, 1995                [Page 35]

Internet Draft                                        February 4th, 1995


   Situations where either issue is a problem should simply revert to
   using a multicast mesh between participating endpoints, where the
   source never sees copies of its own packets, and the multicasting
   happens within the ATM switch fabrics.


Armitage                Expires August 4th, 1995                [Page 36]