Network Working Group R. R. Stewart INTERNET-DRAFT Motorola Q. Xie Motorola Expires in six months 1 April 1999 MULTI_NETWORK DATAGRAM TRANSMISSION PROTOCOL Status of This Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This Internet Draft discusses an experimental call control protocol, namely the Multi-network Datagram Transmission Protocol (MDTP), that is intended to provide fault-tolerant reliable/unreliable data transfer between communicating processes over IP networks [1]. MDTP is proposed as an application-level protocol which is designed with a high emphasis on supporting redundant networks and transparent fault management. MDTP also gives the application a great degree of timing control and configuration flexibilities. The motivation of developing MDTP is to establish a framework for supporting Internet-based high reliability real-time commercial applications such as signaling and call control for Internet telephony. Stewart & Xie [Page 1] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 TABLE OF CONTENTS 1. Introduction..............................................3 1.1 Multi-network Datagram Transmission Protocol.........3 1.2 Interfaces to MDTP...................................4 1.3 Operation of MDTP....................................5 2. Design Principles.........................................5 3. Header Format.............................................6 3.1 MDTP Header Format Description.......................9 3.2 Notes on Multicast Header format....................12 4. Transmission Initialization..............................12 4.1 Normal Initialization...............................12 4.2 Multiple Network Addresses..........................14 4.3 Initialization Collision............................15 4.4 Re-initialization...................................16 4.5 Link rotation.......................................16 5. Reliable Transfer Mode...................................17 5.1 Timer Control.......................................19 5.2 Gap Acknowledgments.................................21 5.3 Congestion Control..................................23 5.4 Sequence Number Reset...............................26 5.5 Retransmission on Multiple Networks.................27 5.5.1 Randomization of the T3-Send timer at resend ...28 5.6 Termination of an Endpoint..........................28 5.7 Endpoint Drain......................................29 5.8 Advisory Acknowledgments...........................29 5.9 RTT Measurement.....................................30 5.10 Heart Beat Ack.....................................32 6. Unreliable Transfer Mode.................................33 6.1 Ordered reception..................................34 7. Reliable flows...........................................35 7.1 Initiating a flow...................................36 7.2 Flow acknowledgments................................37 7.3 Flow session closing................................41 8. Mixed Mode Data Transmission.............................42 9. Bundled Messages.........................................43 9.1 Format of Bundled Datagram..........................44 9.2 Bundled Transfer....................................45 10. Fragmented Messages......................................46 11. Non-protocol Datagrams...................................47 12. Broadcast and Multicast..................................48 12.1 Multicast/Broadcast Initialization.................48 12.2 Transmission of Broadcast Datagrams................48 12.3 Transmission of Multicast Datagrams................49 12.4 Reset of the Multicast Datagram Sequence Number....50 13. Interface with upper level protocols.....................51 13.1 Init.MDTP primitive.....................................52 13.2 Send.Data primitive.....................................52 13.3 Receive.Data primitive..................................52 13.4 Data.Arrive notification................................53 13.5 Send.Failure notification...............................53 13.5 Link.Status.Change notification.........................53 Stewart & Xie [Page 2] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 13.6 Communication.Lost notification.........................53 14. Suggested Timer and Protocol Parameter Values............54 15. Acknowledgments.........................................54 16. Author's Addresses.......................................54 17. References...............................................55 1. Introduction This Internet Draft discusses an experimental protocol, namely the Multi-network Datagram Transmission Protocol (MDTP), that is intended to provide fault-tolerant reliable/unreliable data transfer between communicating processes over IP networks [1]. MDTP is proposed as an application-level protocol which is designed with a high emphasis on supporting redundant networks and transparent fault management. MDTP also gives the application a great degree of timing control and configuration flexibilities. The motivation of developing MDTP is to establish a framework for supporting Internet-based high reliability real-time commercial applications such as signaling and call control for Internet telephony. This document describes the functional interface and the details necessary to implement MDTP. 1.1 Multi-network Datagram Transmission Protocol (MDTP) The Multi-network Datagram Transmission Protocol (MDTP) presented in this Internet Draft is designed to meet the following critical requirements common to real-time call control environments employing redundant networks: A) A process may need to be in simultaneous communication with thousands of endpoints performing various call processing functions. These endpoints may be codec converters, SS7 to IP translation applications, or, in the case of mobile networks, data selector and combiner applications. B) A process needs to have a very fine control over the timing for delivering a datagram. The timing should be easily adjusted depending on the message type and the destination. For example, after a few seconds of non-delivery the call which the message is about may not exist anymore. C) A process communicating with a peer should be able to take advantage of the redundant networks in a transparent way. This means that the application or upper level protocols need not to be involved in the network fault management. Instead, when network failure occurs the transmission protocol should be able to automatically re-route the out-bound datagram to the alternate Stewart & Xie [Page 3] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 network without intervention from the application. D) Datagrams may arrive out of order, or may arrive in duplicate copies. This is especially true in a redundant network environment. The transmission protocol should be strong enough to properly handle both situations with little intervention from the upper level protocol or application. To accomplish the above objectives we have defined MDTP to reside in user-space, i.e., it is not intended to be implemented as a module in an operating system. This gives the application or upper level protocols that use MDTP outstanding flexibility in controlling the timing and other operational characteristics for the data transmissions. MDTP is also made multi-network aware. This means that if more than one path exists between two endpoints (such as redundant LANs), MDTP will take advantage of the multiple networks by automatically switching to the alternate LAN if the datagram delivery becomes unavailable or inefficient (e.g., too many re-transmissions) on the current LAN. The ability to handle multiple networks by MDTP can also greatly facilitate the implementation of various traffic balancing schemes in the application or upper level protocols. In the redundant network setting, out-of-order or duplicate datagrams are proven to be most harmful during MDTP transmission initiations and re-initiations. To cope with the problem, MDTP utilizes a very efficient tag mechanism to guard against out-of-order or duplicate datagrams. MDTP assumes that a UDP-like [2] transport protocol is available at the operating system level for data transport. We have successfully implemented and tested MDTP over UDP and Sun Microsystem's CLTS transport layers. Comparing to traditional TCP [3], MDTP design is more tuned towards a special set of applications, that is the time critical fault tolerant applications using redundant LANs. It is not designed to replace TCP as a general purpose transmission protocol. 1.2 Interfaces to MDTP MDTP interfaces with the application programs or higher level protocols through a set of function calls. Due to the fact that MDTP is an application level protocol, these calls are not executed within the operating system, but within the user process (i.e., in the user space). The application or higher level protocols pass data to MDTP by making calls to MDTP, which then enqueues the data for transmission. When data arrives, MDTP will distribute the data to the application or higher level protocols via mechanisms predefined by the application. The application also has an interface to change the operational mode of an MDTP endpoint and the default operational mode of the MDTP endpoint. The default operational mode is used in the absence of any Stewart & Xie [Page 4] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 specific direction from the application. More details on the MDTP interface to the upper level protocol/application can be found in section 13. As noted above, it is assumed that a UDP-like data transport protocol will provide the interface between MDTP and the operating system. No other special interfaces or changes are assumed within the operating system, all queuing and internal pseudo-connection information is maintained inside MDTP endpoint. 1.3 Operation of MDTP MDTP operates in three different modes. A) Reliable transfer mode B) Unreliable transfer mode C) Raw UDP transfer mode The two ends in a communication connection can operate in different modes with respect to each other, with the exception of the raw UDP mode. For example, if two endpoints A and B are communicating with each other. Endpoint A may be sending information to B in reliable transfer mode, while B, on the other hand, may be sending information to A in unreliable transfer mode. All communications from A to B will be acknowledged by B, but A will not need to acknowledge data received from B. Raw UDP transfer is used when one of the endpoints in communication does not support MDTP. This allows compatibility with non-MDTP endpoints. Two MDTP capable endpoints are also allowed to engage in communications in raw UDP transfer mode. However, both sides will have to be in raw UDP mode once one of them indicates to use raw UDP transfer mode. MDTP also provides a bundling option for both the reliable and unreliable transfer modes. This allows each side to hold the data before transmission for some period of time, so that small datagrams can be combined and sent in a single larger datagram to improve network utilization efficiency. 2. Design Principles One of the major objectives which dictates the design of MDTP is to provide a data transmission protocol that transparently supports highly fault tolerant implementations. To accomplish this, provisions for two endpoints engaging in communication to use multiple networks is essential. MDTP is therefore designed to yield the best fault tolerance when the application shares the load over multiple network connections. In cases of failed original transmission, MDTP provides the ability of attempting retransmissions using an alternate network connection even Stewart & Xie [Page 5] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 when the upper level protocol or the application is completely ignorant of the existence of the alternate route. Many of the fundamental concepts that have made TCP such a useful protocol are reused, and some of the advantages of UDP are also merged into the design of MDTP. This has lead to a highly effective, robust protocol for fault tolerant data communications. 3. Header Format MDTP inserts at the beginning of every datagram a header. This header is composed of various flags and integers. The integers are always kept in network byte order. The following table illustrates the common MDTP header overlay. Note that one tick mark represents one bit position. MDTP Header Format - Non Multicast 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MDTP Protocol Identifier 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MDTP Protocol Identifier 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number (Seen) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number (Send) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data Size | Part | Of | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Flags | Mode | Version | In Queue | |N N W I F R D A|B S W R R B G U| | | |O O I S I E A C|R H N E E U A N| | | |G B N B R S T K|O U R 1 2 N R R| | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \ \ / data / \ \ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Stewart & Xie [Page 6] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 MDTP Header Format - Multicast Format 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MDTP Protocol Identifier 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MDTP Protocol Identifier 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number (Seen) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number (Send) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data Size | Part | Of | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Flags | Mode | Version | In Queue | |N N W I F R D A|B S W R R B G U| | | |O O I S I E A C|R H N E E U A N| | | |G B N B R S T K|O U R 1 2 N R R| | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Multicast To Transmit address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Multicast From - senders base address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \ \ / data / \ \ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ MDTP Header Format - RTT Ack 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MDTP Protocol Identifier 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MDTP Protocol Identifier 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number (Seen) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number (Send) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data Size | Part | Of | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Flags | Mode | Version | In Queue | |N N W I F R D A|B S W R R B G U| | | |O O I S I E A C|R H N E E U A N| | | |G B N B R S T K|O U R 1 2 N R R| | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Transparent Time Int-1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Transparent Time Int-2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Stewart & Xie [Page 7] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 Flow Initiate/Close Message 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MDTP Protocol Identifier 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MDTP Protocol Identifier 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number (Seen/flow num) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number (Send) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data Size | Part | Of | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Flags | Mode | Version | In Queue | |N N W I F R D A|B S W R R B G U| | | |O O I S I E A C|R H N E E U A N| | | |G B N B R S T K|O U R 1 2 N R R| | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Ack Flow (opening) | Ack datagram number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Flow Extended Acknowledgment 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MDTP Protocol Identifier 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MDTP Protocol Identifier 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Ack Flow (Seen) | Ack datagram number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Number of flow Acks | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data Size | Part | Of | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Flags | Mode | Version | In Queue | |N N W I F R D A|B S W R R B G U| | | |O O I S I E A C|R H N E E U A N| | | |G B N B R S T K|O U R 1 2 N R R| | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Ack Flow (Seen) | Ack datagram number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \ / / one for each 'Number of flow Acks' \ \ / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Ack Flow (Seen) | Ack datagram number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Stewart & Xie [Page 8] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 3.1 MDTP Header Format MDTP Protocol Identifier 1: 32 bits This is a fixed long value of 0xf7873072. MDTP Protocol Identifier 2: 32 bits This is a fixed long value of 0x17074012. MDTP Protocol Identifier 1 and 2 are jointly examined to determine a received datagram is an MDTP protocol datagram. Acknowledgment Number (or Seen): 32 bits If the flag ACK is set this value is the next sequence number that the sender of this datagram expects to receive from the receiver of this datagram. However, during initialization negotiation, multicast and broadcast transmissions, this field will have special meanings (see 4 and 11). Sequence Number (or Send): 32 bits If DAT flag is set, this value represents the sequence number of the first data octet that follows this header. Otherwise, this value will be the sequence number of the first octet of the next data unit that will be sent. However, during initialization negotiation, multicast and broadcast transmissions, this field will have special meanings (see 4 and 11). Part: 8 bits This value represents the Part number of a fragmented message. The first fragment of a message is always part '0'. Of: 8 bits This value represents the total number of fragments in a fragmented message. The valid range for this value is from '1' to '255'. For broadcast and multicast datagrams this value is set to '1' to indicate that no fragmentation should occur. Data Size: 16 bits This value represents, in number of octets, the size of the data field that follows this header in the current datagram. Flags: 8 bits NOG - No Guaranteed delivery. This bit is used in negotiation Stewart & Xie [Page 9] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 and is set to indicate that the sender does not wish to use reliable delivery. When this bit has been set in negotiation, the receiver should prevent its application from putting communication with this endpoint in reliable mode. In normal data transfer (after the initiate sequence) this bit should be set to 0, except when responding to a RTT Ack request. NOB - No Bundling. This bit is used in negotiation and is set to indicate that the sender does not wish to perform of bundling or un-bundling of datagrams. When this bit has been set in negotiation, the receiver should prevent its application from putting communication with this endpoint in bundled mode. In normal data transfer this bit should be set to 0, if this bit is set to 1 then this message is part of a flow. WIN - Window Up. This bit is set by the sender of this datagram to indicate that the sender needs the receiver to acknowledge on previously received datagrams before it can send more datagrams. ISB - Is Bundled. This bit is set by the sender to indicate that this datagram is bundled. This bit should never be set if during negotiation either end set the NOB bit. FIR - First Datagram. This flag is set to indicate that this is a negotiation datagram. RES - Reset Sequence Number. This bit is set to indicate that the sequence number is being reset. The sequence number should be reset whenever the sending count is greater than 0x7fffffff. DAT - Data Present. This bit is set to indicate that, following this header, application data is present in this datagram. ACK - Acknowledge. This bit is set to indicate that the sender is acknowledging receipt of the specified Acknowledgment Number. Mode: 8 bits BRO - Broadcast. This bit is set to indicate a broadcast or multicast datagram. When this bit is set, bit SHU, WNR, BUN, and GAR are not used and should be set to '0'. This datagram is a multicast datagram if the UNR bit is also set. Otherwise, this datagram is a broadcast datagram. SHU - Shutdown. This bit is set when the sender initiates its closing procedure and indicates to the receiver that the sender is no longer a valid destination. If the UNR bit is set in conjunction with the SHU bit, an incomplete shutdown is specified. After an incomplete shutdown, the receiver can still re-establish the communication with the sender by re-initiating with the sender (see 5.7). Stewart & Xie [Page 10] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 WNR - Window Up Response. This bit is set in the acknowledgment reply to a Window Up flag. RE1 - This bit will represent one of two things. If the GAR bit is set to one, then setting the RE1 bit indicates to the receiver that the sender is requesting a advisory ACK. This is normally sent in a datagram when 1/2 of the current window has been sent. If this bit is set to 0 (when the GAR bit is set) then the sender is NOT requesting a advisory ACK. If the UNR bit is set then the RE1 bit is set than the receiver is requested to order the datagrams (if more than one have not been read). If the receiver has already delivered a datagram of higher sequence, then the receiver should discard lower number sequence datagrams that arrive late. RE2 - This bit will represent one of two things. If the GAR bit is set to one, the DAT bit is set to 0 and the ACK bit is set to 1 then this is a ACK with a Round Trip Time Request format. This also identifies the RTT Ack header format it in place. If the UNR bit is set to 1 and DAT bit is set to 0, then this datagram is used in a implementation specific way but carries no data. The datagram can be safely ignored and discarded. BUN - Bundled Mode. This bit is set to indicate that bundled mode is in effect for the sender. This bit should never be set if during negotiation either endpoint set the NOB flag. GAR - Guaranteed Mode. This bit is set to indicate that the reliable mode is in effect for the sender, i.e., the sender expects an acknowledgment. This bit should never be set if either endpoint set the NOG flag during negotiation. UNR - Unreliable Mode. This bit is set to indicate that unreliable mode is in effect for the sender and the sender does not expect an acknowledgment. This bit has special meanings if BRO or SHU bit is set (see above). Version: 8 bits This field represents the version number of the MDTP protocol. If these bits are set to 1, then the sender does not support Round Trip Time (RTT) calculation or Heart Beat of reliable protocol. If these bits are set to 2 then this version does support RTT and Heartbeat. If the Version is set to 3 then the sender/receiver supports reliable flows. In Queue: 8 bits This field contains the number of messages the sender has on its incoming queue, waiting to be read by the application. This gives the receiver an indication of the flow control conditions within the sender. Stewart & Xie [Page 11] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 The message header is always followed by the data field. If there is less than 4 octets of application data to send with the datagram, the data field of the datagram should be padded with all '0' to make it four (4) octets. The padded all '0' octets, if there is any, are not counted in the Data Size. The maximal Data Size for a single MDTP datagram is the MTU size of the underlying transport protocol (e.g., UDP) minus the MDTP header size that is twenty four (24) octets. The combination of the maximal 'Of' value, which is 255, and the maximal Data Size will determined the maximal size of a single message that the MDTP can send or receive. 3.2 MDTP Multicast Header Format The multicast header format is identical to the standard MDTP header format, as discussed above, except for the following extensions. Multicast To Transmit address - This is the multicast address, in network byte order, that the sender transmitted the data to. The receiver can use this information for internal tracking purposes. Multicast From - This is the base address (address 0 in the initiate message, see below) of the sender. Since a multicast sender may not have gone through the initiate procedures this address is the base reference that the receiver is to use to lookup the sender. This network byte order address should be used to reference any internal cache rather than the arriving network from address. 4. Transmission Initialization 4.1 Normal Initialization Before the first data transmission can take place from one endpoint (A) to another endpoint (Z), the two endpoints will need to complete an initialization process. The initialization process consists of the following steps. A) Endpoint A should first send an initiation datagram, while withholding the application data from transmission. Endpoint A Endpoint Z [Header Flags=FIR|RES Mode=options Seen=0,Send=Tag_A] -----------------------> (Start T1-init timer) (Enter Tag_A-lock mode) The initiation datagram is identified by setting FIR and RES bits in the Flags field. No user data should be carried in the initiation datagram. Stewart & Xie [Page 12] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 The Endpoint A should fill in the appropriate options, e.g., BUN, GAR, or UNR, in the Mode field to indicate the transmission type it has chosen. It may also use NOB and NOG bits in the Flags field to specify to whether or not its peer is allowed for bundling or reliable transfer mode. The Seen field will be set to '0', but an initiation tag, Tag_A, generated by Endpoint A, will be carried in the Send field, as shown in the above diagram. If re-initializations are needed between two endpoints subsequently (see 4.3), a different tag with a unique value should be used for each re-initialization. After sending the initiation datagram, Endpoint A shall start T1-init timer and enter a Tag_A-lock mode. During the Tag_A-lock mode, Endpoint A will wait for the initiation Ack datagram with the Seen value set to Tag_A. Any other incoming datagrams from Endpoint Z, except for new initiation datagrams, will be discarded. The arrival of new initiation datagrams during the Tag_A-lock mode indicates an initialization collision that will be discussed in 4.3. If T1-init timer expires, the same initiation datagram will be retransmitted and the timer restarted. This will be repeated Max.Init.Retransmit times before Endpoint A considers Endpoint Z unreachable and optionally reports the failure. B) Upon the receipt of the above initiation datagram from Endpoint A, Endpoint Z should respond immediately with an initiation Ack as shown below: Endpoint A Endpoint Z [Header Flags=FIR|RES|ACK Mode=Options /---------- Seen=Tag_A,Send=Tag_Z] / (Enter Tag_Z-lock mode) (Cancel T1-init timer)<-------/ The initiation Ack datagram is specified with FIR, RES, ACK bits set to '1' in the Mode field. Similarly, Endpoint Z will specify its preferred transmission mode and type by setting proper bits in the Mode and Flags fields. In addition, in the out-bound initiation Ack datagram, Endpoint Z should set the Seen field to Tag_A and supply its own initiation tag, Tag_Z, in the Send field. Once the initiation Ack is transmitted, Endpoint Z should enter the Tag_Z-lock mode. In the Tag_Z-lock mode Endpoint Z will ignore any incoming initiation Ack datagrams and also discard any other incoming datagram whose Seen field is not equal to Tag_Z, except for new initiation datagrams. Stewart & Xie [Page 13] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 If a new initiation datagram is received when Endpoint Z is in Tag_Z-lock mode, Endpoint Z will acknowledged the initiation datagram only when the tag carried in the Send field matches Tag_A previously recorded by Endpoint Z. Otherwise, Endpoint Z will send an initiation datagram with Send field set to Tag_Z back to Endpoint A to elicit an initiation Ack. C) After transmitted the initiation Ack, Endpoint Z can start transmitting datagrams with user data. However, the Seen field in the first out-bound datagram with user data must be set to Tag_A. D) Upon the receipt of the initiation Ack with Seen equal to Tag_A, Endpoint A can start transmitting datagrams with user data. However, the first datagram with application data transmitted by Endpoint A should have the Seen value set to Tag_Z, which is obtained from the initiation Ack. Endpoint A Endpoint Z {first app message} [Header Flags=ACK|DAT Mode=options Seen=Tag_Z,Send=1] [data field] -----------\ \ \-------> (Leave Tag_Z-lock mode) E) Upon the receipt of the first datagram with user data from Endpoint A and with the Seen value set to Tag_Z, Endpoint Z should leave the Tag_Z-lock mode. F) Similarly, upon the receipt of the first datagram with user data and the Seen value set to Tag_A from Endpoint Z, Endpoint A should leave the Tag_A-lock mode. The upper level protocol or application can predefine a set of default transmission modes, which will be used by the endpoint for initialization. However, it should be pointed out that the transmission modes between two endpoints are allowed to change on a datagram by datagram basis, as been illustrated in later chapters. 4.2 Multiple Network Addresses In order to support multiple networks, both endpoints need to have knowledge of all network addresses available to each other. This information needs to be passed to the other end during the initialization. The data field of the initiation and initiation Ack datagrams is used for this purpose. Stewart & Xie [Page 14] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 Depending on the underlying network configuration, the data field will be filled in one of the two following ways: A) If the sending endpoint of the initiation or initiation Ack datagram does not have access to multiple networks, the data field will be set to the pad value of 4 octets of '0's. B) If the sending endpoint has access to multiple networks (for example two redundant LANs), the first 4 octets of the data field will be an unsigned long integer (in network order) specifying how many networks the endpoint has access to. Following these 4 octets will be a list of network addresses. Each address begins with a header of 4 octets followed by the actual address. The first 2 octets of the header is an unsigned integer indicating the size of the actual address. The next 2 octets of the header is the type of the address. For an IPv4 address, the address header will have the size set to 8 and the type set to AF_INET (2). Of the 8 octets used by the actual IPv4 address, the first 4 octets will contain the IP address (in network order) of the path. The next two octets will contain the UDP port number (in network byte order). The last two octets will be padded with 0's. The data field of the initiation or initiation Ack datagram from an endpoint with access to two IPv4 networks would look the following: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Number of Networks = 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Size of address=8 | Type of Address=AF_INET (2)| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IP Address of Network 1 = 0x88b68108 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Port = 52212 | Padding = 0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Size of address=8 | Type of Address=AF_INET (2)| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IP Address of Network 2 = 0x0a100001 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Port = 52212 | Padding = 0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Any data following the initiate network list can be ignored. Implementations are at option to use additional data sent in subsequent locations for implementation specific data exchanges. No user data, however, is allowed to be transported in this datagram. 4.3 Initialization Collision If both endpoints attempt to initialize the communication at about the Stewart & Xie [Page 15] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 same instance, a collision will occur. In a collision each endpoint will receive an initiation datagram from the other side after it transmitted its own. Both sides must acknowledge the initiation datagram in the normal procedure as described in 4.1 The following is an example of initialization collision: Endpoint A Endpoint Z [Header Flags=FIR|RES [Header Flags=FIR|RES Mode=options Mode=options Seen=0,Send=Tag_A] --------\ /----- Seen=0, Send=Tag_Z] (Start T1-init timer) \ / (Start T1-init timer) / / \ / \ [Header Flags=FIR|RES|ACK <------/ \ Mode=options \---> [Header Flags=FIR|RES|ACK Seen=Tag_Z,Send=Tag_A]----\ Mode=options \ /------- Seen=Tag_A,Send=Tag_Z] \ / \-------> (Cancel T1-init timer) (Cancel T1-init timer) <------/ .. [Header Flags=ACK|DAT Mode=options Seen=Tag_Z,Send=1] ------------------> .. [Header Flags=ACK|DAT Mode=options <----------------- Seen=Tag_A,Send=1] 4.4 Re-initialization An endpoint is allowed to re-initialize an established communication. In the case of re-initialization, the endpoint which initiates the re-initialization (i.e, the initiator) should use a tag different from the one used in the previous initialization. The initiator should follow the standard initialization procedure as stated in 4.1. Upon the arrival of the initiation datagram, the peer of the initiator should also follow the procedure stated in 4.1 to respond. Note that any outstanding flows that were open are considered closed once re-initialized. 4.5 Link Rotation When multiple networks exist between two communicating endpoints, every time the application transmits a datagram, the MDTP implementation MUST keep track of which network the transmission was sent on (if more than one network exists) in the MDTP protocol variable 'last.sent.intf'. If the user does not specifically override rotation, Stewart & Xie [Page 16] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 each send should be rotated in a round robin fashion amongst all available networks and the protocol variable 'last.sent.intf' should be updated to indicate which interface was used last. The MDTP implementation should consider the rules defined in "5.5 Retransmission on Multiple Networks" to consider if a network is "available" The MDTP implementation MUST allow a user to override this rotation defeating MDTP's rotation upon each send. 5. Reliable Transfer Mode Reliable transfer mode is indicated if the sending endpoint sets the GAR option on the current datagram. If the sending endpoint was previously transmitting in unreliable mode (by setting UNR bit in each previous datagram), the receiver must reset its Seen counter to the Send value of this current datagram upon receiving it. The following example illustrates both piggy-backed and non-piggy-backed acknowledgments with both ends transmitting in reliable mode: Endpoint A Endpoint Z {App sends 3 messages} [Header Flags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=1,Send=1,Size=100]-------------> (Start T2-receive timer) (Start T3-send timer) [Header Flags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=1,Send=101,Size=100]-----------> (Restart T3-send timer) [Header Flags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=1,Send=201,Size=100]-----------> (Stop and restart T3-send timer) {Timer T2 expires} <---------------------------- [Header Flags=ACK Mode=0 Part=0,Of=0 Seen=301,Send=1] Stewart & Xie [Page 17] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 (cancel T3-send timer) .. {App sends 1 message} [Header Flags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=1,Send=301,Size=100]-----------> (Start T2-receive timer) (Start T3-send timer) {App sends 1 message} (cancel T2-receive timer) <---------------------------- [Header Flags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=401,Send=1,Size=45] (Start T3-send timer) (cancel T3-send timer) (Start T2-receive timer) .. {Timer T2 Expires} [Header Flags=ACK Part=0,Of=0 Seen=46,Send=401]------------------> (cancel T3-send timer) In the above example, the first series of 3 messages of 100 octets each are sent by Endpoint A. The messages are unbundled in this example, i.e., each message will be transmitted in a single datagram. Endpoint A starts its send timer T3 after sending the first datagram, and each subsequent send will stop and restart the send timer T3, extending the life of the send timer. Endpoint Z upon receiving the first datagram starts the receive timer T2. When timer T2 in Endpoint Z expires, Endpoint Z transmits an Ack. Upon receipt of this Ack by Endpoint A, it stops timer T3 and discards the first 3 datagrams (held for possible retransmissions). After the first three messages were transmitted successfully, the application at Endpoint A sends another message of 100 octets. After sending this datagram, Endpoint A starts timer T3 again. Upon receipt of the datagram, Endpoint Z starts Timer T2. Before Endpoint Z's T2 timer expires, the application at Endpoint Z sends a message of 45 octets to Endpoint A. This causes Endpoint Z to cancel the T2 timer and to piggyback an Ack on the out-bound datagram being transmitted to Endpoint A. After the transmission, Endpoint Z then starts its T3 timer. Upon receipt of this datagram Endpoint A cancels its T3 timer (since all data it has sent is acknowledged), and starts a receive timer T2. At the expiration of the T2 timer Endpoint A acks the receipt of the last datagram from Endpoint Z. This Ack causes Endpoint Z to cancel its T3-send timer. It is very important to notice in the above example that the acknowledgments to the received datagrams are always delayed by timer T2. This delay gives the receiving endpoint a window to piggyback the Stewart & Xie [Page 18] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 Acks onto subsequent datagrams traveling in the opposite direction, thus to avoid sending the Acks in separate datagrams. 5.1 Timer Control The basic rules for timer control are as follows: A) When all outstanding datagrams are acknowledged, the T3-send timer shall be stopped, if one is running. B) When a datagram with application data (i.e., with DAT flag set) is received, the endpoint shall start a T2-receive timer if no timer is running. C) Upon the expiration of the T2-receive timer, the endpoint shall ack to the sender all the un-acked data it has received. D) When a datagram with application data is sent out, the sending endpoint shall start a T3-send timer. If the T3-send timer is already running, the endpoint shall first stop the old T3 timer and then start a new one. If the T2-receive timer is running, the endpoint shall first stop the T2 timer, piggyback an Ack unto the out-bound datagram, and then start a T3-send timer. E) If the T3-send timer expires, the endpoint shall attempt re-transmission according to the rules described in 5.5. F) No more than one timer of any type should be running on an endpoint at any given moment. G) When a T2-receive timer expires, any bundled data waiting to be transmitted should be sent immediately with a piggy-backed Ack to acknowledge all un-acked data previously received. H) Whenever a T3-send timer is to be started, any running timer should be stopped and supplanted by the T3-send timer. I) In bundling mode, if the total size of all application messages pending to be sent is less than the bundle size, the messages should be withheld and the T4-bundle timer should be started. J) If the total size of all application messages pending to be sent exceeds the bundle size, the T4-bundle timer should be stopped and the message(s) should be immediately sent. K) If a T4-bundle timer is running and data arrives, the T2-receive timer should not be started. L) A T4-bundle timer should never be canceled unless it is being supplanted by a T3-send timer. M) When the first datagram with the Tag which unlocks the initiation is received, no T2-receive timer should be started, instead an Stewart & Xie [Page 19] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 acknowledgment must be sent without delay. The following example shows the use of various timers. Endpoint A Endpoint Z {App sends 2 messages} [Header Flags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=1,Send=501,Size=100]-----------> (Start T2-receive timer) (Start T3-send timer) [Header Flags=DAT|ACK Mode=GAR Part=0,Of=1 {App sends 1 message} Seen=1,Send=601,Size=100]-\ /-- (cancel T2-receive timer) (stop and restart T3-send timer) \ / [Header Flags=DAT|ACK \ / Mode=GAR \ / Part=0,Of=1 \ Seen=601,Send=1,Size=100] / \ (Start T3-send timer) / \ <----/ \--> .. {T3-send timer expires} [Header Flags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=101,Send=601,Size=100]---------> (Cancel T3-send timer) (Restart T3-send timer) (Start T2-receive timer) .. {Timer T2 expires} (Cancel T3-send timer) <-------------- [Header Flags=ACK Mode=0 Part=0,Of=0 Seen=701,Send=101] In this example, the application at Endpoint A sends 2 messages to Endpoint Z. Both messages are 100 octets in length. Before the second datagram arrives at Endpoint Z, Endpoint Z's application sends a message to Endpoint A. This causes Endpoint Z to cancel its T2-receive timer and piggyback the Ack to the first received datagram on the out-bound datagram destined to Endpoint A. After transmitting the datagram Endpoint Z starts its T3-send timer. When the T3-send timer at Endpoint A expires, it will re-send its earlier datagram. The retransmitted datagram is the same except for now it acknowledges all outstanding packets that Endpoint Z has sent. After retransmitting the datagram Endpoint A restarts its T3-send timer. The arrival of the retransmitted datagram causes Endpoint Z to cancel its T3-send timer and discard the duplicate datagram, and it now Stewart & Xie [Page 20] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 starts its T2-receive timer. At the expiration of the T2-receive timer Endpoint Z sends the Ack to Endpoint A. Endpoint A upon receipt of the Ack Cancels its T3 timer. 5.2 Gap Acknowledgments If a datagram becomes missing during a series of transmissions, a special type of acknowledgment known as the gap Ack will be sent. The gap Ack tells the sender of the missing datagram that retransmission is needed. The following example shows the use of gap Ack. Endpoint A Endpoint Z {App sends 3 messages} [Header Flags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=146,Send=701,Size=100]--------> (Start T2-receive timer) (Start T3-send timer) [Header Flags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=146,Send=801,Size=100]-----X (lost) (Restart T3-send timer) [Header Flags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=146,Send=901,Size=100]--------> (A gap detected in data) (Restart T3-send timer) .. {T2-receive timer expires} /------ [Header Flags=ACK / Mode=0 / Seen=801,Send=146, / Part=1,Of=1 / data=(long integer)901] (Prepare retransmit) <--------/ In this example, when Endpoint Z received the third datagram from Endpoint A it realizes that a gap exists in the received data. At the expiration of T2-receive timer, Endpoint Z sends a gap Ack, in place of a normal Ack, to Endpoint A to indicate the missing data. In the gap Ack, the Part and Of fields are both set to '1', as opposed to '0' as in a normal Ack. The data field of the gap Ack is a four (4) octet long integer containing the sequence number of the last octet of the gap (which is 901 in this example). The Seen field in the gap Ack will contain the sequence number of the first octet of the gap. Stewart & Xie [Page 21] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 Using these two values, Endpoint A should be able to calculate the position and size of the missing data (which is 801-900 in this example) and thus determine which datagrams will need to be retransmitted. Gap Acks cannot be piggy-backed with application data. The following is another example of using gap Ack: Endpoint A Endpoint Z {App sends 3 messages} [Header Flags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=146,Send=701,Size=100]--------> (Start T2-receive timer) (Start T3-send timer) [Header Flags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=146,Send=801,Size=100]-----X (lost) (Restart T3-send timer) [Header Flags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=146,Send=901,Size=100]--------> (A gap is detected) (Restart T3-send timer) .. {App sends a message} (Cancel T2-receive timer) /------ [Header Flags=ACK / Mode=0 / Seen=801,Send=146, / Part=1,Of=1 / data=(network long)901] (Retransmit missing data) <-----/ [Header Flags=DAT|ACK - [Header Flags=DAT|ACK Mode=GAR / Mode=GAR Part=0,Of=1 / Part=0,Of=1 Seen=146,Send=801,Size=100]- / Seen=801,Send=146,Size=100] (Restart T3-send timer) \ / (Start T3-send timer) \/ /\ <---------/ \ \ Stewart & Xie [Page 22] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 \--> .. {T3-Send timer expires} (Retransmit app data) (Cancel T3-send timer) <--------------- [Header Flags=DAT|ACK (Start T2-receive timer) Mode=GAR Part=0,Of=1 Seen=1001,Send=146,Size=100] (Restart T3-send timer) .. {T2-receive timer expires} [Header Flags=ACK Part=0,Of=0 Seen=246,Send=1001]----------------> (Cancel T3-send timer) In this example, Endpoint Z detected the missing data when it received the second datagram. However, before the T2-receive timer expired, the application at Endpoint Z requested to send a message (of 100 octets in length). This caused Endpoint Z to cancel its T2-receive timer and send the gap Ack before it sent out the datagram containing the application message. After transmitting the application message Endpoint Z started its T3-send timer. When Endpoint Z's T3-send timer expired it retransmitted the previous datagram and at the same time acked all of Endpoint A's outstanding datagrams. Upon the receipt of the retransmission from Endpoint Z, Endpoint A started its own T2-receive timer. At the expiration of its T2-receive timer Endpoint A sent an Ack to Endpoint Z and resolved the outstanding datagram at Endpoint Z. 5.3 Congestion Control Three different mechanisms should be used jointly to achieve flow and congestion control in MDTP. First, a limit should be set on the number of out-bound messages queued up at an endpoint. If the limit is reached, new send requests from the application should be rejected until the number of messages in the queue drops back. Secondly, MDTP uses a transmission window to control the number of outstanding datagrams, i.e., datagrams that have been sent, but yet to be acknowledged. The length of the window is defined as the maximal number of outstanding datagrams a sending endpoint can allow. This length is adjusted dynamically, depending on the current number of successful transmissions as well as the number of lost datagrams. When the number of outstanding datagrams reaches the current window length, the endpoint may still accept send requests from the application, but will transmit no more datagram until an Ack is received. Also, when the window length is reached, the next send request from the Stewart & Xie [Page 23] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 application will trigger the sending endpoint to transmit a special Window Up message. Upon receiving this Window Up message the receiver must respond with a Window Up Response message, as illustrated by the following diagram (assume current window length is 3): Endpoint A Endpoint Z {App sends 3 messages} [Header Flags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=146,Send=1001,Size=100]--------> (Start T2-receive timer) (Start T3-send timer) [Header Flags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=146,Send=1101,Size=100]--------> (Restart T3-send timer) [Header Flags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=146,Send=1201,Size=100]--------> (Restart T3-send timer) {App sends 1 messages} { queue 100 byte message } [Header Flags=WIN|ACK Seen=146,Send=1301]-----------------> (cancel T2-receive timer) /--- [Header Flags=ACK / Mode=WNR / Part=0,Of=0 / Seen=1301,Send=146] [Header Flags=DAT|ACK <---------/ Mode=GAR Part=0,Of=1 Seen=146,Send=1301,Size=100]--------> (Start T2-receive timer) In this example, after the transmission of the first three datagrams, Endpoint A reached its window length. The next message from the application triggered a Window Up message that was sent to Endpoint Z. The Window Up message always contains no data and has its WIN flag set. In response, Endpoint Z cancelled timer T2 and immediately sent an Ack with the WNR set in the Mode field. The arrival of this Ack from Endpoint Z effectively resolved all the outstanding datagrams at Endpoint A, thus allowed Endpoint A to send out the next datagram. The window length is initially set to 2, and is then dynamically adjusted based on the performance of the underlying networks. If the current window length is equal to or greater than 4, every time Stewart & Xie [Page 24] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 when 4 consecutive outstanding datagrams are acknowledged at once by the receiver, the sender's window length will be raised by 1 until it reaches 20. If the length is less than 4, every time when the number of consecutively acknowledged outstanding datagrams is equal to or greater than the current window length, the sender's window will be raised by 1 until it reaches 20. The sender's window length will be decreased if datagram loss occurs. If between 1 to 3 consecutive datagrams are lost, the window length will be decreased by 1. If between 4 to 7 datagrams are lost, the window length will be decreased by 2. If 8 or more datagrams are lost, the window length will be decreased by 4. When the window length reaches 2 it will not be decreased any further. Moreover, any time a Window Up is sent to the receiving endpoint the sender's window length will be decreased by 1. Also, if a timeout forces a retransmission the sender's window length will be decreased by 1. Moreover if a duplicate Ack is received by a sender, this should indicate a network congestion situation and the number of outstanding packets allowed should be decreased by 4. The following table summarizes these rules: ----------------------------------------------------------------------- Duplicate Ack received by sender | Adjust down by 4 ----------------------------------------------------------------------- Greater than 8 datagrams lost | Adjust down by 4 ----------------------------------------------------------------------- Greater than 4 datagrams lost | Adjust down by 2 ----------------------------------------------------------------------- Greater than 0 datagrams lost | Adjust down by 1 ----------------------------------------------------------------------- Timeout forces retransmission | Adjust down by 1 ----------------------------------------------------------------------- Window Up sent | Adjust down by 1 ----------------------------------------------------------------------- 4 or more consecutive datagrams | Adjust up by 1 acknowledged (window length > 4) | ----------------------------------------------------------------------- 1/2 Window length or more acked | Adjust up by 1 (window length <=4) | ----------------------------------------------------------------------- Finally, the third flow control mechanism is to exchange incoming queue information between the two communicating endpoints. By using the In Queue field in the MDTP header, the sender can inform the receiver the number of pending datagrams which the sender has received, but yet to deliver to its application. The following example shows how the endpoints use In Queue value to accomplish flow control. Assume that Endpoint A sent Endpoint Z 20 datagrams, and when Endpoint Stewart & Xie [Page 25] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 Z acked the receipt of all the 20 datagrams, only the first one of the 20 datagrams was delivered to the application at Endpoint Z. In the last Ack sent by Endpoint Z, the In Queue field would then have a value of 19, indicating the number of datagrams pending for delivery to its application. This value would be checked by Endpoint A before it sent the next datagram to Endpoint Z. If this value was found to be greater than its current window length, Endpoint A would not send the next datagram. Instead, Endpoint A would start its T3-send timer and send a Window Up message to Endpoint Z at the expiration of the timer. This would force Endpoint Z to send an Ack with an updated In Queue value. If the new In Queue value was still greater than its window length, Endpoint A would restart its T3-send timer, repeating this procedure until the In Queue value of Endpoint Z dropped below the current window length of Endpoint A. Then, the transmission at Endpoint A would resume. 5.4 Sequence Number Reset It may become necessary for an endpoint to reset the sequence number while it is sending data to a peer. However, the endpoint must inform the peer about this event by: 1) sending a Window Up message to force the peer to acknowledge all received datagrams which have not been acknowledged, and 2) sending the next datagram with RES bit set in the Flags field. 3) A sending endpoint should always reset it sequence counter before the counter reaches 0x7fffffff. When the counter reaches this value the sending endpoint is required to reset its sequence counter. 4) A sending endpoint should never reset its sequence counter until after reaching 0x7fff05ff. Note: This section will be obsoleted in a future version of the draft and be replaced by a deterministic roll-over algorithm. The following example illustrates the sequence number reset procedure (assume that Endpoint A opts to do a reset when the data sequence number becomes greater than 0x7fffff000). Stewart & Xie [Page 26] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 Endpoint A Endpoint Z {App sends 2 messages} [Header Flags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=46,Send=0x7ffff000,Size=100]----> (Start T2-receive timer) (Start T3-send timer) (Reset sequence number) [Header Flags=WIN|ACK Seen=146,Send=0x7ffff100]------------> (cancel T2-receive timer) /------- [Header Flags=ACK / Mode=WNR / Part=0,Of=0 / Seen=7fffff100,Send=46] (Cancel T3-send timer) <------/ [Header Flags=DAT|ACK|RES Mode=GAR Part=0,Of=1 Seen=46,Send=2,Size=100]-------------> (Start T2-receive timer) (Restart T3-send timer) .. {App sends 1 message} (cancel T2-receive timer) (Cancel T3-send timer) <---------------- [Header Flags=DAT|ACK (Start T2-receive timer) Mode=GAR Part=0,Of=1 Seen=102,Send=46,Size=100] (Start T3-send timer) In the above example, after transmitting the first datagram Endpoint A determines that its data sequence number needs to be reset before it transmits the next datagram. It first sends out a Window Up message to force Endpoint Z to send back a Window Up Response to ack all the outstanding received data. Then, it transmits the datagram it has been withholding, with the new sequence number and the RES flag set. Upon detecting the RES flag in the header of the incoming datagram, Endpoint Z resets its data sequence counter on Endpoint A. 5.5 Retransmission on Multiple Networks Whenever a T3-send timer expires, the endpoint will take one of the following three actions: A) If the current window length is not reached (see 5.3) and there is application data pending, a new datagram will be sent out. B) If the current window length is reached, a Window Up message will be sent out. C) If the window length is not reached, but there is no pending Stewart & Xie [Page 27] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 application data to send, The datagram with the lowest Send value that is still outstanding (i.e., not been acked) will be retransmitted. When multiple networks exist between two communicating endpoints, the re-transmission should be attempted on the network specified in the MDTP protocol variable 'last.good.intf'. The value of 'last.good.intf' is always updated to refer to the network on which the last datagram from the peer endpoint arrived. Moreover, the number of consecutive re-transmissions is also recorded in a variable 'retran.count' for each network. Every time a datagram is received from a network, the corresponding retran.count is reset to '0'. If the value in the retran.count of the current network exceeds a half of the value of the protocol parameter 'Max.Retransmit', the 'last.good.intf' will be changed, so as to force the next re-transmission to be directed to an alternate network. The total number of consecutive re-transmissions across all the networks is also recorded. If this value exceeds the limit defined by 'Max.Retransmit', the sending endpoint should consider the peer endpoint unreachable and stop transmitting data to it, and optionally report the failure. 5.5.1 Randomization of the T3-send timer at retransmission When a T3-send timer is started after retransmitting a packet, the value of the next T3-send timer for this destination should be extended by a random amount. The amount must be bounded so that the application can predict with some reasonable degree of precision when the destination endpoint is declared unreachable. For performance considerations, this can be implemented by pre-calculating a set of random values and then using a different value to extend the T3-send timer for each re-transmission to the same destination endpoint. 5.6 Termination of an Endpoint When an endpoint terminates, it should send a shutdown message to each of the peer endpoints it has ever initiated for a communication. The shutdown message is sent in unreliable transfer mode and need not to be acknowledged. When an endpoint receives a shutdown message from its peer, it will remove the sender from its record, and optionally report the termination of that peer. The following sequence shows an example of the termination of an endpoint (Endpoint A). Endpoint A Stewart & Xie [Page 28] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 {App indicates termination} [Header Flags=FIR Mode=SHU Seen=146,Send=1301,------------------------> to Endpoint X [Header Flags=FIR Mode=SHU Seen=1496,Send=101,------------------------> to Endpoint Y [Header Flags=FIR Mode=SHU Seen=1460,Send=201-------------------------> to Endpoint Z As shown in this example, the shutdown message is indicated by having both FIR flag and SHU mode bit set. Also, notice that no acknowledgment is sent back by Endpoint X, Y, or X. 5.7 Endpoint Drain An endpoint may decide to "drain" a connection without completely shutting it down. By draining a connection, both endpoints will remove any record and pending datagrams associated with the connection. Further communications between the two endpoints can be resumed by going through a re-initialization procedure. A "drain" message is specified with the UNR bit set in a shutdown message. No Ack is required for a "drain" message. The following sequence shows an example. Endpoint A {App indicates termination} [Header Flags=FIR|UNR Mode=SHU Seen=146,Send=1301]------------------------> to Endpoint X 5.8 Advisory Acknowledgments. To increase bandwidth utilization a sending endpoint may (at its option) request an advisory acknowledgment. A endpoint would typically do this when 1/2 of its window is unacknowledged and upon its last datagram that will fill its window. Upon reception of a advisory Acknowledgment request the receiver shall with no delay transmit an acknowledgment of all received packets canceling any T2-Receive timer that may be running. The sequence would look as follows: Stewart & Xie [Page 29] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 Endpoint A Endpoint Z {App sends 3 messages} [Header Flags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=1,Send=1,Size=100]-------------> (Start T2-receive timer) (Start T3-send timer) [Header Flags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=1,Send=101,Size=100]-----------> (Restart T3-send timer) [Header Flags=DAT|ACK Mode=GAR|RE1 Part=0,Of=1 Seen=1,Send=201,Size=100]-----------> (Stop and restart T3-send timer) (cancel T2-receive timer) <---------------------------- [Header Flags=ACK Mode=0 Part=0,Of=0 Seen=301,Send=1] 5.9 RTT Measurement On occasion either end may wish to do a Round Trip Time measurement of a network. There are two methods of measuring Round Trip Time. Method 1 involves a ping-pong using a special ACK, Method 2 involves a rider on top of a datagram. If Method 2 is invoked then the Round Trip Time includes the T2-Receive timer (this actually may be more useful then pure RTT time since each endpoint may have a different T2-Receive timer value). Method 1: When a endpoint wishes a RTT measurement it shall send a ACK datagram with RE2 set to 1, GAR set to 1 and DAT set to 0. The sender should place in Time Int 1 and Time int 2 the value of the current time of day in seconds/microseconds. Upon receipt of a datagram with RE2 set to 1, GAR set to 1 and DAT set to 0, the recipient should return the datagram to the sender over the arriving network with the NOG bit set. The sender can then use the Time Int 1 and Time Int 2 to calculate the current RTT. Stewart & Xie [Page 30] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 Endpoint A Endpoint Z RTT - Request Now=x.y [Header Flags=ACK Mode=GAR|RE2 Part=0,Of=1 Seen=1,Send=301,Size=0 Time-Int1=x Time-Int2=y]-------------> <---------------------------- [Header Flags=ACK|NOG Mode=0 Part=0,Of=0 Seen=301,Send=1 Time-Int1=x Time-Int2=y] Endpoint A uses current time subtracted from X.y (in arriving Datagram) to calculate the RTT. Method 2: If a endpoint wishes to piggyback a RTT test including the T2-Timer at the remote endpoint the sending endpoint fills out the datagram in the normal way for reliable communication but also sets the RE2 flag, and places at the end of the datagram (outside the length of the data) two long integers has a trailer. When the receiving endpoint recognizes the RE2 flag, it should extract the two integers and place them in internal storage until the next datagram is scheduled to be returned (i.e. at the expiration of the T2-Recv timer). If the The T2-Recv timer expires the receiving endpoint should send the acknowledgment as above with the addition of the NOB flag as well. If the receiving endpoints upper layer sends a datagram causing the T2-Recv timer to be canceled then the datagram should include the Trailing integers and have the NOB flag set. In cases where a intervening Window UP is received the receiving endpoint should respond with a window Up Response (per the window up procedure) but NOT cancel its T2-Recv timer. Stewart & Xie [Page 31] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 Example 1 - T2-Recv timer expires Endpoint A Endpoint Z RTT - Request Now=x.y [Header Flags=ACK|DAT Mode=GAR|RE2 Part=0,Of=1 Seen=1,Send=301,Size=100 {data of 100 octets} Time-Int1=x Time-Int2=y]-------------> (started T2-Recv) {T2-Recv Expires } <---------------------------- [Header Flags=ACK|NOG|NOB Mode=0 Part=0,Of=0 Seen=301,Send=1 Time-Int1=x Time-Int2=y] Example 2 - Datagram causes T2-Recv timer cancel Endpoint A Endpoint Z RTT - Request Now=x.y [Header Flags=ACK|DAT Mode=GAR|RE2 Part=0,Of=1 Seen=1,Send=301,Size=100 {data of 100 octets} Time-Int1=x Time-Int2=y]-------------> (started T2-Recv) {datagram sent by application} (cancel T2-Recv) <---------------------------- [Header Flags=DAT|ACK|NOG Mode=GAR Part=0,Of=1,Size=100 Seen=401,Send=1 {data of 100 octets} Time-Int1=x Time-Int2=y] 5.10 Heart Beat Ack At request by the application, the user may wish a Heart Beat acknowledgment sent. The Heart Beat should only be allowed to be enabled if the senders Mod is Gar (reliable delivery) and version is 2. Once enabled when no datagrams are being transmitted, a T5-Heart Beat timer should be started. When the T5 timer expires a ACK should be sent using the next available link, following the link rotation procedure outlined in "4.5 Link Rotation". After sending the Ack another T5-Heart Beat timer should be started. If, before the expiration of T5-Heart Beat, a datagram is transmitted or received, Stewart & Xie [Page 32] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 the T5 timer should be stopped and the appropriate T2-T4 timer should be started. The T5 timer has the lowest precedence of all timers. When sending a Heart Beat Ack, the format should be that of a RTT time test. This will require the receiver to respond on the network. If the sender does not get a response on the network the heartbeat arrived on by the time a next heartbeat is to be sent, then the network that the last heartbeat was sent upon should be counted as a transmission failure has described in section "5.5 Retransmission on Multiple Networks", and should counted against the 'retran.count' and protocol parameter 'Max.Retransmit'. 6. Unreliable Transfer Mode The unreliable transfer mode allows two endpoints to send to each other without acknowledging the receiving. This can usually achieve higher data throughput than the reliable transfer mode. To indicate the unreliable transfer mode the sender of a datagram simply sets the UNR in the mode field. The following sequence illustrates unreliable data transfer. Endpoint A Endpoint Z {App sends 2 messages} [Header Flags=DAT|ACK Mode=UNR Part=0,Of=1 Seen=1,Send=11001,Size=100]--------> [Header Flags=DAT|ACK Mode=UNR Part=0,Of=1 Seen=1,Send=11101,Size=100]--------> {App sends 1 message} <------- [Header Flags=DAT|ACK Mode=UNR Part=0,Of=1 Seen=11201,Send=1,Size=450] {App sends 2 more messages} [Header Flags=DAT|ACK Mode=UNR Part=0,Of=1 Seen=451,Send=11201,Size=100]------> [Header Flags=DAT|ACK Mode=UNR Part=0,Of=1 Seen=451,Send=11301,Size=100]------> Note that no timers are started by either end. Also note that even Stewart & Xie [Page 33] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 though both ends are in UNR mode, the ACK flag is still set by the sender of the datagram. This means that the Seen field in the datagram header is still valid to indicating the sequence number of the last octet received by the sender. However, the sender makes no claim as to whether pieces of data are missing. The upper application can use this information to help detecting missing or duplicated pieces. In unreliable mode, MDTP makes no effort to re-transmit missing data or to screen out duplicated datagrams. 6.1 Ordered reception In unreliable transfer if the sender sets the RE1 bit the receiver should order the datagrams upon arrival. Any datagrams that have not been read by the receivers application should be ordered so that the datagrams will be received in order the datagrams were transmitted (using the sendStartsAt field). If a datagram arrives after a new datagram then the datagram should be discarded. The sequence would look as follows: Endpoint A Endpoint Z {App sends 4 messages} [Header Flags=DAT|ACK Mode=UNR|RE1 Part=0,Of=1 Seen=1,Send=11001,Size=100]--------> [Header Flags=DAT|ACK Mode=UNR|RE1 Part=0,Of=1 Seen=1,Send=11101,Size=100]\ /--> \ / \ / (User reads/Receives all [Header Flags=DAT|ACK \ / datagrams 11001 & 11201) Mode=UNR|RE1 \ Part=0,Of=1 / \ Seen=451,Send=11201,Size=100]/ \---> { Datagram is discarded } [Header Flags=DAT|ACK Mode=UNR|RE1 Part=0,Of=1 Seen=1,Send=11301,Size=100]\ /--> \ / \ / [Header Flags=DAT|ACK \ / Mode=UNR|RE1 \ Part=0,Of=1 / \ Seen=451,Send=11401,Size=100]/ \--->(User reads/Receives all datagrams in order 11301 & 11401) Stewart & Xie [Page 34] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 7. Reliable flows A flow is a ordered reliable sequence of datagrams that is delivered to the receiver in order without constraint to other flows. There is a set way to initiate (open) a flow and close a flow. Each flow is initiated by the sender. Multiple flows may be initiated between two endpoints at the same time. Once initiated a flow will follow the same retransmission and link rotation schema's has the rest of MDTP. However each flow is independent of any other flow, so if datagram 1 and 2 of flow 5 arrives, but datagram 1 of flow 4 is lost (having been sent ahead of flow 5's datagrams), flow 5's datagrams are delivered to the application without blocking for retransmission of the lost datagram from flow 4 (datagram 1 of flow 4). All flow related datagrams will have the NOB bit set. Each flow will also have a separate timer associated with it that is unique and different from any non-flow related timers that are running. The Seen and Send fields will be broken down and interpreted in the following manner. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Flow Number | Datagram number in flow | (Seen) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Flow Number | Datagram number in flow | (Send) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The Send field will contain the flow number of this datagram, flow 0 is always reserved and is NOT used. The datagram number is the sequential number of the datagram. The Seen field is used to acknowledge receipt of the indicated datagram for the specified flow. The flow number in the acknowledgment does NOT need to be the same as the flow number in the Send field. This format is only used for flow datagrams. A flow can have bundled data (see section 9) but cannot have fragmented messages. The reason fragmented messages are not supported is two fold, to attempt to simplify the flows a little bit. And flows are thought of has call control related limiting there size to be no larger than one datagram per message. If a flow packet number reaches 0xffff, then the next packet number should wrap to 1. Before a flow can be used it must be initiated, after the flow is complete it should be closed. Note it is assumed that before any flows can be opened the MDTP initiate sequence has taken place (see section 4). When a MDTP initiate sequence occurs, any endpoint being re-initialized will cause a closing of all outstanding flows during that re-initialization. Before opening a flow the opening end should verify that the version number of the receiving MDTP endpoint is at least 3. If the version number is less than 3 then the MDTP endpoint must NOT attempt to open a flow. Stewart & Xie [Page 35] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 7.1 Initiating a flow. A flow is initiated by sending a Flow Initiate/Close Message. In all flow datagram the NOB bit is set. For the Flow Initiate Message the UNR mode bit set as well. The Acknowledgment number (Seen) and the Sequence Number (Send) is set to 0 unless this is the first message in which case the TAG unlock value is set in the Send (see section 4.1). Until a flow is open successfully a receiver of a non-opened flow datagram will silently discard the datagram. Upon sending a flow initiation a T3-Send timer will be started on flow 0. The timer will follow the same rules for retransmission and timing as outlined in section 5. The following illustration demonstrates the opening of flow 5: Endpoint A Endpoint Z {App Initiates flow 5} [Header Flags=NOB Mode=UNR Part=0,Of=1 Seen=00000000,Send=0x0000 0000,Size=0, flow=0x0005 dg=0000 ]------> (Start T3-send timer f=5) (Cancel T3-send timer f=5) <----------------- [Header Flags=NOB|ACK Mode=UNR Part=0,Of=1 Seen=0x00000005,Send=0x00000000, Size=0, flow=0000 dg=0000] In the above example note that for flow 0, unlike all others, no T2-Recv timer is ever started. Each flow open/close must be independently acknowledged. Note also that in the reply acknowledgment the ACK bit is set. If unlikely event that Endpoint-Z wished to piggy back the open of flow 5 with a flow open of its own the sequence would look as follows: Stewart & Xie [Page 36] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 Endpoint A Endpoint Z {App Initiates flow 5} [Header Flags=NOB Mode=UNR Part=0,Of=1 Seen=0,Send=0, Size=0, flow=5, dg=0 ]------> (Start T3-send timer f-5) {App Initiates flow 8} (Cancel T3-send timer f-5) <----------------- [Header Flags=NOB|ACK Mode=UNR Part=0,Of=1 Seen=5, Send=0, Size=0,flow=0008 dg=0000] (Start T3-send timer - f8) [Header Flags=NOB|ACK Mode=0 Part=0,Of=1 Seen=8,Send=0,Size=0, flow=0, dg=0]-------------------------------->(Cancel T3-send timer - f8) Note that at the initiate of a flow, the timer started is considered the first timer for the flow, but it is sent over flow 0. Note also that a piggyback open is not allowed if the TAG sequences have not been exchanged. 7.2 Flow acknowledgments Normal dataflow's follow the normal MDTP transmission formats (see section 5) Acknowledgments when possible are piggy-backed on datagrams. Each flow maintains its own send timer. When no piggyback of data and acknowledgments is possible, more than one flow can be be acknowledged at the same time by using the Flow Extend Acknowledgment format. The Send field (now considered the number of extended acknowledgments) will contain the number of acknowledgments in the array. During data transfer if the when the datagram number reaches 0xffff the next packet should be labeled 1. Pkt 0 is never used for datagram transfer. One T2-Recv timer is maintained for all flows. If more than one flow is being timed and a datagram is to be transmitted then one of the flows will be acknowledged and the T2-Recv timer will be left running until expiration, which will then cause the Flow Extended Acknowledgment to be sent, acknowledging all remaining flows. The following examples illustrate examples of flow acknowledgments. For this example we assume that Endpoint A has 3 flows open 5,7 and 9. Endpoint Z has 4 flows open 0x11, 8 4 and 1. Stewart & Xie [Page 37] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 Example 1: Endpoint A sends to Endpoint Z T2-Recv timer expires Endpoint A Endpoint Z { App sends first datagram on flow 5} [Header Flags=NOB|DAT Mode=REL Part=0,Of=1 Seen=0x0000 0000,Send=0x0005 0001,Size=20]------>(Start T2-Recv) (Start T3-send timer-f5) { T2-Recv Timer Expires } (Cancel T3-send timer) <--------------- [Header Flags=NOB|ACK Mode=REL Part=0,Of=1 Seen=0x00050001,Send=0x00000000, Size=0] (Start T3-send timer) Example 1: Endpoint A sends to Endpoint Z T2-Recv timer expires Endpoint A Endpoint Z { App sends first datagram on flow 5} [Header Flags=NOB|DAT Mode=REL Part=0,Of=1 Seen=0x0000 0000,Send=0x0005 0001,Size=20]------>(Start T2-Recv) (Start T3-send timer-f5) { T2-Recv Timer Expires } (Cancel T3-send timer) <--------------- [Header Flags=NOB|ACK Mode=REL Part=0,Of=1 Seen=0x00050001,Send=0x00000000, Size=0] (Start T3-send timer) Stewart & Xie [Page 38] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 Example 2: Endpoint A sends multiple messages to Endpoint Z and T2-Recv timer expires Endpoint A Endpoint Z { App sends 1 datagram on flow 5} [Header Flags=NOB|DAT Mode=REL Part=0,Of=1 Seen=0x0000 0000,Send=0x0005 0002,Size=20]------>(Start T2-Recv) (Start T3-send timer-f5) { App sends 1 datagram on flow 9} [Header Flags=NOB|DAT Mode=REL Part=0,Of=1 Seen=0x0000 0000,Send=0x0009 0004,Size=20]------> (Start T3-send timer-f9) { App sends 1 datagram on flow 5} [Header Flags=NOB|DAT Mode=REL Part=0,Of=1 Seen=0x0000 0000,Send=0x0005 0003,Size=20]------> { App sends 1 datagram on flow 7} [Header Flags=NOB|DAT Mode=REL Part=0,Of=1 Seen=0x0000 0000,Send=0x0007 0011,Size=20]------> { T2-Recv Timer Expires } (Cancel T3-send timer-f5) <-------------- [Header Flags=NOB|ACK (Cancel T3-send timer-f9) Mode=REL (Cancel T3-send timer-f7) Part=0,Of=1 Seen=0x00050003, Send=0x00000002, Size=0, ex[0]=0x00090004, ex[1]=0x00070011 ] Stewart & Xie [Page 39] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 Example 3: Endpoint A sends a message to Endpoint Z, Endpoint Z piggy-backs a ack. { App sends 1 datagram on flow 5} [Header Flags=NOB|DAT Mode=REL Part=0,Of=1 Seen=0x0000 0000,Send=0x0005 0004,Size=20]------>(Start T2-Recv) (Start T3-send timer-f5) { App sends 1 message flow 0x11} ( cancel T2-Recv Timer ) (Cancel T3-send timer-f5) <----------------- [Header Flags=NOB|DAT|ACK (Start T2-Recv timer) Mode=REL Part=0,Of=1 Seen=0x0005 0004, Send=0x0011 0008, Size=10] (Start T3-send timer-f0x11) { T2-Recv Timer Expires } [Header Flags=NOB|ACK Mode=REL Part=0,Of=1 Seen=0x0000 0000,Send=0x0011 0008,Size=0]------>(Cancel T3-send-f0x11) Stewart & Xie [Page 40] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 Example 4: Endpoint A sends a multiple message to Endpoint Z, Endpoint Z piggy-backs a ack and sends a Extended flow Ack. { App sends 1 datagram on flow 5} [Header Flags=NOB|DAT Mode=REL Part=0,Of=1 Seen=0x0000 0000,Send=0x0005 0005,Size=20]------>(Start T2-Recv) (Start T3-send timer-f5) { App sends 1 datagram on flow 9} [Header Flags=NOB|DAT Mode=REL Part=0,Of=1 Seen=0x0000 0000,Send=0x0009 0004,Size=20]------> (Start T3-send timer-f9) { App sends 1 message flow 0x4} (Cancel T3-send timer-f5) <-------------- [Header Flags=NOB|DAT|ACK (Start T2-Recv timer) Mode=REL Part=0,Of=1 Seen=0x00050005,Send=0x00040004, Size=10] (Start T3-send timer-f0x4) { T2-Recv Timer Expires } (Start T3-send timer) (Cancel T3-send timer) <-------------- [Header Flags=NOB|ACK Mode=REL Part=0,Of=1 Seen=0x00090004,Send=0x00000000, Size=0] { T2-Recv Timer Expires } [Header Flags=NOB|ACK Mode=REL Part=0,Of=1 Seen=0x0000 0000,Send=0x0004 0004,Size=0]------>(Cancel T3-send-f0x4) Retransmissions and resends are handled per section 5 but using the flow formats (i.e. the NOB bit set) as described above. The rules for retransmission, windowing, flow control and declaration of endpoint death are applied has defined in section 5. Note that messages to the different flows are handed up ordered correctly within the flow but not delayed with respect to any other flows transmission or retransmission. 7.3 Flow session closing The application may signal a closing of a flow. If this occurs the implementation will inform its peer of the closing so that resources used to track and maintain the flow can be reused/freed. The following sequence is used to release a flow in this example we see the closing of flow 5. Note it is up to the sender to assure that all outstanding datagrams are acknowledged before closing a flow: Stewart & Xie [Page 41] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 Endpoint A Endpoint Z {App Initiates flow 5} [Header Flags=NOB|RES Mode=UNR Part=0,Of=1 Seen=0,Send=0,Size=0, flow=5, dg=0 ]------> (Start T3-send timer f-5) (Cancel T3-send timer f-5) <----------------- [Header Flags=NOB|ACK|RES Mode=UNR Part=0,Of=1 Seen=5,Send=0, Size=0, flow=0, dg=0] Datagrams received by a endpoint directed to a closed flow should be silently discarded. 8. Mixed Mode Data Transmission An endpoint can switch between reliable and unreliable transfer modes at any time during the data transfer. The following sequence illustrates such a transfer mode change, in which both endpoints starts with the unreliable transfer mode, and then Endpoint A switches to reliable transfer mode. Endpoint A Endpoint Z {App send 1 message} <------------------ [Header Flags=DAT|ACK Mode=UNR Part=0,Of=1 Seen=11201,Send=1,Size=450] .. {App send 1 message} [Header Flags=DAT|ACK Mode=UNR Part=0,Of=1 Seen=451,Send=11201,Size=100]------> Stewart & Xie [Page 42] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 .. {App send 1 message} [Header Flags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=451,Send=11301,Size=100]------> (Start T2-receive timer) (Start T3-send timer) {App sends 1 message} (Cancel T2-receive timer) /------- [Header Flags=DAT|ACK / Mode=UNR / Part=0,Of=1 / Seen=11401,Send=1,Size=450] (Cancel T3-send timer) <-------/ .. {App sends 1 message} [Header Flags=DAT|ACK Mode=GAR Part=0,Of=1 Seen=451,Send=11401,Size=100]------> (Start T2-receive timer) (Start T3-send timer) .. {Timer T2 Expires} (Cancel T3-send timer) <------------------- [Header Flags=ACK Mode=0 Part=0,Of=0 Seen=11501,Send=146] Note that in the second datagram sent by Endpoint A the mode is switched to reliable transfer mode (with GAR bit set). This causes Endpoint A to start its T3-send timer. When Endpoint Z receives the datagram and realizes the mode change, it starts its T2-receive timer. At this point, Endpoint Z also must update its Seen value to 11301. This will allow Endpoint Z to align its Seen counter to the Seen value of this first reliable datagram from Endpoint A. This prevents Endpoint Z from requesting retransmission of data that Endpoint A may not have. 9. Bundled Messages In order to increase network utilization, MDTP allows an endpoint to bundle small application messages into one single datagram for transmission. This bundled mode can be applied to both reliable and unreliable datagrams. An endpoint indicates to its peer that it is currently in bundled Stewart & Xie [Page 43] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 mode by setting the BUN bit in the mode field. 9.1 Format of Bundled Datagram The ISB bit in the flag field is set to indicate the current datagram is bundled, i.e., it contains multiple messages. The format of a bundled datagram is defined as follow: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MDTP Protocol Identifier 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MDTP Protocol Identifier 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number (Seen) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number (Send) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data Size | Part | Of | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Flags | Mode | Version | Num On Queue | |N N W I F R D A|B S W R R B G U| | | |O O I S I E A C|R H N E E U A N| | | |G B N B R S T K|O U R 1 2 N R R| | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Number Of Messages | Size of first message B1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | B1 octets of data | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Size of second message B2 | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | B2 octets of data | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \ \ / / \ \ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Size of last message BL | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | BL octets of data | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Data Size in a bundled datagram indicates the actually size of the data field of the datagram, including both the bundling overhead and the actually application data. Since no fragmentation is allowed in a bundled datagram, the Part field will always be '0' and the Of field always be '1'. Stewart & Xie [Page 44] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 The first two octets of the data field is a 16 bit integer indicating the number of messages bundled in the datagram. This is followed immediately by a list of bundled messages. Each bundled message starts with an integer of two octets indicating the size of the data in the message, followed by the data itself. All integers in the datagram should be transmitted in the network byte order. 9.2 Bundled Transfer Two protocol parameters, namely the Min.Bundle and Max.Bundle, are used to control the assembly of bundled datagrams. If the current size of a bundled datagram is smaller than Min.Bundle, the endpoint will withhold the datagram from transmission and start T4-bundle timer. If new out-bound data becomes available for transmission, the endpoint will attempt to bundle the new data with the current withheld datagram by using the following rules: A) If the size of the new data is greater than or equal to Min.Bundle, the current withheld datagram will be transmitted and T4-bundle timer will be canceled. Then, the new data will be transmitted in a separate datagram. B) If the size of the new data is less than Min.Bundle, but the combined size of the current datagram and the new data is greater than or equal to Max.Bundle, the current datagram will be sent and the new data will be withheld as the new current datagram. C) If the size of the new data is less than Min.Bundle, and the combined size of the current datagram and the new data is less than Max.Bundle, the new data will be bundled into the current datagram and the bundled datagram will be immediately transmitted. D) If the size of the new data is less than Min.Bundle, and the combined size of the current datagram and the new data is less than Min.Bundle, the new data will be bundled into the current datagram. And the T4-bundle timer will be restarted. E) If T4-bundle timer expires, the current datagram will be sent immediately. F) If the size of the new data is greater than the Max.Bundle, the current datagram will be sent. Then, the new data will be fragmented for transmission (see 9). Stewart & Xie [Page 45] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 The following is an example of bundled data transfer, assuming Max.Bundle=4096 and Min.Bundle=1700: Endpoint A Endpoint Z {App sends 1 messages of 100 octets} (withhold and Start T4-Bundle timer) .. {App sends 1 messages of 100 octets} (bundling into current datagram) .. {App sends 1 messages of 100 octets} (bundling into current datagram) .. {T4-bundle timer expires} [Header Flags=DAT|ACK Mode=GAR|BUN Part=0,Of=1 Seen=146,Send=1001,Size=308]--------> (Start T2-receive timer) (T3-send timer starts) .. {Timer T2 Expires} (cancel T3-send) <---------------- [Header Flags=ACK Mode=0 Part=0,Of=0 Seen=1309,Send=146] Notice that the Data Size in the datagram sent by Endpoint A is not 300 but 308. This is due to the fact that this size reflects the size of the data field of the datagram including the bundling overhead. When the bundled datagram arrives at the receiving endpoint, each message is unbundled and delivered separately to the upper level application. 10. Fragmented Messages When the size of an out-bound message exceeds the value defined in the protocol parameter Max.Bundle, the endpoint will fragment the message into smaller pieces of sizes equal to or smaller than Max.Bundle and send each piece out in a separate datagram. The Part and Of fields are used to disassemble and reassemble the fragmented message. The following example shows the transmission of a fragmented message (assuming Max.Bundle=4096, Min.Bundle=1700): Stewart & Xie [Page 46] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 Endpoint A Endpoint Z {App sends 1 messages 8544 octets long} [Header Flags=DAT|ACK Mode=GAR|BUN Part=0,Of=3 Seen=146,Send=1001,Size=4072]-------> (Start T2-receive timer) [Header Flags=DAT|ACK Mode=GAR|BUN Part=1,Of=3 Seen=146,Send=5073,Size=4072]-------> [Header Flags=DAT|ACK Mode=GAR|BUN Part=2,Of=3 Seen=146,Send=9145,Size=400]--------> (Start T3-send timer) .. {Timer T2 Expires} /----------- [Header Flags=ACK / Mode=0 / Part=0,Of=0 (cancel timer T3) <-----------/ Seen=9545,Send=146] Notice that Endpoint A is using the reliable transfer mode to send the fragmented message. In this mode, Endpoint Z will hold the fragments and request retransmission if a fragment is found missing, i.e., a gap is found in the received data (see 5). When all the parts of the fragmented message are received, the endpoint will re-assemble the message and dispatch it to the upper level application. It is also allowed in MDTP to send fragmented message using unreliable transfer mode. However, in unreliable mode, each fragment datagram will be dispatch to the application upon its arrival, and no retransmission will be requested even if a fragment is found missing. Bundling is prohibited if the current datagram contains a fragment of a fragmented message. 11. Non-protocol Datagrams The MDTP protocol allows an endpoint to send and receive non-protocol datagrams such as the traditional UDP datagrams. Non-protocol datagrams are detected by the absence of the MDTP protocol identifiers at the beginning of the datagram. A non-protocol transmission received by an MDTP endpoint is termed as a "raw" datagram. When a raw datagram arrives, the receiving endpoint will set itself into raw mode and start sending back to its peer in raw mode as well. Once an endpoint is in raw mode with a peer, only a change of operational mode by the application or a reception of a MDTP datagram will bring the endpoint out of raw mode. In the latter case, the Stewart & Xie [Page 47] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 endpoint will use the default MDTP operational mode predefined by the application for MDTP transmissions. When an endpoint changes from raw mode into MDTP mode, the normal MDTP initiation messages must be exchanged between the two endpoints, as described in 4. 12. Broadcast and Multicast Broadcast and multicast are supported by MDTP when the underlying transport layer supports them. Both types of transmissions are carried out in unreliable transfer mode. For broadcast datagrams, the BRO bit will be set to '1' and the UNR bit will be set to '0' in the mode field. For multicast datagrams, both the BRO bit and the UNR bit will be set to '1'. For multicast datagrams, the value in the Send field will indicate the number of multicast datagrams transmitted by the sender. This information makes it possible for the receiver of the multicast to detect duplicated multicast datagrams and also to detect lost multicast datagrams. A multicast datagram transmission MUST use the alternate multicast header filling in both the multicast transmit to address as well as its lowest network address in the multicast from address. Bundling and fragmentation are not allowed in either multicast or broadcast datagrams. 12.1 Multicast/Broadcast initialization. No initiation is needed for an endpoint to transmit multicast or broadcast datagrams. However, caution should be taken when transmitting non-protocol datagrams (i.e., datagrams with no MDTP protocol header) in multicast or broadcast transmission. This is because the non-protocol datagrams may inadvertently force all the receiving endpoints of the multicast or broadcast transmission into raw mode (see 10). 12.2 Transmission of Broadcast Datagrams. When sending a broadcast datagram, the endpoint will not take effort to prevent duplicate transmissions (this is likely to occur especially when multiple networks exist). The application at the receiving end must be prepared to handle duplicate broadcast messages. Stewart & Xie [Page 48] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 The following is an example of broadcast datagram transmission: Endpoint A Endpoint Z {application sends 2 messages } [Header Flags=DAT Mode=BRO Part=0,Of=1 Seen=0,Send=0,Size=200]--------------> (Datagram may appear more than once.) [Header Flags=DAT Mode=BRO Part=0,Of=1 Seen=0,Send=0,Size=100]--------------> Notice that no timers are used on either end, and Seen and Send values in the datagrams are always '0'. 12.3 Transmission of Multicast Datagrams. Unlike the broadcast transmission, when multicast datagrams are transmitted the receiving endpoints should take effort to prevent duplicate copies of datagrams from being distributed to their applications. This is possible because the transmission of multicast datagrams is usually addressed to a special multicast network address. The receiving endpoints can thus use this multicast address in combination with the sender's address to detect duplicate transmissions of a multicast datagram. The following example illustrates multicast transmissions between two endpoints. Endpoint A Endpoint Z {app multicasts a message} [Header Flags=DAT Mode=BRO|UNR Part=0,Of=1 Seen=0,Send=5,Size=250]--------------> (may receive more than one copy) .. {app multicasts a message} [Header Flags=DAT Mode=BRO|UNR Part=0,Of=1 Seen=0,Send=6,Size=500]--------------> (may receive more than one copy) Notice the values of the Send field in the multicast datagrams (which are 5 and 6, respectively). They represent the sequence numbers of the multicast datagrams Endpoint A has sent out. Endpoint Z should use the Stewart & Xie [Page 49] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 Send value found in the incoming multicast datagrams to detect any missing or duplicate datagrams. Duplicate datagrams will be discarded and no effort will be made to retransmit lost multicast datagrams. For example, each endpoint can track the last 32 datagrams received by using a sliding window of 32 bits. Each time a new datagram with a sequence number higher than the current window head is received, the window can be moved up. If a datagram received has a sequence number below the current window head, then a check of the last 32 received datagrams' sequence numbers can determine whether the new datagram is a duplicate. If the sequence number of the new datagram is below the current window tail then the datagram should be considered a duplicate and discarded. 12.4 Reset of the Multicast Datagram Sequence Number If the Seen field in a multicast datagram is set to '1', it is an indication that the sender has reset its multicast datagram sequence number. The receiving endpoint, upon detecting this reset indicator in the incoming multicast datagram, should start a procedure to adopt the new sequence number for error detection. However, caution should be taken to prevent false resets due to duplicated datagrams with reset indicator propagating through multiple networks. To guarantee that all receivers of the multicast group adopt the new sequence number, the reset indicator should be repeated within the first N multicast datagrams sent out after the reset. N is predefined by the protocol parameter Num.Of.Mcast.Reset.Msg. At the receiving endpoint, when the reset indicator is detected the new sequence number will be adopted. However, if two reset events are detected within a predefined time interval (Min.Mcast.Time.To.Reset), the second reset indicator will be ignored. Stewart & Xie [Page 50] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 The following is an example (assuming Num.Of.Mcast.Reset.Msg = 4): Endpoint A Endpoint Z [Header Flags=DAT Mode=BRO|UNR Part=0,Of=1 Seen=0,Send=17859,Size=300]----------> `< {reset message sequence number indicated} [Header Flags=DAT Mode=BRO|UNR Part=0,Of=1 Seen=1,Send=1,Size=250]--------------> (record new sequence number, datagram may appear more than once) [Header Flags=DAT Mode=BRO|UNR Part=0,Of=1 Seen=1,Send=2,Size=250]--------------> (may appear more than once) [Header Flags=DAT Mode=BRO|UNR Part=0,Of=1 Seen=1,Send=3,Size=500]--------------> (may appear more than once) [Header Flags=DAT Mode=BRO|UNR Part=0,Of=1 Seen=1,Send=4,Size=500]--------------> (may appear more than once) [Header Flags=DAT Mode=BRO|UNR Part=0,Of=1 Seen=0,Send=5,Size=100]--------------> (may appear more than once) In the above example Endpoint Z would detect the reset indicator in the second multicast datagram and adopt the new sequence number which is 1. Then, it would ignore the reset indicator in the subsequent three (3) datagrams since they arrived within a very short time interval. 13. Interface with upper level protocols The upper level protocols (ULP) shall request for services by passing primitives to MDTP and shall receive notifications from MDTP for various events. The primitives and notifications described in this section should be Stewart & Xie [Page 51] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 used as a guideline for implementing MDTP. 13.1 Init.MDTP primitive This primitive allows MDTP to initialize its internal data structures and allocate necessary resources for setting up its operation environment. Note that once MDTP is initialized, ULP can communicate directly with any other endpoints without re-invoking this primitive. Mandatory attributes: None. Optional attributes: The following types of attributes may be passed along with the primitive: o Timer selection and its operation syntax -- to indicate to MDTP an alternative timer the MDTP should use for its operation. o Initial MDTP operation mode; o IP port number, if ULP wants it to be specified; 13.2 Send.Data primitive This is the main method to send datagrams via MDTP. Mandatory attributes: o data - This is the payload ULP wants to transmit; o size - The size of the payload in number of octets; o to-address - The IP address and port number of the intended receiver. In case of redundant networks, to-address can be any one of the multiple IP addresses of the receiver. The network which the datagram will actually be sent through will be determined by MDTP due to the link rotation, unless the current mode prohibits MDTP link rotation; in such case the datagram will be sent through the network specified by to-address (see section 4.5). Optional attributes: o mode-flags - This indicates a new MDTP operation mode, taking effect immediately including the current datagram send; o context - optional information that will be carried in the Send.Failure notification to the ULP if the transportation of this datagram fails. 13.3 Receive.Data primitive This primitive shall return the first datagram in the MDTP in-queue to ULP, if there is one available. It may, depending on the specific implementation, also return other informations such as the sender's Stewart & Xie [Page 52] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 address, whether there are more datagrams available for retrieval, etc. The behavior is undefined if no datagram is available when this primitive is invoked. Mandatory attributes: o buffer - the memory location indicated by the ULP to store the received datagram and other information. Optional attributes: None. 13.4 Data.Arrive notification MDTP shall invoke this notification on the ULP when a datagram is successfully received and ready for retrieval. 13.5 Send.Failure notification If a datagram can not be delivered MDTP shall invoke this notification on the ULP. The following may be optionally passed with the notification: o data - the location ULP can find the un-delivered datagram. o context - optional information associated with this datagram (see 13.2). 13.5 Link.Status.Change notification When a link is marked down (e.g., when MDTP detects a link failure), or marked up (r.g., when MDTP detects a link recovery), MDTP shall invoke this notification on the ULP. The following shall be passed with the notification: o link-address - This indicates the IP address of the affected link; o new-status - This indicates the new status of the link; 13.6 Communication.Lost notification When MDTP loses communication to an endpoint completely or detects that the endpoint has performed a shut-down operation, it shall invoke this notification on the ULP. The following shall be passed with the notification: o status - This indicates what type of event that has occurred; o endpoint-id - The IP address and port number to identify the endpoint; o packets-enqueue - The number and location of un-sent datagrams still holding by MDTP; Stewart & Xie [Page 53] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 o last-acked - the sequence number last acked by that peer endpoint; o last-sent - the sequence number last sent to that peer endpoint; 14. Suggested timer and MTU values. The following are suggested timer values for MDTP: T1-init Timer - 160 ms T2-receive Timer - 20 ms T3-send Timer - 160 ms T4-bundle Timer - 40 ms T5-Heart Beat - 4000 ms The following protocol parameters are recommended: Min.Bundle - 1000 octets Max.Bundle - 1432 octets Max.Retransmit - 10 attempts Max.Init.Retransmit - 8 attempts Min.Mcast.Time.To.Reset - 5 seconds Num.Of.Mcast.Reset.Msg - 5 messages 15. Acknowledgments The authors wish to thank Brian Wyld, Sankar A, Henry Houh, Gary Lehecka, Ken Morneault, Lyndon Ong, and others for their very valuable comments. 16. Author's Addresses Randall R. Stewart Tel: +1-847-632-7438 Cellular Infrastructure Group EMail: stewrtrs@cig.mot.com Motorola, Inc. 1475 W. Shure Drive, #2C-6 Arlington Heights, IL 60004 USA Qiaobing Xie Tel: +1-847-632-3028 Cellular Infrastructure Group EMail: xieqb@cig.mot.com Motorola, Inc. 1501 W. Shure Drive, #2309 Arlington Heights, IL 60004 USA Stewart & Xie [Page 54] Internet Draft Multi-network Datagram Transmission Protocol Apr 1999 17. References [1] Postel, J. (ed.), "Internet Protocol - DARPA Internet Program Protocol Specification", RFC 791, USC/Information Sciences Institute, September 1981. [2] Postel, J., "User Datagram Protocol", RFC 768, USC/Information Sciences Institute, August 1980. [3] Postel, J. (ed.), "Transmission Control Protocol", RFC 793, USC/ Information Sciences Institute, September 1981. This Internet Draft expires in 6 months from April 1999. Stewart & Xie [Page 55]