Internet Engineering Task Force S. Dawkins INTERNET DRAFT G. Montenegro M. Kojo V. Magret N. Vaidya September 22, 2000 End-to-end Performance Implications of Links with Errors draft-ietf-pilc-error-05.txt Status of This Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Comments should be submitted to the PILC mailing list at pilc@grc.nasa.gov. Distribution of this memo is unlimited. This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress.'' The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract The rapidly-growing Internet is being accessed by an increasingly wide range of devices over an increasingly wide variety of links. At least some of these links do not provide the reliability that hosts expect, and this expansion into unreliable links causes some Internet protocols, especially TCP [RFC793], to perform poorly. Expires March 22, 2001 [Page 1] INTERNET DRAFT PILC - Links with Errors September 2000 Specifically, TCP congestion control [RFC2581], while appropriate for connections that lose traffic primarily because of congestion and buffer exhaustion, interact badly with connections that traverse links with high uncorrected error rates. The result is that senders may spend an excessive amount of time waiting on acknowledgements that aren't coming, whether these losses are due to data losses in the forward path or acknowledgement losses in the return path, and then, although these losses are not due to congestion-related buffer exhaustion, the sending TCP transmits at substantially reduced traffic levels as it probes the network to determine "safe" traffic levels. This document discusses the specific TCP mechanisms that are problematic in these environments, and discusses what can be done to mitigate the problems without introducing intermediate devices into the connection. This document does not address issues with other transport protocols, for example, UDP. Expires March 22, 2001 [Page 2] INTERNET DRAFT PILC - Links with Errors September 2000 Table of Contents 1.0 Introduction .................................................. 4 1.1 Relationship of this recommendation and [PILC-PEP] ......... 4 1.2 Relationship of this recommendation and [PILC-LINK] ........ 5 1.3 Should you be reading this recommendation? ................ 5 2.0 Errors and Interactions with TCP Mechanisms ................... 6 2.1 Slow Start and Congestion Avoidance [RFC2581] .............. 6 2.2 Fast Retransmit and Fast Recovery [RFC2581] ................ 7 2.3 Selective Acknowledgements [RFC2018, SACK-EXT] ............. 9 3.0 Summary of Recommendations .................................... 10 4.0 Topics For Further Work ....................................... 10 4.1 Achieving, and maintaining, large windows .................. 11 5.0 Acknowledgements .............................................. 12 Changes ........................................................... 12 References ........................................................ 13 Authors' addresses ................................................ 16 Appendix A: When TCP Defers Recovery to the Link Layer ............ 18 Appendix B: Detecting Transmission Errors With Explicit Notifi- cations ........................................................... 18 Appendix C Appropriate Byte Counting [ALL99] (Experimental) ....... 20 Expires March 22, 2001 [Page 3] INTERNET DRAFT PILC - Links with Errors September 2000 1.0 Introduction It has been axiomatic that most losses on the Internet are due to congestion, as routers run out of buffers and discard incoming traffic. This observation is the basis for current TCP congestion avoidance strategies - if losses are due to congestion, there is no need for an explicit "congestion encountered" notification to the sender. Quoting Van Jacobson in 1988: "If packet loss is (almost) always due to congestion and if a timeout is (almost) always due to a lost packet, we have a good candidate for the `network is congested' signal." [VJ-DCAC] This axiom has served the Internet community well, because it allowed the deployment of TCPs that have allowed the Internet to accomodate explosive growth in link speeds and traffic levels. This same explosive growth has attracted users of networking technologies that DON'T have low uncorrected error rates - including many satellite-connected users, and many wireless Wide Area Network-connected users. Users connected to these networks may not be able to transmit and receive at anything like available bandwidth because their TCP connections are spending time in congestion avoidance procedures, or even slow-start procedures, that were triggered by transmission error in the absence of congestion. This document makes recommendations about what the participants in connections that traverse high error-rate links may wish to consider doing to improve utilization of available bandwidth in ways that do not threaten the stability of the Internet. Applications use TCP in very different ways, and these have interactions with TCP's behavior [HPF-CWV]. Nevertheless, it is possible to make some basic assumptions about TCP flows. Accordingly, the mechanisms discussed here are applicable for all uses of TCP, albeit in varying degrees according to different scenarios (as noted where appropriate). This document does not address issues with non-TCP transport protocols, for example, UDP. 1.1 Relationship of this recommendation and [PILC-PEP] This document discusses end-to-end mechanisms that do not require TCP-level awareness by intermediate nodes. This places severe limitations on what the end nodes can know about the nature of Expires March 22, 2001 [Page 4] INTERNET DRAFT PILC - Links with Errors September 2000 losses that are occurring between the end nodes. Attempts to apply heuristics to distinguish between congestion and transmission error have not been successful [BV97, BV98, BV98a]. A companion PILC document on Performance-Enhancing Proxies, [PILC-PEP], relaxes this restriction; because PEPs can be placed on boundaries where network characteristics change dramatically, PEPs have an additional opportunity to improve performance over links with uncorrected errors. However, generalized use of PEPs contravenes the end-to-end principle and is highly undesireable given their deleterious implications with respect to the following [PILC-PEP]: fate sharing (a PEP adds a third point of failure besides the endpoints themselves), end-to-end reliability and diagnostics, security (particularly, network layer security such as IPsec), mobility (handoffs are much more complex because state must be transferred), asymmetric routing (PEPs typically require being on both the forward and reverse paths of a connection), scalability (PEPs add more state to maintain), QoS transparency and guarantees, etc Not every type of PEP has all the drawbacks listed above. Nevertheless, the use of PEPs may have very serious consequences which must be weighed carefully. 1.2 Relationship of this recommendation and [PILC-LINK] This recommendation is for use with TCP over subnetwork technologies that have already been deployed. A companion PILC recommendation, [PILC-LINK], is for designers of subnetworks that are intended to carry Internet protocols, and have not been completely specified, so that the designers have the opportunity to reduce the number of uncorrected errors TCP will encounter. 1.3 Should you be reading this recommendation? All known subnetwork technologies provide an "imperfect" subnetwork service - the bit error rate is non-zero. But there's no obvious way for end stations to tell the difference between losses due to congestion and losses due to transmission errors. It may be obvious if a directly-attached subnetwork reports transmission errors. But both hosts won't be directly attached to the same subnetwork in all but the most trivial networks, so even if one host receives specific error reports, the other host probably won't. Expires March 22, 2001 [Page 5] INTERNET DRAFT PILC - Links with Errors September 2000 Another way of deciding if a subnetwork should be considered to have a "high error rate" is by appealing to mathematics. A formula giving an upper bound on the performance of any additive- increase, multiplicative-decrease algorithm likely to be implemented in TCP in the future was derived in [MSMO97]: MSS 1 BW = 0.93 --- ------- RTT sqrt(p) where MSS is the segment size being used by the connection RTT is the end-to-end round trip time of the TCP connection p is the packet loss rate for the path (i.e. .01 if there is 1% packet loss) If one plugs in an observed packet loss rate and then does the math and sees predicted bandwidth utilization that is greater than the link speed, the connection won't benefit from recommendations in ERROR, because the level of packet losses being encountered won't affect the ability of TCP to utilize the link. If, however, the predicted bandwidth is less than the link speed, packet losses are affecting the ability of TCP to utilize the link, and if further investigation reveals a subnetwork with significant transmission error rates, the recommendations in ERROR will improve the ability of TCP to utilize the link. 2.0 Errors and Interactions with TCP Mechanisms A TCP sender adapts its use of bandwidth based on feedback from the receiver. When TCP is not able to distinguish between losses due to congestion and losses due to uncorrected errors, it is not able to accurately determine available bandwidth. Some TCP mechanisms, targeting recovery from losses due to congestion, coincidentally assist in recovery from losses due to uncorrected errors as well. 2.1 Slow Start and Congestion Avoidance [RFC2581] Slow Start and Congestion Avoidance [RFC2581] are essential to the Internet's stability. These mechanisms were designed to accommodate networks that didn't provide explicit congestion notification. Although experimental mechanisms like [RFC2481] are moving in the direction of explicit notification, the effect of ECN on ECN-aware TCPs is essentially the same as the effect Expires March 22, 2001 [Page 6] INTERNET DRAFT PILC - Links with Errors September 2000 of implicit congestion notification through congestion-related loss. TCP connections experiencing high error rates interact badly with Slow Start and with Congestion Avoidance, because high error rates make the interpretation of losses ambiguous - the sender cannot know intuitively whether detected losses are due to congestion or to data corruption. TCP makes the "safe" choice - assume that the losses are due to congestion. - Whenever TCP's retransmission timer expires, the sender assumes that the network is congested and invokes slow start. - Less-reliable link layers often use small link MTUs. This slows the rate of increase in the sender's window size during slow start, because the sender's window is increased in units of segments. Small link MTUs alone don't improve things unless Path MTU discovery is also used to prevent fragmentation. Path MTU discovery allows the most rapid opening of the sender's window size during slow start, but a number of round trips may still be required to open the window completely. Recommendation: Slow Start and Congestion Avoidance are MUSTs in [RFC1122], itself a full Internet Standard. Recommendations in this document will not interfere with these mechanisms. 2.2 Fast Retransmit and Fast Recovery [RFC2581] TCPs deliver data as a reliable byte-stream to applications, so when a segment is lost (whether due to either congestion or transmission loss), delivery of data to the receiving application must wait until the missing data is received. Missing segments are detected by the receiver by segments arriving with out-of-order sequence numbers. TCPs SHOULD immediately send an acknowledgement when data is received out-of-order [RFC2581], sending the next expected sequence number with no delay, so that the sender can retransmit the required data and the receiver can resume delivery of data to the receiving application. When an acknowledgement carries the same expected sequence number as an acknowledgement that has already been sent for the last in-order segment received, these acknowledgements are called "duplicate ACKs". Because IP networks are allowed to reorder packets, the receiver may send duplicate acknowledgements for segments that arrive out of order due to routing changes, link-level retransmission, etc. When a TCP sender receives three duplicate ACKs, fast Expires March 22, 2001 [Page 7] INTERNET DRAFT PILC - Links with Errors September 2000 retransmit [RFC2581] allows it to infer that a segment was lost. The sender retransmits what it considers to be this lost segment without waiting for the full retransmission timeout, thus saving time. After a fast retransmit, a sender halves its congestion window and invokes the fast recovery [RFC2581] algorithm, whereby it invokes congestion avoidance, but not slow start from a one-segment congestion window. This also saves time. It's important to be realistic about the maximum throughput that TCP can have over a connection that traverses a high error-rate link. Even using Fast Retransmit/Fast Recovery, the sender will halve the congestion window each time a window contains one or more segments that is lost, and will re-open the window by one additional segment for each acknowledgement that is received. If a connection path traverses a link that loses one or more segments during recovery, the one-half reduction takes place again, this time on a reduced congestion window - and this downward spiral will continue until the connection is able to recover completely without experiencing loss. In general, TCP can increase its congestion window beyond the delay-bandwidth product. In links with high error rates, the TCP window may remain rather small for long periods of time due to any of the following reasons: 1. TCP's congestion avoidance strategy is additive-increase, multiplicative-decrease, which means that if additional errors are encountered before the congestion window recovers completely from a 50-percent reduction, the effect can be a "downward spiral" of the congestion window due to additional 50-percent reductions. This "downward spiral" will hold the congestion window below the capacity of the path between the endpoints until the error rate decreases, allowing full recovery by additive increase. Of course, no downward spiral occurs if the error rate is constantly high and the congestion window always remains small. 2. If a network path with high uncorrected error rates DOES cross a highly congested wireline Internet path, congestion losses on the Internet have the same effect as losses due to corruption. Not all causes of small windows are related to errors. For example, HTTP/1.0 commonly closes TCP connections to indicate boundaries between requested resources. This means that these Expires March 22, 2001 [Page 8] INTERNET DRAFT PILC - Links with Errors September 2000 applications are constantly closing "trained" TCP connections and opening "untrained" TCP connections which will execute slow start, beginning with one or two segments. This can happen even with HTTP/1.1, if webmasters configure their HTTP/1.1 servers to close connections instead of waiting to see if the connection will be useful again. A small window - especially a window of less than four segments - effectively prevents the sender from taking advantage of Fast Retransmits. Moreover, efficient recovery from multiple losses within a single window requires adoption of new proposals (NewReno [RFC2582]). Recommendation: Implement Fast Retransmit and Fast Recovery at this time. This is a widely-implemented optimization and is currently at Proposed Standard level. [RFC2488] recommends implementation of Fast Retransmit/Fast Recovery in satellite environments. In cases where SACK (see next section) can not be enabled for both sides of a connection, NewReno [RFC2582] may be used by TCP senders to better handle partial ACKs and multiple losses in a single window. 2.3 Selective Acknowledgements [RFC2018, SACK-EXT] Selective Acknowledgements allow the repair of multiple segment losses per window without requiring one (or more) round-trips per loss. [SACK-EXT] proposes an extension to SACK that allows receivers to provide more information about the order of delivery of segments, allowing "more robust operation in an environment of reordered packets, ACK loss, packet replication, and/or early retransmit timeouts". [SACK-EXT] has been approved for proposed standard as a minor but useful update to Selective Acknowledgements. Unless explicitly stated otherwise, in this document "Selective Acknowledgements" (or "SACK") refers to the combination of [RFC2018] and [SACK-EXT]. Selective acknowledgements are most useful in LFNs ("Long Fat Networks"), because of the long round trip times that may be encountered in these environments, according to Section 1.1 of [RFC1323], and are especially useful if large windows are required, because there is a higher probability of multiple segment losses per window. On the other hand, if error rates are generally low but occasionally increase due to interference, TCP will have the opportunity to increase its window to larger values. When Expires March 22, 2001 [Page 9] INTERNET DRAFT PILC - Links with Errors September 2000 interference occurs, multiple losses within a window are likely to occur. In this case, SACK would provide benefits in speeding the recovery and preventing unnecessary extra reduction of window size. Recommendation: SACK as specified in [RFC2018] and updated by [SACK-EXT] is a Proposed Standard. Implement SACK now for compatibility with other TCPs. 3.0 Summary of Recommendations Because existing TCPs have only one implicit loss feedback mechanism, it is not possible to use this mechanism to distinguish between congestion loss and transmission error without additional information. Because congestion affects all traffic on a path while transmission loss affects only the specific traffic encountering uncorrected errors, avoiding congestion has to take precedence over quickly repairing transmission error. This means that the best that can be achieved without new feedback mechanisms is minimizing the amount of time spent unnecessarily in congestion avoidance. Fast Retransmit/Fast Recovery allows quick repair of loss without giving up the safety of congestion avoidance. In order for Fast Retransmit/Fast Recovery to work, the window size must be large enough to force the receiver to send three duplicate acknowledgements before the retransmission timeout interval expires, forcing full TCP slow-start. Selective Acknowledgements (SACK) extend the benefit of Fast Retransmit/Fast Recovery to situations where multiple segment losses in the window need to be repaired more quickly than can be accomplished by executing Fast Retransmit for each segment loss, only to discover the next segment loss. These mechanisms cover both wireless and wireline environments. This general applicability attracts more attention and analysis from the research community. All of these mechanisms continue to work in the presence of IPsec. 4.0 Topics For Further Work Delayed Duplicate Acknowledgements is an attractive scheme, especially when link layers use fixed retransmission timer mechanisms that may still be trying to recover when TCP-level retransmission timeouts occur, adding additional traffic to the Expires March 22, 2001 [Page 10] INTERNET DRAFT PILC - Links with Errors September 2000 network. This proposal is worthy of additional study, but is not recommended at this time, because we don't know how to calculate appropriate amounts of delay for an arbitrary network topology. It is not possible to use explicit congestion notification as a surrogate for explicit transmission error notification (no matter how much we wish it was!). Some mechanism to provide explicit notification of transmission error would be very helpful. This might be more easily provided in a PEP environment, especially when the PEP is the "first hop" in a connection path, because current checksum mechanisms do not distinguish between transmission error to a payload and transmission error to the header - and, if the header is damaged it's problematic to send explicit transmission error notification to the right endpoints. Losses that take place on the ACK stream, especially while a TCP is learning network characteristics, can make the data stream quite bursty (resulting in losses on the data stream, as well). Several ways of limiting this burstiness have been proposed, including "Appropriate Byte Counting" (ABC) [ALL99], TCP transmit pacing at the sender, and ACK rate control within the network. ABC can lead to behavior that is less bursty than standard TCP, because the congestion window is opened by the number of bytes that have been successfully transfered to the receiver, giving more appropriate behavior for application protocols that initiate connections with relatively short packets. For SMTP, for instance, the client might send a short HELO packet, a short MAIL packet, one or more short RCPT packets, and a short DATA packet - followed by the entire mail body sent as maximum-length packets. ABC would not use ACKs for each of these short packets to increase the congestion window allowing additional full-length packets. 4.1 Achieving, and maintaining, large windows The recommendations described in this document will aid TCPs in injecting packets into ERRORed connections as fast as possible without destabilizing the Internet, and so optimizing the use of available bandwidth. In addition to these TCP-level recommendations, there is still additional work to do at the application level, especially with the dominant application protocol on the World Wide Web, HTTP. HTTP/1.0 (and its predecessor, HTTP/0.9) used TCP connection closing to signal a receiver that all of a requested resource had been transmitted. Because WWW objects tend to be small Expires March 22, 2001 [Page 11] INTERNET DRAFT PILC - Links with Errors September 2000 in size [MOGUL], TCPs carrying HTTP/1.0 traffic experience difficulty in "training" on available bandwidth (a substantial portion of the transfer had already happened, by the time the TCPs got out of slow start). Several HTTP modifications have been introduced to improve this interaction with TCP ("persistent connections" in HTTP/1.0, with improvements in HTTP/1.1 [RFC2616]). For a variety of reasons, many HTTP interactions are still HTTP/1.0-style - relatively short-lived. Proposals which reuse TCP congestion information across connections, like TCP Control Block Interdependence [RFC2140], or the more recent Congestion Manager [BS99] proposal, will have the effect of making multiple parallel connections impact the network as if they were a single connection, "trained" after a single startup transient. These proposals are critical to the long-term stability of the Internet, because today's users always have the choice of clicking on the "reload" button in their browsers and cutting off TCP's exponential backoff - replacing connections which are building knowledge of the available bandwidth with connections with no knowledge at all. 5.0 Acknowledgements This recommendation has grown out of RFC 2757, "TCP Over Long Thin Networks", which was in turn based on work done in the IETF TCPSAT working group. The authors are indebted to the active members of the PILC working group. In particular, Mark Allman gave us copious and insightful feedback. Also, Jamshid Mahdavi provided text replacements. Changes Changes between versions 03 and 04: Other editorial changes and corrections. Changes between versions 02 and 03: Restructure document into discussion of standard mechanisms, work remaining to be done, and appendices on experimental mechanisms. Change "Explicit Corruption Notification" to "Explicit Transmission Error Notification", in order to avoid confusion with "Explicit Congestion Notification". Expires March 22, 2001 [Page 12] INTERNET DRAFT PILC - Links with Errors September 2000 Other editorial changes and corrections. Changes between versions 03 and 04: Incorporated lots of comments from mark allman to numerous to list here. Also incorporated some changes suggested by Jamshid Mahdavi. SACK-EXT is now approved for proposed. Reflected this change in status in the text by treating SACK-EXT in the same way as SACK. Changed section name from Delayed Duplicate Acknowledgements to "When TCP Defers Recovery to the Link Layer" and mentioned Reiner Ludwig's Eifel algorithm. Added reference to link-outage in the appendix. Changes between versions 04 and 05: Added section 1.3. References [ALL99] Mark Allman, "TCP Byte Counting Refinements," ACM Computer Communication Review, Volume 29, Number 3, July 1999. http://www.acm.org/sigcomm/ccr/archive/1999/jul99/ccr-9907-allman.pdf [BBKVP96] Bakshi, B., P., Krishna, N., Vaidya, N., Pradhan, D.K., "Improving Performance of TCP over Wireless Networks," Technical Report 96-014, Texas A&M University, 1996. [BPSK96] Balakrishnan, H., Padmanabhan, V., Seshan, S., Katz, R., "A Comparison of Mechanisms for Improving TCP Performance over Wireless Links," in ACM SIGCOMM, Stanford, California, August 1996. [BS99] Hari Balakrishnan, Srinivasan Seshan, "The Congestion Manager", July, 2000. Work in progress, available at http://www.ietf.org/internet-drafts/draft-ietf-ecm-cm-00.txt [BV97] Biaz, S., Vaidya, N., "Using End-to-end Statistics to Distinguish Congestion and Corruption Lossses: A Negative Result," Texas A&M University, Technical Report 97-009, August 18, 1997. [BV98] Biaz, S., Vaidya, N., "Sender-Based heuristics for Expires March 22, 2001 [Page 13] INTERNET DRAFT PILC - Links with Errors September 2000 Distinguishing Congestion Losses from Wireless Transmission Losses," Texas A&M University, Technical Report 98-013, June 1998. [BV98a] Biaz, S., Vaidya, N., "Discriminating Congestion Losses from Wireless Losses using Inter-Arrival Times at the Receiver," Texas A&M University, Technical Report 98-014, June 1998. [HPF-CWV] Handley, M., Padhye, J., Floyd, S., "TCP Congestion Window Validation," March 2000. Approved for informational rfc, available at http://search.ietf.org/internet-drafts/draft-handley-tcp-cwv-02.txt. [LINK-OUTAGE] G. Montenegro, "Link Outage ICMP Notification," July 2000. Work in progress, available at http://www.ietf.org/internet-drafts/ draft-montenegro-pilc-link-outage-00.txt [LK00] Reiner Ludwig and Randy Katz, "The Eifel Algorithm: Making TCP Robust Against Spurious Retransmissions, " ACM Computer Communication Review, Volume 30, number 1, January 2000. Available at http://www.acm.org/sigcomm/ccr/archive/2000/jan00/ ccr-200001-ludwig.pdf [MD95] Gabriel Montenegro and Steve Drach, "System Isolation and Network Fast-Fail Capability in Solaris," Second USENIX Symposium on Mobile and Location-Independent, April 1995. http://www.usenix.org/publications/library/proceedings/mob95/ montenegro.html [MSMO97] M. Mathis, J. Semke, J. Mahdavi, T. Ott, "The Macroscopic Behavior of the TCP Congestion Avoidance Algorithm", Computer Communication Review, volume 27, number 3, July 1997. Available at http://www.acm.org/sigcomm/ccr/archive/1997/jul97/ ccr-9707-mathis.html [MV97] Mehta, M., Vaidya, N., "Delayed Duplicate-Acknowledgements: A Proposal to Improve Performance of TCP on Wireless Links," Texas A&M University, December 24, 1997. Available at http://www.cs.tamu.edu/faculty/vaidya/mobile.html [PILC-LINK] Phil Karn, Aaron Falk, Joe Touch, Marie-Jose Montpetit, Jamshid Mahdavi, Gabriel Montenegro, Dan Grossman, Gorry Fairhurst, "Advice for Internet Subnetwork Designers", July 2000. Work in progress, available at http:// www.ietf.org/internet-drafts/draft-ietf-pilc-link-design-03.txt Expires March 22, 2001 [Page 14] INTERNET DRAFT PILC - Links with Errors September 2000 [PILC-PEP] J. Border, M. Kojo, Jim Griner, G. Montenegro, "Performance Implications of Link-Layer Characteristics: Performance Enhancing Proxies", July 2000. Work in progress, available at http://www.ietf.org/internet-drafts/draft-ietf-pilc-pep-03.txt [PILC-SLOW] S. Dawkins, G. Montenegro, M. Kojo, V. Magret, "Performance Implications of Link-Layer Characteristics: Slow Links", July 2000. Work in progress, available at http://www.ietf.org/internet-drafts/draft-ietf-pilc-slow-04.txt [P-HTTP] "The Case for Persistent-Connection HTTP", Jeffrey C. Mogul, Research Report 95/4, May 1995, available as http://www.research.digital.com/wrl/techreports/abstracts/95.4.html [RFC793] Jon Postel, "Transmission Control Protocol", September 1981. RFC 793. [RFC1122] Braden, R., "Requirements for Internet Hosts -- Communication Layers", October 1989. RFC 1122. [RFC1323] Van Jacobson, Robert Braden, and David Borman. "TCP Extensions for High Performance", May 1992. RFC 1323. [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and Romanow, A., "TCP Selective Acknowledgment Options," October, 1996. [RFC2140] J. Touch, "TCP Control Block Interdependence", RFC 2140, April 1997. [RFC2309] Braden, B. Clark, D., Crowcroft, J., Davie, B., Deering, S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., Partridge, C., Peterson, L., Ramakrishnan, K.K., Shenker, S., Wroclawski, J., Zhang, L., "Recommendations on Queue Management and Congestion Avoidance in the Internet," RFC 2309, April 1998. [RFC2481] Ramakrishnan, K.K., Floyd, S., "A Proposal to add Explicit Congestion Notification (ECN) to IP", RFC 2481, January 1999. [RFC2488] Mark Allman, Dan Glover, Luis Sanchez. "Enhancing TCP Over Satellite Channels using Standard Mechanisms," RFC 2488 (BCP 28), January 1999. [RFC2581] M. Allman, V. Paxson, W. Stevens, "TCP Congestion Control," April 1999. RFC 2581. [RFC2582] Floyd, S., Henderson, T., "The NewReno Modification to TCP's Fast Recovery Algorithm," April 1999. RFC 2582. Expires March 22, 2001 [Page 15] INTERNET DRAFT PILC - Links with Errors September 2000 [RFC2616] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, Masinter, P. Leach, T. Berners-Lee. "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. (Draft Standard) [SACK-EXT] Sally Floyd, Jamshid Mahdavi, Matt Mathis, Matthew Podolsky, Allyn Romanow, "An Extension to the Selective Acknowledgement (SACK) Option for TCP", August 1999. Approved for proposed standard, available at http://www.ietf.org/internet-drafts/draft-floyd-sack-00.txt [SF98] Nihal K. G. Samaraweera and Godred Fairhurst, "Reinforcement of TCP error Recovery for Wireless Communication", Computer Communication Review, volume 28, number 2, April 1998. Available at http://www.acm.org/sigcomm/ccr/archive/1998/apr98/ ccr-9804-samaraweera.pdf [VJ-DCAC] Van Jacobson, "Dynamic Congestion Avoidance / Control" e-mail dated Feberuary 11, 1988, available from http://www.kohala.com/~rstevens/vanj.88feb11.txt [VMPM99] N. H. Vaidya, M. Mehta, C. Perkins, G. Montenegro, "Delayed Duplicate Acknowledgements: A TCP-Unaware Approach to Improve Performance of TCP over Wireless," Technical Report 99-003, Computer Science Dept., Texas A&M University, February 1999. Authors' addresses Questions about this document may be directed to: Spencer Dawkins Fujitsu Network Communications 2801 Telecom Parkway Richardson, Texas 75082 Voice: +1-972-479-3782 E-Mail: spencer.dawkins@fnc.fujitsu.com Expires March 22, 2001 [Page 16] INTERNET DRAFT PILC - Links with Errors September 2000 Gabriel E. Montenegro Sun Labs Networking and Security Group Sun Microsystems, Inc. 901 San Antonio Road Mailstop UMPK 15-214 Mountain View, California 94303 Voice: +1-650-786-6288 Fax: +1-650-786-6445 E-Mail: gab@sun.com Markku Kojo University of Helsinki/Department of Computer Science P.O. Box 26 (Teollisuuskatu 23) FIN-00014 HELSINKI Finland Voice: +358-9-7084-4179 Fax: +358-9-7084-4441 E-Mail: kojo@cs.helsinki.fi Vincent Magret Corporate Research Center Alcatel Network Systems, Inc 1201 Campbell Mail stop 446-310 Richardson Texas 75081 USA M/S 446-310 Voice: +1-972-996-2625 Fax: +1-972-996-5902 E-mail: vincent.magret@aud.alcatel.com Nitin Vaidya Dept. of Computer Science Texas A&M University College Station, TX 77843-3112 Voice: +1 409-845-0512 Fax: +1 409-847-8578 Email: vaidya@cs.tamu.edu Expires March 22, 2001 [Page 17] INTERNET DRAFT PILC - Links with Errors September 2000 Appendix A: When TCP Defers Recovery to the Link Layer When link layers try aggressively to correct a high underlying error rate, it is imperative to prevent interaction between link-layer retransmission and TCP retransmission as these layers duplicate each other's efforts. It may be preferable to allow a local mechanism to resolve a local problem, instead of invoking TCP's end-to-end mechanism and incurring the associated costs, both in terms of wasted bandwidth and in terms of its effect on TCP's window behavior. In such an environment it may make sense to delay TCP's efforts so as to give the link-layer a chance to recover. With this in mind, the Delayed Dupacks [MV97, VMPM99] scheme selectively delays duplicate acknowledgements at the receiver. At this time, it is not well understood how long the receiver should delay the duplicate acknowledgments. In particular, the impact of medium access control (MAC) protocol on the choice of delay parameter needs to be studied. The MAC protocol may affect the ability to choose the appropriate delay (either statically or dynamically). In general, significant variabilities in link-level retransmission times can have an adverse impact on the performance of the Delayed Dupacks scheme. Delayed dupacks makes very little assumptions about the TCP implementations. If, however, one assumes that the implementations support TCP timestamps, then other schemes are possible. For example, the Eifel algorithm [LK00] uses timestamps (alternatively two of the currently four unused bits in the TCP header) to make TCP more robust in the face of spurious timeouts and packet re-orderings. Recommendation: Delaying duplicate acknowledgements and the Eifel Algorithm are not standards-track mechanisms. They may be useful in specific network topologies, but a general recommendation requires further research and experience. Appendix B: Detecting Transmission Errors With Explicit Notifications As noted above, today's TCPs assume that any loss is due to congestion, and encounter difficulty in distinguishing between congestion loss and corruption loss because this "implicit notification" mechanism can't carry both meanings at once. [SF98] reports simulation results showing that performance improvements are possible when TCP can correctly distingush between losses due to congestion and losses due to Expires March 22, 2001 [Page 18] INTERNET DRAFT PILC - Links with Errors September 2000 corruption. With explicit notification from the network it is possible to determine when a loss is due to corruption. Several proposals along these lines include: - Explicit Loss Notification (ELN) [BPSK96] - Explicit Bad State Notification (EBSN) [BBKVP96] - Explicit Loss Notification to the Receiver (ELNR), and Explicit Delayed Dupack Activation Notification (EDDAN) [MV97] - Space Communication Protocol Specification - Transport Protocol (SCPS-TP), which uses explicit "negative acknowledgements" to notify the sender that a damaged packet has been received. Similarly to notifying about corruptions affecting specific packets, it is useful to inform of sustained interruptions in link connectivity. These conditions can be reported with an ICMP Host Unreachable message [LINK-OUTAGE]. IP is required to pass any such messages up to transport layers like UDP and TCP, and these, in turn, to applications above them [RFC1122]. What is not clearly defined is what code within an ICMP Host Unreachable message should be used to notify of an error condition. For conditions of network outage, a currently unused 'host isolated' (code 8) was introduced for routers (actually, IMP's) to inform hosts of an outage. Additionally, [MD95] argues for the application of 'host isolated' for notifications emanating from a host's lower layers. In summary, these notifications to upper layers can originate either from within a host itself or in another host altogether. ICMP includes the necessary information to determine the sender of the notification, as well as a part of the datagram which encountered the error These proposals offer promise, but none have been proposed as standards-track mechanisms for adoption in IETF. Recommendation: Researchers should continue to investigate true corruption-notification mechanisms, especially mechanisms like ELNR and EDDAN [MV97], in which the only systems that need to be modified are the base station and the mobile device. We also note that the requirement that the base station be able to examine TCP headers at link speeds raises performance issues with respect to Expires March 22, 2001 [Page 19] INTERNET DRAFT PILC - Links with Errors September 2000 IPsec-encrypted packets. Appendix C Appropriate Byte Counting [ALL99] (Experimental) Researchers have pointed out an interaction between delayed acknowledgements and TCP acknowledgement-based self-clocking, and various proposals have been made to improve bandwidth utilization during slow start. One proposal, called "Appropriate Byte Counting", increases cwnd based on the number of bytes acknowledged, instead of the number of ACKs received. This proposal is a refinement of earlier proposals, limits the increase in cwnd so that cwnd does not "spike" in the presence of "stretch ACKs", which cover more than two segments (whether this is intentional behavior by the receiver or the result of lost ACKs), and limits cwnd growth based on byte counting to the initial slow-start exchange. This proposal is still at the experimental stage, but implementors may wish to follow this work, because the effect is that the congestion window is opening more aggressively when ACKs are lost during the initial slow-start exchange, but this aggressiveness does not act to the detriment of other flows. Expires March 22, 2001 [Page 20]