Internet Engineering Task Force Hari Balakrishnan Internet Draft MIT LCS Document: draft-ietf-pilc-asym-04.txt Venkata N. Padmanabhan Microsoft Research Gorry Fairhurst University of Aberdeen, U.K. Mahesh Sooriyabandara University of Aberdeen, U.K. Category: BCP May 2001 TCP Performance Implications of Network Asymmetry Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. 1. Abstract This document describes TCP performance problems that arise because of asymmetric effects. These problems arise in several access networks, including bandwidth-asymmetric networks and packet radio networks, for different underlying reasons. However, the end result on TCP performance is the same in both cases: performance often degrades significantly because of imperfection and variability in the ACK feedback from the receiver to the sender. This document details several mitigations to these effects, which have either been proposed or evaluated in the literature, or are currently deployed in networks. These solutions use a combination of local link-layer techniques, subnetwork, and end-to-end mechanisms, consisting of: (i) techniques to manage the channel used for the upstream bottleneck link carrying the ACKs, typically using header compression or reducing the frequency of TCP ACKs, (ii) techniques to handle this reduced ACK frequency to retain the TCP sender's acknowledgment-triggered self-clocking and (iii) techniques to schedule the data and ACK packets in the reverse direction to improve performance in the presence of two way traffic. Expires November 2001 [page 1] INTERNET DRAFT PILC - Asymmetric Links May 2001 2. Conventions used in this document FORWARD DIRECTION: The dominant direction of data transfer over an asymmetric network. It corresponds to the direction with better characteristics in terms of bandwidth, latency, error rate, etc. We term data transfer in the forward direction as a "forward transfer". Packets traveling in the forward direction follow the forward path through the IP network. REVERSE DIRECTION: The direction in which acknowledgments of a forward TCP transfer flow. Data transfer could also happen in this direction (and it is termed "reverse transfer"), but it is typically less voluminous than that in the forward direction. The reverse direction typically exhibits worse characteristics than the forward direction. DOWNSTREAM LINK: A link on the forward path, which has reduced capability in the reverse direction. UPSTREAM LINK: The bottleneck link that normally has much less capability than the corresponding downstream link. ACK: A cumulative TCP acknowledgment. In this document, we use this term to refer to a TCP segment that carries a cumulative acknowledgement, but no data. DELAYED ACK FACTOR, d: The number of TCP data segments acknowledged by a TCP ACK. STRETCH ACK: Stretch ACKs are acknowledgements that cover more than 2 segments of previously unacknowledged data (d>2). Stretch ACKs can occur by design (although this is not standard), due to implementation bugs [All97b, RFC2525] or due to ACK loss [RFC2760]. NORMALISED BANDWIDTH RATIO, k: The ratio of the raw bandwidth of the forward direction to the return direction, divided by the ratio of the packet sizes used in the two directions. SOFTSTATE: Per-flow state established in a network device which is used by the protocol. The state expires after a period of time (i.e. is not required to be explicitly deleted when a session expires), and is continuously refreshed while a flow continues (i.e. lost state may be reconstructed without needing to exchange additional control messages). The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. Expires November 2001 [page 2] INTERNET DRAFT PILC - Asymmetric Links May 2001 3. Motivation Asymmetric characteristics are exhibited by several network technologies, including cable modems, direct broadcast satellite with interactive return channel, Very Small Aperture satellite Terminals (VSAT), Asymmetric Digital Subscriber Line (ADSL), Hybrid Fiber-Coaxial (HFC), and several packet radio networks. Given that these networks are increasingly being deployed as high-speed access networks, it is highly desirable to achieve good TCP performance over such networks. However, the asymmetry of the networks often makes this challenging. Asymmetry may manifest itself as a difference in transmit and receive bandwidth, an imbalance in the packet loss rate, or differences between the transmit and receive paths [RFC3077]. For example, when bandwidth is asymmetric such that the reverse path used by TCP ACKs is constrained, the slow or infrequent ACK feedback degrades TCP performance in the forward direction. Even when bandwidth is symmetric, asymmetry in the underlying medium access control (MAC) protocol could make it expensive to transmit ACKs (disproportionately to the size of the ACKs) in one direction. In a wireless packet radio network, the asymmetry of the MAC protocol is often a fundamental consequence of the hub-and-spokes architecture of the network (e.g., a single base station that communicates with multiple mobile stations) rather than an artifact of poor engineering choices. Satellite networks employing dynamic bandwidth on demand (BoD), are another example of systems that consume MAC resources for each packet sent and therefore transmission of ACKs would be as costly as transmission of data packets. These MAC interactions may result in significant degradation of TCP performance. Some networks are heterogeneous. In such networks, the upstream and downstream links are implemented using different technologies. One commonly found example, is networks employing a forward link utilizing Digital Video Broadcast (DVB) transmission, and a much slower uplink (upstream link) using standard network technology (such as dial-up modem, line of sight microwave, or cellular radio). DVB-based networks are becoming increasingly common in commercial satellite systems, were the downstream link could operate at typically 38-45 Mbps, at least many hundreds of times the upstream capacity [CLC99]. Other specialized networks may also be highly asymmetric by design. Examples include some common military networks [KSG98], such as those networks used by NATO to provide Internet access to in- transit/isolated nodes and/or shipboard terminals [Seg00] which use a high bandwidth downstream link using high power satellite links (at 3Mbps and 2Mpbs) with a narrowband upstream link (9.6kbps and 2400bps) using UHF/DAMA or Inmarsat satellite links. The bandwidth Expires November 2001 [page 3] INTERNET DRAFT PILC - Asymmetric Links May 2001 asymmetric ratios in these networks require mitigations to provide acceptable overall performance. Despite the technological differences between asymmetric-bandwidth and packet radio networks, TCP performance suffers in both these kinds of networks for the same fundamental reason: the imperfection and variability of ACK feedback. This document discusses the problem in detail and describes several techniques that may reduce or eliminate the constraints. 4. How does asymmetry degrade TCP performance? This section describes the implications of network asymmetry on TCP performance. We refer the reader to [BPK99, Bal98, Pad98, FSS01] for more details and experimental results. 4.1 Bandwidth asymmetry We first discuss the problems that degrade unidirectional transfer performance in bandwidth-asymmetric networks. Depending on the characteristics of the upstream link, two types of situations arise for unidirectional traffic over such networks: when the upstream bottleneck link has sufficient queuing to prevent packet (ACK) losses, and when the upstream bottleneck link has a small buffer. We consider each situation in turn. If the upstream bottleneck link has deep queues, so that ACKs do not get dropped in the reverse direction, then performance is a strong function of the normalized bandwidth ratio, k, defined in [LMS97]. k is the ratio of the raw bandwidths divided by the ratio of the packet sizes used in the two directions. For example, for a 10 Mbps downstream link and a 50 Kbps upstream link, the raw bandwidth ratio is 200. With 1000-byte data packets and 40-byte ACKs, the ratio of the packet sizes is 25. This implies that k is 200/25 = 8. Thus, if the receiver acknowledges more frequently than one ACK every k = 8 data packets, the upstream bottleneck link will become saturated before the downstream bottleneck link does, limiting the throughput in the forward direction. If ACKs are not dropped (at the upstream bottleneck link) and k > 1 or k > 0.5 when delayed ACKs are used [RFC1122], TCP ACK-clocking breaks down. Consider two data packets transmitted by the sender in quick succession. En route to the receiver, these packets get spaced apart according to the bandwidth of the smallest bottleneck link in the forward direction. The principle of ACK clocking is that the ACKs generated in response to receiving these data packets preserve this temporal spacing all the way back to the sender, enabling it to transmit new data packets that maintain the same spacing [Jac88]. However, the limited upstream bandwidth and queuing at the upstream bottleneck router alters the inter-ACK spacing of the reverse path, and hence that observed at the sender. When ACKs arrive at the Expires November 2001 [page 4] INTERNET DRAFT PILC - Asymmetric Links May 2001 upstream bottleneck link at a faster rate than the link can support, they get queued behind one another. The spacing between them when they emerge from the link is dilated with respect to their original spacing, and is a function of the upstream bottleneck bandwidth. Thus the sender clocks out new data packets at a slower rate than if there had been no queuing of ACKs. No longer is the performance of the connection dependent on the downstream bottleneck link alone; instead, it is throttled by the rate of arriving ACKs. As a side effect, the sender's rate of congestion window growth also slows down. A second side effect arises when the upstream bottleneck link on the reverse path is saturated. The saturated link causes persistent queuing of packets, leading to an increasing path RTT observed by all end systems using the bottleneck link. This can impact the protocol control loops, and may also trigger false time out (underestimation of the path RTT by the sending host). A different situation arises when the upstream bottleneck link has a relatively small amount of buffer space to accommodate ACKs. As the transmission window grows, this queue fills and ACKs are dropped. If the receiver were to acknowledge every packet, only one of every k ACKs would get through to the sender, and the remaining (k-1) are dropped due to buffer overflow at the upstream link buffer (here k is the normalized bandwidth ratio as before). In this case, the reverse bottleneck link capacity and slow ACK arrival are not directly responsible for any degraded performance. However, there are three important reasons for degraded performance in this case because ACKs are infrequent. 1. First, the sender transmits data in large bursts of packets, limited only by the available congestion window (cwnd). If the sender receives only one ACK in k, it transmits data in bursts of k (or more) packets because each ACK shifts the sliding window by at least k (acknowledged) data packets (more formally TCP data segments). This increases the likelihood of data packet loss along the forward path especially when k is large, because routers do not handle large bursts of packets well. 2. Second, TCP sender implementations increase their congestion window by counting the number of ACKs they receive and not by how much data is actually acknowledged by each ACK. Thus fewer ACKs imply a slower rate of growth of the congestion window, which degrades performance over long-delay connections. 3. Third, the sender TCP's Fast Retransmission and Fast Recovery algorithms [RFC 2581] are less effective when ACKs are lost. The sender may possibly not receive the threshold number of duplicate ACKs even if the receiver transmits more than the DupACK threshold (> 3 DupACKs). Furthermore, the sender may possibly not receive enough duplicate ACKs to adequately inflate its congestion window during Fast Recovery. Expires November 2001 [page 5] INTERNET DRAFT PILC - Asymmetric Links May 2001 4.2 MAC protocol interactions The interaction of TCP with MAC protocols may degrade end-to-end performance. Variable round-trip delays and ACK queuing are the main symptoms of this problem. One example, is the impact on terrestrial wireless networks [Bal98]. The need for the communicating peers to first synchronize via a Ready To Send / Clear to Send (RTS/CTS) protocol before communication and the significant turn-around time for the radios result in a high per-packet overhead. Furthermore this overhead is variable, since the RTS/CTS exchange needs to back-off exponentially when the polled radio is busy (for example, engaged in a conversation with a different peer). This leads to large and variable communication latencies in packet-radio networks. In addition, an asymmetric workload (with most data flowing in one direction to clients) tends to cause ACKs to get queued in certain radio units (especially in the client modems), exacerbating the variable communication latencies. These variable latencies and queuing of ACKs adversely affect smooth data flow. In particular, TCP ACK traffic interferes with the flow of data and increases the traffic load on the system. For example, experiments conducted on Metricom's Ricochet packet radio network [Met] in 1996 and 1997 clearly demonstrated the effect of the radio turnarounds and increased RTT variability, which degrade TCP performance. It is not uncommon for TCP connections to experience timeouts that last between 9 and 12 seconds each. As a result, a connection may be idle for a very significant fraction of its lifetime. (We observed instances in the context of the Ricochet network where the idle time is 35% of the total transfer time!) Clearly, this leads to gross under-utilization of the available bandwidth. These observations are not an artifact of a particular network, but in fact show up in many wireless situations. Why are these timeouts so long in duration? Ideally, the smoothed round-trip time estimate (srtt) of a TCP data transfer will be relatively constant (i.e., have a low linear deviation, rttvar). Then the TCP retransmission timeout, set to srtt + 4*rttvar, will track the smoothed round-trip time estimate and respond well when multiple losses occur in a window. Unfortunately, this is not true for connections in the Ricochet network. Because of the high variability in RTT, the retransmission timer is on the order of 10 seconds, leading to the long idle timeout periods. In general, it is correct for the retransmission timer to trigger a segment retransmission only after an amount of time dependent on both the round-trip time and the linear (or standard) deviation. If only the mean or median round-trip estimates were taken into account, the potential for spurious retransmissions of data packets still in transit is large. Connections traversing multiple wireless hops are especially vulnerable to this effect, because it is now Expires November 2001 [page 6] INTERNET DRAFT PILC - Asymmetric Links May 2001 more likely that the radio units may already be engaged in conversation with other peers. Satellite and wireless subnetworks often employed shared frequency channels and arbitrate use of the channel bandwidth by using Medium Access Control (MAC) protocols that employ a Bandwidth on Demand (BoD) scheme. Such links can also exhibit a per packet transmission overhead (each packet sent consumes satellite resource which could otherwise be used to transfer useful data). For this reason many Very Small Aperture satellite Terminals (VSATs) employ some form of asymmetric mitigation technique to optimise the overall system performance. Note that the subnetwork MAC contention problem is a significant function of the number of packets (e.g., ACKs) transmitted rather than their size. In other words, there is a significant cost to transmitting a packet regardless of its size. 4.3 Bi-directional traffic We now consider the case when TCP transfers simultaneously occur in opposite directions over an asymmetric network. An example scenario is one in which a user sends out data packets in the reverse direction (for example, an e-mail message) while simultaneously receiving other data packets in the forward direction (for example, Web pages). For ease of exposition, we restrict our discussion to the case of one connection in each direction. In many practical cases, several simultaneous connections need to share the available bandwidth, increasing the level of congestion. In the presence of bi-directional traffic, the effects discussed in Section 4.1 are more pronounced, because part of the upstream link bandwidth is used up by the reverse transfer. This effectively increases the degree of bandwidth asymmetry for the forward transfer. In addition, there are other effects that arise due to the interaction between data packets of the reverse transfer and ACKs of the forward transfer. Suppose the reverse connection is initiated first and that it has saturated the bottleneck upstream link and buffer with its data packets at the time the forward connection is initiated. There is then a high probability that many ACKs of the newly initiated forward connection will encounter a full upstream link buffer and hence get dropped. Even after these initial problems, ACKs of the forward connection could often get queued up behind large data packets of the reverse connection, which could have long transmission times (e.g., it takes about 280 ms to transmit a 1 KB data packet over a 28.8 Kbps line). This causes the forward transfer to stall for long periods of time. It is only at times when the reverse connection loses packets (due to a buffer overflow at an intermediate router) and slows down that the forward connection gets the opportunity to make rapid progress and quickly build up its congestion window. Expires November 2001 [page 7] INTERNET DRAFT PILC - Asymmetric Links May 2001 When ACKs are queued behind other traffic for appreciable periods of time, the burst nature of TCP traffic and self-synchronising effects can result in an effect known as ACK Compression [ZSC91], which reduces the throughput of TCP. It occurs when a series of ACKs, in one direction are queued behind a burst of other packets (e.g. data packets traveling in the same direction) and become compressed in time. This, in turn, results in an intense burst of data packets in the other direction, (in response to the burst of compressed ACKs arriving at the server). This phenomenon has been investigated in detail for bi-directional traffic, and recent analytical work [LMS97] has predicted ACK Compression may also result from asymmetry, and was observed in practical asymmetric satellite networks [FSS01]. However in the case of extreme asymmetry, the effect of ACK Dilation (k>>1) can be significant rather than ACK compression. That is, the inter-ACK spacing can actually increase due to queuing. In summary, any sharing of the upstream bottleneck link by multiple flows (e.g. multiple TCP flows to the same host, or flows to a number of hosts sharing a common upstream link) increases the level of ACK Congestion. The presence of bi-directional traffic exacerbates the constraints introduced by bandwidth asymmetry because of the adverse interaction between (large) data packets of a reverse direction connection and the ACKs of a forward direction connection. 4.4 Forward path packet losses In the case of long delay paths, another complication can arise due to the slow upstream link. In these type of networks TCP large windows [RFC1323] may be used to maximise throughput in the forward direction. Loss of data packets on the forward path, due to congestion, or link loss (common for some wireless links) will generate large number of back-to-back duplicate ACKs (or TCP SACK packets [RFC2018]), for each correctly received data packet following a loss. The TCP sender employs Fast Retransmission to recover from the loss, but even if this is successful, the ACK to the retransmitted data segment may be significantly delayed by any duplicate ACKs still queued at the upsteam link buffer. This can ultimately leading to a timeout which can lead to premature ending of TCP Slow Start or reduction of the congestion window, resulting in poor forward path throughput. Section 6.2.1 describes a mitigation to counter this effect). 5. Improving TCP Performance using Host Mitigations There are two key issues that need to be addressed in order to improve TCP performance over asymmetric networks. The first issue is to manage the bandwidth of the upstream bottleneck link, used by ACKs (and possibly other traffic). A number of techniques exist which work by reducing the number of ACKs that flow in the reverse Expires November 2001 [page 8] INTERNET DRAFT PILC - Asymmetric Links May 2001 direction. This has the side effect of potentially destroying the desirable self-clocking property of the TCP sender where transmission of new data packets is triggered by incoming ACKs. Thus, the second issue is to avoid any adverse impact of infrequent ACKs. Each of these issues can be handled by local link-layer solutions and/or by end-to-end techniques. In this section, we discuss end-to- end modifications. Some techniques require TCP receiver changes (5.1 5.4, 5.5), some require TCP sender changes (5.6, 5.7), and a pair require changes to both the TCP sender and receiver (5.2, 5.3). One technique requires a sender modification at the receiving host (5.8). The techniques may be used independently, however some sets of techniques are complementary, for example, pacing (5.6) and byte counting (5.7) which have been bundled into a single TCP Sender Adaptation scheme [BPK97]. It is normally envisaged that these changes would occur in the end hosts using the asymmetric path, however they could, and have, be used in a middle-box or Protocol Enhancing Proxy, PEP, employing split TCP. This document does not discuss the issues concerning PEPs (see [PEP-ID]). Section 4 describes several techniques, which do not require end-to-end changes. 5.1 Modified Delayed ACKs There are two standard methods that can be used by TCP receivers to generated acknowledgments. The method outlined in [RFC793] generates an ACK for each incoming DATA segment. [RFC1122] states that hosts SHOULD use "delayed acknowledgments". Using this algorithm, an ACK is generated for every second full-sized segment (d=2), or if a second full-size segment does not arrive within a given timeout (which must not exceed 500 ms [RFC 1122], typically less than 200 ms). Relaxing the latter constraint (i.e. allowing d>2) may generate stretch ACKs [RFC2760]. This provides a possible mitigation, which reduces the rate at which ACKs returned by the receiver. Reducing the number of ACKs per received data segment has a number of undesirable effects including: (i) Increased path RTT (ii) Increases the time TCP takes to open the TCP cwnd (iii) Increased TCP sender best size as the window opens in larger steps. In addition, a TCP receiver is often unable to determine an optimum setting for a large d, since it will normally be unaware of the details of the links that form the reverse path. RECOMMENDATION: The algorithm recommended by [RFC2581] (i.e. d=2) MUST be used by a TCP receiver. Changing the algorithm would require a host modification to the TCP receiver and awareness by the Expires November 2001 [page 9] INTERNET DRAFT PILC - Asymmetric Links May 2001 receiving host that it is using a connection with an asymmetric path. Such a change is the subject of on-going research and SHOULD NOT be used within the Internet. 5.2 Use of large MSS If the TCP sender were to use a larger MSS, it would reduce the number of ACKs generated per transmitted byte of data [RFC2488]. The problem with this approach is that most current networks do not support arbitrarily large MTUs. Many Internet paths therefore employ an MTU of approx 1500 B (that of Ethernet). However individual subnetworks may support a large MTU. Path MTU Discovery [RFC 1191] may be used to determine the maximum packet size a connection can use on a given network path without being subjected to IP fragmentation, and provides a way to automatically use the largest MSS possible. By electing not to use Path MTU Discovery, IP fragmentation could be used to support a larger MSS. However increasing the unit of error recovery and congestion control (MSS) above the unit of transmission (the IP packet) is not recommended, since it can aggravate network congestion [Ken88]. RECOMMENDATION: IP fragmentation by routers is NOT recommended. Network providers MAY use a large MTU on the links in the forward direction. A larger forward path MTU is desirable for paths with bandwidth asymmetry. TCP end hosts using Path MTU (PMTU) discovery may be able to take advantage of a large MTU by automatically selecting an appropriate larger MSS, without requiring modification. 5.3 ACK Congestion Control ACK congestion control (ACC) is a proposed technique that operates end-to-end. The key idea in ACC is to extend congestion control to TCP ACKs, since they do make non-negligible demands on resources at the bandwidth-constrained upstream link. ACKs occupy slots in the up stream buffer, whose capacity is often limited to a certain number of packets (rather than bytes). ACC has two parts: (a) a mechanism for the network to indicate to the receiver that the ACK path is congested, and (b) the receiver's response to such an indication. One possibility for the former is the RED (Random Early Detection) algorithm [FJ93] in the router feeding the upstream bottleneck link. The router detects incipient congestion by tracking the average queue size over a time window in the recent past. If the average exceeds a threshold, the router selects a packet at random and marks it, i.e. sets an Explicit Congestion Notification (ECN) bit in the IP header. This notification is reflected back to the upstream TCP end-host by its downstream peer. Expires November 2001 [page 10] INTERNET DRAFT PILC - Asymmetric Links May 2001 ACC extends the ECN scheme of IP so that both TCP data packets and ACKs carry the Explicit Congestion Notification Capable Transport (ECT) and are thus candidates for being marked with an ECN bit. Therefore, upon receiving an ACK packet with the ECN bit set [RFC 2481], the TCP receiver reduces the rate at which it sends ACKs. It maintains a dynamically varying delayed-ACK factor, d, and sends one ACK for every d data packets received. When it receives a packet with the ECN bit set, it increases d multiplicatively, thereby decreasing the frequency of ACKs also multiplicatively. Then for each subsequent round-trip time (determined using the TCP timestamp option) during which it does not receive an ECN, it linearly decreases the factor d, thereby increasing the frequency of ACKs. Thus, the receiver mimics the standard congestion control behavior of TCP senders in the manner in which it sends ACKs. There are bounds on the delayed ACK factor (d). The minimum value of d is 1 [RFC793], since at most one ACK should be sent for each data packet. The maximum value of d is determined by the sender's window size, which is conveyed to the receiver in a new TCP option. The receiver should send at least one ACK (preferably more) for each window of data from the sender. Otherwise, it could cause the sender to stall until the receiver's delayed ACK timer kicks in and forces an ACK to be sent. Despite RED+ECN, there may be times when the upstream link buffer queue fills up and it needs to drop a packet. The router can pick a packet to drop in various ways. For instance, it can drop from the tail of the queue, or it can drop a packet already in the queue at random. RECOMMENDATION: ACK Congestion Control (ACC) requires TCP sender and receiver modifications. Future versions of TCP may evolve to include this or similar techniques. This scheme is a subject of on-going research and SHOULD NOT be used within the Internet in its current form. 5.4 Window Prediction Mechanism The Window Prediction Mechanism (WPM) is a TCP receiver side end-to- end solution to asymmetric paths. This scheme [CLP98] uses a dynamic ACK delay factor resembling the ACC scheme (section 5.3). The TCP receiver reconstructs the congestion control behavior of the TCP sender by predicting a congestion window (cwnd) value. This value is used along with the allowed window to adjust the ACK delay factor (d). WDM accommodates for unnecessary retransmissions resulting from losses due to link errors. RECOMMENDATION: Window Prediction Mechanism (WPM) is a TCP receiver side modification. Future versions of TCP may evolve to include this or similar techniques, however this scheme is still a subject of on- going research and SHOULD NOT be used within the Internet in its current form. Expires November 2001 [page 11] INTERNET DRAFT PILC - Asymmetric Links May 2001 5.5 Acknowledgement based on Cwnd Estimation. Acknowledgement based on Cwnd Estimation (ACE)[MJW00] tries to measure the congestion window (cwnd) at the TCP receiver and maintain a varying ACK delay factor (d). The cwnd is estimated by counting the number of packets received during a path RTT. The technique may improve accuracy of prediction of a suitable cwnd. RECOMMENDATION: Acknowledgement based on Cwnd Estimation (ACE)is a TCP receiver side modification. Future versions of TCP may evolve to include this or similar techniques. This scheme is a subject of on- going research and SHOULD NOT be used within the Internet in its current form. 5.6 TCP Pacing Reducing the frequency of ACKs alleviates congestion on the upstream bottleneck link, but can lead to increased size of TCP sender bursts (section 4.1). This may slowdown congestion window growth, and is undesirable when used over shared network paths since it may significantly increase the maximum number of packets in the bottleneck link buffer, potentially resulting in an increase in network congestion. Congestion may also lead to ACK Compression [ZSC91] under some conditions. TCP Pacing [AST00] employs an adapted TCP sender to alleviating transmission burstiness. A bound is placed on the maximum number of packets the TCP sender can transmit back-to-back (at local line rate), even if the window(s) allow the transmission of more data. If necessary, more bursts of data packets are scheduled for later points in time computed based on the TCP connection's transmission rate. The transmission rate is estimated from the ratio cwnd/srtt, where cwnd is the TCP congestion window and srtt is the smoothed RTT estimate. Thus, large bursts of data packets get broken up into smaller bursts spread out over time. A subnetwork may also provide pacing (e.g. Generic Traffic Shaping (GTS)), but implies a significant increase in the per-packet processing overhead and buffer requirement at the router where shaping is performed (see section 6.3.3). RECOMMENDATIONS: TCP Sender Pacing requires a TCP sender modification. It may be beneficial in IP networks and will significantly reduce the burst size of packets transmitted by a host. This successfully mitigates the impact of receiving stretch ACKs. To perform TCP sender Pacing, requires increased processing cost per packet, and a prediction algorithm to suggest a suitable transmission rate, and hence there are performance trade-offs between end system cost and network performance. Suitable algorithms remain an area of on-going research. Use of TCP sender pacing is Expires November 2001 [page 12] INTERNET DRAFT PILC - Asymmetric Links May 2001 safe, it MAY be used by TCP hosts, but it is not currently widely deployed. 5.7 TCP Byte Counting The TCP sender can avoid a slowdown in congestion window growth by taking into account the volume of data acknowledged by each ACK, rather than opening the window based on the number of received ACKs. So, if an ACK acknowledges d data packets (or TCP data segments), the congestion window would be grown as if d separate ACKs had been received. This is called TCP Byte Counting [RFC2581; RFC 2760]. (One could treat the single ACK as being equivalent to d/2, instead of d ACKs, to mimic the effect of the TCP delayed ACK algorithm.) This policy works because the window growth is only tied to the available bandwidth in the forward direction, so the number of ACKs is immaterial. This may mitigate the impact of asymmetry when used in combination with other techniques (e.g. a combination of TCP Pacing (section 5.6), and ACC (section 5.3) associated with a duplicate ACK threshold at the receiver.) There are issues associated with this approach. The main issue is that the scheme may generate undesirable long bursts of TCP packets at the host line rate. An implementation must also consider that data packets in the forward direction and ACKs in the reverse direction may both travel over networks that perform some amount of packet reordering. Reordering of IP packets is currently common, and may arise from various causes [BPS00]. It is strongly recommended [RFC2581; RFC 2760] that any byte counting scheme SHOULD also include a mechanism to prevent excessive transmission bursts (e.g. TCP Pacing (section 5.6), ABC [abc-ID]). RECOMMENDATION: TCP Byte Counting requires a small TCP sender modification. The simplest modification may generate large bursts of TCP data packets, particularly when stretch ACKs are received. Unlimited byte counting SHOULD NOT be used within the Internet without a method to mitigate the potentially large bursts of TCP data packets the algorithm can cause. If the burst size or burst rate (e.g. by Pacing) of the TCP sender can be controlled, then the scheme is beneficial when stretch ACKs are received. Providing these safeguards are in place, it is not expected to significantly contribute to Internet congestion. Determining safe algorithms remain an area of on-going research. 5.8 Enhanced Backpressure A technique to enhance the performance of bi-directional traffic has been proposed for hosts directly connected to the upstream bottleneck link [KVR98]. This scheme only applies to hosts Expires November 2001 [page 13] INTERNET DRAFT PILC - Asymmetric Links May 2001 implementing backpressure for the host transmit queue. It first limits the number of data packets in the outgoing upstream link queue by applying backpressure to the TCP layer. This limits the queuing delay caused by the accumulation of data packets at the upstream link queue. Similar generic schemes which may be implemented in hosts/routers are discussed in section 6.4. Backpressure can be unfair to a reverse direction connection and make its throughput highly sensitive to the dynamics of the forward connection(s). RECOMMENDATION: Backpressure requires a modification to the sender protocol stack of a host directly connected to an upstream bottleneck link. Use of backpressure is an implementation issue, rather than a network protocol issue. Where backpressure is implemented, the optimizations described in this section could be desirable and can benefit bi-directional traffic for hosts. Enhanced backpressure is a subject of on-going research and SHOULD NOT be used within the Internet in its current form. 6. Improving TCP performance using Transparent Modifications Various link and network layer techniques have been suggested to mitigate the effect of an upstream bottleneck link. These techniques may provide benefit without modification to either the TCP sender or receiver, or may alternately be used in conjunction with one or more of the schemes identified in section 5. In this document, these techniques are known as "transparent" because at the transport layer, the TCP sender and receiver are not necessarily aware of their existence. This does not imply that they do not modify the pattern and timing of packets as observed at the network layer. The techniques are classified here into three types based on the point at which they are introduced. Most techniques require the individual TCP connections to be separately identified and imply that some per-flow state is maintained for active TCP connections. A link scheduler may also be employed (section 6.4). The techniques (with one exception, ACK Decimation) require: (i) Visibility of an unencrypted IP and TCP packet header (e.g. no use of IPSEC with Payload Encryption) (ii) Knowledge of IP/TCP options/tunnels (or ability to suspend processing of packets with unknown formats) (iii)Ability to demultiplex flows (by using address/protocol/port number, or an explicit flow-id). The approach of the techniques described in this section differ from that of a Protocol Enhancing Proxy (PEP) [PEP-ID] in that they do NOT modify the end to end semantics, and do not inspect/modify any TCP or UDP payload data. They also do not modify port numbers or Expires November 2001 [page 14] INTERNET DRAFT PILC - Asymmetric Links May 2001 link addresses. Many of the risks associated with PEP do not exist for these schemes. 6.1 TYPE 0: Header Compression A client may reduce the volume of bits carrying ACKs by using compression. Most modern dial-up modems support ITU-T V.42 compression. In contrast to bulk compression, header compression is known to be very effective at reducing the number of bits sent on the upstream link [RFC1144]. This relies on the observation that most TCP packet headers vary only in a few bit positions between successive packets in a flow, and that the variations can often be predicted. 6.1.1 TCP Header Compression RFC 1144 [RFC 1144] (sometimes known as V-J compression) describes TCP header compression for use over low-bandwidth links running SLIP or PPP. Because it greatly reduces the size of ACKs on the reverse link when losses are infrequent (a situation that ensures that the state of the compressor and decompressor are synchronized). However, this alone does not address all of the problems: 1. In some (e.g. wireless) networks there is a significant per- packet MAC overhead that is independent of packet size. 2. A reduction in the size of ACKs does not prevent adverse interaction with large upstream data packets in the presence of bi-directional traffic (discussed in Section 4.3). 3. TCP header compression may not be used with packets that have IP or TCP options (including IPSEC, TCP RTTM, TCP SACK, etc.) 4. The performance of header compression described by RFC1144 is significantly degraded when packets are lost. This therefore suggests the scheme should only be used on links that see a low level of packet loss. 5. The normal implementation of Header Compression inhibits compression when IP is used to support tunneling (e.g. L2TP, GRE, IP-in-IP). The tunnel encapsulation complicates locating the appropriate packet headers. Although GRE allows Header Compression on the inner (tunneled) IP header [RFC2784], this is not recommended, since loss of a packet (e.g. to router congestion along the tunnel path) will result in discard of all packets for one RTT [RFC1144]. RECOMMENDATION: TCP Header Compression is a transparent modification performed at both ends of the upstream bottleneck link. The technique benefits paths that have a low-to-medium bandwidth asymmetry (e.g. k<10). The scheme is widely implemented and deployed and MAY be used over an Internet link. In the form described in [RFC 1144], it provides very poor performance when used over paths which may exhibit appreciable rates of packet loss. The scheme on its own does not provide significant improvement for links with bi-directional traffic. It also offers no benefit for packets employing IPSEC. Expires November 2001 [page 15] INTERNET DRAFT PILC - Asymmetric Links May 2001 6.1.2 Alternate Robust Header Compression Algorithms VJ Header compression [RFC 1144] and IP header compression [RFC 2507] do not perform well in error prone links. Further they do not compress packets with TCP option fields such as SACK [RFC-SACK] and Timestamp (RTTM). However, recent work on more robust schemes suggest that a new generation of compression algorithms may be developed which are much more robust. The IETF ROHC working group [rohc] is currently examining a number of schemes that may provide improved headers compression and could be beneficial to asymmetric networks. RECOMMENDATION: Robust header compression is a transparent modification performed on at both ends of the upstream bottleneck link. Robust header compression techniques MAY be used over an Internet link. They benefit paths that have a low-to-medium bandwidth asymmetry and may be robust to packet loss. Selection of suitable compression algorithms remain an area of on- going research. It is possible that schemes may be derived which support IPSEC authentication, but not IPSEC payload encryption. Such schemes do not alone provide significant improvement in asymmetric networks with bi-directional traffic. 6.2 TYPE 1: Reverse Link Bandwidth Management To effectively address the performance problems caused by asymmetry, there is a need for techniques over and beyond TCP header compression. One set of techniques is implemented only at one point on the reverse direction path, within the router/host connected to the upstream bottleneck link. They use class or per-flow queues at the upstream link interface to manage the queue of packets waiting for transmission on the bottleneck upstream link. In this type of technique, the upstream link buffer queue size is bounded, and an algorithm employed to remove (discard) excess ACK packets from each queue. Like the host modification to increase ACK delay (d>2), this relies on the cumulative nature of TCP ACKs. Two approaches are described which employ this type of mitigation: 6.2.1 ACK Filtering ACK Filtering (AF) [DMT96, BPK97] (also known as ACK Suppression [SF98, Sam99, FSS01]) is a TCP-aware link-layer technique that reduces the number of TCP ACKs sent on the upstream link. The challenge is to ensure that the sender does not stall waiting for ACKs, which can happen if ACKs are removed indiscriminately on the reverse path. AF removes only certain ACKs without starving the sender by taking advantage of the fact that TCP ACKs are cumulative. Expires November 2001 [page 16] INTERNET DRAFT PILC - Asymmetric Links May 2001 When an ACK from the receiver is about to be enqueued at a upstream link interface, the router or the end-host's link layer (if the host is directly connected to the bottleneck upstream link) checks the transmit queues for any older ACKs belonging to the same TCP connection. If any are found, it removes some (or all of them) from the queue, thereby reducing the number of ACKs that go back to the TCP sender. The removal of these "redundant" ACKs frees up buffer space for other data and ACK packets. Some ACKs also have other functions in TCP [RFC1144], and should not be deleted to ensure normal operation. AF should therefore not delete an ACK that has any DATA or TCP flags set (sync, reset, urgent, and final). In addition, it should avoid deleting a series of 3 duplicate ACKs that indicate the need for Fast Retransmission [RFC2581] or TCP ACKS with the Selective ACK option (SACK)[RFC2018] from the queue to avoid causing problems to TCP's data-driven loss recovery mechanisms. Appropriate treatment is also needed to preserve correct operation of ECN feedback (carried in the TCP header). This technique has been deployed in specific production networks (e.g. asymmetric satellite networks [ASB96]). A range of policies to filter ACKs may be used. These may be either deterministic or random (similar to a random-drop gateway, but taking the semantics of the items in the queue into consideration). Algorithms have also been suggested to ensure a minimum ACK rate to guarantee the sender's window is updated [Sam99, FSS01], and limit the number of data packets (TCP segments) acknowledged by a stretch ACK. Per-flow state needs to be maintained only for connections with at least one packet in the queue (akin to FRED [LM97]). This state is soft [Cla88), and if necessary, can easily be reconstructed from the contents of the queue. The undesirable effect of delayed DupACKs (section 4.4) can be reduced by deleting duplicate ACKs up to a threshold value [MJW00, CLP98] allowing Fast Retransmission, but avoiding early timeouts, which may otherwise result from excessive queuing of DupACKs. Future schemes may include more advanced rules allowing removal of selected SACKs [RFC2018]. Such a scheme could prevent the upstream link queue from becoming filled by back-to-back ACKs with SACK blocks. Since a SACK packet is much larger than an ACK, it would otherwise add significantly to the reverse path delay. Selection of suitable algorithms remains an on-going area of research. RECOMMENDATION: ACK Filtering requires a modification to the upstream link interface. It benefits paths that have an arbitrary bandwidth asymmetry. The scheme has been deployed in networks where the extra processing overhead (per ACK) may be compensated for by avoiding the need to modify TCP. At high asymmetry (k>>1) (or with bi-directional traffic) the scheme will increase the burst size of the TCP sender, use of a scheme to mitigate the effect of stretch Expires November 2001 [page 17] INTERNET DRAFT PILC - Asymmetric Links May 2001 ACKs or control TCP sender burst size is therefore recommended. The scheme MAY be used in the Internet, and has been deployed, however a scheme to mitigate the effect of stretch ACKs or control burst size SHOULD be used in combination with ACK Filtering. Suitable algorithms to support IPSEC authentication, SACK, and ECN remain areas of on-going research. 6.2.2 ACK Decimation ACK Decimation is based on standard router mechanisms. By using an appropriate configuration of (small) per-flow queues and a chosen dropping policy (e.g. WFQ) at the upstream bottleneck link, a similar effect to AF (section 6.2.1) may be obtained, but with less control of the actual packets which are dropped. In this scheme, the router/host at the bottleneck upstream link maintains per-flow queues and services them fairly (or with priorities) by handling the queuing and scheduling of ACKs and data packets in the reverse direction. A small queue threshold is maintained to drop the excessive ACKs from the tail of the queue, in order to reduce ACK Congestion. The inability to identify the special ACK packets (c.f. AF) introduces some major drawbacks to this approach, such as the possibility of losing DupACKs, FIN/ACK, RST packets, or packets carrying ECN information. Loss of these packets does not significantly impact network congestion, but does adversely impact the performance of the TCP session observing the loss. A WFQ scheduler may assign a higher priority to interactive traffic and provide a fair share of the remaining bandwidth to the data traffic. In the presence of bi-directional traffic, and with a suitable scheduling policy, this may ensure fairer sharing for ACK and data packets. An increased forward transmission rate is achieved over asymmetric links by an increased ACK Decimation rate, leading to generation of stretch ACKs. As in AF, TCP sender burst size increases when stretch ACKs are received unless other techniques are used in combination with this technique. This technique has been deployed in specific networks. One example is a network that shows high bandwidth asymmetry to support high- speed data services to in-transit mobile hosts and shipboard [Seg00]. It has proven to be workable, resulting significant performance improvement for asymmetric transfers. Although not optimal, it use offered a potential mitigation which may be applicable even when the TCP header is difficult to identify or no longer visible (due to IPSEC encryption). RECOMMENDATION: ACK Decimation uses standard router mechanisms at the upstream link interface to constrain the rate at which ACKs are fed to the upstream bottleneck link. At high asymmetry (k>>1) (or with bi-directional traffic) the scheme will increase the burst size Expires November 2001 [page 18] INTERNET DRAFT PILC - Asymmetric Links May 2001 of the TCP sender, use of a scheme to mitigate the effect of stretch ACKs or control burstiness is therefore recommended when this scheme is used. The approach is however suboptimal, in that it may lead to inefficient TCP error recovery (and hence in some cases degraded TCP performance), and provides only crude control of the link behavior. It is therefore recommended that ACK Filtering should be used in preference to ACK Decimation. The scheme is widely implemented and deployed and MAY be used over an Internet link, however a scheme to mitigate the effect of stretch ACKs or control burst size SHOULD be used in combination with ACK Decimation. 6.3 TYPE 2: Handling Infrequent ACKs TYPE 2 mitigations perform TYPE 1 upstream link bandwidth management, but also employ a second active element which mitigates the effect of the reduced ACK rate and burstiness transmission. This is desirable when hosts use standard TCP sender implementations (e.g.those that do not implement the techniques in sections 5.6, 5.7). Consider a path were a TYPE 1 scheme forwards a stretch ACK covering d TCP packets (i.e. where the acknowledgement number is d*MSS larger than the last ACK received by the TCP sender). When the TCP sender receives this ACK, it can send a burst of d (or d+1) TCP data packets, since the sender congestion window increases by at most 1, independent of the size of d. A TYPE 2 scheme mitigates the impact of the reduced ACK frequency resulting when a TYPE 1 scheme is used. This is achieved by interspersing additional ACKs before each received stretch ACK. The additional ACKs, together with the original ACK, provide the TCP sender with sufficient ACKs to allow the TCP cwnd to open in the same way as if each of the original ACKs send by the TCP receiver had been forwarded by the reverse path. In addition, by attempting to restore the spacing between ACKs, such a scheme can also restore the TCP self-clocking behavior, and reduce the TCP sender burst size. Such schemes need to ensure conservative behavior (i.e. SHOULD NOT introduce more ACKs than were originally sent) and reduce the probability of ACK Compression [ZSC91]. The action is performed at two points on the return path (the upstream link interface (where excess ACKs are removed), and a point further along the reverse path (after the bottleneck upstream link(s)), where replacement ACKs is inserted. This attempts to reconstruct the ACK stream sent by the TCP receiver when used in combination with AF (section 6.2.1), or ACK Decimation (6.2.2). TYPE 2 mitigations can be deployed by Internet Service Providers (ISPs) of asymmetric access technologies. They may be performed locally at the receive interface directly following the upstream bottleneck link, or may alternatively be applied at any place further along the reverse path (note that this does not need to be on the forward path, since asymmetric routing may employ different Expires November 2001 [page 19] INTERNET DRAFT PILC - Asymmetric Links May 2001 forward and reverse internet paths). Since the techniques may generate multiple ACKs upon reception of each individual stretch ACK, it is recommended that the expander implements a scheme to prevent exploitation as a "packet amplifier" in Denial of Service (DoS) attacks (e.g. to verify the originator of the compacted ACK). Identification of the sender could be accomplished by appropriately configured packet filters, by tunnel encryption procedures. A limit on the number of reconstructed ACKs that may be generated from a single packet may also desirable. 6.3.1 ACK Reconstruction ACK Reconstruction (AR) [BPK97] is used in conjunction with AF (section 6.2.1). AR deploys a soft-state [Cla88] agent (e.g. middle- box) called an ACK Reconstructor on the reverse path following the upstream bottleneck link. The soft-state can easily be regenerated if lost, based on received ACKs. When a stretch ACK is received, AR introduces additional ACKs by filling in the gaps in the ACK sequence. Some potential denial of service vulnerabilities may arise (see section 6.3) and need to be addressed by appropriate security techniques. The reconstructor determines the number of additional ACKS, by estimating the number of filtered ACKs. This uses implicit information present in the received ACK stream by observing the ACK sequence number of each received ACK. An example implementation could set an ACK threshold, ack_thresh, to twice the TCP Maximum Segment Size (MSS) (the largest TCP data packet). The factor of two corresponds to standard TCP delayed-ACK policy (d=2 at the TCP receiver). Thus, if successive ACKs arrive separated by delta_a, the reconstructor interposes a ceil(delta_a/ack_thresh) _ 2 number of ACKs, where ceil() is the ceiling operator. To reduce the TCP sender burst size and allow the congestion window to increase at a rate governed by the downstream link, the reconstructed ACKs need to be sent at a consistent rate (i.e. temporal spacing between reconstructed ACKs). One method is for the reconstructor to measure the rate at which ACKs arrive using an exponentially weighted moving average estimator. This rate depends on the output rate from the upstream bottleneck link and on the presence of other traffic sharing the link. The output of the estimator, delta_t, indicates the average temporal spacing for ACKs (and the average rate at which ACKs would reach the TCP sender if there were no further losses or delays). If the ACK reconstructor sets the temporal spacing of reconstructed ACKs (ack_interval) equal to delta t, then it would operate at a rate governed by the upstream bottleneck link. If TCP sender adaptation were used (e.g. a combination of the techniques in sections 5.6 and 5.7), then the TCP sender behaves as if the rate at which ACKs arrive is delta_a/delta_t. A suitable, ack_interval may be obtained by equating the rates at which increments in the ACK Expires November 2001 [page 20] INTERNET DRAFT PILC - Asymmetric Links May 2001 sequence occur in the two cases. That is, setting ack_interval so that delta_a/delta_t = ack_thresh/ack_interval, implying that ack_interval = (ack_thresh/delta_a)*delta_t. Therefore, the latest received stretch ACK, is held back for a time of approximately delta_t. The trade-off in AR is between obtaining less TCP sender burstiness, and a better rate of congestion window increase, with a reduction in the round-trip variation, versus a modest increase in the round-trip time estimate at the TCP sender. The technique can not perform reconstruction on connections using IPSEC, since they are unable to regenerate appropriate security information. An ACK Reconstructor operates correctly (i.e. generates no spurious ACKs and preserving the end-to-end semantics of the connection), providing that the TCP receiver uses ACK Delay (d=2), the reconstructor receives only in-order ACKs, all ACKs are routed via the reconstructor, and the reconstructor correctly determines the TCP MSS used by the session. RECOMMENDATION: ACK Reconstruction is a transparent modification performed on the reverse path following the upstream bottleneck link. It is designed to be used in conjunction with a TYPE 1 mitigation. It reduces the burst size of TCP transmission in the forward direction, which may otherwise increase when TYPE 1 schemes are used alone. The scheme requires modification of equipment after the bottleneck upstream bottleneck link (including maintaining per- flow soft state). Selection of appropriate algorithms to pace the ACK traffic also remain an open research issue. This scheme is a subject of on-going research and SHOULD NOT be used within the Internet in its current form. ACK Reconstruction can not perform reconstruction on connections using IPSEC (AH or encryption), since it is unable to generate appropriate security information. Some potential denial of service vulnerabilities may arise and need to be addressed by appropriate security techniques. 6.3.2 ACK Compaction/Companding ACK Compaction and ACK Companding [SAM99, FSS01] also operate at a point on the reverse path following the constrained ACK bottleneck. Like AR (section 6.3.1), ACK Compaction and ACK Companding are both used in conjunction with an AF technique (section 6.2.1) and regenerate the filtered ACKs, restoring the ACK stream. They do however differ from AR, in that the techniques use a modified AF (known as a compactor or compressor), in which explicit information is added to all stretch ACKs generated by the AF. This is used to explicitly synchronise the reconstruction operation (referred to as Expansion). Expires November 2001 [page 21] INTERNET DRAFT PILC - Asymmetric Links May 2001 The modified AF combines two modifications: First, when the compressor deletes an ACK from the upstream bottleneck link queue, it appends explicit information to the remaining ACK (this ACK is marked to ensure it is not subsequently deleted). The additional information contains details the conditions under which ACKs were previously filtered. A variety of information may be encoded in the compacted ACK. This includes the number of ACKs deleted by the AF and the average number of bytes acknowledged. This may be subsequently used by an expander at the remote end of the tunnel. Further timing information may also be added to control the pacing of the regenerated ACKs [FSS01]. To encode the extra information requires that the Expander recognises a modified ACK header. This would normally limit the Expander to link local operation (at the receive interface of the upstream bottleneck link). If remote expansion is needed further along the reverse path, a tunnel may be used to pass the modified ACKs to the remote Expander. The tunnel introduces extra overhead, however networks with asymmetric bandwidth and symmetric routing [RFC3077] frequently already employ such tunnels. (e.g. in a UDLR network [RFC3077], the expander may be co-located with the feed router.) ACK expansion uses a stateless algorithm (i.e. each received packet is processed independently of previously received packets) to expand the ACK. It uses the prefixed information together with the acknowledgment field in the received ACK, to produce an equivalent number of ACKs to those previously deleted by the compactor. These ACKs are forwarded to the original destination (i.e. the TCP sender), preserving normal TCP ACK clocking. In this way, ACK Compaction, unlike AR, is not reliant on specific ACK policies, nor must it see all ACKs associated with the reverse path (e.g. it may be compatible with schemes such as DAASS [RFC2760]). Like AR, some potential denial of service vulnerabilities may arise (see section 6.3) and need to be addressed by appropriate security techniques. The technique can not perform reconstruction on connections using IPSEC, since they are unable to regenerate appropriate security information. It is possible to explicitly encode IPSEC security information from suppressed packets, allowing operation with IPSEC AH, however this remains an open research issue, and implies and additional overhead per ACK. Other techniques similar in vein to ACK Compaction have also been proposed [JSK99]. An ACK compressor concatenates multiple ACKs and sends them to the decompressor together with the arrival time of the concatenated ACKs into the queue. The decompressor/expander uses this information to reconstruct individual ACKs. These schemes enable more accurate regeneration of ACKs compared to AF/AR. RECOMMENDATION: ACK Compaction/Companding are transparent modifications performed on the reverse path following the upstream Expires November 2001 [page 22] INTERNET DRAFT PILC - Asymmetric Links May 2001 bottleneck link. They are designed to be used in conjunction with a modified TYPE 1 mitigation and reduce the burst size of TCP transmission in the forward direction, which may otherwise increase when TYPE 1 schemes are used alone. The technique is desirable, but requires modification of equipment after the upstream bottleneck link (including processing of a modified ACK header). Selection of appropriate algorithms to pace the ACK traffic also remain an open research issue. The technique has not, at the time of writing been widely deployed. This scheme is a subject of on-going research and SHOULD NOT be used within the Internet in its current form. Some potential denial of service vulnerabilities may arise and need to be addressed by appropriate security techniques. 6.3.3 Mitigating the TCP packet bursts generated by Infrequent ACKs The bursts of data packets generated when a type 1 scheme is used on the reverse may be mitigated by introduction of a router supporting Generic Traffic Shaping (GTS) on the forward path [Seg00]. GTS is a standard router mechanism implemented in many deployed routers. This technique does not eliminate the bursts of data generated by the TCP sender, but attempts to smooth out the bursts by employing scheduling and queuing techniques, producing traffic which resembles that when TCP Pacing is used (section 5.6). These techniques require maintaining per-flow soft-state in the router, and increase per- packet processing overhead. Some additional buffer capacity is needed to queue packets being shaped. To perform GTS, the router needs to select appropriate traffic shaping parameters, which require knowledge of the network policy, connection behavior and/or downstream bottleneck characteristics. GTS may also be used to enforce other network policies and promote fairness between competing TCP connections (and also UDP and multicast flows). It also reduces the probability of ACK Compression [ZSC91]. The smoothing of packet bursts reduces the impact of the TCP transmission bursts on routers and hosts following the point at which GTS is performed. It is therefore desirable to perform GTS near to the sending host, or at least at a point before the first forward path bottleneck router. RECOMMENDATIONS: Generic Traffic Shaping (GTS) is a transparent technique employed at a router on the forward path. The algorithms to implement GTS are available in widely deployed routers and MAY be used on an Internet link, but do imply significant additional per- packet processing cost. When correctly configured they reduce size of TCP data packet bursts, mitigating the effects of Type 1 techniques. Expires November 2001 [page 23] INTERNET DRAFT PILC - Asymmetric Links May 2001 6.4 TYPE 3: Upstream Link Scheduling Many of the above schemes imply using per flow queues (or per connection queues in the case of TCP) at the upstream bottleneck link. Per-flow queuing (e.g. FQ, CBQ) offers benefit when used on any slow link (where the time to transmit a packet forms an appreciable part of the path round trip time, RTT). Since the upstream link of an asymmetric bandwidth network is usually slow, type 3 schemes offer additional benefit when used with one of the above techniques. 6.4.1 Per-Flow queuing at the upstream bottleneck link When bi-directional traffic exists in a bandwidth asymmetric network competing ACK and packet data traffic in the return path may degrade the performance both downstream and down stream flows [KVR98]. Therefore, it is highly desirable to use a queuing strategy combined with a scheduling mechanism at the upstream link. On a slow upstream link, appreciable jitter may be introduced by sending large data packets ahead of ACKs. A simple scheme may be implemented using per-flow queuing with a fair scheduler (e.g. round robin service to all flows, or priority scheduling). A modified scheduler [KVR98] could place a limit on the number of ACKs a host is allowed to transmit upstream before transmitting a data packet (assuming at least one data packet is waiting in the upstream link queue). This guarantees the reverse connection at least a certain minimum share of the bandwidth, while enabling the forward direction connection(s) to achieve high throughput. A smaller MTU, link level (transparent fragmentation [RFC1990, RFC 2686] or link level suspend/resume capability (where higher priority frames may pre-empt transmission of lower priority frames) may be used to mitigate the impact (jitter) of bi-directional traffic on low speed links. More advanced schemes (e.g. WFQ) may also be used to improve the performance of transfers with multiple ACK streams such as http [Seg00]. RECOMMENDATION: Per-flow queuing is a transparent modification performed at the upstream bottleneck link. Per-flow (or per-class) scheduling does not impact the congestion behavior of the Internet, and MAY be used on any Internet link. The scheme has particular benefits for slow links. It is widely implemented and widely deployed on links operating at less than 2 Mbps. This is recommended as a mitigation on its own or in combination with one of the other techniques outlined here. 6.4.2 ACKs-first Scheduling In the case of bi-directional transfers, data as well as ACK packets compete for resources over the shared upstream bottleneck link. A single FIFO queue for both data packets and ACKs could impact the Expires November 2001 [page 24] INTERNET DRAFT PILC - Asymmetric Links May 2001 performance of forward transfers. For example, if the upstream bottleneck link is a 28.8 Kbps dialup line, the transmission of a 1 KB sized data packet would take about 280 ms. So even if just two such data packets get queued ahead of ACKs (not an uncommon occurrence since data packets are sent out in pairs during slow start), they would shut out ACKs for well over half a second. If more than two data packets are queued up ahead of an ACK, the ACKs would be delayed by even more. A possible approach to alleviating this is to schedule data and ACKs differently from FIFO. One algorithm, in particular, is ACKs-first scheduling, which always accords a higher priority to ACKs over data packets. The motivation for such scheduling is that it minimizes the idle time for the forward connection by minimizing the time that ACKs spend queued behind data packets at the upstream link. At the same time, with type 0 techniques such as header compression [RFC1144], the transmission time of ACKs becomes small enough that the impact on subsequent data packets is minimal. (Networks in which the per-packet overhead of the upstream link is large, e.g. packet radio networks, are an exception.) This scheduling scheme does not require the upstream bottleneck router/host to explicitly identify or maintain state for individual TCP connections. ACKs-first scheduling does not help avoid a delay due to a data packet in transmission. Link fragmentation or suspend/resume may be beneficial in this case. RECOMMENDATION: ACKs-first scheduling is a transparent modification performed at the upstream bottleneck link. If it is used without a mechanism (such as ACK congestion control (ACC)) to regulate the volume of ACKs, it could lead to starvation of data packets. This is a performance penalty experiences by hosts using the link and does not modify Internet congestion behavior. Experiments indicate that ACKs-first scheduling in combination with ACC is promising. However, further development of the technique remains an open research issue, and therefore the scheme SHOULD NOT be used within the Internet in its current form. 7. Security Considerations The authors believe the recommendations contained in this memo do not impact the integrity of the security implications of the TCP protocol or applications using TCP. Some security considerations in the context of this Internet Draft arise from the implications of using IPSEC by the end hosts or routers operating along the return path. Use of IPSEC prevents, or complicates, some of the mitigations: 1. When IPSEC ESP is used to encrypt the IP payload, the TCP header can neither be read nor modified by intermediate entities. This Expires November 2001 [page 25] INTERNET DRAFT PILC - Asymmetric Links May 2001 rules out header compression, ACK Filtering, and ACK Reconstruction, and current use of ACK Compaction. 2. With IPSEC AH or TF-ESP, the TCP header can be read, but not modified, by intermediaries. This rules out ACK Reconstruction, but may in future allow extensions to support ACK Filtering and ACK Compaction. The enhanced header compression scheme discussed in [RFC2507] would also work with AH. There are potential Denial of Service (DoS) implications when using Type 2 schemes. Unless additional security mechanisms are used, a reconstructor/expander could be exploited as a packet amplifier. A third party may inject unauthorized stretch ACKs into the reverse path, triggering the generation of additional ACKs. These ACKs would consume capacity on the return path and processing resources at the systems along the path, including the destination host. This provides a potential platform for a DOS attack. The usual precautions must be taken to verify the correct tunnel end point, and to ensure that applications cannot falsely inject packets that expand to generate unwanted traffic. Imposing a rate limit and bound on distance (d) would also lessen the impact of any undetected exploitation. 8. Summary This document considers several TCP performance constraints that arise from asymmetry in the properties of the forward and reverse paths of an IP network. Such performance constraints arise as a result of both asymmetry in bandwidth and interactions with media access control (MAC) protocols. Asymmetric bandwidth provision may cause TCP ACKs to be lost or become inordinately delayed (e.g., when a bottleneck link is shared between many flows, or when there is bi-directional traffic). This effect may be exacerbated with media-access delays (e.g., in certain multi-hop radio networks, satellite BoD access). Asymmetry, and particular high asymmetry, raises a set of TCP performance issues. A set of techniques providing performance improvement is surveyed. These include techniques to alleviate ACK congestion and techniques that enable a TCP sender to cope and infrequent ACKs without destroying TCP self-clocking. These techniques include both end-to- end, local link-layer, and subnetwork schemes. Many of these techniques have been evaluated in detail via analysis, simulation, and/or implementation on asymmetric subnetworks forming part of the Internet. The author's recommendations for end-to-end host modifications are summarised in table 1 below. This lists each technique, the section in which each technique is discussed, and where it is applied (S denotes the host sending TCP data packets in the forward direction, R denotes the host which receives these data packets). Expires November 2001 [page 26] INTERNET DRAFT PILC - Asymmetric Links May 2001 +------------------------+-------------+------------+--------+ | Technique | Use | Section | Where | +------------------------+-------------+------------+--------+ | Modified Delayed ACKs | NOT REC | 5.1 | TCP R | | Large MSS & IP FRAG | NOT REC | 5.2 | TCP S | | Large MSS & Large MTU | MAY | 5.2 | TCP S | | ACK Congestion Control | NOT REC | 5.3 | TCP SR | | Window Pred. Mech (WPM)| NOT REC | 5.4 | TCP R | | Window Cwnd. Est. (ACE)| NOT REC | 5.5 | TCP R | | TCP Sender Pacing | MAY | 5.6 | TCP S | | Byte Counting | NOT REC *1 | 5.7 | TCP S | | Enhanced Backpressure | NOT REC | 5.8 | TCP R | +------------------------+-------------+------------+--------+ Table 1: Recommendations concerning host modifications. *1 Dependent on a scheme for preventing excessive TCP transmission burst. In this document, these techniques are known as "transparent" because at the transport layer, the TCP sender and receiver are not necessarily aware of their existence. This does not imply that they do not modify the pattern and timing of packets as observed at the network layer. The recommendations for techniques which do not require the TCP sender and receiver to be aware of their existence (i.e. transparent techniques) are summarised in table 2 below. For each technique, the section in which each mechanism is discussed, and where the technique is applied (S denotes the sending interface prior to the upstream bottleneck link, R denotes receiving interface following the upstream bottleneck link). +------------------------+-------------+------------+--------+ | Mechanism | Use | Section | Type | +------------------------+-------------+------------+--------+ | Header Compr. (V-J) | MAY | 6.1.1 | 0 SR | | Header Compr. (ROHC) | MAY *1 | 6.1.2 | 0 SR | +------------------------+-------------+------------+--------+ | ACK Filtering (AF) | See REC *2 | 6.2.1 | 1 S | | ACK Decimation | See REC *2 | 6.2.2 | 1 S | +------------------------+-------------+------------+--------+ | ACK Reconstruction (AR)| NOT REC | 6.3.1 | 2 *3 | | ACK Compaction/Compand.| NOT REC | 6.3.2 | 2 S *3 | | Gen. Traff. Shap. (GTS)| MAY | 6.3.3 | 2 *4 | +------------------------+-------------+------------+--------+ | Fair Queueing (FQ) | MAY | 6.4.1 | 3 S | | ACK First Scheduling | NOT REC | 6.4.2 | 3 S | +------------------------+-------------+------------+--------+ Table 2: Recommendations concerning transparent modifications. *1 Standardisation of new compression protocols is the subject of on-going work within the ROHC WG. *2 Dependent on a scheme for preventing excessive TCP transmission burst. Refer to section recommendations. Expires November 2001 [page 27] INTERNET DRAFT PILC - Asymmetric Links May 2001 *3 Performed at a point along the reverse path after the upstream bottleneck link. *4 Performed at a point along the forward path. The techniques proposed in this document differ from those used in Protocol Enhancing Proxies (PEP) in that they do NOT seek to modify the end to end semantics, and do not inspect/modify any TCP or UDP payload data. They also do not modify the port numbers or addresses of packets. Many of the risks associated with PEP do not exist for such schemes. 9. Acknowledgments This document has benefited from comments from the members of the Performance Implications of Links (PILC) Working Group. In particular, we would like to thank Ramon Segura, Rod Ragland, Spencer Dawkins and Aaron Falk for their useful comments about this document. 10. References [abc-ID] Allman, M., draft-allman-tcp-abc-00.txt, Internet Draft, WORK IN PROGRESS. [All97b] Allman, M., "Fixing Two BSD TCP Bugs", Technical Report CR- 204151, NASA Lewis Research Center, October 1997. [ASB96] Arora, V., Suphasindhu, N., Baras, J.S. and D. Dillon, "Asymmetric Internet Access over Satellite-Terrestrial Networks", Proc. AIAA: 16th International Communications Satellite Systems Conference and Exhibit, Part 1, Washington, D.C., February 25-29, 1996, pp.476-482. [AST00] Aggarwal, A., Savage, S., and T. Anderson, "Understanding the Performance of TCP Pacing," Proc. IEEE INFOCOM, Tel-Aviv, Israel, V.3, March 2000, pp. 1157-1165. [Bal98] Balakrishnan, H., "Challenges to Reliable Data Transport over Heterogeneous Wireless Networks", Ph.D. Thesis, University of California at Berkeley, USA, August 1998. http://www.cs.berkeley.edu/~hari/thesis/ [BPK97] Balakrishnan, H., Padmanabhan, V. N., and R. H. Katz, "The Effects of Asymmetry on TCP Performance", Proc. ACM/IEEE MOBICOM, Budapest, Hungary, September 1997, pp. 77-89. [BPK99] Balakrishnan, H., Padmanabhan, V. N., and R. H. Katz, "The Effects of Asymmetry on TCP Performance", ACM Mobile Networks and Applications (MONET), Vol.4, No.3, 1999, pp. 219-241. An expanded version of a paper published at Proc. ACM/IEEE MOBICOM '97. Expires November 2001 [page 28] INTERNET DRAFT PILC - Asymmetric Links May 2001 [BPS00] Bennett, J. C., Partridge, C., and N. Schectman, "Packet Reordering is Not Pathological Network Behaviour," IEEE/ACM Transactions on Networking, Vol. 7, 2000, pp.789-798. [Cla88] Clark, D.D, "The Design Philoshophy of the DARPA Internet Protocols", Proc. ACM SIGCOMM, Stanford, CA, 1988, pp.106-114. [CLC99] Clausen, Linder, H., H., and B. Collini-Nocker, "Internet over Broadcast Satellites", IEEE Commun. Mag. 1999, pp.146-151. [CLP98] Calveras, A., Linares, J., and J. Paradells, "Window Prediction Mechanism for Improving TCP in Wireless Asymmetric Links". Proc. IEEE GLOBECOM, Sydney Australia, November 1998, pp.533-538. [CR98] Cohen, R., and Ramanathan, S.,"TCP for High Performance in Hybrid Fiber Coaxial Broad-Band Access Networks", IEEE/ACM Transactions on Networking, Vol.6, No.1, February 1998, pp.15-29. [DMT96] Durst, R., Miller, G., and E. Travis, "TCP Extensions for Space Communications," Proc. ACM MOBICOM, New York, USA, November 1996, pp.15-26. [FJ93] Floyd, S., and V. Jacobson, "Random Early Detection gateways for Congestion Avoidance", IEEE/ACM Transactions on Networking, Vol.1, No.4, August 1993, pp.397-413. [FSS01] Fairhurst, G., Samaraweera, N.K.G, Sooriyabandara, M., Harun, H., Hodson, K., and R. Donardio, "Performance Issues in Asymmetric Service Provision using Broadband Satellite", IEE Proc. Commun, Vol.148, No.2, April 2001, pp.95-99. [Jac88] Jacobson, V., "Congestion Avoidance and Control", Proc. ACM SIGCOMM, Stanford, CA, CCR Vol.18, No.4, August 1988, pp.314-329. [JSK99] Johansson, G.L., Shakargi, H., Kanljung, C., and J. Kullander, "ACKNOWLEDGEMENT Compression Rev B", Technical Report December 1999. [Ken87] Kent C.A., and J. C. Mogul,"Fragmentation Considered Harmful", Proc. ACM SIGCOMM, USA, CCR Vol.17, No.5, 1988, pp.390- 401. [KSG98] Krout, T., Solsman, M., and J. Goldstein, "The Effects of Asymmetric Satellite Networks on Protocols", Proc. IEEE MILCOM, Bradford, MA, USA, Vol.3, 1998, pp.1072-1076. [KVR98] Kalampoukas, L., Varma, A., and Ramakrishnan, K.K., "Improving TCP Throughput over Two-Way Asymmetric Links: Analysis and Solutions", Proc. ACM SIGMETRICS, Medison, USA, 1998, pp.78-89. Expires November 2001 [page 29] INTERNET DRAFT PILC - Asymmetric Links May 2001 [LM97] Lin, D., and R. Morris, "Dynamics of Random Early Detection", Proc. ACM SIGCOMM, Cannes, France, CCR Vol.27, No.4, 1997, pp.78-89. [LMS97] Lakshman, T.V., Madhow, U., and B. Suter, "Window-based Error Recovery and Flow Control with a Slow Acknowledgement Channel: A Study of TCP/IP Performance", Proc. IEEE INFOCOM, Vol.3, Kobe, Japan, 1997, pp.1199-1209. [Met] Metricom Inc., http://www.metricom.com [MJW00] Ming-Chit, I.T., Jinsong, D., and W. Wang,"Improving TCP Performance Over Asymmetric Networks", ACM SIGCOMM CCR, Vol.30, No.3, 2000. [Pad98] Padmanabhan, V.N., "Addressing the Challenges of Web Data Transport", Ph.D. Thesis, University of California at Berkeley, USA, September 1998 (also Tech Report UCB/CSD-98-1016). http://www.research.microsoft.com/~padmanab/phd-thesis.html [PEP-ID] Border, J., Kojo, M., griner, J., Montenegro, G., and Z. Shelby, "Performance Enhancing Proxies That Improve Link-Related Degradations", draft-ietf-pilc-pep-06.txt, Internet Draft, WORK IN PROGRESS. [rohc] Robust Header Compression Working Group IETF, http://www.ietf.org/html.charters/rohc-charter.html. [RFC1144] Jacobson, V., "Compressing TCP/IP Headers for Low-Speed Serial Links", RFC1144. [RFC1990] Sklower, K., Lloyd, B., McGregor, G., Carr, D., and T. Coradetti, "The PPP Multilink Protocol (MP)", RFC1990. [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 3", RFC2026. [RFC2119] Bradner, S., " Key words for use in RFCs to Indicate Requirement Levels", RFC2119. [RFC2481] Ramakrishnan K., and S. Floyd, "A Proposal to add Explicit Congestion Notification (ECN) to IP,", Experimental RFC2481. [RFC2507] Degermark, M., Nordgren, B., and Pink, S., "IP Header Compression", RFC2507. [RFC2525] Paxson, V., Allman, M., Dawson, S., Heavens, I., and B. Volz, "Known TCP Implementation Problems", RFC 2525. [RFC 2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion Control", RFC2581. Expires November 2001 [page 30] INTERNET DRAFT PILC - Asymmetric Links May 2001 [RFC2686] Bormann, C., "The Multi-Class Extension to Multi-Link PPP", RFC2686. [RFC2760] Allman, M., Dawkins, S., Glover, D., Griner, J., Henderson, T., Heidemann, J., Kruse, H., Ostermann, S., Scott, K., Semke, J., Touch, J., and D. Tran, "Ongoing TCP Research Related to Satellites", RFC 2760. [RFC3077] Duros, E., Dabbous, W., Izumiyama, H., Fujii, N., and Y. Zhang, "A link Layer tunneling mechanism for unidirectional links", RFC3077. [Sam99] Samaraweera, N.K.G, "Return Link Optimization for Internet Service Provision Using DVB-S Networks", ACM CCR, Vol.29, No.3, 1999, pp.4-19. [Seg00] Segura R., " Asymmetric Networking Techniques For Hybrid Satellite Communications", NC3A, The Hague, Netherlands, NATO Technical Note 810. Aug. 2000, pp.32-37. [SF98] Samaraweera, N.K.G., and G. Fairhurst. "High Speed Internet Access using Satellite-based DVB Networks", Proc. IEEE International Networks Conference (INC98), Plymouth, UK, 1998, pp.23-28. [ZSC91] Zhang, L., Shenker, S., and D. D. Clark, "Observations and Dynamics of a Congestion Control Algorithm: The Effects of Two-Way Traffic", Proc. ACM SIGCOMM, 1991, pp.133-147. 11. Authors' Addresses Hari Balakrishnan Laboratory for Computer Science 200 Technology Square Massachusetts Institute of Technology Cambridge, MA 02139 USA Phone: +1-617-253-8713 Fax: +1-617-253-0147 Email: hari@lcs.mit.edu Web: http://nms.lcs.mit.edu/~hari/ Venkata N. Padmanabhan Microsoft Research One Microsoft Way Redmond, WA 98052 USA Phone: +1-425-705-2790 Fax: +1-425-936-7329 Email: padmanab@microsoft.com Web: http://www.research.microsoft.com/~padmanab/ Expires November 2001 [page 31] INTERNET DRAFT PILC - Asymmetric Links May 2001 Godred Fairhurst Department of Engineering Fraser Noble Building University of Aberdeen Aberdeen AB24 3UE UK Fax: +44-1224-272497 Email: gorry@erg.abdn.ac.uk Web: http://www.erg.abdn.ac.uk/users/gorry Mahesh Sooriyabandara Department of Engineering Fraser Noble Building University of Aberdeen Aberdeen AB24 3UE UK Fax: +44-1224-272497 Email: mahesh@erg.abdn.ac.uk Web: http://www.erg.abdn.ac.uk/users/mahesh Full Copyright Statement "Copyright (C) The Internet Society (2001). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. 12. IANA Considerations There are no IANA considerations associated with this draft. Expires November 2001 [page 32]