INTERNET-DRAFT Thomas Narten IBM June 1999 Privacy Extensions for Stateless Address Autoconfiguration in IPv6 Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract Nodes use IPv6 stateless address autoconfiguration to generate addresses without the necessity of a DHCP server. Addresses are formed by combining network prefixes with a constant interface identifier derived from the interface's IEEE Indentifier. This document describes an optional extension to IPv6 stateless address autoconfiguration that results in a node generating addresses from an interface identifier that changes over time. Changing the interface identifier over time makes it more difficult for eavesdroppers and other information collectors to identify when different addresses used in different transactions actually correspond to the same node. draft-ietf-ipngwg-addrconf-privacy-00.txt [Page 1] INTERNET-DRAFT June 24, 1999 Contents Status of this Memo.......................................... 1 1. Introduction............................................. 2 2. Background............................................... 3 3. Protocol Description..................................... 5 4. Implications of Changing Interface Identifiers........... 7 5. Open Issues and Future Work.............................. 7 6. Security Considerations.................................. 8 7. References............................................... 8 8. Authors' Addresses....................................... 9 1. Introduction Stateless address autoconfiguration [ADDRCONF] defines how an IPv6 node generates addresses without the need for a DHCP server. Network interfaces typically come with an embedded IEEE Identifier (i.e., a link-layer MAC address), and stateless address autoconfiguration uses the IEEE identifier to generate a 64-bit interface identifier [ADDRARCH]. By design, the interface identifier will typically be globally unique. The interface identifier is in turn appended to a prefix to form a 128-bit IPv6 address. All nodes use this technique to generate link-local addresses for their attached interfaces. Additional addresses, including site-local and global-scope addresses, are then created by combining prefixes advertised in Router Advertisements via Neighbor Discovery [DISCOVERY] with the interface identifier. As mobile devices (e.g., laptops, PDAs, etc.) move topologically, they form new addresses for their current topological point of attachment. While the node's address changes as it moves, however, the interface identifier contained within the address remains the same. Because the interface identifier associated with a node can potentially remain fixed for a long period of time (e.g., months or years) concern has been voiced that the interface identifier could in some cases be used to track the movement and usage of a particular machine. For example, a server that logs the source addresses of incoming connections would simultaneously collect identical information keyed on the interface id, allowing one to correlate draft-ietf-ipngwg-addrconf-privacy-00.txt [Page 2] INTERNET-DRAFT June 24, 1999 activities based on interface identifiers in addition to addresses. This is of particular concern with the expected proliferation of next-generation network-connected devices (e.g, PDAs, cell phones, etc.) in which large numbers of devices are in practice associated with a single user. Thus, the interface identifier embedded within an address could be used to track activities of an individual. This document discusses concerns associated with the embedding of interface identifiers within IPv6 addresses and describes optional extensions to stateless address autoconfiguration that can help mitigate those concerns in environments where such concerns are significant. 2. Background This section discusses the problem in more detail and provides context for evaluating the significance of the concerns in specific environments and makes comparisons with existing practices. 2.1. Extended Use of the Same Identifier The use of a non-changing interface identifier to form addresses is a specific instance of the more general case where a constant identifier is reused over an extended period of time and in multiple independent activities. Anytime the same identifier is used in multiple contexts, it becomes possible for that identifier to be used to correlate seemingly unrelated activity. For example, a network sniffer placed strategically on a link across which all traffic to/from a particular host crosses could keep track of which destinations a node communicated with and at what times. Such information can in some cases be used to infer things, such as what hours an employee was active, when someone is at home, etc. One of the requirements for correlating seemingly unrelated activities is the use (and reuse) of an identifier that is recognizable over time within different contexts. IP addresses provide one obvious example, but there are more. Many nodes also have DNS names associated with their addresses, in which case the DNS name serves as a similar identifier. Although the DNS name associated with an address is more work to obtain (it may require a DNS query) the information is often readily available. In such cases, changing the address on a machine over time would do little to address the concern raised in this document, as the DNS name would become the correlating identifier. The use of a constant identifier within an address is of special draft-ietf-ipngwg-addrconf-privacy-00.txt [Page 3] INTERNET-DRAFT June 24, 1999 concern because addresses are a fundamental requirement of communication and cannot easily be hidden from eavesdroppers and other parties. Even when higher layers encrypt their payloads, addresses in packet headers appear in the clear. Consequently, if a mobile host (e.g., laptop) accessed the network from several different locations, an eavesdropper might be able to track the movement of that mobile host from place to place, even if the upper layer payloads were encrypted [SERIALNUM]. 2.2. Not a New Issue Although the topic of this document may at first appear to be an issue new to IPv6, similar issues already exist in today's Internet already. That is, addresses used in today's Internet are often constant in practice for extended periods of time. In many sites, addresses are assigned statically; such addresses typically change infrequently. However, many sites are moving away from static allocation to dynamic allocation via DHCP. In theory, the address a client gets via DHCP can change over time, but in practice servers return the same address to the same client (unless addresses are in such short supply that they are reused immediately by a different node when they become free). Thus, although many sites use DHCP, clients end up using the same address for months at a time. Nodes that need a (non-changing) DNS name generally have static addresses assigned to them to simplify the configuration of DNS servers. Although Dynamic DNS [DDNS] can be used to update the DNS dynamically, it is not widely deployed today. In addition, changing an address but keeping the same DNS name does not really address the underlying concern, since the DNS name becomes a non-changing identifier. Servers generally require a DNS name (so clients can connect to them), and clients often do as well (e.g., some servers refuse to speak to a client whose address cannot be mapped into a DNS name that also maps back into the same address). Many network services require that the client authenticate itself to the server before gaining access to a resource. The authentication step binds the activity (e.g., TCP connection) to a specific entity (e.g., an end user). In such cases, a server already has the ability to track usage by an individual, independent of the address they happen to use. Indeed, such tracking is an important part of accounting. Web browsers and servers typically exchange "cookies" with each other. Such cookies allow web servers to correlate a current activity with a previous activity. One common usage is to send back targeted advertising to a browser by noting that a transaction that it is draft-ietf-ipngwg-addrconf-privacy-00.txt [Page 4] INTERNET-DRAFT June 24, 1999 performing was started by an entity that previously requested information that had the side-effect of indicating the interest of the querier. 2.3. Possible Approaches One way to avoid some of the problems discussed above would be to use DHCP for obtaining addresses. With DHCP, the DHCP server could arrange to hand out addresses that change over time. Another approach, one compatible with the stateless address autoconfiguration architecture would be to change the interface id portion of an address over time. For example, upon each system restart, select a new interface identifier different from the ones used previously. Changing the interface identifier makes it more difficult to look at the IP addresses in independent transactions and identify which ones actually correspond to the same node. In order to make it difficult to make educated guesses as to whether two different interface identifiers belong to the same node, the algorithm for generating alternate identifiers must include input that has an unpredictable component from the perspective of the outside entity's collecting information. Picking identifiers from a pseudorandom sequence suffices, so long as the specific sequence cannot be determined by an outsider examining just the identifiers that appear in addresses. This document proposes the use of an MD5 hash, using a per-interface "key" that varies from one interface to another. Specifically, we use the interface identifier generated using the normal procedure [ADDRARCH] as the key. 3. Protocol Description The goal of this section is to define procedures that: 1) Result in a different interface identifier being generated at each system restart or attachment to a network. 2) Produce a sequence of interface identifiers that appear to be random in the sense that it is difficult for an outside observer to predict a future identifier based on a current one and it is difficult to determine previous identifiers knowing only the present one. We describe two approaches. The first assumes the presence of stable storage that can be used to record state history for use as input into the next iteration of the algorithm, i.e., after a system draft-ietf-ipngwg-addrconf-privacy-00.txt [Page 5] INTERNET-DRAFT June 24, 1999 restart. A second approach addresses the case where stable storage is unavailable and the interface identifier must be generated at random. 3.1. When Stable Storage is Present The following algorithm assumes the presence of a 64-bit "history value" that is used as input in generating an interface identifier. The very first time the system boots (i.e., out-of-the-box), any value can be used including all zeros. Whenever a new interface identifier is generated, its value is saved in the seed for the next iteration of the process. Section 5.3 of [ADDRCONF] describes the steps for generating a link- local address when an interface becomes enabled. This document modifies that step in the following way. Rather than use interface identifiers generated as described in [ADDRARCH], the identifier is generated as follows: 1) Take the history value from the previous iteration (or 0 if there is no previous value) and append to it the interface identifier generated as described in [ADDRARCH]. 2) Compute the MD5 message digest [MD5] over the quantity created in step 1). 3) Take the left-most 64-bits of the MD5 digest and set bit 6 (the left-most bit is numbered 0) to zero. This creates an interface identifier with the universal/local bit indicating local significance only. Use the resultant identifier for generating addresses as outlined in [ADDRCONF]. That is, use the interface identifier to generate a link-local and other appropriate addresses. 4) Save the interface identifier created in step 3) in stable storage as the history value to be used in the next iteration of the algorithm. MD5 was chosen for convenience, not because of strict requirements. IPv6 nodes are already required to implement MD5 as part of IPsec [IPSEC], thus the code will already be present on IPv6 machines. 3.2. In The Absence of Stable Storage In the absence of stable storage, no history information will be available to generate a pseudo-random sequence of interface identifiers. Consequently, identifiers will need to be generated at random. A number of techniques might be appropriate. Consult [RANDOM] for suggestions on good sources for obtaining random numbers. Note that even though a machine may not have stable storage for storing draft-ietf-ipngwg-addrconf-privacy-00.txt [Page 6] INTERNET-DRAFT June 24, 1999 the previously using interface identifier, they will in many cases have configuration information that differs from one machine to another (e.g., user identity, security keys, etc.). One approach to generating random interface identifiers in such cases is to use the configuration information to generate some data bits (which may be remain constant for the life of the machine, but will vary from one machine to another), append some random data and compute the MD5 digest as before. The remaining details for generating addresses would be analogous to those of the previous section. 4. Implications of Changing Interface Identifiers The IPv6 addressing architecture goes to great lengths to ensure that interface identifiers are globally unique. During the IPng discussions of the GSE proposal [GSE], it was felt that keeping interface identifiers globally unique in practice might prove useful to future transport protocols. Usage of the algorithms in this document would eliminate that future flexibility. The desires of protecting individual privacy vs. the desire to effectively maintain and debug a network can conflict with each other. Having clients use addresses that change over time will make it more difficult to track down and isolate operational problems. For example, when looking at packet traces, it could become more difficult to determine whether one is seeing behavior caused by a single errant machine, or by a number of them. 5. Open Issues and Future Work This document specifies that a node generate a new interface identifier each time it autoconfigures an interface. The same identifier is used to generate all addresses, including link-local, site-local and global. However, the concerns this document addresses are most likely relevant only to global-scope addresses. Thus, it may make sense for a node to have two interface identifiers, the standard one [ADDRCONF] used for link-local and site-local addresses, with a changing one used only for global-scope addresses. This would appear to require only small changes from the current specification. In some cases, one could imagine the need to change an address more frequently than upon reboot or movement to a new location. For example, for machines that do not restart for months at time, one might change addresses every few days or weeks. In extreme cases, one might even want to change addresses upon the initiation of each new TCP connection. Doing frequent changes would appear to add significant issues and possible implementation complications. For draft-ietf-ipngwg-addrconf-privacy-00.txt [Page 7] INTERNET-DRAFT June 24, 1999 example, an implementation might need to support a significant number of address on an interface simultaneously. An implementation would also need to keep track of which addresses were being used so as to be able to stop using an address once no upper layer protocols are using it (but not before). This is in contrast to current approaches where addresses are removed from an interface when they become invalid [ADDRCONF], independent of whether or not upper layer protocols are still using them. Some machines server as both clients and servers. In such cases, the server would need a DNS name. Whether the address stays fixed or changes doesn't matter since the DNS name remains constant. Simultaneously, when acting as a client (e.g., initiating communication) it may want to vary the address it uses. In such environments, one might need multiple addresses. Source address selection rules would need to take into account the policy aspects of which addresses would be acceptable for use when initiating communication. 6. Security Considerations The motivation for this document stems from privacy concerns for individuals. This document does not appear to add any security issues beyond those already associated with stateless address autoconfiguration [ADDRCONF]. 7. References [ADDRARCH] Hinden, R. and S. Deering, "IP Version 6 Addressing Architecture", RFC 2373, July 1998. [ADDRCONF] Thomson, S. and T. Narten, "IPv6 Address Autoconfiguration", RFC 2462, December 1998. [DHCP] Droms, R., "Dynamic Host Configuration Protocol", RFC 2131, March 1997. [DDNS] Vixie et. al., "Dynamic Updates in the Domain Name System (DNS UPDATE)", RFC 2136, April 1997. [DISCOVERY] Narten, T., Nordmark, E. and W. Simpson, "Neighbor Discovery for IP Version 6 (IPv6)", RFC 2461, December 1998. [GSE-ANALYSIS] Crawford et. al., "Separating Identifiers and Locators in Addresses: An Analysis of the GSE Proposal for IPv6 ", draft-ietf-ipngwg-esd-analysis-04.txt. draft-ietf-ipngwg-addrconf-privacy-00.txt [Page 8] INTERNET-DRAFT June 24, 1999 [IPSEC] Kent, S., Atkinson, R., "Security Architecture for the Internet Protocol", RFC 2401, November 1998. [MD5] Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321, April 1992. [SERIALNUM] Moore, K., "Privacy Considerations for the Use of Hardware Serial Numbers in End-to-End Network Protocols", draft-iesg-serno-privacy-00.txt. 8. Authors' Addresses Thomas Narten IBM Corporation P.O. Box 12195 Research Triangle Park, NC 27709-2195 USA Phone: +1 919 254 7798 EMail: narten@raleigh.ibm.com draft-ietf-ipngwg-addrconf-privacy-00.txt [Page 9]