INTERNET-DRAFT                                             Thomas Narten
                                                                     IBM
<draft-ietf-ipngwg-addrconf-privacy-00.txt>                    June 1999

    Privacy Extensions for Stateless Address Autoconfiguration in IPv6

                <draft-ietf-ipngwg-addrconf-privacy-00.txt>

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet- Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

Abstract

   Nodes use IPv6 stateless address autoconfiguration to generate
   addresses without the necessity of a DHCP server. Addresses are
   formed by combining network prefixes with a constant interface
   identifier derived from the interface's IEEE Indentifier.  This
   document describes an optional extension to IPv6 stateless address
   autoconfiguration that results in a node generating addresses from an
   interface identifier that changes over time. Changing the interface
   identifier over time makes it more difficult for eavesdroppers and
   other information collectors to identify when different addresses
   used in different transactions actually correspond to the same node.

draft-ietf-ipngwg-addrconf-privacy-00.txt                       [Page 1]

INTERNET-DRAFT                                             June 24, 1999

   Contents

   Status of this Memo..........................................    1

   1.  Introduction.............................................    2

   2.  Background...............................................    3

   3.  Protocol Description.....................................    5

   4.  Implications of Changing Interface Identifiers...........    7

   5.  Open Issues and Future Work..............................    7

   6.  Security Considerations..................................    8

   7.  References...............................................    8

   8.  Authors' Addresses.......................................    9

1.  Introduction

   Stateless address autoconfiguration [ADDRCONF] defines how an IPv6
   node generates addresses without the need for a DHCP server. Network
   interfaces typically come with an embedded IEEE Identifier (i.e., a
   link-layer MAC address), and stateless address autoconfiguration uses
   the IEEE identifier to generate a 64-bit interface identifier
   [ADDRARCH]. By design, the interface identifier will typically be
   globally unique. The interface identifier is in turn appended to a
   prefix to form a 128-bit IPv6 address. All nodes use this technique
   to generate link-local addresses for their attached interfaces.
   Additional addresses, including site-local and global-scope
   addresses, are then created by combining prefixes advertised in
   Router Advertisements via Neighbor Discovery [DISCOVERY] with the
   interface identifier.

   As mobile devices (e.g., laptops, PDAs, etc.) move topologically,
   they form new addresses for their current topological point of
   attachment. While the node's address changes as it moves, however,
   the interface identifier contained within the address remains the
   same.  Because the interface identifier associated with a node can
   potentially remain fixed for a long period of time (e.g., months or
   years) concern has been voiced that the interface identifier could in
   some cases be used to track the movement and usage of a particular
   machine. For example, a server that logs the source addresses of
   incoming connections would simultaneously collect identical
   information keyed on the interface id, allowing one to correlate

draft-ietf-ipngwg-addrconf-privacy-00.txt                       [Page 2]

INTERNET-DRAFT                                             June 24, 1999

   activities based on interface identifiers in addition to addresses.
   This is of particular concern with the expected proliferation of
   next-generation network-connected devices (e.g, PDAs, cell phones,
   etc.) in which large numbers of devices are in practice associated
   with a single user. Thus, the interface identifier embedded within an
   address could be used to track activities of an individual.

   This document discusses concerns associated with the embedding of
   interface identifiers within IPv6 addresses and describes optional
   extensions to stateless address autoconfiguration that can help
   mitigate those concerns in environments where such concerns are
   significant.

2.  Background

   This section discusses the problem in more detail and provides
   context for evaluating the significance of the concerns in specific
   environments and makes comparisons with existing practices.

2.1.  Extended Use of the Same Identifier

   The use of a non-changing interface identifier to form addresses is a
   specific instance of the more general case where a constant
   identifier is reused over an extended period of time and in multiple
   independent activities. Anytime the same identifier is used in
   multiple contexts, it becomes possible for that identifier to be used
   to correlate seemingly unrelated activity. For example, a network
   sniffer placed strategically on a link across which all traffic
   to/from a particular host crosses could keep track of which
   destinations a node communicated with and at what times. Such
   information can in some cases be used to infer things, such as what
   hours an employee was active, when someone is at home, etc.

   One of the requirements for correlating seemingly unrelated
   activities is the use (and reuse) of an identifier that is
   recognizable over time within different contexts. IP addresses
   provide one obvious example, but there are more. Many nodes also have
   DNS names associated with their addresses, in which case the DNS name
   serves as a similar identifier. Although the DNS name associated with
   an address is more work to obtain (it may require a DNS query) the
   information is often readily available. In such cases, changing the
   address on a machine over time would do little to address the concern
   raised in this document, as the DNS name would become the correlating
   identifier.

   The use of a constant identifier within an address is of special

draft-ietf-ipngwg-addrconf-privacy-00.txt                       [Page 3]

INTERNET-DRAFT                                             June 24, 1999

   concern because addresses are a fundamental requirement of
   communication and cannot easily be hidden from eavesdroppers and
   other parties. Even when higher layers encrypt their payloads,
   addresses in packet headers appear in the clear.  Consequently, if a
   mobile host (e.g., laptop) accessed the network from several
   different locations, an eavesdropper might be able to track the
   movement of that mobile host from place to place, even if the upper
   layer payloads were encrypted [SERIALNUM].

2.2.  Not a New Issue

   Although the topic of this document may at first appear to be an
   issue new to IPv6, similar issues already exist in today's Internet
   already. That is, addresses used in today's Internet are often
   constant in practice for extended periods of time. In many sites,
   addresses are assigned statically; such addresses typically change
   infrequently. However, many sites are moving away from static
   allocation to dynamic allocation via DHCP. In theory, the address a
   client gets via DHCP can change over time, but in practice servers
   return the same address to the same client (unless addresses are in
   such short supply that they are reused immediately by a different
   node when they become free). Thus, although many sites use DHCP,
   clients end up using the same address for months at a time.

   Nodes that need a (non-changing) DNS name generally have static
   addresses assigned to them to simplify the configuration of DNS
   servers. Although Dynamic DNS [DDNS] can be used to update the DNS
   dynamically, it is not widely deployed today. In addition, changing
   an address but keeping the same DNS name does not really address the
   underlying concern, since the DNS name becomes a non-changing
   identifier. Servers generally require a DNS name (so clients can
   connect to them), and clients often do as well (e.g., some servers
   refuse to speak to a client whose address cannot be mapped into a DNS
   name that also maps back into the same address).

   Many network services require that the client authenticate itself to
   the server before gaining access to a resource. The authentication
   step binds the activity (e.g., TCP connection) to a specific entity
   (e.g., an end user). In such cases, a server already has the ability
   to track usage by an individual, independent of the address they
   happen to use. Indeed, such tracking is an important part of
   accounting.

   Web browsers and servers typically exchange "cookies" with each
   other. Such cookies allow web servers to correlate a current activity
   with a previous activity. One common usage is to send back targeted
   advertising to a browser by noting that a transaction that it is

draft-ietf-ipngwg-addrconf-privacy-00.txt                       [Page 4]

INTERNET-DRAFT                                             June 24, 1999

   performing was started by an entity that previously requested
   information that had the side-effect of indicating the interest of
   the querier.

2.3.  Possible Approaches

   One way to avoid some of the problems discussed above would be to use
   DHCP for obtaining addresses. With DHCP, the DHCP server could
   arrange to hand out addresses that change over time.

   Another approach, one compatible with the stateless address
   autoconfiguration architecture would be to change the interface id
   portion of an address over time. For example, upon each system
   restart, select a new interface identifier different from the ones
   used previously. Changing the interface identifier makes it more
   difficult to look at the IP addresses in independent transactions and
   identify which ones actually correspond to the same node.

   In order to make it difficult to make educated guesses as to whether
   two different interface identifiers belong to the same node, the
   algorithm for generating alternate identifiers must include input
   that has an unpredictable component from the perspective of the
   outside entity's collecting information. Picking identifiers from a
   pseudorandom sequence suffices, so long as the specific sequence
   cannot be determined by an outsider examining just the identifiers
   that appear in addresses. This document proposes the use of an MD5
   hash, using a per-interface "key" that varies from one interface to
   another. Specifically, we use the interface identifier generated
   using the normal procedure [ADDRARCH] as the key.

3.  Protocol Description

   The goal of this section is to define procedures that:

   1) Result in a different interface identifier being generated at each
      system restart or attachment to a network.

   2) Produce a sequence of interface identifiers that appear to be
      random in the sense that it is difficult for an outside observer
      to predict a future identifier based on a current one and it is
      difficult to determine previous identifiers knowing only the
      present one.

   We describe two approaches. The first assumes the presence of stable
   storage that can be used to record state history for use as input
   into the next iteration of the algorithm, i.e., after a system

draft-ietf-ipngwg-addrconf-privacy-00.txt                       [Page 5]

INTERNET-DRAFT                                             June 24, 1999

   restart. A second approach addresses the case where stable storage is
   unavailable and the interface identifier must be generated at random.

3.1.  When Stable Storage is Present

   The following algorithm assumes the presence of a 64-bit "history
   value" that is used as input in generating an interface identifier.
   The very first time the system boots (i.e., out-of-the-box), any
   value can be used including all zeros. Whenever a new interface
   identifier is generated, its value is saved in the seed for the next
   iteration of the process.

   Section 5.3 of [ADDRCONF] describes the steps for generating a link-
   local address when an interface becomes enabled. This document
   modifies that step in the following way. Rather than use interface
   identifiers generated as described in [ADDRARCH], the identifier is
   generated as follows:

   1) Take the history value from the previous iteration (or 0 if there
      is no previous value) and append to it the interface identifier
      generated as described in [ADDRARCH].
   2) Compute the MD5 message digest [MD5] over the quantity created in
      step 1).
   3) Take the left-most 64-bits of the MD5 digest and set bit 6 (the
      left-most bit is numbered 0) to zero. This creates an interface
      identifier with the universal/local bit indicating local
      significance only. Use the resultant identifier for generating
      addresses as outlined in [ADDRCONF]. That is, use the interface
      identifier to generate a link-local and other appropriate
      addresses.
   4) Save the interface identifier created in step 3) in stable storage
      as the history value to be used in the next iteration of the
      algorithm.

   MD5 was chosen for convenience, not because of strict requirements.
   IPv6 nodes are already required to implement MD5 as part of IPsec
   [IPSEC], thus the code will already be present on IPv6 machines.

3.2.  In The Absence of Stable Storage

   In the absence of stable storage, no history information will be
   available to generate a pseudo-random sequence of interface
   identifiers. Consequently, identifiers will need to be generated at
   random. A number of techniques might be appropriate. Consult [RANDOM]
   for suggestions on good sources for obtaining random numbers. Note
   that even though a machine may not have stable storage for storing

draft-ietf-ipngwg-addrconf-privacy-00.txt                       [Page 6]

INTERNET-DRAFT                                             June 24, 1999

   the previously using interface identifier, they will in many cases
   have configuration information that differs from one machine to
   another (e.g., user identity, security keys, etc.). One approach to
   generating random interface identifiers in such cases is to use the
   configuration information to generate some data bits (which may be
   remain constant for the life of the machine, but will vary from one
   machine to another), append some random data and compute the MD5
   digest as before. The remaining details for generating addresses
   would be analogous to those of the previous section.

4.  Implications of Changing Interface Identifiers

   The IPv6 addressing architecture goes to great lengths to ensure that
   interface identifiers are globally unique. During the IPng
   discussions of the GSE proposal [GSE], it was felt that keeping
   interface identifiers globally unique in practice might prove useful
   to future transport protocols. Usage of the algorithms in this
   document would eliminate that future flexibility.

   The desires of protecting individual privacy vs. the desire to
   effectively maintain and debug a network can conflict with each
   other. Having clients use addresses that change over time will make
   it more difficult to track down and isolate operational problems. For
   example, when looking at packet traces, it could become more
   difficult to determine whether one is seeing behavior caused by a
   single errant machine, or by a number of them.

5.  Open Issues and Future Work

   This document specifies that a node generate a new interface
   identifier each time it autoconfigures an interface. The same
   identifier is used to generate all addresses, including link-local,
   site-local and global. However, the concerns this document addresses
   are most likely relevant only to global-scope addresses. Thus, it may
   make sense for a node to have two interface identifiers, the standard
   one [ADDRCONF] used for link-local and site-local addresses, with a
   changing one used only for global-scope addresses. This would appear
   to require only small changes from the current specification.

   In some cases, one could imagine the need to change an address more
   frequently than upon reboot or movement to a new location. For
   example, for machines that do not restart for months at time, one
   might change addresses every few days or weeks. In extreme cases, one
   might even want to change addresses upon the initiation of each new
   TCP connection. Doing frequent changes would appear to add
   significant issues and possible implementation complications. For

draft-ietf-ipngwg-addrconf-privacy-00.txt                       [Page 7]

INTERNET-DRAFT                                             June 24, 1999

   example, an implementation might need to support a significant number
   of address on an interface simultaneously. An implementation would
   also need to keep track of which addresses were being used so as to
   be able to stop using an address once no upper layer protocols are
   using it (but not before). This is in contrast to current approaches
   where addresses are removed from an interface when they become
   invalid [ADDRCONF], independent of whether or not upper layer
   protocols are still using them.

   Some machines server as both clients and servers. In such cases, the
   server would need a DNS name. Whether the address stays fixed or
   changes doesn't matter since the DNS name remains constant.
   Simultaneously, when acting as a client (e.g., initiating
   communication) it may want to vary the address it uses. In such
   environments, one might need multiple addresses. Source address
   selection rules would need to take into account the policy aspects of
   which addresses would be acceptable for use when initiating
   communication.

6.  Security Considerations

   The motivation for this document stems from privacy concerns for
   individuals. This document does not appear to add any security issues
   beyond those already associated with stateless address
   autoconfiguration [ADDRCONF].

7.  References

   [ADDRARCH] Hinden, R. and S. Deering, "IP Version 6 Addressing
           Architecture", RFC 2373, July 1998.

   [ADDRCONF] Thomson, S. and T. Narten, "IPv6 Address
           Autoconfiguration", RFC 2462, December 1998.

   [DHCP] Droms, R., "Dynamic Host Configuration Protocol", RFC 2131,
           March 1997.

   [DDNS] Vixie et. al., "Dynamic Updates in the Domain Name System (DNS
           UPDATE)", RFC 2136, April 1997.

   [DISCOVERY] Narten, T., Nordmark, E. and W. Simpson, "Neighbor
           Discovery for IP Version 6 (IPv6)", RFC 2461, December 1998.

   [GSE-ANALYSIS] Crawford et. al., "Separating Identifiers and Locators
           in Addresses: An Analysis of the GSE Proposal for IPv6 ",
           draft-ietf-ipngwg-esd-analysis-04.txt.

draft-ietf-ipngwg-addrconf-privacy-00.txt                       [Page 8]

INTERNET-DRAFT                                             June 24, 1999

   [IPSEC] Kent, S., Atkinson, R., "Security Architecture for the
           Internet Protocol", RFC 2401, November 1998.

   [MD5] Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321, April
           1992.

   [SERIALNUM] Moore, K., "Privacy Considerations for the Use of
           Hardware Serial Numbers in End-to-End Network Protocols",
           draft-iesg-serno-privacy-00.txt.

8.  Authors' Addresses

   Thomas Narten
   IBM Corporation
   P.O. Box 12195
   Research Triangle Park, NC 27709-2195
   USA

   Phone: +1 919 254 7798
   EMail: narten@raleigh.ibm.com

draft-ietf-ipngwg-addrconf-privacy-00.txt                       [Page 9]