<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [

<!ENTITY rfc1002 PUBLIC ''
   'http://xml.resource.org/public/rfc/bibxml/reference.RFC.1002.xml'>
<!ENTITY rfc1033 PUBLIC ''
   'http://xml.resource.org/public/rfc/bibxml/reference.RFC.1033.xml'>
<!ENTITY rfc1035 PUBLIC ''
   'http://xml.resource.org/public/rfc/bibxml/reference.RFC.1035.xml'>
<!ENTITY rfc2131 PUBLIC ''
   'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2131.xml'>
<!ENTITY rfc2132 PUBLIC ''
   'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2132.xml'>
<!ENTITY rfc2782 PUBLIC ''
   'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2782.xml'>
<!ENTITY rfc3315 PUBLIC ''
   'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3315.xml'>

<!ENTITY rfc3596 PUBLIC ''
   'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3596.xml'>

<!ENTITY rfc4620 PUBLIC ''
   'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4620.xml'>

<!ENTITY rfc4795 PUBLIC ''
   'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4795.xml'>

<!ENTITY rfc6762 PUBLIC ''
   'http://xml.resource.org/public/rfc/bibxml/reference.RFC.6762.xml'>

<!ENTITY rfc6763 PUBLIC ''
   'http://xml.resource.org/public/rfc/bibxml/reference.RFC.6763.xml'>

<!ENTITY rfc7288 PUBLIC ''
   'http://xml.resource.org/public/rfc/bibxml/reference.RFC.7288.xml'>

<!ENTITY rfc7719 PUBLIC ''
   'http://xml.resource.org/public/rfc/bibxml/reference.RFC.7719.xml'>

<!ENTITY rfc7819 PUBLIC ''  
   "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7819.xml"> 
 
<!ENTITY rfc7824 PUBLIC ''  
   "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7824.xml"> 

<!ENTITY rfc7844 PUBLIC ''  
   "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7844.xml"> 


]>

<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc compact="yes"?>
<?rfc toc="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>

<!-- Expand crefs and put them inline -->
<?rfc comments='yes' ?>
<?rfc inline='yes' ?>

<rfc category="info" 
     docName="draft-ietf-intarea-hostname-practice-04.txt" 
     ipr="trust200902">

<front>
    <title abbrev="Harmful Hostname Practice">
      Current Hostname Practice Considered Harmful
    </title>

   <author fullname="Christian Huitema" initials="C." surname="Huitema">
      <organization>Private Octopus Inc.</organization>
      <address>
        <postal>
          <street> </street>
          <city>Friday Harbor</city>
          <code>98250</code>
          <region>WA</region>
          <country>U.S.A.</country>
        </postal>
        <email>huitema@huitema.net</email>
      </address>
    </author>

   <author fullname="Dave Thaler" initials="D." surname="Thaler">
      <organization>Microsoft</organization>
      <address>
        <postal>
          <street> </street>
          <city>Redmond</city>
          <code>98052</code>
          <region>WA</region>
          <country>U.S.A.</country>
        </postal>
        <email>dthaler@microsoft.com</email>
      </address>
    </author>

<author fullname="Rolf Winter" initials="R." surname="Winter">
      <organization>University of Applied Sciences Augsburg</organization>
      <address>
        <postal>
          <street> </street>
          <city>Augsburg</city>
          <country>DE</country>
        </postal>
        <email>rolf.winter@hs-augsburg.de</email>
      </address>
    </author>
    <date year="2017" />

    <abstract>
        <t> 
Giving a hostname to your computer and publishing it as you roam from
one network to another is the Internet equivalent of walking around with
<!-- RW: "network to hot spot" gave the impression that this happens when changing
to a wireless network, which is not necessary -->
a name tag affixed to your lapel. This current practice can significantly 
compromise your privacy, and something should change in order to mitigate
these privacy threats.
        </t>
<t>
There are several possible remedies, such as fixing a variety of
protocols or avoiding disclosing a hostname at all. This document describes
some of the protocols that reveal hostnames today and sketches
another possible remedy, which is to replace static hostnames by frequently
changing randomized values.
<!-- RW: if you write "this idea obviously needs more work" it begs the question
why you're not doing the work now -->
</t>
    </abstract>
</front>

<middle>
<section title="Introduction">
<t>
There is a long established practice of giving names to computers. In the Internet
protocols, these names are referred to as "hostnames" <xref target="RFC7719" /> . 
<!-- RW: RFC 7719 defines host name as a term. It does write it as "host name" and 
not "hostname" but acknowledges that "hostname" is indeed equivalent -->
Hostnames are normally used in conjunction with a domain name suffix to build the "Fully Qualified Domain 
Name" (FQDN) of a host. However, it is common practice to use the hostname without
further qualification in a variety
of applications from file sharing to network management. Hostnames are typically
published as part of domain names, and can be obtained through a variety of
name lookup and discovery protocols.
</t>
<t>
Hostnames have to be unique within the domain in which they are created and used. They 
do not have to be globally unique identifiers, but they will always be at 
least partial identifiers, as discussed in <xref target="PartialID" />.
</t>
<t>
The disclosure of information through hostnames creates a problem for
mobile devices. Adversaries that monitor a remote network such as a 
Wi-Fi hot spot can obtain the hostname through passive monitoring or active probing 
of a variety of Internet protocols, such as for example DHCP, or multicast DNS (mDNS).
They can correlate the hostname with various other information extracted
from traffic analysis and other information sources, and can potentially 
identify the device, device properties and its user <xref target="TRAC2016" />.
<!-- RW: That really depends on how you name your device. We have seen things like
"printer-third-floor" or "Henrys-ThinkpadC300" or "iphone-von-rolf" etc. etc.
so the language of the user is encoded, the device type, manufacturer and model
and many other things. I took the liberty to reference a paper here that got
just accepted and which mentions many of theses things. -->
</t>
</section>
<section title="Naming Practices">
<t>
There are many reasons to give names to computers. This is particularly true 
when computers operate on a network. Operating systems like Microsoft Windows or 
Unix assume that computers have a "hostname." This enables users and 
administrators to do things such as ping a computer, add its name to an access control list,
remotely mount a computer disk, or connect to the computer through tools such as
telnet or remote desktop. Other operating systems maintain multiple hostnames
for different purposes, e.g. for use with certain protocols such as mDNS.
<!-- RW: Mac OS X does this. It maintains a hostname, a computer name and a local
computer name. The latter seems to be used for bonjour services -->
</t>
<t>
In most consumer networks, naming is pretty much left to the fancy of the user. Some will
pick names of planets or stars, other names of fruits or flowers, and other 
will pick whatever suits their mood when they unwrap the device. As long as users
are careful to not pick a name already in use on the same network, anything goes.
Very often however, the operating system is suggesting a hostname at install time, which can
contain the user name, the login name and information learned from the device
itself such as the brand, model or maker of the device <xref target="TRAC2016" />.
</t>
<t>
In large organizations, collisions are more likely and a more structured approach 
is necessary. In theory, organizations could use multiple DNS subdomains to ease
the pressure on uniqueness, but in practice many don't and insist on unique flat
names, if only to simplify network management. To ensure
unique names, organizations will set naming guidelines and enforce some kind of structured 
naming. For example, within the Microsoft corporate network, computer names are
derived from the login name of the main user, leading to names like "huitema-test2"
for a machine that one of the authors used to test software.
</t>
<t>
There is less pressure to assign names to small devices, including for example
smart phones, as these devices typically do not enable sharing of their disks or
remote login. As a consequence, these devices often have manufacturer assigned
names, which vary from very generic like "Windows Phone" to completely unique 
like "BrandX-123456-7890-abcdef" and often contain the name of the device owner,
the device's brand name, and often also a hint as to which language the device
owner speaks <xref target="TRAC2016" />.
</t>
</section>
<section title="Partial Identifiers" anchor="PartialID" >
<t>
Suppose an adversary wants to track the people connecting to a specific Wi-Fi hot spot,
for example in a railroad station. Assume that the adversary is able to
retrieve the hostname used by a specific laptop. That, in itself, might
not be enough to identify the laptop's owner. Suppose however that the
adversary observes that the laptop name is "huitema-laptop" and that the
laptop has established a VPN connection to the Microsoft corporate network. The two
pieces of information, put together, firmly point to Christian Huitema, employed 
by Microsoft. The identification is successful.
</t>
<t>
In the example, we saw a login name inside the hostname, and that certainly helped 
identification. But generic names like "jupiter" or "rosebud" also provide
partial identification, especially if the adversary is capable of
maintaining a database recording, among other information, the hostnames 
of devices used by specific users. Generic names are picked from vocabularies that
include thousands of potential choices. Finding the name reduces the scope
of the search significantly. Other information such as the visited
sites will quickly complement that data and can lead to user identification.
</t>
<t>
Also the special circumstances of the network can play a role. 
Experiments on operational networks such as the IETF meeting network
have shown that with the help of external data such as the publicly available
IETF attendees list or other data sources such as LDAP servers on the network
<xref target="TRAC2016" />, the identification of the device owner
can become trivial given only partial identifiers in a hostname.
</t>
<t>
Unique names assigned by manufacturers do not directly encode a user identifier,
but they have the property of being stable and unique to the device in a large 
context. A unique name like "BrandX-123456-7890-abcdef" allows efficient tracking 
across multiple domains. In theory, this only allows tracking of the device but 
not of the user. However, an adversary could correlate the device to the user
through other means, for example the one-time capture of some clear text traffic.
Adversaries could then maintain databases linking unique host name to user 
identity. This will allow efficient tracking of both the user and the
device.

<!-- RW: I think this is not a good example. The unique name in this case only gives 
away the brand and is unique but that does not help in identification of the user
as was the example at the beginning. You can track the device but not identify it.
If uniqueness is the only concern here the
MAC address can be used as well, so this is not so special I'd say. 
This is not the same identification referred to in the
first part of this paragraph and uniqueness has to be guaranteed in the broadcast 
domain anyway. -->

<!-- CH: I reworked the text of the paragraph to acknowledge your objections. I 
understand that there are other identifiers, such as MAC addresses, but we are working
on these too. -->


</t>
</section>
<section title="Protocols that leak Hostnames" >
<t>
Many IETF protocols can leak the "hostname" of a computer. A non exhaustive
list includes DHCP, DNS address to name resolution, Multicast DNS,
Link-local Multicast Name Resolution, and 
DNS service discovery.
</t>

<section title="DHCP" >
<t>
Shortly after connecting to a new network, a host can use DHCP <xref target="RFC2131" />
to acquire an IPv4 address and other parameters <xref target="RFC2132" />. A DHCP
query can disclose the "hostname." DHCP traffic is sent to the broadcast address
<!-- RW: DHCPv6 uses multicast but DHCPv4 uses broadcast -->
and can be easily monitored, enabling adversaries to discover the hostname 
associated with a computer visiting a particular network. DHCPv6
<xref target="RFC3315" /> shares similar issues.
</t>
<t>
The problems with the hostname and FQDN parameters in DHCP are analyzed  
in <xref target = "RFC7819" /> and <xref target="RFC7824" />.
Possible mitigations are described in <xref target="RFC7844" />.
</t>
</section>

<section title="DNS Address to Name Resolution" >
<t>
The domain name service design 
<xref target = "RFC1035" /> includes the specification of the special domain
"in-addr.arpa" for resolving the name of the computer using a particular IPv4 
address, using the PTR format defined in  <xref target="RFC1033" />. A similar
domain, "ip6.arpa", is defined in <xref target="RFC3596" /> for finding the name of a 
computer using a specific IPv6 address.
</t>
<t>
Adversaries who observe a particular address in use on a specific network
can try to retrieve the PTR record associated with that address, and thus
the hostname of the computer, or even the fully qualified domain name of
that computer. The retrieval may not be useful in many IPv4 networks due
to the prevalence of NAT, but it could work in IPv6 networks. Other name lookup
mechanisms, such as <xref target="RFC4620" />, share similar issues.
</t>
</section>

<section title="Multicast DNS" >
<t>
Multicast DNS (mDNS) is defined in <xref target="RFC6762" />. It enables hosts to
send DNS queries over multicast, and to elicit responses from hosts
participating in the service.
</t>
<t>
If an adversary
suspects that a particular host is present on a network, the adversary can 
send mDNS requests to find, for example, the A or AAAA 
records associated with the hostname in the ".local" domain. A positive
reply will confirm the presence of the host.
</t>
<t>
When a new responder starts, it must send a set of
multicast queries to verify that the name that it advertises is unique on the network,
and also to populate the caches of other mDNS hosts.
Adversaries can monitor this traffic and discover the hostname of computers
as they join the monitored network.
</t>

<t>
mDNS further allows to send queries via unicast to port 5353. An adversary might
decide to use unicast instead of multicast in order to hide from e.g. intrusion
detection systems.
</t>
</section>

<section title="Link-local Multicast Name Resolution" >
<t>
Link-local Multicast Name Resolution (LLMNR) is defined in <xref target="RFC4795" />. 
The specification did not achieve consensus as an IETF standard, but it is widely 
deployed. Like mDNS, it enables hosts to send DNS queries over multicast,
and to elicit responses from computers implementing the LLMNR service.
</t>
<t>
Like mDNS, LLMNR can be used by adversaries to confirm the presence of a specific 
host on a network, by issuing a multicast request to find
the A or AAAA records associated with the hostname in the ".local" domain. 
</t>
<t>
When an LLMNR responder starts, it sends a set of
multicast queries to verify that the name that it advertises is unique on the network.
Adversaries can monitor this traffic and discover the hostname of computers
as they join the monitored network.
</t>
</section>

<section title="DNS-Based Service Discovery" >
<t>
DNS-Based Service Discovery (DNS-SD) is described in <xref target="RFC6763" />. It enables
participating hosts to retrieve the location of services proposed by other hosts. It can
be used with DNS servers, or in conjunction with mDNS in a server-less environment.
</t>
<t>
Participating hosts publish a service described by an "instance name," typically 
chosen by the user responsible for the publication. While this is obviously
an active disclosure of information, privacy aspects can be mitigated by user control. 
Services should only be published when deciding to do so, and the information disclosed in
the service name should be well under the control of the device's owner. 
</t>
<t>
In theory there should not be any privacy issue, but in practice the publication of
a service also forces the publication of the hostname, due to a chain of dependencies.
The service name is used to publish a PTR record announcing the service. The PTR record 
typically points to the service name in the local domain. The service names, in turn,
are used to publish TXT records describing service parameters, and SRV records 
describing the service location.
</t>
<t>
SRV records are described in <xref target="RFC2782" />. Each record contains 4 parameters:
priority, weight, port number and hostname. While the service name published in the PTR
record is chosen by the user, the "hostname" in the SRV record is indeed the hostname
of the device.
</t>
<t>
Adversaries can monitor the mDNS traffic associated with DNS-SD and retrieve the hostname 
of computers advertising any service with DNS-SD.
</t>
</section>

<section title="NetBIOS-over-TCP">
<t>
Amongst other things, NetBIOS-over-TCP (<xref target="RFC1002" />) implements a 
name registration and resolution mechanism called the NetBIOS Name Service. 
In practice, NetBIOS resource names are often based on hostnames.  
</t>
<t>NetBIOS allows an application to register resource names and to resolve such
names to IP addresses. In environments without an NetBIOS Name Server, the 
protocol makes extensive use of broadcasts from which resource names can 
be easily extracted. NetBIOS also allows querying for the names registered
by a node directly (node status).
</t>
</section>

</section>

<section title="Randomized Hostnames as Remedy" >
<t>
There are several ways to remedy the hostname practices. We could instruct 
people to just turn off any protocol that leaks hostnames, at least when
they visit some "insecure" place. We could also examine each particular
standard that publishes hostnames, and somehow fix the corresponding 
protocols. Or, we could attempt to revise the way devices manage
the hostname parameter.
</t>
<t>
There is a lot of merit in "turning off unneeded protocols when visiting
insecure places." This amounts to attack surface reduction, and is
clearly beneficial -- this is an advantage of
the stealth mode defined in <xref target="RFC7288" />.
However, there are two issues with this advice. First, it
relies on recognizing which networks are secure or insecure. This is hard to
automate, but relying on end-user judgment may not always provide good 
results. Second, some protocols such as DHCP cannot be turned off 
without losing connectivity, which limits the value of this option. Also, the
services that rely on protocols that leak hostnames such as mDNS will not
be available when switched off. In addition, not always are hostname-leaking 
protocols well-known as they might be proprietary and come with an installed 
application instead of being provided by the operating system.
<!-- RW: just to be fair, the above is the case, but has not been mentioned -->
</t>
<t>
It may be possible in many cases to examine a protocol and
prevent it from leaking hostnames. This is for example what is attempted for
DHCP in <xref target="RFC7844" />. However, 
it is unclear that we can identify, revisit and fix all the protocols
that publish hostnames. In particular, this is impossible for proprietary
protocols.
<!-- RW: we have observed multiple protocols that are not based on standards
that already leak privacy-sensitive data today. It appeards likely that those
will become more over time -->
</t>
<t>
We may be able to mitigate most of the effects of hostname leakage by
revisiting the way platforms handle hostnames. This is in a way similar to 
the approach of MAC address randomization described in 
<xref target="RFC7844" />. Let's assume that the
operating system, at the time of connecting to a new network, picks
a random hostname and starts publicizing that random name in protocols such as DHCP or mDNS,
instead of the static value. This will
render monitoring and identification of users by adversaries much more difficult, without preventing protocols
such as DNS-SD from operating as expected. This has of course implications on 
the applications making use of such protocols e.g. when the hostname is being
displayed to users of the application. They will not as easily be able to
identify e.g. network shares or services based on the hostname carried in the 
underlying protocols. Also, the generation of new hostnames should be synchronized
with the change of other tokens used in network protocols such as the MAC or IP
address to prevent correlation of this information. E.g. if the IP address 
changes but the hostname stays the same, the new IP address can be correlated
to belong to the same device based on a leaked hostname. 
<!-- RW: I think the above considerations are important to be mentioned here -->
<!-- CH: I agree. Reading that, there are two issues, randomizing and timing of randomization. We could develop it. -->
</t>
<t>
Some operating systems, including Windows, support "per network" hostnames, but some other
operating systems only support "global" hostnames. In that case, changing the hostname
may be difficult if the host is multi-homed, as the same name will be used
on several networks. Other operating systems already use potentially different
hostnames for different purposes, which might be a good model to combine both
static hostnames and randomized hostnames based on their potential use and threat to 
a user's privacy.
Obviously, further studies are required before the idea of randomized hostnames
can be implemented.
</t>
</section>

<section title="Security Considerations">
<t> 
This draft does not introduce any new protocol. It does point to
potential privacy issues in a set of existing protocols.
</t>
</section>

<section title="IANA Considerations" anchor="iana">
<t> 
This draft does not require any IANA action.
</t> 
</section>

<section title="Acknowledgments">
    <t>
Thanks to the members of the INTAREA Working Group for discussions and reviews.
    </t>
</section>
</middle>

<back>

<references title="Informative References">

       &rfc1002;
       &rfc1033;
       &rfc1035;
       &rfc2131;
       &rfc2132;
       &rfc2782;
       &rfc3315;
       &rfc3596;
       &rfc4620; 
       &rfc4795;
       &rfc6762;
       &rfc6763;
       &rfc7288;
       &rfc7719;
       &rfc7819;
       &rfc7824;
       &rfc7844;

       <reference anchor="TRAC2016">
         <front>
		   <title>How Broadcast Data Reveals Your Identity and Social Graph</title>
		   <author surname="Faath" fullname="Michael Faath" initials="M." />
		   <author surname="Weisshaar" fullname="Fabian Weisshaar" initials="F." />
		   <author surname="Winter" fullname="Rolf Winter" initials="R." />
		   <date month="September" year="2016" />
		 </front>
		 <seriesInfo name="7th International Workshop on TRaffic Analysis and Characterization" value="IEEE TRAC 2016"/>
	   </reference>

</references>  

</back>
</rfc>
