INTERNET-DRAFT N. Popp December 23, 1999 RealNames Inc. Expires May 23, 2000 M. Mealling draft-ietf-cnrp-00.txt Network Solutions, Inc. M. Moseley Netword, Inc. CNRP PROTOCOL SPECIFICATION 1. Status of this memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Please send comments on this draft to CNRP-IETF@LISTS.INTERNIC.NET. 2. Abstract People often refer to things in the real world by a common name or phrase, e.g., a trade name, company name, or a book title. These names are sometimes easier for people to remember and type than URLs. Furthermore, because of the limited syntax of URLs, companies and individuals are finding that the ones that might be most reasonable for their resources are being used elsewhere and so are unavailable. Services are arising that offer a mapping from common names to Internet resources (e.g., as identified by a URI). These services often resolve common name categories such as company names, trade names, or common keywords. Thus, such a resolution service may operate in one or a small number of categories or domains, or may expect the client to limit the resolution scope to a limited number of categories or domains. For example, the phrase "Internet Engineering Task Force" is a common name in the "organization" category, as is "Moby Dick" in the book category. Two classes of clients of such services are being built, browser improvements and web accessible front-end services. Browser enhancements modify the "open" or "address" field of a browser so that a common name can be entered instead of a URL. Internet search sites integrate common name resolution services as a complement to search. In both cases, these may be clients of back-end resolution services. In the browser case, the browser must talk to a service that will resolve the common name. The search sites are accessed via a browser. In some cases, the search site may also be the back- end resolution service, but in others, the search site is a front-end to a collection of back-end services. This effort is about the creation of a protocol for client applications to communicate with common name resolution services, as exemplified in both the browser enhancement and search site paradigms. Although the protocol's primary function is resolution, it is intended to address the issues of internationalization and privacy as well. Name resolution services are not generic search services and thus do not need to provide complex Boolean query, relevance ranking or similar capabilities. The protocol is a simple, minimal interoperable core. Mechanisms for extension are provided, so that additional capabilities can be added. Several other issues, while of importance to the deployment of common name resolution services, are outside of the resolution protocol itself and are not in the initial scope of the proposed effort. These include discovery and selection of resolution service providers, administration of resolution services, name registration, name ownership, and methods for creating, identifying or insuring unique common names. 3. Introduction For the purposes of this document, a "common name" is a word or a phrase, without imposed syntactic structure, that may be associated with a resource. These common names will be used primarily by humans, as opposed to machine agents. A common name "resolution service" handles these associations between common names and data (resources, information about resources, pointers to locations, etc). A single common name may be associated with different data records, and more than one resolution service is expected to exist. Any common name may be used in any resolution service. Common names are not URIs (Uniform Resource Identifiers) in that they lack the syntactic structure imposed by URIs; furthermore, unlike URNs, there is no requirement of uniqueness or persistence of the association between a common name and a resource. (Note: common names may be expressed in a URI, the syntax for which is described herein.) This document will define a protocol for the parameterized resolution necessary to make common names useful. "Resolution" is defined as the retrieval of data associated (a priori) with descriptors that match the input request. "Parameterized" means the ability to have a multi-component descriptor. Descriptors are not required to provide unique identification, therefore 0 or more records may be returned to meet a specific input query. 4. Basic object model The protocol will consist of a simple request /response mechanism. There will be two types of queries. 1. A `special' initial query that establishes the schema for a particular CNRP database and communicates that to the client. The CNRP client will send this query, and in turn receive an XML document defining the query parameters that the database supports. 2. A `standard' query, which is the submission of the common name along with parameters. search string to the database. The query will conform to the previously established schema. There will be a set of query parameters, listed below, treated as hints by the server. Note: a CNRP database will accept any correctly encoded CNRP query parameter; the extent to which a query result is responsive to those parameters is a service differentiator. The base properties that are always supported are common name, language, geography, category, and range (start and length of the result set). CNRP allows database service providers to create unique data types and surface them to any CNRP client via the CNRP schema XML documents. 4.1 Hints A hint is an assertion by the user about him or her self and the context in which he/she is operating. There is no data type `hint'; a hint is expressed within the structure of the query itself and is limited or enabled by the richness of the defined query namespace. In effect, a query and any parameter within it is a hint. An example of this would be the required parameter , in which a query might be created that specifies the primary language in which you want to see results, the secondary language, and so on. So seeing results in US English followed by European French and South American Spanish would be: ) The fact that a hint exists does not mean that a CNRP database must respond to it. This best-effort approach is similar to relevance ranking in a search engine (high precision, low recall); hints are similar to a search engine's selection criteria. CNRP services will attempt to return the results "closest" to the selection criteria. This is quite different from a SQL database approach where a SQL query returns the entire results set and each result in the set must match all the requirements expressed by the qualifier (the SQL WHERE clause). 4.2 Transport independence This document defines CNRP in terms of an object model, the encoding scheme used to express it (XML documents), and a response/request interaction model. Therefore CNRP is transport-independent. It is expected that the primary transport used for CNRP will be HTTP, but that is certainly not a requirement. Most aspects of authentication and security are a requirement of the transport and not of CNRP. The protocol does not, in and of itself, support any authentication and security. Discovery of the transport associated with a CNRP database is accomplished through DNS. The syntax for a CNRP URI is: CNRP:<[host]>:<[port]>/path/; paraname=value,paraname=value,... "CNRP", in conjunction, with the URI content, denotes a DNS entry containing a Naming Authority Pointer (NAPTR). The NAPTR specifies how a CNRP URI is dynamically rewritten by the client to adhere to some transport (HTTP, GOPHER, etc.) Because this rewrite can be a URL, a CNRP URI can thus be cached and assigned a time-to-live (TTL). 5 Object Model: 5.1 Properties: 5.1.1 Base properties In CNRP, objects are property lists. A property has a unique name and type. Some properties can be part of the query or the results list or both. For simplicity, CNRP is limiting property values to string values. CNRP introduces a set of base properties. Among these properties, CNRP distinguishes between core properties and optional properties. Core properties are the minimal set of properties that all CNRP services MUST support. The core properties define the level of interoperability between CNRP services. The proposed core properties are: 1. CommonName: the common name associated with a resource. 2. ID: an opaque string that serves as a unique identifier (typically a database ID) 3. URI: An URI as define by RFC-2026. In addition to core properties, CNRP introduces optional properties to enable a wider range of CNRP based services. Although, these properties are not required, it is expected that many services, especially large one, will implement them. An equally important goal for introducing additional properties is to provide a powerful results filtering mechanism. This is a requirement for large namespaces that contain several million of names. The optional properties are: 1. Language: The language of a resource associated with a resource. 2. Geography: The geographical region or location associated with a resource. 3. Category: The category associated with a resource. 4. Description: A short text abstract associated with a resource. 5. Range: The range is a results set control parameter. The range property is used to specify the starting point and the length of a results set (e.g. I want 5 records starting at the 10th record) The language property is expressed using language values as defined by ISO-3166. 5.1.2 Multi type properties The "geography" and "category" properties introduced in the CNRP model can be expressed using many different value sets. For example, geography can be specified in terms of a country code, a postal code or in terms of spatial coordinates. Therefore, for such properties, CNRP introduces a "type" attribute. To facilitate interoperability, CNRP defines the main primitive types as well. Property types can be extended by a specific service through the definition of new type values (see extensibility section). The multi-type properties and the main types are defined below: 5.1.2.1 Geography: 1. type = "free form" value = a free form expression for a geographical location (e.g. "palo alto in california"). 2. type = "ISO3166-1" value = a geographical region expressed using a standard country code as defined by ISO3166-1 (e.g. "US"). 3. type = "ISO3166-2" value = a geographical region expressed using a standard region and country codes as defined by ISO3166-2 (e.g. "US-CA").type = "latitude- longitude-elevation " value = the latitude, longitude and elevation of a geographical location. 4. type = "GPS" value = a geographical location expressed using the standard GPS coordinates system (e.g. ???) 5. type = "LLE" value = a geographical location expressed using the Latitude-Longitude-Elevation coordinates system (e.g. ???) 5.1.2.2 Category 1. type = "free form" value = a free form expression for a category (e.g. "movies"). 2. type = "NAICS" value = The North American Industry Code System. When the "type" is unspecified, the value defaults to "free form". The free form type is important because it allows very simple user interface where the user can enter a value in a text field. It is up to the serviced to interpret the value correctly and take advantage of it to increase the relevance of results (using specialized dictionaries for instance). 5.1.2.3 Common name - String encoding and equivalence rules CNRP specifies that common name strings should be encoded using UTF-8. CNRP does not specify any string equivalence rules for matching a common name in the query against a common name of a Resource. String equivalence rules are language and service dependant. They are specific to relevance ranking algorithms, hence treated as CNRP services. Consequently, string equivalence rules are not part of the CNRP protocol specification. For example, the query member: bmw Should be read as a selection criterion for a resource with a common name LIKE (similar to) the string "bmw" where the exact definition of the LIKE operator is intuitive, yet specific to the queried CNRP service. 5.2 Objects: 5.2.1 Query: The Query object encapsulates all the query parameters such as CommonName, ID, language, geography, category, and range. A Query cannot be empty. A Query must contain either a common name, or an ID. A Query can also contain the custom properties defined by a specific CNRP service. For example, a query for the first 5 resources whose common name is like "bmw" would be expressed as: bmw 5.2.1.1 Logical operations within a Query The Query syntax is extremely simple. CNRP does not extensively support Boolean logic operator such as OR, AND or NOT. However, there exist two implicit logical operations that can be expressed through the Query object and its properties. First, a query with multiple property-value pairs implicitly expresses an AND operation on the query terms. For instance, the CNRP query to request all the resources whose common name is like "bmw", AND whose language is "German" can be expressed as: bmw de-DE Note however, that because the server is only trying to best match the Query criteria, there is no guarantee that all or any of the resources in the results match both requirements. In addition, for enumerated value types only (e.g. language), CNRP allows the client to express a logical OR by specifying multiple values for the same property within the Query. For example, the logical expression: property = value1 OR property = value2 .OR property = valueN. Will be expressed as: value1 value2. valueN It is important to emphasize that this form is only applicable to enumerated types. In particular, logical OR operations on the common name are not supported. Note that the ordering or the property-value pairs in the query implies a precedence. As a consequence, CNRP also introduces one special string value: "*". Not surprisingly, "*" means all admissible values for the typed property. For example, the following query requests all the resources whose common name is like BMW and whose language is preferably in German or French or any other language. bmw de-DE fr-FR * 5.2.1.2 Direct and indirect property values in the Query: An important concept is that in a query, a CNRP property can carry two different types of values: an indirect and a direct value. The indirect value corresponds to an implicit configuration- like setting. Indirect values will typically be derived from a user's global preferences, general client settings or any other source of user profile information. For example, a language preference set to French for browsing would translate into: fr-FR Another important source for indirect values is the physical "entry point" for that specific query, that is, the context surrounding the environment where the query has been entered. For example, if a CNRP query has been entered on a German Web site, there is a strong (yet implicit) assumption that the user is predominantly looking for resources in the German language. The direct value on the other hand, corresponds to the local value specified by the CNRP user for a specific query (In this CNRP query, I am looking for Web pages with the common name "fettuccini" that are in the Italian language.although my default language is set to French.). Distinguishing between indirect and direct query parameters is extremely important for a CNRP server in order to generate the most relevant list of results. To differentiate a direct value from an indirect value, CNRP uses a "source" property attribute: source = "indirect" | "direct" Because, they have different precedence, it is important for a CNRP resolution service to be able to distinguish between the two types of information sources. When unspecified, the source is assumed to be "direct" (default value of the source attribute). For instance, in a wireless device, the device location (geography) can be determined accurately. The wireless CNRP client may decide to pass the device current location as an indirect specification of the desired geography for the resource. For example, if the user is looking for resources with common name "Star Bucks", the device could pass the following information to the client: 345,455,3555 At the same time, the same user intending to travel to Montreal may want to explicitly specify in the query that he is actually looking for a Star Bucks in Montreal. In such case the query would contain the following information: Montreal 345,455,3555 5.3 Results: The results object is a container for CNRP results. The type of objects contained in Results can be: Resource, Service, Error, Referral and Schema. 5.3.1 Resource A Resource object describes a resource (e.g. a Web page, a person, an object identified by a URI). The Resource object can contain the commonname, URI, ID, description, language, geography, and category of the resource. A Resource can also be augmented using custom properties. Lastly, a Resource can also aggregate a service object. bmw de-DE foo.com:234364 http://www.bmw.de de car companies Wunderbar BMWs! 5.3.2 Service The Service object provides an encapsulation of an instance of a CNRP service. A service is uniquely identified through the ServiceURI property. In order to support services relying on a network of distributed servers, CNRP introduces the serverURI property. The ServerURI embodies a protocol, a server name and a port number for accessing the service. Services can also include a description, a brief textual description of the service. http://cnrp.us.foo.com:8081 http://cnrp.foo.com foo.com is a CNRP service specialized on cocktail recipes The service object can also be extended by including existing properties to further describe the service. For instance, a service that focuses on French companies could be expressed as: http://cnrp.us.foo.com:8081 http://cnrp.foo.com companies FR The service objects also encapsulate a list of server objects. The server object is used to describe a CNRP server (or a cluster of servers). A server is identified through its serverURI. A server can be further described using existing properties. For example, the following example defines two clusters of CNRP servers one in the US and one in France. http://cnrp.us.foo.com:8081 http://cnrp.foo.com http://router.us.widgetco .com:4321/foo? US http://router.fr. acmecorp.com:4321/foo? FR 5.3.3 Error An Error object indicates an error in the results set. The error object encapsulates two properties: an error number and an error description. 345 The CNRP foo.com database is temporarily unreachable 5.3.4 Referral A Referral object in the results set is a place holder for un-fetched results from a different service. Referrals typically occur when a CNRP server knows of another service capable of providing relevant results for the query and wants to notify the client about this possibility. The client can decide whether it wants to follow the referral and resolve the extra results by contacting the referred-to service using the information contained within the Referral object (a Service object). The Referral is a simple mechanism to enable hierarchical resolution as well as to join multiple resolution services together. bmw de-DE foo.com:234364 http://www.bmw.de/ http://cnrp.bar.com/ http://resolver.bar.com/ 5.3.5 ServiceQuery & schema: A subclass of Query, the ServiceQuery object supports the dynamic discovery of a specific CNRP service's characteristics. The response to a ServiceQuery returns the Service object described in section 5.3.2. As explained before, the service object describes the CNRP service. Furthermore, a Service object describes: 1. The new Properties introduced by the CNRP service (Property schema), 2. The properties used to describe the Service object (Service schema) 3. The properties that belong to the query interface (Query schema) 4. The properties that belong to a resource within the results (Resource schema). These lead to the following new objects definitions: * PropertySchema -- A property schema describes all the custom properties introduced by the service. * PropertyDefinition -- A property definition describes a custom property. A property definition has a name and a type (the name and the type of the property). * ProperyReference -- A property reference is a reference to a property definition so that it can be included within a given schema (a service, query or resource schema). * ServiceSchema -- The service schema defines the properties used to describe the service. * QuerySchema -- A query schema describes the structure of a query handled by the CNRP service. * ResourceSchema -- A resource schema describes the resource returned as a result by the CNRP service. For example, a CNRP query to discover a service's capabilities will be in the form: And for a CNRP service for cocktail recipes in French, the corresponding response would be: cocktailrecipe freeForm< propertyType > language < propertyReference required="no" idref="1"/> 6 XML DTD for CNRP 6.1 Examples 6.2 Service Description Request This is what the client sends when it is requesting a servers schema. This is the result. Notice how the Service tag is used to allow the service to describe itself in its own terms. urn:foo:bar http://host1.acmecorp.com:4321/foo? 1 smtp://host2.acmecorp.com:4321/foo? 1 This is the AcmeCorp CNRP Service 544554 http://adserver.acmecorp.com/ workgroupID freeform domainname BannerAdServer URI 6.3 Sending A Query and Getting A Response This is the query that is sent from the client to the server: Fido CA-QC CA fr-CA This is the result set. It is sent back in response to the query. This result set includes a referral and a non-fatal error. http://acmecorp.com http://serverfarm.acmecorp.com http://servers.acmecorp.co.uk Fidonet 1333459455 http://www.fidonet.ca This is ye olde Canadian Fidonet Fidonet 1333459455 http://host:port/bla 6.4 Examples to be done: 6.4.1 Complex Result 6.4.2 No Results 6.4.3 Error Conditions 7 Transport Two CNRP transport protocols are specified. HTTP is used due to its popularity and ease of integration with other web applications. SMTP is also used as a way to illustrate a protocol that has a much different range of latency than most protocols. 7.1.1 HTTP transport The HTTP transport is fairly simple. The client connects to an HTTP based CNRP server and issues the POST method with the Content-type and Accept header set to "application/xml". The content of the POST body is the CNRP xml document that is being sent. The results are sent back to the client with a Content-Type of "application/xml". The body of the result is the CNRP xml document being sent to the client. 7.1.2 SMTP transport The SMTP transport is very similar to the HTTP transport. Since there is no method to specify, the CNRP xml document is simply sent to a particular SMTP endpoint with its Content-Type set to "application/xml". The server responds by sending a response to the originator of the request with the results in the body and the Content-Type set to "application/xml". 8 Security Considerations This is where we talk about the various security threats. Two that need to be addressed are Man in the Middle attacks and possing as a service by spoofing a Service object. The proposed solution for man in the middle attacks is to utilize transport level authentication and encryption where available. In the case where the transport can't provide the level of required authentication, individual entries or the entire response can be signed/encrypted. In the case of where a service attempts to pose as another by spoofing the serviceURI in the Service object, the Service object should be signed. A client can then verify the Service object's veracity by verifying the signature. How the client obtains that authoritative public key is out of scope since it depends on the service discovery problem. 8.1 XXX To be done: add additional threat scenarios 9 IANA Considerations The major consideration for the IANA is that the IANA will be registering well known properties and property types. It will not register values. Since this document does not discuss CNRP service discovery, the IANA will not be registering the existence of servers or Server objects. There are two types of entities the IANA can register: parameters and parameter types. If a parameter or type are not registered with the IANA then they must start with "x-". The required information for the registration of a new parameter is the parameter's name, its default type, and a general description. A new type requires the type's name, what parameters it is valid for, and a description. See Appendix A for some example parameter and type registrations. 10 Appendix A: Well Known Parameter and Type Registration Templates Parameter Name: Geography Default Type: ISO3166-1 Description: A geographic location Paramater Name: Language Default Type: RFCXXXX Description: A language specification Parameter Name: Category Default Type: freeform Description: A node in some system of semantic relationshipsthat that is considered relevant to the common-name. Parameter Name: Range Default Type: range Description: A range given in the format "x,y" where x is the starting point and y is the length. This parameter is used by the client to tell the server that is is requesting a subrange of the results. Types: Type: freeform Parameter: ALL Description: The value is to be interpreted by the server the best way it knows how. This value has no defined structure. Type: ISO3166-2 Parameter: Geography Description: The combination of country and subregion codes. Type: ISO3166-1 Parameter: Geography Description: Country Codes Type: POSTALCODE Parameter: Geography Description: A postal code that is valid for some region. A good example is the Zip code system used in the US. Type: GPS Parameter: Geography Description: A code in the format used by the Global Positioning System Type: ISO636 Parameter: Language Description: language codes Type: NAICS Parameter: Category Description: North American Industry Code System 11 Author contact information Nico Popp RealNames Corporation 2 Circle Star Way, 2nd Floor San Carlos, CA 94070-1350 Phone: (650) 298 8080 Email: nico@realnames.com Michael Mealling Network Solutions, Inc. 505 Huntmar Park Drive Herndon, VA 22070 Phone: (703) 742-0400 EMail: michaelm@netsol.com Marshall Moseley Netword, Inc. 702 Russell Avenue Gaithersburg, MD 20877-2606 Phone: (240) 631-1100 EMail: marshall@netword.com This Internet Draft expires on May 23, 2000