Network Working Group M. Nottingham
Internet-Draft E. Hammer-Lahav
Intended status: Informational October 16, 2008
Expires: April 19, 2009
draft-nottingham-site-meta-00
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on April 19, 2009.
Abstract
This memo describes a method for locating site-wide metadata for Web
sites.
Nottingham & Hammer-Lahav Expires April 19, 2009 [Page 1]
Internet-Draft Site-Wide Metadata for the Web October 2008
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Notational Conventions . . . . . . . . . . . . . . . . . . . . 3
3. the site-meta File Format . . . . . . . . . . . . . . . . . . 3
3.1. Site Metadata Entries . . . . . . . . . . . . . . . . . . 4
4. Discovering site-meta Files . . . . . . . . . . . . . . . . . 4
5. Security Considerations . . . . . . . . . . . . . . . . . . . 5
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5
6.1. application/site-meta+xml media type registration . . . . 5
7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 6
7.1. Normative References . . . . . . . . . . . . . . . . . . . 6
7.2. Informative References . . . . . . . . . . . . . . . . . . 6
Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 7
Appendix B. Frequently Asked Questions . . . . . . . . . . . . . 7
B.1. Is this mechanism appropriate for all kinds of
metadata? . . . . . . . . . . . . . . . . . . . . . . . . 7
B.2. Why not use OPTIONS * with content negotiation to
discover different types of metadata directly? . . . . . . 7
B.3. Why not use a META tag or microformat in the root
resource? . . . . . . . . . . . . . . . . . . . . . . . . 7
B.4. Why not use response headers on the root resource, and
have clients use HEAD? . . . . . . . . . . . . . . . . . . 7
B.5. Why scope metadata to be site-wide? . . . . . . . . . . . 8
B.6. Why /site-meta? . . . . . . . . . . . . . . . . . . . . . 8
B.7. Aren't you concerned about pre-empting an authority's
URI namespace? . . . . . . . . . . . . . . . . . . . . . . 8
B.8. Why use link relations instead of media types to
identify kinds of metadata? . . . . . . . . . . . . . . . 8
B.9. What impact does this have on existing mechanisms,
such as P3P and robots.txt? . . . . . . . . . . . . . . . 8
B.10. Why not (insert existing similar mechanism here)? . . . . 8
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 8
Intellectual Property and Copyright Statements . . . . . . . . . . 10
Nottingham & Hammer-Lahav Expires April 19, 2009 [Page 2]
Internet-Draft Site-Wide Metadata for the Web October 2008
1. Introduction
It is increasingly common for Web-based protocols to require the
discovery of policy or metadata about a site before communicating
with it. For example, the Robots Exclusion Protocol specifies a way
for automated processes to obtain permission to access resources;
likewise, the Platform for Privacy Preferences [W3C.REC-P3P-20020416]
tells user-agents how to discover privacy policy beforehand.
While there are several ways to access per-resource metadata (e.g.,
HTTP headers, WebDAV's PROPFIND [RFC4918]), the overhead associated
with them often precludes their use in these scenarios.
When this happens, it is common to designate a "well-known location"
for site metadata, so that it can be easily located. However, this
approach has the drawback of risking collisions, both with other such
designated "well-known locations" and with pre-existing resources.
To address this, this memo proposes a single (and hopefully last)
"well-known location", /site-meta, which acts as a directory to the
interesting metadata about that site. Future mechanisms that require
site-wide metadata can easily include an entry in the site-meta
directory, thereby making their metadata cheaply available (indeed,
because the site directory can be cached, the more mechanisms that
use it, the more efficient it becomes) without impinging on sites'
URI space.
The directory format allows different types of site metadata to be
referenced by URI or included inline.
Please discuss this draft on the www-talk@w3.org [1] mailing list.
2. Notational Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
3. the site-meta File Format
The site-meta file format is an extremely simple XML-based language
[W3C.REC-xml] that allows an authority (in the URI sense) to indicate
where metadata about its resources is located.
The root element is the "metadata" element, which may contain any
number of "meta" elements.
Nottingham & Hammer-Lahav Expires April 19, 2009 [Page 3]
Internet-Draft Site-Wide Metadata for the Web October 2008
foo = bar
baz = bat
Unrecognised elements and attributes SHOULD be silently ignored when
parsing the format, unless specified otherwise. Likewise, unless
otherwise specified ordering of sibling elements SHOULD be ignored.
3.1. Site Metadata Entries
Each "meta" element represents a kind of site metadata that is
available. It MUST have a "rel" attribute containing a link relation
[ref TBD]. It SHOULD have a "type" attribute whose content MUST be
an internet media type [RFC4288], hinting its format.
The actual metadata content may be inlined as the content of the
"meta" element, and/or referred to using the "href" attribute. The
metadata MUST be made available by at least one of these methods.
If the "href" attribute is present, its content MUST be a URI-
Reference [RFC3986] that locates the metadata. Relative URIs MUST be
evaluated with the site root URI as the base URI.
If the metadata content is included inline, it MUST appear as a child
of the "meta" element. If the metadata format is XML-based, the root
element of the metadata will thus be the first (and only) child
element of the "meta" element. If the metadata format is textual, it
will be the text content of the "meta" element (appropriately
escaped, with CDATA section(s) and/or entities).
the "meta" element MUST NOT contain any children other than inlined
metadata content.
4. Discovering site-meta Files
The site-wide metadata for a given authority can be discovered by
Nottingham & Hammer-Lahav Expires April 19, 2009 [Page 4]
Internet-Draft Site-Wide Metadata for the Web October 2008
dereferencing the path /site-meta. For example, in HTTP the
following request would obtain site metadata for the authority
"www.example.com";
GET /site-meta HTTP/1.1
Host: www.example.com
If the resource is not available or existent (in HTTP, the 404 or 410
status code), the client SHOULD infer that site metadata is not
available via this mechanism. If a representation is successfully
obtained, but is not in the format described above, clients SHOULD
infer that the site is using this URI for other purposes, and not
process it as a site-meta file.
To aid in this process, sites using this mechanism SHOULD correctly
label site-meta responses with the "application/site-meta+xml"
internet media type.
5. Security Considerations
6. IANA Considerations
6.1. application/site-meta+xml media type registration
The site-meta format, when serialized as XML 1.0, can be identified
with the following media type:
MIME media type name: application
MIME subtype name: site-meta+xml
Mandatory parameters: None.
Optional parameters:
"charset": This parameter has identical semantics to the charset
parameter of the "application/xml" media type as specified in
RFC 3023 [RFC3023]. [RFC3023].
Encoding considerations: Identical to those of "application/xml" as
described in RFC 3023 [RFC3023], section 3.2.
Security considerations: As defined in this specification. [[update
upon publication]]
In addition, as this media type uses the "+xml" convention, it
shares the same security considerations as described in RFC 3023
[RFC3023], section 10.
Interoperability considerations: There are no known interoperability
issues.
Nottingham & Hammer-Lahav Expires April 19, 2009 [Page 5]
Internet-Draft Site-Wide Metadata for the Web October 2008
Published specification: This specification. [[update upon
publication]]
Applications which use this media type: No known applications
currently use this media type.
Additional information:
Magic number(s): As specified for "application/xml" in RFC 3023
[RFC3023], section 3.2.
File extension: None
Fragment identifiers: As specified for "application/xml" in RFC 3023
[RFC3023], section 5.
Base URI: As specified in RFC 3023 [RFC3023], section 6.
Macintosh File Type code: TEXT
Person and email address to contact for further information: Mark
Nottingham
Intended usage: COMMON
Author/Change controller: This specification's author(s). [[update
upon publication]]
7. References
7.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3023] Murata, M., St. Laurent, S., and D. Kohn, "XML Media
Types", RFC 3023, January 2001.
[RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
Resource Identifier (URI): Generic Syntax", STD 66,
RFC 3986, January 2005.
[RFC4288] Freed, N. and J. Klensin, "Media Type Specifications and
Registration Procedures", BCP 13, RFC 4288, December 2005.
[W3C.REC-xml]
Bray, T., Paoli, J., Sperberg-McQueen, C., and E. Maler,
"Extensible Markup Language (XML) 1.0 (2nd ed)", W3C REC-
xml, October 2000, .
7.2. Informative References
[RFC4918] Dusseault, L., "HTTP Extensions for Web Distributed
Authoring and Versioning (WebDAV)", RFC 4918, June 2007.
Nottingham & Hammer-Lahav Expires April 19, 2009 [Page 6]
Internet-Draft Site-Wide Metadata for the Web October 2008
[W3C.REC-P3P-20020416]
Marchiori, M., "The Platform for Privacy Preferences 1.0
(P3P1.0) Specification", W3C REC REC-P3P-20020416,
April 2002.
URIs
[1]
Appendix A. Acknowledgements
The authors take all responsibility for errors and omissions.
Appendix B. Frequently Asked Questions
B.1. Is this mechanism appropriate for all kinds of metadata?
No. The primary use cases are described in the introduction; when
it's necessary to discover metadata or policy before a resource is
accessed, and/or it's necessary to describe metadata for a whole site
(or large portions of it), site-meta is appropriate. In other cases
(e.g., fine-grained metadata that doesn't need to be known ahead of
time), other mechanisms are more appropriate.
B.2. Why not use OPTIONS * with content negotiation to discover
different types of metadata directly?
Two reasons; a) OPTIONS is not cacheable -- a severe problem for
scaling -- and b) it is not well-supported in browsers, and difficult
to configure in servers.
B.3. Why not use a META tag or microformat in the root resource?
This places constraints on the format of a site's root resource to be
HTML or similar. While extremely common, it isn't universal (e.g.,
mobile sites, machine-to-machine communication, etc.). Also, some
root resources are very large, which would place additional overhead
on clients and intervening networks.
B.4. Why not use response headers on the root resource, and have
clients use HEAD?
This is attractive, in that you could either put metadata directly in
response headers, or you could refer to a resource in a similar
manner to site-meta. However, it requires an extra round-trip for
metadata discovery, which is unacceptable in some scenarios.
Nottingham & Hammer-Lahav Expires April 19, 2009 [Page 7]
Internet-Draft Site-Wide Metadata for the Web October 2008
B.5. Why scope metadata to be site-wide?
The alternative is to allow scoping to be dynamic and determined
locally, but this has its own issues, which usually come down to a)
an unreasonable number of requests to determine authoritative
metadata, b) increased complexity, with a higher likelihood of
implementation and interoperability (or even security) problems.
Besides, many mechanisms on the Web already presume a site scope
(e.g., robots.txt, P3P, cookies, javascript security), and the effort
and cost required to mint a new URI authority is small and shrinking.
B.6. Why /site-meta?
It's short, descriptive and according to search indices, not widely
used.
B.7. Aren't you concerned about pre-empting an authority's URI
namespace?
Yes, but it's unfortunately a necessary (and already present) evil;
this proposal tries to minimise future abuses.
B.8. Why use link relations instead of media types to identify kinds of
metadata?
A link relation declares the intent and use of the link (or inline
content, when present); a media type defines the format and
processing model for those bits.
B.9. What impact does this have on existing mechanisms, such as P3P and
robots.txt?
None, until they choose to use this mechanism.
B.10. Why not (insert existing similar mechanism here)?
We are aware that there are several existing proposals with similar
functionality. In our estimation, none have gained sufficient
traction. This may be because they were perceived to be too complex,
or tied too closely to one use case.
Nottingham & Hammer-Lahav Expires April 19, 2009 [Page 8]
Internet-Draft Site-Wide Metadata for the Web October 2008
Authors' Addresses
Mark Nottingham
Email: mnot@pobox.com
URI: http://www.mnot.net/
Eran Hammer-Lahav
Email: eran@hueniverse.com
URI: http://www.hueniverse.com/
Nottingham & Hammer-Lahav Expires April 19, 2009 [Page 9]
Internet-Draft Site-Wide Metadata for the Web October 2008
Full Copyright Statement
Copyright (C) The IETF Trust (2008).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Nottingham & Hammer-Lahav Expires April 19, 2009 [Page 10]