Dublin Core Workshop Series S. Weibel Internet-Draft J. Kunze draft-kunze-dc-00.txt C. Lagoze 9 February 1997 Expire in six months Dublin Core Metadata for Simple Resource Description 1. Status of this Document This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet- Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Distribution of this document is unlimited. Please send comments to weibel@oclc.org, or to the discussion list meta2@mrrl.lut.ac.uk. 2. Introduction Finding information on the World Wide Web has become increasingly problematic in proportion to the explosive growth of available resources. Web indexing evolved rapidly to fill the demand for resource discovery tools, but indexing, while enormously useful, is a poor substitute for richer varieties of resource description. An invitational workshop in March of 1995 brought together librarians, digital library researchers, and text-markup specialists to address the problem of resource description for networked resources. This activity evolved into a series of related workshops and ancillary activities that have become known collectively as the Dublin Core Metadata Workshop Series. This report summarizes the state of this effort. The initial motivation for the first workshop was simply to do something that would improve the prospects for resource discovery on the Web. Specifically, the goal was to identify a simple set of common description elements that authors (or content managers) could embed in their documents to promote their discovery -- something like a catalog card for a network resource. The term "Dublin Core" applies to this simple core of descriptive elements. 3. Simple Resource Description The goals that motivate the Dublin Core effort are: Simplicity of creation and maintenance Commonly understood semantics International scope and applicability Extensibility These requirements work at cross purposes to some degree, but all are desirable goals. The ensuing two years of discussion have been to some degree an exercise in minimizing the tensions among them. The development of formal ontologies is currently a prominent line of research in digital library communities, aimed at identifying the structure of knowledge in a given discipline, and linking these structures into a larger whole. In contrast, one might think of this workshop series as an attempt to identify an "emergent ontology", that is, a consensus among experienced practitioners across many disciplines about the basic elements of resource discovery. 4. Description of Dublin Core Elements The following comprises the reference definition of the Dublin Core Metadata Element set as of December, 1996. The elements or their names are not expected to change substantively from this list, though the application of some of them are currently experimental and subject to interpretation. Further, it is expected that practice will evolve to include qualifiers for certain of the elements. The reference description of the elements resides at http://purl.org/metadata/dublin_core_elements Note that elements have a descriptive name intended to convey a common semantic understanding of the element. In addition, a formal, single- word label is specified to make syntactic specification of elements simpler in encoding schemes. Each element is optional and repeatable. Element descriptions follow. 4.1. Title Label: TITLE The name given to the resource by the CREATOR or PUBLISHER. 4.2. Author or Creator Label: CREATOR The person(s) or organization(s) primarily responsible for the intellectual content of the resource. For example, authors in the case of written documents, artists, photographers, or illustrators in the case of visual resources. 4.3. Subject and Keywords Label: SUBJECT The topic of the resource, or keywords or phrases that describe the subject or content of the resource. The intent of the specification of this element is to promote the use of controlled vocabularies and keywords. This element might well include scheme-qualified classification data (for example, Library of Congress Classification Numbers or Dewey Decimal numbers) or scheme-qualified controlled vocabularies (such as MEdical Subject Headings or Art and Architecture Thesaurus descriptors) as well. 4.4. Description Label: DESCRIPTION A textual description of the content of the resource, including abstracts in the case of document-like objects or content descriptions in the case of visual resources. Future metadata collections might well include computational content description (spectral analysis of a visual resource, for example) that may not be embeddable in current network systems. In such a case this field might contain a link to such a description rather than the description itself. 4.5. Publisher Label: PUBLISHER The entity responsible for making the resource available in its present form, such as a publisher, a university department, or a corporate entity. The intent of specifying this field is to identify the entity that provides access to the resource. 4.6. Other Contributor Label: CONTRIBUTOR Person(s) or organization(s) in addition to those specified in the CREATOR element who have made significant intellectual contributions to the resource but whose contribution is secondary to the individuals or entities specifed in the CREATOR element (for example, editors, transcribers, illustrators, and convenors). 4.7. Date Label: DATE The date the resource was made available in its present form. The recommended best practice is an 8 digit number in the form YYYYMMDD as defined by ANSI X3.30-1985. In this scheme, the date element for the day this is written would be 19961203, or December 3, 1996. Many other schema are possible, but if used, they should be identified in an unambiguous manner. 4.8. Resource Type Label: TYPE The category of the resource, such as home page, novel, poem, working paper, technical report, essay, dictionary. It is expected that RESOURCE TYPE will be chosen from an enumerated list of types. 4.9. Format Label: FORMAT The data representation of the resource, such as text/html, ASCII, Postscript file, executable application, or JPEG image. The intent of specifying this element is to provide information necessary to allow people or machines to make decisions about the usability of the encoded data (what hardware and software might be required to display or execute it, for example). As with RESOURCE TYPE, FORMAT will be assigned from enumerated lists such as registered Internet Media Types (MIME types). In principal, formats can include physical media such as books, serials, or other non-electronic media. 4.10. Resource Identifier Label: IDENTIFIER String or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs (when implemented). Other globally-unique identifiers,such as International Standard Book Numbers (ISBN) or other formal names would also be candidates for this element. 4.11. Source Label: SOURCE The work, either print or electronic, from which this resource is derived, if applicable. For example, an html encoding of a Shakespearean sonnet might identify the paper version of the sonnet from which the electronic version was transcribed. 4.12. Language Label: LANGUAGE Language(s) of the intellectual content of the resource. Where practical, the content of this field should coincide with the NISO Z39.53 three character codes for written languages. 4.13. Relation Label: RELATION Relationship to other resources. The intent of specifying this element is to provide a means to express relationships among resources that have formal relationships to others, but exist as discrete resources themselves. For example, images in a document, chapters in a book, or items in a collection. A formal specification of RELATION is currently under development. Users and developers should understand that use of this element should be currently considered experimental. 4.14. Coverage Label: COVERAGE The spatial locations and temporal durations characteristic of the resource. Formal specification of COVERAGE is currently under development. Users and developers should understand that use of this element should be currently considered experimental. 4.15. Rights Management Label: RIGHTS The content of this element is intended to be a link (a URL or other suitable URI as appropriate) to a copyright notice, a rights-management statement, or perhaps a server that would provide such information in a dynamic way. The intent of specifying this field is to allow providers a means to associate terms and conditions or copyright statements with a resource or collection of resources. No assumptions should be made by users if such a field is empty or not present. 5. Security Considerations The Dublin Core element set poses no risk to computers and networks. It poses minimal risk to searchers who obtain incorrect or private information due to careless mapping from rich data descriptions to simple Dublin Core scheme. No other security concerns are likely to be affected by the element description consensus documented here. 6. References [1] Weibel, S., Miller, E., "Dublin Core Metadata Element Set: Reference Description", http://purl.org/metadata/dublin_core_elements 7. Authors' Addresses Stuart L. Weibel OCLC Online Computer Library Center, Inc. Office of Research 6565 Frantz Rd. Dublin, Ohio, 43017, USA Email: weibel@oclc.org Voice: +1 614-764-6081 Fax: +1 614-764-2344 John A. Kunze Center for Knowledge Management University of California, San Francisco 530 Parnassus Ave, Box 0840 San Francisco, CA 94143-0840, USA Email: jak@ckm.ucsf.edu Voice: +1 415-502-6660 Fax: +1 415-476-4653 Carl Lagoze Digital Library Research Group Department of Computer Science Cornell University Ithaca, NY 14853, USA Email: lagoze@cs.cornell.edu Voice: +1-607-255-6046 Fax: +1-607-255-4428 APPENDIX: A Proposed Convention for Embedding Metadata in HTML. The following proposed convention reflects the consensus of a break-out group at the W3C Distributed Indexing and Searching Workshop, May 28-29, 1996, concerning tagging of meta information in HTML. This break out group included representatives of the Dublin Core/Warwick Framework Metadata meetings, Lycos, Microsoft, WebCrawler, the IEEE metadata effort, Verity Software, and the W3C. Attendees (alphabetically): Nick Arnett narnett@verity.com Mic Bowman bowman@transarc.com Eliot Christian echristi@usgs.gov Dan Connolly conolly@w3.org Martijn Koster m.koster@webcrawler.com John Kunze jak@ckm.ucsf.edu Carl Lagoze lagoze@cs.cornell.edu Michael fuzzy@lycos.com Mauldin Christian Mogensen christian@vivid.com Wick Nichols wickn@microsoft.com Timothy Niesen tmn@swl.msd.ray.com Stuart weibel@oclc.org Weibel Andrew Wood woody@dstc.edu.au 1. The Problem The problem is to identify a simple means of embedding metadata within HTML documents without requiring additional tags or changes to browser software, and without unnecessarily compromising current practices for robot collection of data. While metadata is intended for display in some situations, it is judged undesireable for such embedded metadata to display on browser screens as a side effect of displaying a document. Therefore, any solution requires encoding information in attribute tags rather than as container element content. The goal was to agree on a simple convention for encoding structured metadata information of a variety of types (which may or may not be registered with a central registry analogous to the Mime Type registry). It was judged that a registry may be a necessary feature of the metadata infrastructure as alternative schema are elaborated, but that deployment in the short-term could go forward without such a registry, especially in light of the proposed use of the LINK tag to link descriptions to a standard schema description as described below. 2. A Proposed Convention The solution agreed upon is to encode schema elements in META tags, one element per META tag, and as many META tags as are necessary. Grouping of schema elements is achieved by a prefix schema identifier associated with each schema element. The convention agreed upon is as follows: Thus, a partial Dublin Core citation might be encoded as follows: And a collection of Microsoft Word metadata might be encoded as follows: 3. Linkage to the Reference Description of a Metadata Schema It is judged useful to provide a means for linking to the reference definition of the metadata schema (or schemata) used in a document. Doing so serves as a primitive registration mechanism for metadata schemata, and lays the foundation for a more formal, machine-readable linkage mechanism in the future. The proposed convention for doing so is as follows: Thus, the reference description of one metadata scheme, the Dublin Core Metadata Element Set, would be referenced in the LINK HREF as follows: The description of an element could be accessed by the construction of URL using the # token to identify a named anchor. Thus, the derived URL below actually links to the title element in the reference description of the Dublin Core Metadata Element Set. http://purl.org/metadata/dublin_core_elements#title This URL would correspond to the human-readable description of the title element within the document by a NAME anchor such as: Title The name of the work provided by the author or publisher. While use of the LINK tag is not required for a given schema, when used, it will make possible retrieval of the reference definition of a given schema element, and will therefore reduce the need for a formal metadata scheme registry. Multiple LINK tags can be used so that elements derived from multiple schemas can be referenced within a single document. 4. Consistency of Description Schemas To promote consistency among resource description schemas, it is suggested that the semantics for metadata elements be related to existing well-known schemas whenever feasible.