Dublin Core Workshop Series                                   S. Weibel
Internet-Draft                                                J. Kunze
draft-kunze-dc-00.txt                                        C. Lagoze
9 February 1997
Expire in six months


          Dublin Core Metadata for Simple Resource Description


1. Status of this Document

This document is an Internet-Draft.  Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas, and
its working groups.  Note that other groups may also distribute working
documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time.  It is inappropriate to use Internet- Drafts as reference
material or to cite them other than as ``work in progress.''

To learn the current status of any Internet-Draft, please check the
``1id-abstracts.txt'' listing contained in the Internet- Drafts Shadow
Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
ftp.isi.edu (US West Coast).

Distribution of this document is unlimited.  Please send comments
to weibel@oclc.org, or to the discussion list meta2@mrrl.lut.ac.uk.


2. Introduction

Finding information on the World Wide Web has become increasingly
problematic in proportion to the explosive growth of available resources.
Web indexing evolved rapidly to fill the demand for resource discovery
tools, but indexing, while enormously useful, is a poor substitute for
richer varieties of resource description.

An invitational workshop in March of 1995 brought together librarians,
digital library researchers, and text-markup specialists to address the
problem of resource description for networked resources.  This activity
evolved into a series of related workshops and ancillary activities
that have become known collectively as the Dublin Core Metadata
Workshop Series.  This report summarizes the state of this effort.

The initial motivation for the first workshop was simply to do
something that would improve the prospects for resource discovery on
the Web.  Specifically, the goal was to identify a simple set of common
description elements that authors (or content managers) could embed in
their documents to promote their discovery -- something like a catalog
card for a network resource.  The term "Dublin Core" applies to
this simple core of descriptive elements.

3. Simple Resource Description

The goals that motivate the Dublin Core effort are:

    Simplicity of creation and maintenance
    Commonly understood semantics
    International scope and applicability
    Extensibility

These requirements work at cross purposes to some degree, but all are
desirable goals.  The ensuing two years of discussion have been to some
degree an exercise in minimizing the tensions among them.

The development of formal ontologies is currently a prominent line of
research in digital library communities, aimed at identifying the
structure of knowledge in a given discipline, and linking these
structures into a larger whole.  In contrast, one might think of this
workshop series as an attempt to identify an "emergent ontology",
that is, a consensus among experienced practitioners across many
disciplines about the basic elements of resource discovery.


4. Description of Dublin Core Elements  

The following comprises the reference definition of the Dublin Core
Metadata Element set as of December, 1996.  The elements or their names
are not expected to change substantively from this list, though the
application of some of them are currently experimental and subject
to interpretation.  Further, it is expected that practice will evolve
to include qualifiers for certain of the elements.  The reference
description of the elements resides at

    http://purl.org/metadata/dublin_core_elements

Note that elements have a descriptive name intended to convey a common
semantic understanding of the element.  In addition, a formal, single-
word label is specified to make syntactic specification of elements
simpler in encoding schemes.  Each element is optional and repeatable.
Element descriptions follow.


4.1. Title			Label: TITLE

     The name given to the resource by the CREATOR or PUBLISHER.

4.2. Author or Creator		Label: CREATOR

     The person(s) or organization(s) primarily responsible for the
     intellectual content of the resource.  For example, authors in the
     case of written documents, artists, photographers, or illustrators
     in the case of visual resources.

4.3. Subject and Keywords	Label: SUBJECT

     The topic of the resource, or keywords or phrases that describe
     the subject or content of the resource.  The intent of the
     specification of this element is to promote the use of controlled
     vocabularies and keywords.  This element might well include
     scheme-qualified classification data (for example, Library of
     Congress Classification Numbers or Dewey Decimal numbers) or
     scheme-qualified controlled vocabularies (such as MEdical Subject
     Headings or Art and Architecture Thesaurus descriptors) as well.
   
4.4. Description		Label: DESCRIPTION

     A textual description of the content of the resource, including
     abstracts in the case of document-like objects or content
     descriptions in the case of visual resources.  Future metadata
     collections might well include computational content description
     (spectral analysis of a visual resource, for example) that may not
     be embeddable in current network systems.  In such a case this
     field might contain a link to such a description rather than the
     description itself.

4.5. Publisher			Label: PUBLISHER

     The entity responsible for making the resource available in its
     present form, such as a publisher, a university department, or a
     corporate entity.   The intent of specifying this field is to
     identify the entity that provides access to the resource.
     
4.6. Other Contributor 		Label: CONTRIBUTOR

     Person(s) or organization(s) in addition to those specified in the
     CREATOR element who have made significant intellectual contributions
     to the resource but whose contribution is secondary to the individuals
     or entities specifed in the CREATOR element (for example, editors,
     transcribers, illustrators, and convenors).

4.7. Date			Label: DATE

     The date the resource was made available in its present form.  The
     recommended best practice is an 8 digit number in the form YYYYMMDD
     as defined by ANSI X3.30-1985. In this scheme, the date element for
     the day this is written would be 19961203, or December 3, 1996.
     Many other schema are possible, but if used, they should be
     identified in an unambiguous manner.
   
4.8. Resource Type 		Label: TYPE

     The category of the resource, such as home page, novel, poem, working
     paper, technical report, essay, dictionary.  It is expected that
     RESOURCE TYPE will be chosen from an enumerated list of types.

4.9. Format  			Label: FORMAT
   
     The data representation of the resource, such as text/html, ASCII,
     Postscript file,  executable application, or JPEG image.  The intent
     of specifying this element is to provide information necessary to
     allow people or machines to make decisions about the usability of
     the encoded data (what hardware and software might be required to
     display or execute it, for example).  As with RESOURCE TYPE, FORMAT
     will be assigned from enumerated lists such as registered Internet
     Media Types (MIME types).  In principal, formats can include
     physical media such as books, serials, or other non-electronic media. 

      
4.10. Resource Identifier 	Label: IDENTIFIER

     String or number used to uniquely identify the resource.  Examples
     for networked resources include URLs and URNs (when implemented).
     Other globally-unique identifiers,such as International Standard
     Book Numbers (ISBN) or other formal names would also be candidates
     for this element.

4.11. Source			Label: SOURCE

     The work, either print or electronic, from which this resource
     is derived, if applicable. For example, an html encoding of a
     Shakespearean sonnet might identify the paper version of the
     sonnet from which the electronic version was transcribed.

4.12. Language 			Label: LANGUAGE

     Language(s) of the intellectual content of the resource.  Where
     practical, the content of this field should coincide with the
     NISO Z39.53 three character codes for written languages. 

4.13. Relation			Label: RELATION 

     Relationship to other resources.  The intent of specifying this
     element is to provide a means to express relationships among
     resources that have formal relationships to others, but exist as
     discrete resources themselves.  For example, images in a document,
     chapters in a book, or items in a collection.  A formal
     specification of RELATION is currently under development.  Users
     and developers should understand that use of this element should
     be currently considered experimental.

4.14. Coverage			Label: COVERAGE

     The spatial locations and temporal durations characteristic of the
     resource.    Formal specification of COVERAGE is currently under
     development. Users and developers should understand that use of
     this element should be currently considered experimental.

4.15. Rights Management 	Label: RIGHTS
   
     The content of this element is intended to be a link (a URL or
     other suitable URI as appropriate) to a copyright notice, a
     rights-management statement, or perhaps a server that would
     provide such information in a dynamic way.  The intent of
     specifying this field is to allow providers a means to associate
     terms and conditions or copyright statements with a resource or
     collection of resources.   No assumptions should be made by users
     if such a field is empty or not present.


5. Security Considerations

The Dublin Core element set poses no risk to computers and networks.
It poses minimal risk to searchers who obtain incorrect or private
information due to careless mapping from rich data descriptions to
simple Dublin Core scheme.  No other security concerns are likely
to be affected by the element description consensus documented here.


6. References

   [1] Weibel, S., Miller, E., "Dublin Core Metadata Element Set:
       Reference Description",
       http://purl.org/metadata/dublin_core_elements


7. Authors' Addresses

Stuart L. Weibel
OCLC Online Computer Library Center, Inc.
Office of Research
6565 Frantz Rd.
Dublin, Ohio, 43017, USA
Email: weibel@oclc.org
Voice: +1 614-764-6081
Fax:   +1 614-764-2344

John A. Kunze
Center for Knowledge Management
University of California, San Francisco
530 Parnassus Ave, Box 0840
San Francisco, CA  94143-0840, USA
Email: jak@ckm.ucsf.edu
Voice: +1 415-502-6660
Fax:   +1 415-476-4653

Carl Lagoze
Digital Library Research Group
Department of Computer Science
Cornell University
Ithaca, NY  14853, USA
Email: lagoze@cs.cornell.edu
Voice: +1-607-255-6046
Fax:   +1-607-255-4428


APPENDIX:  A Proposed Convention for Embedding Metadata in HTML.

The following proposed convention reflects the consensus of a break-out
group at the W3C Distributed Indexing and Searching Workshop, May 28-29,
1996, concerning tagging of meta information in HTML.  This break out
group included representatives of the Dublin Core/Warwick Framework
Metadata meetings, Lycos, Microsoft, WebCrawler, the IEEE metadata effort,
Verity Software, and the W3C.

                        Attendees (alphabetically):

 Nick Arnett    narnett@verity.com       Mic Bowman    bowman@transarc.com
 Eliot
 Christian      echristi@usgs.gov        Dan Connolly  conolly@w3.org
 Martijn Koster m.koster@webcrawler.com  John Kunze    jak@ckm.ucsf.edu

 Carl Lagoze    lagoze@cs.cornell.edu    Michael       fuzzy@lycos.com
                                         Mauldin
 Christian
 Mogensen       christian@vivid.com      Wick Nichols  wickn@microsoft.com

 Timothy Niesen tmn@swl.msd.ray.com      Stuart        weibel@oclc.org
                                         Weibel
 Andrew Wood    woody@dstc.edu.au


1. The Problem

The problem is to identify a simple means of embedding metadata within HTML
documents without requiring additional tags or changes to browser software,
and without unnecessarily compromising current practices for robot
collection of data.

While metadata is intended for display in some situations, it is judged
undesireable for such embedded metadata to display on browser screens as
a side effect of displaying a document. Therefore, any solution requires
encoding information in attribute tags rather than as container element
content.

The goal was to agree on a simple convention for encoding structured
metadata information of a variety of types (which may or may not be
registered with a central registry analogous to the Mime Type registry).
It was judged that a registry may be a necessary feature of the metadata
infrastructure as alternative schema are elaborated, but that deployment
in the short-term could go forward without such a registry, especially
in light of the proposed use of the LINK tag to link descriptions to a
standard schema description as described below.


2. A Proposed Convention

The solution agreed upon is to encode schema elements in META tags, one
element per META tag, and as many META tags as are necessary.  Grouping of
schema elements is achieved by a prefix schema identifier associated with
each schema element.  The convention agreed upon is as follows:

     <META NAME    = "schema_identifier.element_name"
           CONTENT = "string data">

Thus, a partial Dublin Core citation might be encoded as follows:

     <META NAME    = "DC.title"
           CONTENT = "HTML 2.0 Specification">

     <META NAME    = "DC.creator"
           CONTENT = "Berners-Lee, Tim">

     <META NAME    = "DC.creator"
           CONTENT = "Connolly, Dan">

     <META NAME    = "DC.date"
           CONTENT = "19951126">

     <META NAME    = "DC.identifier"
           CONTENT = "ftp://ds.internic.net/rfc/rfc1866.txt">

And a collection of Microsoft Word metadata might be encoded as follows:

     <META NAME    = "MSW.title"
           CONTENT = "W3C Indexing Work Shop Report">

     <META NAME    = "MSW.creator"
           CONTENT = "Nichols, Wick">

     <META NAME    = "MSW.date"
           CONTENT = "19960630">


3. Linkage to the Reference Description of a Metadata Schema

It is judged useful to provide a means for linking to the reference
definition of the metadata schema (or schemata) used in a document.  Doing
so serves as a primitive registration mechanism for metadata schemata, and
lays the foundation for a more formal, machine-readable linkage mechanism
in the future. The proposed convention for doing so is as follows:

     <LINK REL = SCHEMA.schema_identifier HREF="URL">

Thus, the reference description of one metadata scheme, the Dublin Core
Metadata Element Set, would be referenced in the LINK HREF as follows:

     <LINK REL = SCHEMA.dc HREF = "http://purl.org/metadata/dublin_core">

The description of an element could be accessed by the construction of URL
using the # token to identify a named anchor. Thus, the derived URL below
actually links to the title element in the reference description of the
Dublin Core Metadata Element Set.

     http://purl.org/metadata/dublin_core_elements#title

This URL would correspond to the human-readable description of the title
element within the document by a NAME anchor such as:

     <A NAME = "title"> Title </A>

         The name of the work provided by the author or publisher.

While use of the LINK tag is not required for a given schema, when used,
it will make possible retrieval of the reference definition of a given
schema element, and will therefore reduce the need for a formal metadata
scheme registry. Multiple LINK tags can be used so that elements derived
from multiple schemas can be referenced within a single document.


4. Consistency of Description Schemas

To promote consistency among resource description schemas, it is suggested
that the semantics for metadata elements be related to existing well-known
schemas whenever feasible.