Title: An Introduction to Dublin Core
1An Introduction to Dublin Core Making Sense of
Metadata, Society of Archivists EAD/Data
Exchange SIG London, Thursday 17 November
2005 Pete Johnston Research Officer, UKOLN,
University of Bath
UKOLN is supported by
www.bath.ac.uk
2An Introduction to Dublin Core
- A brief history
- What is Dublin Core, really?
- The DCMI Abstract Model
- Encoding Dublin Core metadata
- DC Application Profiles
- DC in practice
3A Brief History
4A brief history (1)
- Mid 1990s rapid growth of World Wide Web
- Challenge of resource discovery
- search engines providing many hits, but little
precision - recognition that library approach to cataloguing
could not scale to Web resources - 1995 OCLC/NCSA Workshop in Dublin, Ohio
- interdisciplinary consensus on 13 "metadata
elements" - for discovery of "document-like objects"
- relatively simple, usable by non-cataloguers
- 1996 OCLC/CNI Workshop in Dublin, Ohio
- expand to 15 elements
- explicitly cross-domain
- for discovery of broad range of "resources"
5The Dublin Core Metadata Element Set
- Title
- Subject
- Description
- Creator
- Publisher
- Contributor
- Date
- Type
- Format
- Identifier
- Source
- Language
- Relation
- Coverage
- Rights
6A brief history (2)
- 1997-2000 Development of notion of
"qualification" - tension between simplicity and complexity
- element refinement
- Narrow the meaning of a DC element
- e.g. "date modified" v "date"
- encoding scheme
- Provide additional information about a value
- e.g. that a subject is a Library of Congress
Subject Heading - the "Dumb-Down" principle
- Rules for transforming "qualified" description
into "simple" description - the "One-to-One" rule
- A DC description describes exactly one resource
7A brief history (3)
- 1997-2000 What is a "resource"?
- e.g. Can the DCMES be applied to people?
- DCMI Type Vocabulary
- Collection, Dataset, Event, Image (Still or
Moving), Interactive Resource, Service, Software,
Sound, Text, Physical Object - But still fairly non-prescriptive
- 1998- Emergence of Resource Description Framework
(RDF) - 2000-2001 "Grammatical Principles" as informal
data model
8A brief history (4)
- 2000-2005 Development of notion of DC
"Application Profile" - tailoring metadata standards for context
- providing local guidelines, constraints
- combining components from different sources
- 2003-2005 Formalisation of DCMI Abstract Model
- concepts used in DC metadata
- different types of terms used in DC metadata
- how those terms used in combination to construct
descriptions
9What is Dublin Core, really?
10Dublin Core is...
- a conceptual framework/set of rules...
- DCMI Abstract Model
- describes how to use certain types of terms
- ... to make statements...
- ... that form descriptions (of resources)
- a "core" vocabulary/set of terms...
- managed by DCMI (Usage Board)
- growing (relatively) slowly as new requirements
arise - each identified by a Uniform Resource Identifier
(URI) - a set of specifications for representing or
encoding DC metadata descriptions in various
formats
11DCMI Abstract Model(a slightly simplified view)
12DCMI Abstract Model
- A description
- describes exactly one resource
- may specify a resource URI
- consists of a set of statements
13DCMI Abstract Model Descriptions
14DCMI Abstract Model
- A statement must contain
- a reference to a property
- property URI
- all DC "elements" are properties
- properties may be defined by agencies other than
DCMI - a reference to a second resource (value)
- value URI, and/or
- one or more value representations
- value string
- rich representation
15DCMI Abstract Model Statements
16DCMI Abstract Model
- A statement may contain
- a reference to a vocabulary encoding scheme
- vocabulary encoding scheme URI
- type of value
- a reference to a syntax encoding scheme
- syntax encoding scheme URI
- how value string is interpreted
17DCMI Abstract Model Statements
18DCMI Abstract Model
- A description describes one resource
- Applications typically based on description sets
- groups of descriptions
- where the described resources may be related in
some way - Description sets encoded or serialised as records
- according to rules of binding
19(No Transcript)
20Encoding Dublin Core metadata(a very brief
introduction!)
21DCMI Abstract Model and Bindings
- For transfer between applications, descriptions
must be represented as digital objects - Binding maps between constructs in conceptual
model and components in a digital format - Two way
- encoding application description set -gt record
- decoding application record -gt description set
- DCMI currently provides three "encoding
guidelines" specifications - Other agencies may also provide bindings
22Using X/HTML meta link elements
- The set of meta/link elements represent a single
DC description. - The resource described is the X/HTML document in
which the metadata is embedded. - Each meta/link element represents a single
statement - Property and Encoding Scheme URIs encoded as
prefixed names
ltlink rel"schema.DC" href"http//purl.org/dc/ele
ments/1.1/" /gtltlink rel"schema.DCTERMS"
href"http//purl.org/dc/terms/" /gt ltmeta
name"DC.title" content"A guide to DC metadata"
/gt ltmeta name"DCTERMS.audience"
content"information managers" /gt ltmeta
name"DC.language" scheme"DCTERMS.ISO639-2"
content"eng" /gt ltlink rel"DCTERMS.references"hr
ef"http//dublincore.org/documents/dcq-html" /gt
23Using the DC-XML format
- Supports only limited subset of Abstract Model
(revision forthcoming) - The container element, here ltmetagt, represents a
single DC description. - Each child element represents a single statement
- Property URIs and Encoding Scheme URIs encoded as
XML QNames
lt?xml version"1.0"?gtltmeta xmlns"http//www.ukol
n.ac.uk/metadata/dcdot/" xmlnsxsi"http//w
ww.w3.org/2001/XMLSchema-instance"
xmlnsdc"http//purl.org/dc/elements/1.1/"gt
ltdcidentifiergthttp//example.org/doc/1234/lt/dcid
entifiergt ltdctitlegtA Guide to DC
Metadatalt/dctitlegt ltdclanguage
xsitype"dctermsISO639-2"gtenglt/dclanguagegt
ltdctermsreferencesgthttp//dublincore.org/document
s/dcq-htmllt/dctermsreferencesgt lt/metagt
24Using the Resource Description Framework (RDF)
- Specifications for DC in RDF do exist
- but currently work in progress to
- resolve ambiguities
- revise in light of DCAM
25Dublin Core Application Profiles
26DC Application Profile
- Implementers adapt metadata standards to the
context of their application - Tension between localisation and interoperability
- A DC Application Profile
- specifies the terms (properties,
vocabulary/syntax encoding schemes) used in a
class of description sets - describes how those terms are used
- supplementary information on how properties
applied/interpreted in context - constraints on occurrence of properties
- constraints on values and value representations
(encoding schemes)
27DC Application Profiles Examples
- "Simple Dublin Core"
- use of the 15 properties of the DCMES
- all optional and repeatable
- values represented by value strings
- no vocabulary or syntax encoding schemes
- UK eGMS
- use of selected properties from DCMI
vocabularies, additional properties - guidelines on use of properties
- some properties mandated/recommended
- some vocabulary encoding schemes
mandated/recommended - guidance on content of value strings
28DC Application Profiles Examples
- JISC Information Environment Service Registry
(IESR) Metadata Schema - supports description of several related resources
(Collection, Service, Agents) - use of selected properties from DCMI
vocabularies, selected properties from RSLP CD
vocabularies, some properties created for IESR - for each subject resource type, guidelines on use
of properties - some properties mandated/recommended
- many vocabulary encoding schemes
mandated/recommended
29(No Transcript)
30DC in Practice
31Dublin Core in X/HTML
- Initial implementation focused on DC-in-HTML
- Robot crawls individual HTML pages to extract
metadata - But today little/no use by large Web search
engines - Problems of spamming/trust
- Lack of take-up by authors/publishers
- Success of full-text crawling/indexing, esp.
Google! - However, some use in controlled domains
- Intranets
- Trusted groups of providers (e.g. eGMS)
- Embedding DC in XHTML useful if you know a search
engine exploits it
32(No Transcript)
33- Picture Australia- images "related to all things
Australian" from 40 cultural agencies" - central search service based (initially at
least) on crawling HTML-embedded DC metadata - providers migrating to OAI-PMH
- currently hybrid approach?
http//www.pictureaustralia.org/
34(No Transcript)
35(No Transcript)
36Dublin Core and OAI-PMH
- Open Archives Initiative Protocol for Metadata
Harvesting (OAI-PMH) - Fairly simple mechanism for sharing metadata
records between applications - Has origins in e-prints community
- Built on HTTP, XML
- Allows a harvester to ask a repository for all or
some of its metadata records (in a specified
metadata format) - i.e. supports "incremental harvesting"
- "Give me all your records updated since
yyyy-mm-dd" - "OAI-DC" (Simple DC) is mandatory format
- But no limitation on format that can be
transferred (as long as can be described by XML
Schema)
37Repositories
38- OAIster (University of Michigan)
- "academically-oriented digital resources"
- "5,947,627 records from 557 institutions"
(2005-11-15)
http//oaister.umdl.umich.edu/
39(No Transcript)
40(No Transcript)
41Summary
- DCMES/"Simple DC" as a "core" for discovery of
wide range of resources - "Simple DC" is, by definition, simple!
- Limitations in terms of functions/services that
can be offered - DCMI Abstract Model provides a framework for
extensibility and modularity - A DC Application Profile describes a real-world
usage of that model
42An Introduction to Dublin Core Making Sense of
Metadata, Society of Archivists EAD/Data
Exchange SIG London, Thursday 17 November
2005 Pete Johnston Research Officer, UKOLN,
University of Bath
UKOLN is supported by
www.bath.ac.uk