Title: Metadata for the Web From Discovery to Description
1Metadata for the WebFrom Discovery to Description
- CS 502 20020226
- Carl Lagoze Cornell University
2Co-existing Cost/Functionality Levels
Greater Functionality Cost
3Dublin Core Qualifiers
- From fuzzy buckets to more specific description
- Model of graceful degradation
- Support both simplicity and specificity
- Intra-domain and inter-domain semantics
4implied verb
one of 15 properties
property value (an appropriate literal)
DCCreator DCTitle DCSubject DCDate...
implied subject
Resource
has
property
X
qualifiers (adjectives)
optional qualifier
optional qualifier
5Varieties of qualifiers Element Refinements
- Make the meaning of an element narrower or more
specific. - Narrowing implies an is a relationship
- a "date created is a "date
- an "is part of relation is a "relation
- If your software does not understand the
qualifier, you can safely ignore it.
6Varieties of Qualifiers Value Encoding Schemes
- Says that the value is
- a term from a controlled vocabulary (e.g.,
Library of Congress Subject Headings) - a string formatted in a standard way (e.g.,
"2001-05-02" means May 3, not February 5) - Even if a scheme is not known by software, the
value should be "appropriate" and usable for
resource discovery.
7Resource
has
Subject
"Languages -- Grammar"
LCSH
Resource
has
Date
"2000-06-13"
Revised
ISO8601
8Dumb-Down Principle for Qualifiers
- The fifteen elements should be usable and
understandable with or without the qualifiers - Qualifiers refine meaning (but may be harder to
understand) - Nouns can stand on their own without adjectives
- If your software encounters an unfamiliar
qualifier, look it up -- or just ignore it! - "has a relations break the model
- E.g., a creator has a hair color
9Test for good qualifiers cover and ask
-- Does the statement still make sense?
-- Is it still correct?
Resource
has
Subject
"Languages -- Grammar"
LCSH
Resource
has
Date
"2000-06-13"
Revised
ISO8601
10Incorrect Qualification
Resource
has
creator
Cornell University
affiliation
Resource
has
subject
pre-schoolers
audience
11Open questions in this model
- Are uncontrolled and unconstrained values really
useful for discovery? - Is it possible for an organization (DCMI) to
control the evolution of a language? - How can "simple discovery metadata" be combined
with complex descriptions? Is there a notion of
graceful degradation? - Can DC serve as a lingua franca (mapping
template) among more complex models
12Models for Deploying Metadata
- Embedded in the resource
- low deployment threshold
- Limited flexibility, limited model
- Linked to from resource
- Using xlink
- Is there only one source of metadata?
- Independent resource referencing resource
- Model of accessing the object through its
surrogate
13Syntax AlternativesHTML
- Advantages
- Simple Mechanism META tags embedded in content
- Widely deployed tools and knowledge
- Disadvantages
- Limited structural richness (wont support
hierarchical,tree-structured data or entity
distinctions).
14Dublin Core in HTML
- http//www.dublincore.org/documents/2000/08/15/dcq
-html/ - HTML constructs
- ltlinkgt to establish pseudo-namespace
- ltmetagt for metadata statements
- name attribute for DC element (DC.element.ER)
- content attribute for element value
- scheme attribute for encoding scheme or
controlled vocabulary - lang attribute for language of element value
15Dublin Core in HTML example
ltlink rel"schema.DC" href"http//purl.org/dc/ele
ments/1.1"gt ltmeta name"DC.Title"
content"Business Unusualgtltmeta nameDC.Title
langes contentnegocio inusualgt ltmeta
name"DC.Creator" content"Carl Lagoze"gt ltmeta
name"DC.Subject" content"bibliographic control
web cataloging "gt ltmeta name"DC.Date.Created"
scheme"W3CDTF" content"2000-10-23"gt ltmeta
name"DC.Format" content"text/html"gt ltmeta
name"DC.Identifier" content"http//lcweb.loc
.gov/lagoze_paper.html"gt
16Unqualified Dublin Core in XML
lt?xml version"1.0"?gt lt!DOCTYPE rdfRDF SYSTEM
"http//dublincore.org/2000/12/01-dcmes-xml-dtd.dt
d"gt ltrdfRDF xmlnsrdf"http//www.w3.org/1999/02/
22-rdf-syntax-ns" xmlnsdc"http//purl.
org/dc/elements/1.1/"gt ltrdfDescription
rdfabout"http//www.ilrt.bristol.ac.uk/people/cm
djb/"gt ltdctitlegtDave Beckett's Home
Pagelt/dctitlegt ltdccreatorgtDave
Beckettlt/dccreatorgt ltdcpublishergtILRT,
University of Bristollt/dcpublishergt
ltdcdategt2000-06-06lt/dcdategt
lt/rdfDescriptiongt lt/rdfRDFgt
http//www.dublincore.org/documents/2000/11/dcmes-
xml/
17Example of Dublin Core Use
- A map in the United States Library of Congress
on-line American Memory Collection
18Title
- The name given to the resourcelt META
- name DC.Title
- content Novi Belgii Novæque
- Angliænec non partis
- Virginiæ tabula multis in
- locis emendata
- lang la gt
19Creator
- An entity primarily responsible for making the
content of the resource - lt META
- name DC.Creator
- content Nicolaum Visscher
- gt
20Subject
- The topic of the content of the resource
- lt META
- name DC.Subject
- content Middle Atlantic States
- scheme LCSH
- gtlt META
- name DC.Subject
- content Maps
- scheme LCSH
- gtlt META
- name DC.Subject
- content Early works to 1800
- scheme LCSH
- gt
21Description
- An account of the content of the description
- lt META
- name DC.Description.Abstract
- content An historical map showing
the coast of New Jersey as - perceived in the seventeenth
- century
- gt
22Publisher
- An entity responsible for making the resource
available - lt META
- name DC.Publisher
- content Library of Congress,
- United States
- gt
23Contributor
- An entity responsible for making contributions to
the content of the resource. - lt META
- name DC.Contributor
- content Historic Urban Plans
- gt
24Date
- A date associated with an event in the lifecycle
of the resource - lt META
- name DC.Date.Created
- content 1996-04-17
- scheme W3C-DTF
- gt
25Type
- The nature or genre of the content of the
resource - lt META
- name DC.Type
- content imagescheme DCMIType
- gt
26Format
- The physical or digital manifestation of the
resource - lt META
- name DC.Format.Medium
- content image/gif
- scheme IMT
- gtlt META
- name DC.Format.Extent
- content 556K
- gt
27Identifier
- An unambiguous reference to the resource in the
current context - lt META
- name DC.Identifier
- content http//loc.gov/coll1/img456.jpg
- scheme URI
- gt
28Source
- A reference to a resource from which the present
resource is derived. - lt META
- name DC.Source
- content G3715 1685 .V5 1969 (LOC catalog )
- gt
29Language
- Language of the intellectual content of the
object - lt META
- name DC.Language
- content nlscheme ISO 639-2
- gt
30Relation
- A reference to a related resource
- lt META
- name DC.Relation.isPartOf
- content http//lcweb2.loc.gov/ammem/
- gmdhtml/dsxpimg.html
- scheme URIgt
31Coverage
- The extent or scope of the content of the
resource - lt META
- name DC.Coverage.Spatial
- content New Jersey
- scheme TGN"
- gtlt META name DC.Coverage.Temporal
content 1650 scheme
W3C-DTFgt
32Rights
- Information about rights in and over the resource
- lt META
- name DC.Rights
- content http//www.loc.gov/
- rights_statement.htm
- gt
33Distributed ContentThe Metadata Challenge
- From fixed, contained physical artifacts to
fluid, distributed digital objects - Need for basis of trust and authenticity in
network environment - Decentralization and specialization of resource
description and need for mapping formalisms
34Multi-entity nature of object description
35Understanding Metadata based on Query Capabilities
- Simple boolean tags?
- CreatorTom Baker and Title contains Dublin
Core - Agent, time, place questions?
- Who was responsible for what and when and where
36Attribute/Value approaches to metadata
The playwright of Hamlet was Shakespeare
Hamlet has a creator
Shakespeare
37run into problems for richer descriptions
The playwright of Hamlet was Shakespeare,who was
born in Stratford
Hamlet has a creator
Stratford
birthplace
38because of their failure to model entity
distinctions
Shakespeare
name
R1
R2
creator
birthplace
title
Stratford
Hamlet
39Applying a Model-Centric Approach
- Formally define common entities and relationships
underlying multiple metadata vocabularies - Describe them (and their inter-relationships) in
a simple logical model - Provide the framework for extending these common
semantics to domain and application-specific
metadata vocabularies.
40Events are key to understanding metadata
relationships?
- Modeling implied events as first-class objects
provides attachment points for common entities
e.g., agents, contexts (times places), roles. - Clarifying attachment points facilitates
understanding and querying who was responsible
for what when.
41Content, Events, Descriptions
42ABC/Harmony Event-aware metadata ontology
- Recognizing inherent lifecycle aspects of
description (esp. of digital content) - Modeling incorporates time (events and
situations) as first-class objects - Supplies clear attachment points for agents,
roles, existential properties - Resource description as a story-telling activity
43Resource-centric Metadata
Title Anna Karenina
Author Leo Tolstoy
Illustrator Orest Vereisky
Translator Margaret Wettlin
Date Created 1877
Date Translated 1978
Description Adultery Depression
Birthplace Moscow
Birthdate 1828
44(No Transcript)
45Queries over complex descriptive graphs
- Ability to ask questions like show me all the
translations of War and Peace between 1980 and
1990