Title: Metadata for Networked Resources
1Metadata for Networked Resources
2DESIRE ProjectSeptember 20, 1999 Bristol,
UK Metadata for Networked Resources
Introduction to Metadata
3What is the Problem?
- 3.6 million Web sites
- Five hundred million or more addressable pages on
the Web - High consumer expectations conflicting with
primitive tools and mechanisms - Uncertain quality, integrity, trust
4A Critical Perspective on the Information
Landscape of the Web
- The Web changes relationships among
- authors
- publishers
- information intermediaries and distributors
- users
- Disseminating short-lived or dynamic resources is
greatly simplified - Providing access to resources is more difficult
5The Web as an Information System
- Lower barriers to publication
- rapid dissemination of information and ideas
- less advantage to size or centralization
- greatly expanded access
- Manageability is reduced
- resource discovery is chaotic
- organization is haphazard
- preservation is almost non-existent
6The Web is missing much of what we associate
with a library...
- Search systems are motivated by advertising
- Index coverage is unpredictable and limited (1/3)
- Too much recall, too little precision
- Index spam abounds
- Resources (and their names) are volatile
- What about versions, editions, back issues?
- Archiving is presently unsolved
- Authority and quality of service are spotty
- Managing Access Rights is hard
7Metadata Enabling higher quality information
services on the Web
- Structured data about data
- helps to impose order on chaos
- enables automated discovery/manipulation
- Many dimensions
- richness
- functionality
- discipline
- language/culture
8Metadata takes many forms
resource
document
rights
discovery
administration
management
content
security and
archival
rating
authentication
status
products and
database
process control
services
schemas
or description
9Metadata Challenges
- Accommodate multiple varieties of metadata
- Tension functionality and simplicity
- Tension extensibility and interoperability
- Human and machine creation and use
- Community-specific functionality, creation,
administration, access
10Warwick Framework Containing Chaos
- Conceptual Architecture for metadata from the
Warwick Metadata Workshop (DC-2) - Conceptual architecture to support the
specification, collection, encoding, and exchange
of modular metadata - Provide context for metadata efforts (including
Dublin Core) - avoids the black-hole of comprehensive element
sets - encourages decentralized, community-based
solutions
11Modularization allows Distributed Management
- Communities of expertise (not software vendors)
are responsible for - Semantics
- Registration
- Administration
- Access management
- Authority of data
- Sharing and Distribution
12Modularization and Distribution Present New
Challenges
- Preservation
- Reliability
- Integrity
- Semantic Interaction
13Resource Description Communities
A resource description community is characterized
by common semantic, structural, and syntactic
conventions for exchange of resource description
information
Libraries
MARC
AACR2
14 The Internet Commons embraces many Resource
Description Communities
15Interoperabilityrequires conventions about
- Semantics
- The meaning of the elements
- Structure
- human-readable
- machine-parseable
- Syntax
- grammars to convey semantics and structure
16Dublin Core Metadata
- How to improve resource discovery on the Web?
- simple resource description semantics
- Build an interdisciplinary consensus about a core
element set for resource discovery - simple and intuitive
- cross-disciplinary
- international
- flexible
17 Dublin Core Workshop Series and Related Events
- Chicago WWW Conference Oct, 1994
- OCLC/NCSA Metadata Workshop Mar, 1995
- OCLC/UKOLN Warwick Workshop Apr, 1996
- W3C Indexing and Searching Workshop May, 1996
- CNI/OCLC Image Metadata Workshop Sep, 1996
- DC-4, Canberra, Australia Mar, 1997
- DC-5, Helsinki, Finland Oct, 1997
- DC-6, Washington, D.C. Nov, 1998
- DC-7, Frankfurt, Germany Oct, 1999
18The Dublin CoreMetadata Element Set
- Title
- Author/Creator
- Subject /Keywords
- Description
- Publisher
- Other Contributor
- Date
- Resource Type
- Format
- Resource Identifier
- Source
- Language
- Relation
- Coverage
- Rights Management
19Central Characteristics of the
Dublin Core Metadata Element Set
- Descriptive metadata for resource discovery
- All elements optional
- constraints are established at application level,
not by the semantic specification - All elements repeatable
- Extensible (a starting place for richer
description) - Interdisciplinary (semantic interoperability)
- International (21 languages, 4 continents)
20A Maintenance Agency for the Dublin Core?
- International consensus is the primary asset
- Dublin Core Directorate
- DC Policy Advisory Committee
- Provide avenue of communication among major
international stakeholders - DC Technical Advisory Committee
- Working Group leaders
21A Maintenance Agency forthe Dublin Core
Initiative
Dublin Core Web Site
Dublin Core Directorate
DC Policy Advisory Committee
Stakeholder Communities
DC-General Dublin Core Mail Server
22Dublin Core Working Groups(http//www.mailbase.ac
.uk)
- DC-General
- DC-Data Model
- DC-Internationalization
- DC-Implementors
- DC-Guides
- DC-Standards
- DC-Citation
- DC-One2one
- DC-Agents
- DC-Coverage
- DC-Date
- DC-Format
- DC-Relation
- DC-SubDesc
- DC-Title
- DC-Type
23Steps Toward Standardization
- IETF informational RFCs of Dublin Core semantics
and syntax - RFC 2413
- IETF Informational Draft on DC in HTML
- NISO standardization initiated
- CEN standardization initiated
- ISO standardization under discussion
- The challenge is to establish a common path for
disparate standards processes
24What will be standardized?
- Dublin Core Element Set 1.1. will be submitted
for NISO and CEN standardization at the same time - Element Working Groups have reviewed and finalize
element definitions - Format of element definitions brought into line
with the ISO 11179 standard for expression of
element semantics
25Relationships to other Metadata Initiatives
- MARC/AACR2
- Z39.50
- INDECS Project
- IMS
26MARC/AARC2
- DC is strongly influenced by MARC/AACR2
- Important differences in structure, detail, and
focus - Substantial effort invested in cross walks
- LC MARC Standards Office
- Nordic Metadata Project
- Australian Metadata Initiatives at NLA, DSTC
- CORC project at OCLC
27Z39.50 and Dublin Core
- Dublin Core is the proposed Cross Domain
attribute set - Creator/Contributor/Publisher are collapsed into
a single abstract attribute (Name) - http//www.oclc.org/levan/docs/crossdomainattribu
teset.html
28INDECS project
- INDECS Interoperability of Data in E-Commerce
Systems. - Rights Management Metadata Identification of
common functional requirements for managing IP on
the Internet - Substantial overlap with Resource Discovery
- Data model based on IFLA FRBR model
- http//www.indecs.org/
29IMS
- Instructional Management System
- Extended semantics to support description of
educational materials - Core semantics based on Dublin Core
30DC Implementation Projects
- 100 major implementation projects in 20
countries - Government Information
- Australian Government Locator Service
- Danish Online Government Information
- Finnish Online Government Information
31Projects (continued)
- Science and Mathematics
- Environment Australia
- Australian Geodynamics Cooperative Research
Centre (AGCRC) - EULER (European Libraries and Electronic
Resources in Mathematical Sciences) - Swedish EnviroNet
- German Mathematical Society Preprint Project
32Projects (continued)
- Education
- EDNA (Educational Network of Australia)
- GEM (Gateway to Educational Materials)
- German Education Resources Server
- IMS (Instructional Management System)
- DC discipline-specific elements
33Projects (continued)
- Humanities
- AHDS Arts and Humanities Data Service
- CIMI Metadata Testbed Project
- SCRAN (Scottish Cultural Resources Access
Network)
34Projects (continued)
- Libraries and Digital Libraries
- CORC Project (OCLC)
- Pandora Project (NLA)
- The Nordic Metadata Project
- BIBLINK (Europe)
- ELISE (Electronic Image Service for Europe)
- Florida International University Digital Library
- University of Washington Digital Library
- State Library of Queensland
35Commerce
- Intranets
- eg. Ford, Nokia, Boeing
- Netscapes Open Directory Project
36Why Consider the Dublin Core?
- You have a rich standard, need a simple one
(probably for cost reasons) - You want to reveal your data to other communities
(via the Web) using commonly understood semantics - You want to provide unified access to databases
with different underlying schemas - You need core description semantics and dont
feel compelled to invent them anew
37Additional Information on Dublin Core
- Dublin Core Metadata Initiative Homepage
- http//purl.org/dc
- DLib Magazine (all workshop reports)
- http//www.dlib.org
38DESIRE Metadata Tools
DESIRE ProjectSeptember 20, 1999 Bristol,
UK Metadata for Networked Resources
39A little light relief ?
- Dublin Core in HTML
- Some DESIRE metadata tools...
- Dublin Core editors
- DC-dot
- Nordic DC generator
- ROADS - metadata management
- Web robots
- Combine
- Harvest
40DC in HTML
- lthtmlgtltheadgt
- lttitlegtUKOLN Home Pagelt/titlegt
- ltmeta name"DC.Title content"UKOLN UK Office
for Library and Information Networking"gt - ltmeta name"DC.Subject" content"national centre,
network information support, library community,
awareness, research, information services, public
library networking, bibliographic management,
distributed library systems, metadata, resource
discovery, conferences, lectures, workshops"gt - ltmeta name"DC.Description" content"UKOLN is a
national centre for support in network
information management in the library and
information communities. It provides awareness,
research and information services"gt - ltmeta name"DC.Creator" contentUKOLN
Information Services Group"gt - lt/headgt
- ...
41Editors - DC-dot
- Web-based DC creator and editor
- Automatic generation of some metadata
- Extraction of metadata from MS-Office, PDF and
HTML files - Context sensitive help
- Simple
- Generates HTML ltmetagt tags and a variety of other
formats - Can be integrated with browser
- Validates existing HTML metadata
42(No Transcript)
43(No Transcript)
44(No Transcript)
45Editors - Nordic Template
- Web-based DC creator and editor
- More complex than DC-dot, eg
- support for schemes
- simple support for repeated elements
46(No Transcript)
47(No Transcript)
48Metadata management
- Potential problems
- embedded metadata fairly static
- hard to make bulk changes
- hard to migrate to new metadata formats
- so
- store metadata separately
- embed into Web-pages on-the-fly
49DC-ROADS - Summary
- Embed on-the-fly
- Apache SSI script
- Store metadata in ROADS database
- ROADS Web-based tool to edit/update metadata
records - Associate metadata with resource by assigning a
unique ID (will be able to use the resource URL
in the future)
50DC-ROADS - authoring
Apache syntax for calling server-side
script lt!--exec cmd"roads2metadc.pl" --gt
HTML editor
lthtmlgt ltheadgt lttitlegtlt/titlegt lt!--exec
cmd"roads2metadc.pl" --gt lt/headgt ...
ROADS database
ROADS editor
51DC-ROADS - embedding
Web client or robot
lthtmlgt ltheadgt lttitlegtlt/titlegt lt!--exec
cmd"roads2metadc.pl" --gt lt/headgt ...
2
1
UKOLN Web server
6
ROADS database
3
4
SSI script
5
52Metadata embedded in page
Edit metadata button for authors
Link to metadata display for end users
53DC Usage - Web Robots
- Combine
- Support for embedded Dublin Core
- Used for the Nordic Web Index (NWI)
- Index of all pages in the Nordic countries
- Promoted in combination with the Nordic template
- Searchable using Z39.50
- Harvest
- Support for embedded Dublin Core
- Used as basis for AC/DC
- UK academic Web index
54References
- DC-dot
- http//www.ukoln.ac.uk/metadata/dcdot/
- Nordic Metadata Template
- http//www.lub.lu.se/cgi-bin/nmdc.pl
- DC-ROADS ROADS for metadata management
- http//www.ukoln.ac.uk/metadata/roads/metadata-mg
mt/ - Combine
- http//www.lub.lu.se/combine/
- Nordic Web Index
- http//nwi.ub2.lu.se/?langen
- Harvest
- http//www.tardis.ed.ac.uk/harvest/
55DESIRE ProjectSeptember 20, 1999 Bristol,
UK Metadata for Networked Resources
Qualifying and Extending Metadata Semantics
56Tom Bakers Theory of Pidgin Metadata
- Pidgin languages result from the need for
communication among groups who do not share a
common language - simplification and hybridization
- Creolization is the process of complexification
of a pidgin language - Addition of semantic and syntactic nuance that
supports the inherent complexity of natural
language
57 Pidginization and Creolization
Museums
Metadata Creoles
Metadata Elements
Pidgin Metadata
Interoperability
58Extensibility(refined semantics)
- Ukrainian Doll model
- improve description precision with sub-structure
(sub-elements and schemes) - should degrade gracefully to preserve
interoperability
59Modular Extensibility
- Extensibility via modularity
- additional elements to support local or
discipline-specific requirements - complementary packages of metadata
60What might Extensibility mean for Specific
Communities?
- Basic elements can be thought of as a semantic
framework - High-level descriptors to describe general
characteristics of a resource or collection - Use of domain-specific schemes for further
precision - Schemes to refine the semantics of Subject,
Description, Format, Relation, Coverage. - Controlled vocabularies, thesauri, namespaces,
and encoding rules
61The Purpose of Qualifiers
- Increase semantic specificity
- Specification of encoding rules
- Definition of substructure
- Authority Control
62Increase Semantic Specificity
- DC-4 Qualifiers should refine, not extend, the
semantics of elements - Additional detail is often required to support
the needs of local or domain-specific
applications - Controlled vocabularies provide for more
effective classification and retrieval (LCSH,
Dewey, MeSH, AAT.) - Enumerated lists of possible values
- Formats and Types
- Language Codes (ISO xxxx)
63Specification of Encoding Rules
- 2-4-1998
- The fourth day of February?
- The second day of April?
- Schemes that define the parsing rules for a
value - ISO 8601
- 1998-04-02
64Define the Substructure of a Compound Value
- Established schemas are essential for
interpreting certain data - An agent may include additional structured
information along with names - vCard
- LCNA
65Authority Control
- Authority Records assure unique identity of
people, places, corporate entities - Libraries have strong commitment to authority
control - Other communities as well
- Interested Party names (music industry)
- Important for contractual purposes
66Tradeoffs of Qualification
- On the one hand Keep it Simple
- no sub-elements or substructure
- interoperability is highest priority
- simplicity promotes deployment
- On the other Make it flexible
- complexity of description is unavoidable
- schemas will help bridge the complexity
- query precision is more important than simplicity
- All applications probably require some level of
Qualification
67DESIRE ConsortiumSeptember 20, 1999 Bristol,
UK Metadata for Networked Resources
An Introduction to RDF
68To recap...
- People and economies depend on information
- Exchange of information has been hindered by
incompatible hardware, software, protocols - eg library community MARC, AACR2, z39.50
- Less of a problem before... big problem now. Web
forces to recognize this.
69How do we solve this...
- Design enabling technologies / standards
- W3C World Wide Web Consortium
- Dublin Core Metadata Initiative
- Recognize problem context
- Multiple stakeholders and requirements
- International community
- Requirements will evolve
- Make assumption
- Common architectural components (syntax,
structure, semantics, protocols, etc) help
70Common Syntax XML
- XML - eXtensible Markup Language
- Markup Language - a mechanism to define tags and
the structural relationship between them in
documents - eXtensible - semantics not defined, no
pre-coordinated set of tags.
HTML ltmeta name author content Smith,
Johngt XML ltauthorgtltfngtJohnlt/fngtltlngtSmithlt/ln
gtlt/authorgt
71XML Continued...
- W3C Recommendation, Oct 1998
- Broad industry endorsement
- Subset of SGML ISO 8879
- lighter, stronger, able to leap tall building,
etc. - Validation
- Still able to benefit from SGML DTDs
- XML Schema (in progress)
- Notion of Well-formedness
- ltAgtltBgtlt/Bgtlt/Agt
72Data Transmission Methods
73XML for Describing Data
- Often times common syntax isnt enough
- Common structural representation for expressing
statements is required - The author of a document is Eric
ltauthorgt lturlgt http//doc_url lt/urlgt
ltnamegt Eric lt/namegt lt/authorgt
ltdocumentgt ltauthorgt ltnamegt Eric
lt/namegt lt/authorgt lturlgt http//doc_url
lt/urlgt lt/documentgt
ltdocument href http//doc_url author
Eric /gt
74Common Structure RDF
- RDF Resource Description Framework
- W3C Recommendation, Feb 1999
- Data Model
- Designed to impose structural constraint on
syntax to support consistent encoding, exchange
and processing of metadata - Schema
- Enables resource description communities to
define (and share) vocabularies (museum, library,
e-commerce)
75RDF Continued...
Eric
URIR
Eric
author
ltrdfDescription rdfabout http//uri_of_docu
ment bibauthor Eric /gt
76RDF Example 1
URIR
title
RDF Presentation
creator
Eric Miller
Eric Miller
lt?XML version1.0?gt ltrdfRDF xmlnsrdf
http//www.w3.org/TR/REC-rdf-syntax
xmlnsdc http//purl.org/dc/elements/1.0gt
ltrdfDescription rdfabout URIRgt
ltdctitlegt RDF Presentation lt/dctitlegt
ltdccreatorgt Eric Miller lt/dccreatorgt
lt/rdfDescriptiongt lt/rdfRDFgt
77RDF Example 3
URIR
title
RDF Presentation
creator
URIERIC
Eric Miller
Eric Miller
78RDF Example 2
URIR
title
RDF Presentation
creator
URIERIC
lt?XML version1.0?gt ltrdfRDF xmlnsrdf
http//www.w3.org/TR/REC-rdf-syntax
xmlnsdc http//purl.org/dc/elements/1.0gt
ltrdfDescription rdfabout URIRgt
ltdctitlegt RDF Presentation lt/dctitlegt
ltdccreator rdfresource URIERIC/gt
lt/rdfDescriptiongt lt/rdfRDFgt
79Description Vocabularies
URIR
msKgrip
John Smith
80Common Semantics
- Enabling technologies
- XML provides flexible syntax, RDF provides common
data model for representation and declaration
mechanisms for semantics - Resource Description communities define
vocabularies that satisfy community requirement - share and reuse vocabularies
- Dublin Core Metadata Initiative is a prime example
81Dublin Core Metadata Initiative
- Simple element set designed for resource
description - International, inter-discipline, community
consensus - Semantic interface among resource description
communities
82More Info
- W3C World Wide Web Consortium
- http//www.w3.org/
- XML home page
- http//www.w3.org/XML/
- RDF home page
- http//www.w3.org/RDF/
- Dublin Core Metadata Initiative
- http//purl.org/dc/
83DESIRE ConsortiumSeptember 20, 1999 Bristol
UKMetadata for Networked Resources
84Building block for systems and services
- Revealing information about a resource
- Managing resources
- Negotiating transactions
- Providing discovery, locate, delivery services
85Metadata is used for..
- Supporting operations carried out on information
objects - Enabling software and humans to initiate actions
on resources
86What does metadata describe?
- papers, articles
- information pages
- images
- sound
- collections
- user profiles
- ...Digital and physical
- manifestations
87Diversity of services
- Resource discovery services
- Web site management
- Content rating
- Digital preservation
- Rights management
88Selective services
- Added value descriptions
- subject headings
- subject classifications
- summary descriptions
- authority control
- Selection
- target audience
- quality of resource
- by subject area
- by region
89Benefits of shared approaches
- Compatible technical solutions
- Shared semantics (common metadata sets)
- Shared syntax (HTML, RDF/XML )
- Consistency of content (cataloguing rules)
90Information gateways
- Support activities
- ROADS, DESIRE, IMesh
- Range of associated information gateways
- DutchESS
- Finnish Virtual Library project
- EELS
- NOVAGate
- SOSIG, EEVL, OMNI, BizEd ...
- Internet Scout . etc
91Metadada creation
- Who creates metadata?
- Authors
- Experts
- Metadata creation agencies
- Where?
- Embedded in a resource
- Linked to resource
- Local database
- Third party database
92Collaborative metadata creation
- Information providers
- Publishers
- Libraries
- Service providers
- information gateways (RNC, Nordic Web Index,
AGLS) - bibliographic utilities (OCLC, BookData .)
93Description of BIBLINK Workspace
Publishers
BIBLINK Workspace A shared facility for storing
and manipulating BIBLINK workspace records
Third parties e.g. Identification agencies -
ISBN, ISSN, etc.
BIBLINK Workspace Administrator
National Bibliographic Agencies
15
94Future options?
- More complex creation models
- Re-use of metadata
- Enhancement of harvested metadata
- Incremental additions to metadata
- Targeted services
- Facilitating personalised views
- Providing structured environments
95DESIRE ConsortiumSeptember 20, 1999 Bristol,
UK Metadata for Networked Resources
Metadata into the Mainstream
96Semantic Web
If HTML and the Web made all the online
documents look like one huge book, RDF, schema
and inference languages will make all the data
in the world look like one huge database. Tim
Berners-Lee, 1999
97(No Transcript)
98Data Transmission Methods
99RDF as Building Blocks
100Trusted Third Party metadata e.g.
Resource Referenced by Service in RDF
101Trusted Third Party metadata e.g.
Resource Referenced by Service in RDF
102Trusted Third Party metadata e.g.
Resource Referenced by Service in RDF
Embedded and Associated metadata e.g.
Site-Maps in RDF
103Trusted Third Party metadata e.g.
Resource Referenced by Service in RDF
Embedded and Associated metadata e.g.
Site-Maps in RDF
104Sitemaps / Channels described in RDF/DC
Search Results described in RDF/DC
105Open Source / Open Standards
The consensus-building role played by the Dublin
Core within the metadata community is similar to
that played by the Mozilla Organization and
related initiatives in the 'open source' software
world. It should be possible to leverage the work
of the DC community to provide non-proprietary,
multilingual vocabularies for Mozilla-based
applications http//www.mozilla.org/rdf/doc/vo
cabs.html
106Panel Session
- Nicky Ferguson, ILRT (Chair)
- Eric Miller, OCLC
- Carl Lagoze, Cornell
- Rachel Heery, UKOLN
- Dan Brickley, ILRT
- Andy Powell, UKOLN
- Debra Hiom, SOSIG, ILRT