Title: Metadata Overview
1Metadata Overview
Metadata Data that describes data Structured
data about data Pure metadata has meaning only
in relation to the primary data that is being
described.
2Metadata Overview
Metadata may be either Extrinsic Existing
indendepently of the primary data being
described, usually in an indexable metadata base
or Intrinsic Existing as a part of the primary
data being described
3Metadata Overview
Design Criteria for a Metadata System ? Durable
- independent of changes to hardware,
software and network infrastructure ?
Interoperable Can be seamlessly shared across the
web with disparate hardware, software,
network infrastructure and search
engines
4Metadata Overview
? Precise Enables the creation of customized
virtual collections--pulling objects together
seamlessly from any digital space to meet
exact information requirements. ?
Flexible Supports any search engine, search
strategy, transport or display option ?
Efficient Provides immediate access to the
most appropriate asset for the searcher. ?
Controlled Insures digital assets are from
a trusted source to an authorized end user.
5Metadata Overview
? Granular - Able to search the top page,
subsequent pages, or drill down to an
underlying database of objects. Break
through the web skin
Query
Search Engine
Underlying ObjectDatabase
????????????
metadatabase
6Metadata Overview
Key Concepts Semantics Meaning ascribed by a
community to a metadata element or to the
values for that element. Organized into a
vocabulary. Structure Imposes order for the
unambiguous expression of the
semantics--consistent coding, exchange and
display of metadata elements, providing
consistent interpretation by the end user.
Syntax Provides a means to represent one or
more structures in a flexible, extensible
manner. Provides underlying mechanism for
encoding, exchange, display and machine
processing of metadata . Example XML
7Metadata Overview
Schema Identifies, defines, organizes and
constrains the elements in a set, their
characteristics and descriptions. Involves both
semantics and structure. Examples Dublin
Core, RDF
8Metadata Overview
Types of Metadata ? Structural Describes the
physical and logical attributes of the object,
related to creation, transport, storage and
display Describes the hardware and software
used to create the object (Some place this
in Administrative metadata) Describes the
hardware, software and bandwidth needed to
transport and display the object. May be
machine-readable, human readable or both. May
be part of digital object header (ex TWAIN)
9Metadata Overview
? Provenance/Ingest Metadata Admission ticket
to the Archive or Data Repository. Acknowledges
the rules of entry and identifies the object for
positioning within the Archive. Best if
intrinsic in the object, e.g. in the
Header. Identifies the owner/creator of the
metadata. Identifies the owner/creator of the
digital asset. Provides date created, permanence
of asset updates and modifications to asset.
May push asset to users when content changes.
10Metadata Overview
? Rights Access Provides requirements for
access, display and download/storage of asset.
Should integrate with organizations access and
authorization system, e.g. Reference/hyperlink
to Digital certificate authority Indicate User
restrictions (may reference attribute on
certificate authoritys user attribute
server Support multilayered access download
only vs. store free vs. fee asset
versions (high res. Vs. low res.)
11Metadata Overview
? Descriptive Should uniquely identify an asset
through Physical description (overlap with
structural metadata) Publication/Creation
information (overlap with ingest metadata)
Should describe the information content in
subject and free-text fields to identify and
select the asset in response query from a
search engine.
12Metadata Overview
? Linking Metadata Persistent Links Metadata
record and the described asset. All physical
instantiations of the asset. Registries for
metadata schemas used to provide a meta- schema
to describe the object. Security system for
access and authorization and/or link to
intermediary access page Considerable overlap
with other metadata types
13Metadata Overview
Mining Web Assets Current Practice A query is
sent to a proprietary search engine, or a
metasearch engine which queries many
engines. Benefits Ubiquitous and free
competition results in better precision and
coverage Drawbacks Access for assets only,
not long-term management Ephemeral metadata
Asset creator has no control over description and
access.
14Metadata Overview
Standards are Developed to ? Create durable,
persistent metadata records that precisely
define the asset so that exactly-relevant
assets are identified and retrieved in response
to a query. ? Create metadata that is flexible,
extensible, and scalable to support the needs of
any organization, any type of asset, and varying
skill and interest levels of metadata creators.
? Allow the metadata records from many schemas
with differing levels of complexity to
interoperate for data discovery. ? Enable
machine-intervention for automatic interpretation
of metadata and data discovery, particularly
among disparate search and retrieval platforms
15Metadata Overview
ISO 11179 Joint Standard of the ISO
(International Organization for Standardization)
and IEC (International Electrotechnical
Commission) to provide a robust framework for
defining data elements in an unambiguous and
persistent manner within user committees. Also
provides a framework for creating and maintaining
metadata registries to store and maintain data
element definitions. NCITS L8 Draft Standards
available at the following websites
http//www.jtc1.org/ http//pueblo.lbl.gov/olken/
X3L8/drafts/draft.docs.html
16Metadata Overview
Relevant Metadata Standards Dublin Core Element
Set V. 1.1 (IETF Recommendation) - Flexible
lowest common denominator standard with 15
optional, repeatable fields - XML and HTML based
- integrates completely with assets that live on
the web or are accessed via the web and live in
an attached database May be intrinsic or separate
from the asset described - Automated tools for
generating/validating Dublin Core are freely
available, e.g. DC.dot http//www.ukoln.ac.uk/met
adata/dcdot/
17Metadata Overview
From Description of Dublin Core
Elements http//purl.oclc.org/metadata/dublin_cor
e_elements
18Metadata Overview
Dublin Core Drawbacks Too Flexible and Simple
for complex, sophisticated collections Elements
lack standardized use and precision. Different
communities are developing extensions to
specify and categorize the elements. Approved
extensions are available but slow to appear.
Some elements (rights, coverage) are ambiguous
in their application
19Metadata Overview
Dublin Core Drawbacks Intended for web objects
that are textual or primarily textual. Does
not provide for Media asset components (video
sequences, scenes, shots, frames, objects)
sequential media (audio and video, slide shows)
synchronized media (video, audio, caption file
or transcription slide shows).
20Metadata Overview
Result Every Community Creates Their Own
Metadata Archives EAD (Encoded Archival
Description) Government GILS (Government or
Global Information Locator
System) IMS Instructional Metadata
System TEI Text Encoding Initiative - books and
humanities TEIH (TEI Header used for
metadata description Dublin Core EdNA
http//www.edna.edu.au/edna/owa/info.getpage?spau
topagecode5210 Flavors CIMI Guide to Best
Practice Dublin Core. Available as PDF from
http//www.cimi.org/
21Metadata Overview
MARC Machine-readable cataloging most
library catalogs worldwide. MPEG-7 Digital
Audio, Video and Still Image files. (In
development. Committee draft due October 2000)
22Metadata Overview
MPEG-7 Intended to describe audiovisual
information regardless of storage, coding,
display, medium of technology--will include
analog and digital media and combinations of
media formats Will Standardize Core set of
Descriptors (D) Description Schemes (codified
structures of Descriptors-- definition,
constraints, relationships among Descriptors)
(DS) Language defining Description Schemes
and Descriptors
23Metadata Overview
MPEG-7 Structural Model
Jane Hunter. MPEG-7 Behind the Scenes in D-Lib
Magazine September, 1999 (v. 5, no. 9) 6)
24Metadata Overview
Possible MPEG7 schema incorporating
DC ltDCTypegtImage.Moving.TV.News.sequence.scenelt/D
CTypegt ltDCDescription.textgtFootage of Grenade
Attacklt/DCDescription.textgt ltDCDescription.tran
scriptgtSam Rainsy knows the violence of
political life in Cambodia. Four months ago, 16
of his supporters were killed in a grenade attack
in Phom Penh.lt/DCDescription.transcriptgt ltDCFor
mat.Lengthgt10secondslt/DCFormat.Lengthgt ltDCCovera
ge.t.min DC.SchemeSMPTEgt1931571lt/DCCoverage
.t.mingt ltDCCoverage.t.maxDC SchemeSMPTEgt1932
071lt/DCCoverage.t.maxgt From Jane Hunter and
Renato Iannella. The Application of Metadata
Standards to Video Indexing. In Research and
advanced technology for digital libraries
second European conference, ECDL '98, Heraklion,
Crete, Cyprus, September 21-23, 1998
Proceedings. Berlin Springer 1998 (Lecture
Notes in Computer Science 1513) 135-156.
25Metadata Overview
Beyond the Metadata Schema Access to
Information ? Information stored and managed
within your organization (possibly under
different metadata schema) ? Information
stored and managed by outside organizations
26Metadata Overview
Books and web sites written by Grace Agnew
QUERY
Search Engine
Author Agnew, Grace Parameter mapping
DC.Creator, DC.Contributor
Metadatabase - Dublin Core Record 1 DC.Creator
Grace Agnew Record 70 DC.Contributor. Grace Agnew
Result Set AGNEW, GRACE1999 AGNEW,
GRACE1994...
27Metadata Overview
QUERY
Books and web sites written by Grace Agnew
SEARCH ENGINE 2
SEARCH ENGINE 1
Author Agnew, Grace Parameter mapping 100, 700
Author Agnew, Grace Parameter mapping
DC.Creator, DC.Contributor
28Metadata Overview
Z39.50 Information Retrieval (Z39.50)
Application Service Definition and Protocol
Specification Enables a client to interact with
multiple servers, employing different search
engines and different data element formats and
definitions, to search databases and retrieve the
records that result from the search
29Metadata Overview
Z39.50 ? Initiates a session between client and
server ? Executes a query from the client
against one or more databases on the
server ? Creates a result set consisting of
records that match the query on one or more
query attributes (access points)
30Metadata Overview
Z39.50 ? Returns a report on the number of
records matching the search ? Returns
records--individual records selected by the
client--in a format selected by the
client ? Primary formats returned MARC, SUTRS,
extending to SQL, Dublin Core, other schema
31Metadata Overview
- Z39.50 Version 3
- Extends the capabilities of the standard to
include - Boolean and proximity searching
- Extended services, including saved queries to
be periodically re-executed (SDI) - Explain facility to allow client to solicit
information about the server and
dynamically reconfigure itself.
32Metadata Overview
Z39.50 Profiles for User Groups LOC Access to
Digital Collections LOC Access to Digital
Library Objects CIMI Companion profile for
museum digital collections and objects GEO
Geospatial Datasets ZSQL extension to the SQL
query language
33Metadata Overview
Z39.50 - Limitations Requires client software and
Z39.50-enabled server software (which requires
Z39.50 aware search engine) Most commercial C/S
Products have not implemented the explain
feature in version 3 Requires human collaboration
for implementation, particularly at the profile
level Limited primarily to features provided by
commercial servers and clients
34Metadata Overview
Z39.50 Limitations Indexing parameters
proprietary to server database are not shared
with client to allow client to override or extend
the proprietary search parameters Databases that
are not on a Z39.50 server are invisible
35Metadata Overview
Metadata Registries Dynamic specification,
maintenance and description of metadatabase
structures unambiguous definition of data
structures unambiguous definition and
description of relationships between data
structures, behaviors of data structures,
integrity constraints on the contents of data
structures. semantics (meaning in context) and
structure definition
36Metadata Overview
Metadata Registries Links/Hooks into subordinate
registries used to define data content within a
metadata element Mapping of data structures
between registries Should be both eye-readable
and able to be interpreted by computer programs
for seamless, unambiguous discovery, query and
display across disparate database and search
engine structures and to enable intelligent query
agents, advanced data mining, etc.
37Metadata Overview
Metadata Registries Collaborative Effort of the
Joint Technical Committee 1 (JTC1) of the
International Organization for Standardization
(ISO) and the International Electrotechnical
Commission (IEC) Open Forum on Metadata
Registries http//www.sdct.itl.nist.gov/ftp/l8/s
c32wg2/2000/events/openforum/index.htm
38Metadata Overview
Metadata Registries REGGIE - Java Applet that
dynamically creates metadata according to
available online registries Allows you to enter
your own registry, describing, characterizing and
constraining all the elements in the
set. http//metadata.net UK/Australia joint
effort
39Metadata Overview
Anything by Grace Agnew?
REGISTRY
Search Engine
Metadatabase Scheme DC ltURL of Registrygt
Dublin Core. Author defined as Creator, Contribut
or
40Metadata Overview
Resource Description Framework W3C Resource
Description Framework (RDF) Model and Syntax
Specification (22 February 1999)
http//www.w3.org/TR/REC-rdf-syntax/ Provide
robust application of metadata in the web
environment Model for unambiguous,
schema-independent description of resources. Key
Concepts Resource Any object uniquely
identifiable by a URI (uniform resource
identifier) Property-type Property associated
with a resource. Value Associated with a
property type--may be atomic (a string) or
another resource, creating a new hierarchy)
41Metadata Overview
RDF Property types express the relationships of
values associated with resources Famous
Example The Author of Metadata Overview is
Grace Agnew
Author
Metadata Overview http//www.edu/mo
Grace Agnew
Property Type
Value
Resource
42Metadata Overview
RDF Enables interoperability among metadata
schemes, including the modular use of multiple
schemes within a metadata record utilizing the
XML namespace facility Adds machine-interpretable
semantics to the encoding, exchange and reuse of
structured metadata Enables automatic
negotiation between search engine, metadata
record, and metadata registry for powerful,
flexible search and retrieval independent of
server and client search and retrieval
infrastructures (or, at least, it will!)
43Metadata
Application of Dublin Core and RDF for resource
description Dublin Core in HTML - Resides in the
Header Element lthtmlgt ltheadgt lttitlegtA Thousand
Wheels are set in Motion - Georgia Tech Library
and Information Center lt/titlegt ltlink
relschema.DC" href"http//purl.org/dc"gt ltmeta
name"DC.Title" content"A Thousand Wheels are
Set in Motiongt ltmeta nameDC.Title.Alternative"
content"The Building of Georgia Tech at the Turn
of the 20th Century, 1888-1908"gt ltmeta
name"DC.Creator.CorporateName scheme"LCNAF"
content"Georgia Institute of Technology Library
and Information Center"gt ltmeta name"DC.Subject"
scheme"LCSH" content"Georgia Institute of
Technology--Buildings"gt ltmeta name"DC.Description
" content"This Web site provides photographs,
engravings and sketches of the first buildings on
the Georgia Tech Campus, from 1888-1908. As of
9/20/1999, 88 images are provided but more will
be added. Cataloged in EAD Single Item Metadata
format."gt ltmeta name"DC.Publisher.CorporateName"
scheme"LCNAF" content"Georgia Institute of
Technology Library and Information Center"gt ltmeta
name"DC.Contributor.PersonalName" scheme"LCNAF"
content"Chritton, Heather"gt ltmeta
nameDc.Contributor.PersonalName
schemeLCNAFcontentCrafts, Laurelgt
Full Metadata record http//www.library.gatech.ed
u/gtbuildings
44Metadata Overview
RDF / Dublin Core in XML lt?xmlnamespace
hrefhttp//www.w3c.org/RDF/asRDF?gt lt?xmlna
mespace hrefhttp//purl.org/RDF/DCasDC?gt lt?X
Mlnamespace hrefhttp//loc.gov/LCNAFasLCNAF
?gt lt?XMLnamespace hrefhttp//loc.gov/LCSH as
LCSH?gt ltRDFRDFgt ltRDF Description RDF
HREFhttp//purl.org/metadata/dublin_core_element
sgt ltDC.Titlegt A Thousand Wheels are Set in
Motionlt/DCTitlegt lt DC.Title.Alternativegt The
Building of Georgia Tech at the Turn of the 20th
Century, 1888-1908lt/DC.Title.Alternativegt ltDCCrea
tor.CorporateNamegt ltRDFDescriptiongt ltLCNAFCorp
orateNamegtGeorgia Tech Library and Information
Centerlt/LCNAFCorporate Namegt lt/RDFDescriptio
ngt
45Metadata Overview
ltDCSubjectgt ltRDFDescriptiongt ltLCSHCorpora
teNamegtGeorgia Institute of Technology--
Buildingslt/LCSHCorporateNamegt
lt/RDFDescriptiongt ltDCDescriptiongt This Web site
provides photographs, engravings and sketches of
the first buildings on the Georgia Tech Campus,
from 1888-1908. As of 9/20/1999, 88 images are
provided but more will be added. Cataloged in EAD
Single Item Metadata (SIM) format.lt/DCDescription
gt ltRDFSeqgt ltRDFDescriptiongt
ltRDFLIgtltLCSHPersonalNamegtChritton,
Heatherlt/LCSHPersonalNamegtlt/RDFLIgt
ltRDFLIgtltLCSHPersonalNamegtCrafts,Laurellt/LCSHPer
sonalNamegtlt/RDFLIgt lt/RDFDescriptiongt
lt/RDFSeqgt
46Metadata Overview
Notes 1. RDF shows three types of
relationships among collected resources
Sequence (specified ordering of elements)
Bag (all members of equal importance)
Alternatives (choice between members) In this
example, I am specifying among contributors that
Heather Chritton, the web page developer, appears
first among contributors and Laurel Crafts, the
digital image creator, appears second. Other
contributors follow (text creation, metadata
creation, indexing, etc.) in specified order in
the complete record. I use the RDF Sequence list
to establish this fixed contributor order. 2.
LCSH (Library of Congress Subject Headings) and
LCNAF (Library of Congress Name Authority File)
do not currently reside on web pages at a URL.
The URLs provided are for illustration only
47Metadata Overview
XML Extensible Markup Language, a subset of SGML
(Standard Generalized Markup Language) provides
the ability to define elements within a web
document. XML documents have a logical and a
physical structure. Each unit of an XML document
is an entity. Entities are defined within the
document in relation to each other. The logical
and physical structures of the document include
declarations, elements, comments, character
references and processing instructions.
Structural relationship is provided through
nesting.
48Metadata Overview
XML XML display is governed by an attached style
document, formulated in CSS (Cascading Style
Sheet) or XSL (Extensible Style Language) to
provide rules for display. Styles can be applied
to single elements as well as to the entire
document. More than one style sheet or style
document can be provided for a document or
element, with precedence rules governing the
given display.
49Metadata Overview
DTD The Document Type Declaration provides a
formally defined structure, vocabulary and
syntax for an XML document type. Documents are
validated against a DTD to insure nested
structure and semantic constraints are followed
to insure consistent meaning across
documents. DCD A semantic superset of XML
DTDs--intended to be conformant with the RDF
Model and Syntax Specification. Describes an XML
vocabulary for schemas--for specifying object
classes. Based on elements (RDF property types)
and attributes Supports RDF vocabulary and
constructs.
50Metadata Overview
SOX Schema for Object-Oriented XML Alternative
to DTD for validating XML documents. Supports
scalar (numeric) datatypes, enumerated
datatypes (values enumeration) and format
datatypes. An expanded namespace facility
supports objects from any identifiable
namespace to be used to build the document.
51Metadata Overview
Role of the Database A database that can be
parsed and reported to a validated XML metadata
format, as well as other metadata syntaxes,
provides a robust space for metadata development.
Also reports to any XML Document type and hooks
into applications via APIs, to support unique
user needs
ORACLE DATABASE
SUBJECT-SPECIFIC WEB RESEARCH TOOL
MARC-BASED CATALOG
PERSONAL RESEARCH SPACE
COLLABORATIVE RESEARCH SPACE
WEB-BASED COURSEWARE APPLICATION
52Metadata Overview
Last Step Data Retrieval Data storage, access
and delivery architecture should be open,
standards-based, hardware and software
independent, providing users across platforms
with common, consistent interface and underlying
storage structure for efficient retrieval,
display, storage and use of digital
information Data architecture should support a
well-defined, widely available security system to
validate authenticity of users and provide data
for a variety of uses according to a scalable
authorization hierarchy
53Metadata Overview
Last Step Data Retrieval Data architecture
should support data as objects for scalable,
extensible access, with sophisticated and
flexible support for object relationships,
particularly to support different physical
instantiations of identical data, e.g. digital
video object as D1, MPEG1, Quicktime, etc. CORBA
Common Object Request Broker Architecture -
emerging architecture for open distributed object
computing. Intended to provide transparent
access to applications and databases, regardless
of the hardware and software infrastructure at
each end of the transaction
54Metadata Overview
Putting It All Together A Digital Archive
Architecture
Reference Model for Open Archival Information
Systems (OAIS), Developed by a US ISO archiving
group under ISO TC20/SC13 and the Consultative
Committee for Space Data Systems (CCSDS). This
model has recently been released for formal ISO
and CCSDS review. An electronic version of the
OAIS Reference Model can be found at
http//www.ccsds.org/RP9905/RP9905.html
55Reference Model for Open Archival Information
Systems (OAIS) EXTERNAL DATA FLOW DIAGRAM