Title: OAI Overview
1OAI Overview
- DLESE OAI Workshop
- April 29-30, 2002
- John Weatherley (jweather_at_ucar.edu)
2Workshop Schedule
- Day 1
- Morning
- Overview of OAI
- Look at OAI tools and resources
- Afternoon
- DLESE OAI software installation, configuration
and setup - Day 2
- Morning
- Overview of NDSL and DLESE interoperability
architecture - NSDL metadata overview
- Metadata and OAI
3Resources
- Workshop presentation slides, links to tools and
other OAI resources are located
athttp//oai.dlese.org
4What is DLESE and NSDL?
- DLESE Digital Library for Earth System
Education - provides access to digitally accessible resources
for learning about the Earth system - NSDL National Science (STEM) Digital Library
- network of scholarly and educational digital
libraries related to science (DLESE will be part
of this network)
51. What is the OAI?
- What is the Open Archive Initiative (OAI)?
- Organization dedicated to solving problems of
digital library interoperability by defining
simple protocols and standards - Grew out of the e-prints (arXiv) community at Los
Alamos - What is the OAI Protocol for Metadata Harvesting
(OAI-PMH)? - Protocol to transfer metadata from a source
archive to a destination archive - How is the OAI-PMH Being Used by the NSDL and
DLESE? - The OAI-PMH has been adopted as a primary means
of gathering and sharing metadata among
contributors - Also used to facilitate internal management of
metadata stores
6What is Metadata?
- Data refers to digital objects e.g. the resources
themselves - Metadata is data about data e.g. a description
about a resource, not the resource itself - OAI is used to transmit metadata
72. Definitions / Concepts
- Basic Principles
- Harvesting vs. Federation
- Data Providers vs. Service Providers
- Underlying Technology
- HTTP and XML
- XML Namespaces and Schema
- Protocol Policies and Conventions
- Basic Policies
- Sets
8Harvesting vs. Federation
- Competing approaches to interoperability
- Federation is when services such as searching are
run remotely - Harvesting is when metadata is transferred from
remote sources to the destination where the
services are located - Federation requires more effort at the remote
site but is easier for the local system - Harvesting requires less effort at the remote
site Services are provided by the local system - OAI uses the harvesting model
9Data Providers vs. Service Providers
- Data Providers refer to entities who possess
metadata and are willing to share this with
others (e.g. collection builders) - Service Providers are entities who harvest data
from Data Providers in order to provide
higher-level services to users (e.g. searching,
browsing, recommender systems, etc.). The NSDL
and DLESE are examples.
10Features of the OAI Approach
- Lightweight Low overhead for Data Providers
- Protocol is relatively simple to implement
- Many plug-and-play tools publicly available
- Transports any metadata framework that can be
made available in XML form (details to come) - Details of searching, browsing, annotation and
other advanced services are handled by the
Service Provider
11Metadata Harvesting Framework
Data Providers (collection builders)
Library User
1. Service Provider polls periodically for new
records
3. Provide searching, browsing, and other
services over the data.
OAI protocol (over http)
Service Provider (DLESE, NSDL)
Harvested Records
2. New records downloaded and cached by the
Service Provider
12HTTP and XML
- The OAI-PMH is an almost stateless
request/response protocol - Requests and responses are sent via the HTTP
protocol - Requests are encoded as GET/POST operations
- Responses are well-formed XML documents
13Well-formed and Valid XML
- Correct
- ltcargt
- ltmakegtDodgelt/makegt
- ltmodelgtSpiritlt/modelgt
- ltyeargt1994lt/yeargt
- ltownergt
- ltnamegtyoult/namegt
- ltplategtCOlt/plategt
- lt/ownergt
- lt/cargt
- Incorrect
- ltcargt
- ltmakegtDodgelt/makegt
- ltmodelgtSpiritlt/modelgt
- ltyeargt1994
- ltownergt
- ltplategtCOlt/plategt
- ltnamegtyoult/namegt
- lt/cargt
- lt/ownergt
14DTD, Schemas Namespace
- DTDs Document Type Definition
- Describe the elements of XML instance documents
- Not well-formed XML
- Some data-typing
- Namespaces harder to deal with
- Schemas
- Describe the elements of XML instance documents
- Well-formed XML
- Strong data-typing
- Namespaces are easier to deal with
- Namespace
- Collection of related element names identified by
a name label (e.g. dc)
15XML Namespaces and Schema
- Consistency and data quality is ensured by using
XML Schema descriptions for each possible
response - XML Namespaces are used where necessary to
clearly define which parts of the responses are
actual metadata and which support the OAI-PMH. - Example
- http//www.cstc.org/cgi-bin/OAI/CSTC.pl?verbGetR
ecordidentifieroai3ACSTC3A103metadataPrefixo
ai_dc
16Basic OAI Policies and Conventions
- Each metadata record from a given Data Provider
must have a unique ID (OAI ID is not necessarily
the same as the record ID) - Each metadata record must be persistent so that
Service Providers can always refer back to the
source - Each record must have a date stamp indicating
creation / modification date - Dates provide a mechanism for incremental and
continuous transfer of metadata by only
requesting records that have changed since the
previous harvest - Flow Control - Resumption Tokens can be used to
return partial results the client is issued a
token which may be presented to the server to
receive more results - Multiple formats of metadata are allowed
- Examples Dublin Core, DLESE IMS
17Sets
- OAI-PMH mechanism to allow for harvesting of
sub-collections - Semantics for sets are defined outside of the
protocol - Sets are defined by conventions established
between data and service providers - Example sets within DLESE might be DWEL, COMET,
LDEO, etc. - Example sets within the NDSL might be DLESE,
DLESEDWEL, DLESECOMET, DLESELDEO, etc. - Sets can be established that enable querying
(e.g. by topic, author name, subject area, etc.) - Example The Open Digital Library (Suleman, 2001)
183. Requirements to be a Data Provider
- Source of metadata
- Human or automated resource catalogers
- Metadata mappings
- Crosswalks from native formats to DC or other
formats - Server technology
- Handled by the OAI software
- Datestamps
- Deletions
- Unique identifiers
194. The OAI-PMH
- Service Requests
- Identify
- ListMetadataFormats
- ListSets
- GetRecord
- ListIdentifiers
- ListRecords
- Date Ranges
- Resumption Tokens
20Identify
- Purpose
- Return general information about the archive and
its policies - Parameters
- None
- Sample URL
- http//oai.dlese.org/provider?verbIdentify
21ListMetadataFormats
- Purpose
- List metadata formats supported by the archive as
well as their schema locations and namespaces - Parameters
- Identifier for a specific record ( O )
- Sample URL
- http//oai.dlese.org/provider?verbListMetadataFor
mats
22ListSets
- Purpose
- Provide a hierarchical listing of sets in which
records may be organized - Parameters
- None
- Sample URL
- http//oai.dlese.org/provider?verbListSets
23GetRecord
- Purpose
- Returns the metadata for a single identifier in
the form on an OAI record - Parameters
- identifier id for the record ( R )
- metadataPrefix metadata format ( R )
- Sample URL
- http//oai.dlese.org/provider?verbGetRecordident
ifierdlese3ADLESE-000-000-000-002metadataPrefix
dlese_ims
24ListIdentifiers
- Purpose
- List all unique identifiers corresponding to the
record in the repository - Parameters
- from start date ( O )
- until end date ( O )
- resumptionToken flow control mechanism ( X )
- Sample URL
- http//oai.dlese.org/provider?verbListIdentifiers
25ListRecords
- Purpose
- Retrieves metadata for multiple records
- Parameters
- from start date ( O )
- until end date ( O )
- resumptionToken flow control mechanism ( X )
- set set to harvest from ( O )
- metadataPrefix metadata format ( R )
- Sample URL
- http//oai.dlese.org/provider?verbListRecordsmet
adataPrefixdlese_ims
26DLESE Architecture
DLESE Portal
Library Users
Services (e.g. Whats New)
Search Discovery
OAI
MetadataRepository
NSDL
OAI
OAI
Direct Entry
Resources
Collections
27References
- Building Interoperable Digital Libraries A
Practical Guide to creating Open Archives,
Hussein Suleman (hussein_at_vt.edu), JCDL 2001
Tutorial. - A Framework for Building Open Digital
Libraries, Hussein Suleman and Edward A. Fox, in
D-Lib Magazine, December, 2001.
http//www.dlib.org/dlib/december01/suleman/12sule
man.html - The Open Archives Initiative http//www.openarchiv
es.org