Title: OAI Protocol for Metadata Harvesting
1OAI Protocol for Metadata Harvesting
Tim Brody Intelligence, Agents, Multimedia
Group University of Southampton OpCit
http//opcit.eprints.org/ www.ecs.soton.ac.uk
BCS Metadata Meeting, London 29th May 2002
(Many slides borrowed from Michael L. Nelson)
2OAI 2.0
- Public, stable not released yet (but very
close) - Beta released mid-May
- Public release scheduled 1st June
- 2.0 implementations in the pipeline
- British Library, Cornell Univ, Ex Libris, my.OAI,
Humbolt Univ, InQuirion Pty Ltd, Library of
Congress, NASA, OCLC, Old Dominion Univ, U. of
Illinois, U. of Southampton, UCLA, John Hopkins
U., Indiana U., NYU, UKOLN, Virginia Tech
3Open Archives Initiative
4Metadata Harvesting
- Move away from distributed searching
- Extract metadata from various sources
- Build services on local copies of metadata
- Resources remain at remote repositories
all searching, browsing, etc. performed on the
metadata here
user
individual nodes can still support direct
user interaction
search for cfd applications
local copy of metadata
metadata harvested offline
metadata harvested offline
metadata harvested offline
metadata harvested offline
each node independently maintained
. . .
5Metadata Harvesting
- Repositories (archives etc.) low implementation
cost - Services higher implementation cost
- Similar to web search model
- DP9 gateway makes it exactly the same
6Santa Fe convention
OAI-PMH v.1.0/1.1
OAI-PMH v.2.0
7 OAI-PMH v.2.0 06/2002
- Goal recurrent exchange of metadata about
resources between systems - Input
- OAI-PMH v.1.0 01/01 09/02
- feedback on OAI-implementers
- deliberations by OAI-tech 09/01 -
- alpha test group of OAI-PMH v.2.0 03/02 -
8 OAI-PMH v.2.0 06/2002
- low-barrier interoperability specification
- metadata harvesting model data provider /
service provider - metadata about resources
- autonomous protocol
- distinction between protocol and periphery
- community-specific extensions
- HTTP based
- XML responses
- unqualified Dublin Core
- stable (1.0 characterized as experimental)
9OAI Data ModelResources / Items / Records
item identifier
record identifier metadata format datestamp
10Overview of OAI Verbs
archival metadata
harvesting verbs
most verbs take arguments dates, sets, ids,
metadata formats and resumption token (for flow
control)
11Identify
1.1
2.0
- Arguments
- none
- Errors
- none
- Arguments
- none
- Errors
- badArgument
12ListMetadataFormats
1.1
2.0
- Arguments
- identifier (OPTIONAL)
- Errors
- id does not exist
- Arguments
- identifier (OPTIONAL)
- Errors
- badArgument
- noMetadataFormats
- idDoesNotExist
13ListSets
1.1
2.0
- Arguments
- resumptionToken (EXCLUSIVE)
- Errors
- no set hierarchy
- Arguments
- resumptionToken (EXCLUSIVE)
- Errors
- badArgument
- badResumptionToken
- noSetHierarchy
14ListIdentifiers
1.1
2.0
- Arguments
- from (OPTIONAL)
- until (OPTIONAL)
- set (OPTIONAL)
- resumptionToken (EXCLUSIVE)
- Errors
- no records match
- Arguments
- from (OPTIONAL)
- until (OPTIONAL)
- set (OPTIONAL)
- resumptionToken (EXCLUSIVE)
- metadataPrefix (REQUIRED)
- Errors
- badArgument
- cannotDisseminateFormat
- badResumptionToken
- noSetHierarchy
- noRecordsMatch
15ListRecords
1.1
2.0
- Arguments
- from (OPTIONAL)
- until (OPTIONAL)
- set (OPTIONAL)
- resumptionToken (EXCLUSIVE)
- metadataPrefix (REQUIRED)
- Errors
- no records match
- metadata format cannot be disseminated
- Arguments
- from (OPTIONAL)
- until (OPTIONAL)
- set (OPTIONAL)
- resumptionToken (EXCLUSIVE)
- metadataPrefix (REQUIRED)
- Errors
- noRecordsMatch
- cannotDisseminateFormat
- badResumptionToken
- noSetHierarchy
- badArgument
16GetRecord
1.1
2.0
- Arguments
- identifier (REQUIRED)
- metadataPrefix (REQUIRED)
- Errors
- id does not exist
- metadata format cannot be disseminated
- Arguments
- identifier (REQUIRED)
- metadataPrefix (REQUIRED)
- Errors
- badArgument
- cannotDisseminateFormat
- idDoesNotExist
17 response no errors
lt?xml version"1.0" encoding"UTF-8"?gt ltOAI-PMHgt lt
responseDategt2002-0208T085546Zlt/responseDategt
ltrequest verbGetRecord gthttp//arXiv.org/oai
2lt/requestgt ltGetRecordgt ltrecordgt ltheadergt
ltidentifiergtoaiarXivcs/0112017lt/identifiergt
ltdatestampgt2001-12-14lt/datestampgt
ltsetSpecgtcslt/setSpecgt ltsetSpecgtmathlt/setSpecgt
lt/headergt ltmetadatagt ..
lt/metadatagt lt/recordgt lt/GetRecordgt lt/OAI-PMHgt
18 response with error
lt?xml version"1.0" encoding"UTF-8"?gt ltOAI-PMHgt lt
responseDategt2002-0208T085546Zlt/responseDategt
ltrequestgthttp//arXiv.org/oai2lt/requestgt lterror
codebadVerbgtShowMe is not a valid OAI-PMH
verblt/errorgt lt/OAI-PMHgt
19 resumptionToken Flow-Control
- Idempotency of resumptionToken return same
incomplete list when rT is re-issued - while no changes occur in the repo strict
- while changes occur in the repo all items with
unchanged datestamp - new attributes for the resumptionToken
- expirationDate
- completeListSize
- cursor
20 Adoption
- evolution
- from talking about OAI-PMH
- to talking about projects that use OAI-PMH
- to talking about projects and failing to mention
they use OAI-PMH - gt OAI-PMH becomes part of the infrastructure
21 Data Providers (a.k.a. repositories)
- 49 registered repositories 11/2001
- 65 registered repositories 03/2002
- 77 registered repositories 05/2002
- 5 million records
- many unregistered repositories
- private implementations (e.g. RDN)
22 Service Providers
- Arc cross-searching of registered repositories
http//arc.cs.odu.edu - CiteBase research literature search citation
ranking http//citebase.eprints.org - OLAC cross-searching of Language Archive
Community repositories http//www.language-archi
ves.org/index.html
23 Service Providers
- Scirus scientific search engine Elsevier
http//www.scirus.com - my.OAI user-tailorable cross-searching of
registered repositories FS Consulting, Inc.
http//www.myoai.com - Growing interest from web search engines
24 OAI-PMH tools
- Repository Explorer interactive exploration of
repositories Virginia Tech http//www.purl.org
/NET/oai_explorer - eprints.org generic OAI-PMH compliant repository
software U of Southampton http//www.eprints.o
rg - ALCME repository and harvester software OCLC
http//alcme.oclc.org/index.html - APIs, others tools _at_ www.openarchives.org
25http//www.openarchives.org/ openarchives_at_openarc
hives.org