Title: OAI from the needle box
1OAI from the needle box
Thomas Krichel Palmer School of Library and
Information Science Long Island University With
apologies to Carl Lagoze
Humboldt Universität Berlin, March 20, 2002
2Where I come from...
- Trained economist
- Early (1991) visionary of free online scholarship
- Creator of NetEc in 1993
- Principal founder of RePEc in 1997
- Largest distributed academic DL in the world
- Collection that is open for
- Contribution
- Usage
- Grown to over 200 archives, over 10 partly
interoperable user services
3Metadata collection process
- Metadata is expensive to collect.
- Free online scholarship requires academic
self-documentation - Building free metadata collection is difficult
- no established business model
- no established funding channels
- Only a collaborative effort will be succeed.
4The example of eprint servers
- attractive building block for the transformation
of scholarly communication - but isolated efforts do not make for a scholarly
communication system - need to federate archives
- need to interoperate with other scholarly
communication components
5Example e-print accessibility
6Example e-print accessibility
7metadata harvesting
metadata
e-print
8metadata harvesting
metadata
e-print
9other examples
- within the area of scholarly commuication
- already implemented in RePEc
- Sharing of log data between service providers
- Provision non-document data for document data
provider - personal data
- institutional data
10core concepts in OAI 1.1
- low-barrier interoperability
- data-provider / service-provider model
- metadata harvesting model
OAI 1.1 protocol
HTTP based
Dublin Core
- parallel metadata formats
Community specific
11harvester / repository
12OAI protocol requests
service provider
data provider
- Supporting protocol requests
- Identify
- ListMetadataFormats
- ListSets
- Harvesting protocol requests
- ListRecords
- ListIdentifiers
- GetRecord
13HTTP encoding - requests
BASE-URL -----------gt an.oa.org/OAI-scriptkeyword
arguments --gt verbListIdentiferssetS1
GET http//an.oa.org/OAI-script?verbListIdenti
ferssetS1
POST POST http//an.oa.org/OAI-script
HTTP/1.0 Content-Length 78 Content-Type
application/x-www-form-urlencoded
verbListIdentiferssetS1
14HTTP encoding - responses
ltxml version1.0 encodingUTF-8
?gtltGetRecord xmlnshttp//oai.namespace.uri
xmlnsxsihttp//w3.namespace.uri xsischemaL
ocationhttp//oai.namespace.uri http//oai.sc
hemaURLgt ltresponseDategt2000-19-01T193030-0400
lt/responseDategt ltrequestURLgthttp//an.oa.org/OAI-
script?verbGetRecord ampidentifieroai3Aar
Xiv3A0001 ampmetadataPrefixoai_dclt/request
URLgt ltrecordgt record contents lt/recordgt addi
tional recordslt/GetRecordgt
15record
ltrecordgt ltheadergt ltidentifiergtoaieg001lt/ident
ifiergt ltdatestampgt1999-01-01lt/datestampgt lt/head
ergt ltmetadatagt ltdc xmlnshttp//purl.org/dcgt
lttitlegtMy Examplelt/titlegt lt/dcgt lt/metadatagt
ltaboutgt ltea xmlnshttp//www.arXiv.org/ea
ltusagegtNo restrictionslt/usagegt lt/eagt lt/aboutgtlt
/recordgt
16selective harvesting - datestamps
17selective harvesting - sets
S2
18Communication re OAI
- lists subscribe via http//www.openarchives.org
- oai-general list
- oai-implementers list
- web http//www.openarchives.org
- FAQ http//www.openarchives.org/faq.htm
- mail openarchives_at_openarchives.org
19revision of specifications
- Version 1.1 frozen specifications for 12 -18
months - stable for experimentation not definitive
- minimize risk for early adopters
- maximize chances for future interoperability
across communities
The technical committee are working on the
definitive specifications. They will come
out 2002-05-01.
20The technical committee
- - Herbert Van de Sompel (LANL) - Carl Lagoze
(Cornell U) - Thomas Krichel (Long Island U
RePEc) - Jeff Young (OCLC) - Tim Cole (U
of Illinois at Urbana Champaign) - Hussein
Suleman (Virginia Tech) - Simeon Warner
(Cornell U arXiv) - Michael Nelson (NASA
NACA) - Caroline Arms (Library of Congress) -
Muhammad Zubair (Old Dominion U ARC) - Steven
Bird (U Penn Open Language Archive Community)
- Robert Tansley (MIT DSpace) - Andy Powell
(UK (UKOLN) - Mogens Sandfær (DTV, Denmark) -
Thomas Severiens (Oldenburg U Physnet) -
Thomas Baron (CERN) - Les Carr (U of
Southampton) - Thomas Place (Tilburg U)
21Issues in front of the committee
Error Handling SOAP Harvesting Granularity
Mandatory DC Set Semantics and Collection
Description XML Schema Result Set Filtering
Flow Control, Result Set Cardinality, Response
Level Container Awareness Mechanisms
Multiple Metadata Return and "Best" Metadata
Selection Machine Readable Rights Management
From GetRecord to GetRecords Dedupping Issues
idempotency of base-urls xml format for
mini-archives response compression
22Thank you for your attention!
- Thomas Krichel
- Palmer School of Library and Information Science
- 720 Northern Boulevard
- Brookville NY 11548-1300
- USA
- http//openlib.org/home/krichel
- Krichel_at_openlib.org
23Error handling
- badArgument
- badGranularity
- badResumptionToken
- badVerb
- cannotDisseminateFormat
- idDoesNotExist
- noRecordsMatch
- noSetHierarchy
24SOAP
- SOAP is a mechanism to transmit service requests
over the Internet. - As yet it is not a fully matured protocol.
- A SOAP compatible version of the protocol may be
written later.
25Harvesting granuality
- From and Until arguments may allow a more finer
time stemps, up to one second. - Level supported is chosen by the data provider
and set in the response to the Identify verb. - All times expressed in UTC.