An Overview of OAI - PowerPoint PPT Presentation

1 / 71
About This Presentation
Title:

An Overview of OAI

Description:

Part III Demo of Harvesting of metadata and Searching. Discussion? ... RePEc (NetEc) 1993 Economics - Univ. of Surrey Guildford Protocol. NCSTRL Comp. ... – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 72
Provided by: francisj4
Category:
Tags: oai | overview | surrey

less

Transcript and Presenter's Notes

Title: An Overview of OAI


1
An Overview of OAI OAI-PMH
  • by
  • Filbert Minj
  • Francis Jayakanth
  • NCSI, IISc

2
Agenda
  • Part I Overview of the OAI
  • Part II Overview of the OAI-PMH
  • High-Tea Break
  • Part III Demo of Harvesting of metadata and
    Searching
  • Discussion?

3
Part I Overview of the OAI
  • General Information
  • The Journal System
  • Growth of ePrint Archives
  • The ePrint System
  • The UPS Prototype
  • The Dawn of OAI
  • Important Resources

4
Most Relevant Resource
  • Open Archives Forum
  • http//www.oaforum.org/tutorial/index.php
  • This Presentation to a great extent is based
    on the tutorial available at the above mentioned
    URL.
  • Several slides from the above site have been
    interpolated in this ppt file

5
General Information
  • ePrints
  • ePrints are commonly defined as research articles
    in electronic form (with an underlying assumption
    that they are available online)
  • Preprints (Before peer-review)
  • PostPrints (final, revised, refereed, and
    accepted draft)

6
General Information
  • Repository
  • a repository is a network accessible server that
    holds ePrints
  • Archive
  • is generally accepted as a synonym for repository

7
General Information
  • ePrint Archive An established medium to
    communicate non-peer reviewed scholarly
    literature (preprints)

8
General Information
  • Metadata
  • structured information about resources
  • is a descriptive information about an object or
    a resource whether it is in physical or
    electronic form

9
General Information
  • DC (Dublin Core)
  • is a metadata format defined on the basis of
    international consensus. The DC Metadata Element
    Set defines fifteen elements for simple resource
    description and discovery

10
General Information
  • OAI (Open Archives Initiative)
  • OAI is an initiative to develop and promote
    interoperability standards that aim to facilitate
    the efficient dissemination of content.

11
General Information
  • Protocol
  • a protocol is a set of rules defining
    communication between systems. FTP (File Transfer
    Protocol) and HTTP (Hypertext Transport Protocol)
    are examples of protocols used for communication
    between systems across the Internet.

12
General Information
  • OAI-PMH (OAI Protocol for Metadata Harvesting)
  • OAI-PMH is a lightweight harvesting protocol for
    sharing metadata between services.

13
General Information
  • Data Provider
  • a Data Provider maintains one or more
    repositories (web servers) that support the
    OAI-PMH as a means of exposing metadata.
  • Service Provider
  • a Service Provider issues OAI-PMH requests to
    data providers and uses the metadata as a basis
    for building value-added services.

14
General Information
  • Harvesting
  • in the OAI context, harvesting refers
    specifically to the gathering together of
    metadata from a number of distributed
    repositories into a combined data store

15
General Information
  • Interoperability
  • is the ability of systems, services and
    organizations to work together seamlessly toward
    common or diverse goals. In the technical arena
    it is supported by open standards for
    communication between systems and for description
    of resources and collections, among others.
    Interoperability is considered here primarily in
    the context of resource discovery and access.

16
General Information
  • XML (Extensible Markup Language)
  • it defines a means of describing data. XML can be
    validated against a DTD or schema setting out the
    elements of the language created
  • DTD (Document Type Definition)
  • a DTD is a formal specification of the structure
    of a document

17
The Journal System
  • Significant challenges to the journal system
  • Explosive growth of the Internet
  • Publication delay
  • Full transfer of rights by authors to publishers
  • The implementation of peer-review and
  • Skyrocketing of subscription prices
  • Challenges have resulted in exploring
    alternative models for scholarly communication

18
Growth of ePrint Archives
  • The roots OAI lie in the growing no. of ePrint
    archives. Several of these began as
  • Informal vehicle for dissemination of
  • preliminary research results and
  • gray literature
  • A no. of them have evolved into an essential
    medium for sharing research results among the
    colleagues in a field

19
Growth of ePrint Archives
  • arXiv (xxx) 1991- Physics - Los Alamos
    (Cornell?) 2.5 Lac preprints - OAI-PMH
  • CogPrints Cog Sci. Univ. of Southampton
    OAI-PMH
  • RePEc (NetEc) 1993 Economics - Univ. of
    Surrey Guildford Protocol
  • NCSTRL Comp. Sci. Dienst to OAI ODU,VT and
    others
  • NDLTD Thesis Dissertation - Virginia Tech.

20
Growth of ePrint Archives
  • The growth of ePrint archives exemplify a more
    equitable and efficient model for disseminating
    research results
  • An important challenge is to increase the impact
    of the ePrint archives.
  • The growth of ePrint archives demonstrate shift
    in the traditional scholarly communication model
    the journal system

21
Growth of ePrint Archives
  • There are indications that a growing number of
    disciplines, organizations and even commercial
    publishers are inspired by this pioneering work
    and are investigating alternative models for
    scholarly communication

22
Open Access Journals
  • BMC (BioMed Central) open access publisher
  • PLoS (Public Library of Science) will launch
    peer reviewed open access journals
  • PloS Biology already launched and
  • PLoS Medicine will follow
  • DOAJ Directory of Open Access Journals
  • http//www.doaj.org/

23
ePrint Archives
  • Basic aims of ePrint archives initiative
  • create a more effective scholarly communication
    mechanism and
  • there by providing an alternative to existing
    scholarly communication model

24
ePrint Archives
  • Approaches taken by individual archives differ in
    number of ways
  • Centralized model
  • arXiv
  • Distributed departmental/institutional model
  • RePEc
  • Some deal with gray literature

25
ePrint Archives
  • Approaches taken by individual archives differ in
    number of ways
  • Some incorporate metadata of peer-reviewed papers
  • Some deal with metadata only, others metadata and
    full text
  • Different protocols
  • Dienst, Guildford

26
ePrint Archives
  • Different approaches and protocols used meant
  • Doesnt facilitate discovery
  • Different search interfaces
  • No provision to share metadata (interoperability)

27
ePrint Archives
  • Key players recognised the need for single search
    interface to all the archives through
    interoperability
  • Two key interoperability problems impairing
    impact of ePrint archives were identified
  • Multiple search interface
  • No machine-based way for sharing the metadata

28
ePrint Archives
  • Solutions explored included
  • Cross searching of archives
  • Harvesting metadata from various archives and
    build a central index
  • in July 1999, a call for meeting of tech.
    experts to attend a meeting in Santa Fe, NM in
    Oct99 was given by Ginsparg, Luce and Sompel

29
Creation of UPS
  • Creation of UPS Universal Preprint Service for
    author self archived scholarly literature was
    proposed
  • UPS would be the fundamental and free layer of
    scholarly information, above which both free and
    commercial service could flourish

30
Creation of UPS
  • The first step towards establishing UPS was
    identification/creation of interoperable
    technologies and frameworks for the
    dissemination of ePrints

31
Luce Van de Sompel Ginsparg
32
UPS Prototype
  • Architectural framework for UPS?
  • Cross searching Harvesting of
    metadata
  • (Z39.50)

33
UPS Prototype
  • Searching vs. harvesting
  • US digital library experience in this area (e.g.
    NCSTRL) indicated that cross-searching not
    preferred approach - distributed searching of N
    nodes viable, but only for small values of N
  • NCSTRL N gt 100 Not satisfactory

34
UPS Prototype
  • The UPS Prototype at Santa Fe Oct99
  • Services based on a collection of harvested
    metadata
  • SFX/OpenURL linking
  • Based on NCSTRL Dienst protocol
  • Insights regarding lack on interoperability
  • Recommendation metadata harvesting

35
UPS Prototype
  • UPS architecture identified two logical roles
  • Data Provider
    Service Provider
  • (depositpublishexpose metadata)
    (harvestprovide service)

36
The Dawn of OAI
  • The name UPS was quickly changed
  • to avoid clash with already established
    commercial parcel service and
  • not all e-print archives contained preprints
  • The framework within which this universal service
    would be developed was now designated the Open
    Archives initiative OAi, and later OAI

37
Requirements for Metadata Harvesting
  • For harvesting method to work, there must be
    agreements on
  • Transport protocol (HTTP)
  • Metadata formats (DC, MARC..)
  • Quality assurance (mandatory fields)
  • IP and usage rights (who can do what with the
    records)

38
The Dawn of a Protocol
  • An initial agreement in key areas made it
    possible to develop a protocol for metadata
    harvesting, named the Santa Fe Convention in
    honour of the meeting where the agreement was
    reached.

39
Benefits of Interoperability
  • Facilitates information discovery, linking and
    peer reviewing
  • Increases visibility (impact)
  • Single search interface

40
Whats in the Name
  • Open Archives Initiative
  • The protocol is openly
    Archive/Repository - OAI is happening at
  • documented, and is contains
    collection of break-neck speed
  • compliant with open document-like
    objects
  • Standards HTTP, DC
  • and XML

41
Questions?
42
Part II
  • An Overview of the OAI-PMH

43
OAI-PMH Version History
  • Santa Fe Convention was the first incarnation of
    the OAI-PMH 02/2000
  • Goal optimise discovery of e-prints
  • Inputs
  • UPS prototype
  • RePEc/SODA data/service provider model
  • Dienst protocol
  • Deliberations at the Santa Fe Meeting 10/99

44
OAI-PMH Version History
  • OAI-PMH V. 1.0 01/2001
  • Goal optimise discovery of document-like obj.
  • Inputs
  • Santa Fe Convention
  • various DLF meetings on metadata harvesting
  • deliberations at Cornell
  • alpha-testers of OAI-PMH v 1.0
  • recognition of DC as best core metadata format
  • for interoperability across multiple archives

45
OAI-PMH v 1.0 01/2001
  • Low-barrier interoperability specification
  • Metadata harvesting model data provider /
    service provider
  • Focus on document-like objects
  • HTTP based
  • XML responses
  • Unqualified Dublin Core
  • Experimental 12-18 months

46
OAI-PMH Version History
  • OAI-PMH V. 2.0
  • Goal recurrent exchange of metadata about
    resources between systems
  • Inputs ...
  • OAI-PMH v.1.0
  • feedback on OAI-implementers
  • deliberations by OAI-tech 09/01 - 06/02
  • alpha test group of OAI-PMH v.2.0 03/02 - 06/02
  • officially released June 14, 2002

47
OAI-PMH v.2.0 06/2002
  • Low-barrier interoperability specification
  • Metadata harvesting model data provider /
    service provider
  • Metadata about resources
  • HTTP based
  • XML responses
  • Unqualified Dublin Core
  • Stable No backward compatibility
  • Future releases will be backward compatible

48
What OAI-PMH is not
  • Not a search system on its own
  • Not a database management system
  • Not single metadata schema
  • Not a OAIS

49
Basic Functioning of OAI-PMH
50
OAI General Assumption
  • Two groups of participants
  • Data Providers (Open Archives, Repositories)
  • free access of metadata
  • not necessarily free access to full texts /
    resources
  • easy to implement, low barrier solution

51
OAI General Assumption
  • Two groups of participants
  • Service Providers
  • use OAI interfaces of the Data Providers
  • harvest and store metadata (no live requests!)
  • may select certain subsets from Data
    Providers (set hierarchy, date stamp)
  • offer (value-added) service on the basis of the
    metadata

52
Multiple data and service providers

Harvesting based on OAI-PMH
Service providers
53
Aggregators
  • Data providers

Aggregator
Service providers
54
OAI-PMH Structure Model
Data Provider
e-prints
e-print
Requests Identify ListMetadataformats
ListSets ListIdentifiers ListRecords
GetRecord
Repository
Data Provider
Images
e-print
Repository
Service Provider
Data Provider
OPAC
e-print
Repository
Data Provider
Harvester
Data Provider
Responses General information Metadata
formats Set structure Record identifier
Metadata
Museum
e-print
Repository
Data Provider
Archive
e-print
Repository
55
OAI-PMH Protocol Overview
  • Protocol is based on HTTP
  • Request arguments are issued as GET or POST
    methods
  • Responses are encoded in XML syntax
  • Supports any metadata format (at least Dublin
    Core)

56
OAI-PMH Protocol Overview
  • Data providers may support granularity for
    service providers for selective harvesting
  • Define a logical set hierarchy
  • Date stamps (last change of metadata set)
  • Error messages are http based
  • Supports flow control
  • Supports six request types (known as verbs)

57
OAI Verbs
  • Identify
  • ListSets
  • ListMetadataFormats
  • ListIdentifiers
  • GetRecord
  • ListRecords

58
OAI Verbs - Identify
  • Purpose
  • Return general information about the archive and
    its policies (e.g., date stamp granularity)
  • Parameters
  • None
  • Sample URL
  • http//eprints.iisc.ernet.in/perl/oai2?verbIdenti
    fy

59
OAI Verbs - ListSets
  • Purpose
  • Provide a listing of sets in which records may be
    organized
  • Parameters
  • None
  • Sample URL
  • http//eprints.iisc.ernet.in/perl/oai2?verbListSe
    ts

60
OAI Verbs - ListMetadataFormats
  • Purpose
  • List metadata formats supported by the archive as
    well as their schema locations and namespaces
  • Parameters
  • identifier for a specific record (O)
  • Sample URL
  • http//eprints.iisc.ernet.in/perl/oai2?verbListMe
    tadataFormats

61
OAI Verbs - ListIdentifiers
  • Purpose
  • List headers for all items corresponding to the
    specified parameters
  • Parameters
  • from start date (O)
  • until end date (O)
  • set set to harvest from (O)
  • metadataPrefix metadata format to list
    identifiers for (R)
  • resumptionToken flow control mechanism (X)
  • Sample URL
  • http//eprints.iisc.ernet.in/perl/oai2
    verbListIdentifiersmetadataPrefixoai_dc

62
OAI Verbs - GetRecord
  • Purpose
  • Returns the metadata for a single item in the
    form of an OAI record
  • Parameters
  • identifier unique id for item (R)
  • metadataPrefix metadata format for the record
    (R)
  • Sample URL
  • http//eprints.iisc.ernet.in/perl/oai2?
    verbGetRecordidentifieroaiiiscePrints.OAI210
    metadataPrefixoai_dc

63
OAI Verbs - ListRecords
  • Purpose
  • Retrieves metadata records for multiple items
  • Parameters
  • from start date (O)
  • until end date (O)
  • set set to harvest from (O)
  • resumptionToken flow control mechanism (X)
  • metadataPrefix metadata format (R)
  • Sample URL
  • http//www.anarchive.org/cgi-bin/OAI?verbListRec
    ordmetadataprefixoai_dcfrom2001-01-01

64
Protocol Details Flow Control
Data Provider
Service Provider
Harvester
Repository
65
OAI Compliant Tools
  • eprints.org (http//www.eprints.org)
  • Dspace (http//dspace.org)
  • CDSware (http//cdsware.cern.ch)
  • Kepler (http//kepler.cs.odu.edu/)

66
OAI-PMH Based Services
  • Repository Explorer
  • http//oai.dlib.vt.edu/cgi-bin/Explorer/oai2.0/tes
    toai/
  • Serach engines
  • Arc http//arc.cs.odu.edu/
  • MyOAI http//www.myoai.org/
  • Physnet (subset of arXive, IOP)
  • http//physnet.uni-oldenburg.de/oai/query.php
  • OAIster http//oaister.umdl.umich.edu/o/oaister/

67
Summary
  • Low-cost mechanism for harvesting metadata
    records from one system to another
  • Based on HTTP and XML Web-friendly
  • Development over last 2-3 years has seen move
    from specific (discovery of e-prints) to generic
    (sharing descriptions of any resource)

68
Summary
  • Recommends simple DC as record format but
    extensible to any format encoded in XML
  • OAI-PMH is not a search protocol
  • Metadata and full-text typically made freely
    available but not a requirement
  • OAI-PMH can be used between closed groups

69
Other Important Resources
  • OAI Web site
  • http//www.openarchives.org/
  • Open Archives Forum
  • http//www.oaforum.org/tutorial/index.php
  • The Santa Fe Convention of the Open Archives
    Intiative by Herbert Van De Sompel and Carl
    Lagoze, D-Lib magazine,Vol 6 no. 2, Feb 2000

70
Questions?
71
Thank you for your PresencePatience
Write a Comment
User Comments (0)
About PowerShow.com