OAI-based Harvesting - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

OAI-based Harvesting

Description:

Harvester can expect to fully comprehend the resource description ... Related Harvester Interface ... mechanism to tell a harvester that an update is available ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 17
Provided by: raymond134
Category:

less

Transcript and Presenter's Notes

Title: OAI-based Harvesting


1
OAI-based Harvesting
IVOA Registry Working Group
  • Ray Plante

2
Harvesting in the Registry Framework
VO Projects
harvest
(pull)
Full Searchable Registry
Data Centers
Local Searchable Registry
Harvesting is about publishing
Specialized Portals Services
3
Harvesting in the Registry Framework
VO Projects
harvest
(pull)
replicate
Full Searchable Registry
Data Centers
Local Searchable Registry
Harvesting is about publishing
Specialized Portals Services
4
Harvesting in the Registry Framework
VO Projects
harvest
(pull)
replicate
Full Searchable Registry
Data Centers
selective harvesting
Local Searchable Registry
Harvesting is about publishing
Specialized Portals Services
5
Searching in the Registry Framework
VO Projects
Full Searchable Registry
Data Centers
search queries
Local Searchable Registry
Harvesting is not about searching
Client Applications
Specialized Portals Services
6
Open Archives Initiative (OAI) Protocol for
Metadata Harvesting
  • Existing standard for harvesting resource
    descriptions widely supported in digital library
    community
  • http//www.openarchives.org/
  • Supports aggregation of resource descriptions
  • Deployed successfully as part of NVO registry
    prototype
  • Currently part of framework supporting the NVO
    Data Inventory Service (DIS)

7
OAI-PMH Features
  • Defines 6 operations
  • Identify GetRecord
  • ListIdentifiers ListMetadataFormats
  • ListRecords ListSets
  • Features
  • Support for multiple description formats
    (metadataPrefix)
  • Harvesting by date (from, until)
  • Harvesting by category (set)
  • Marking records as deleted
  • Support for resumption tokens

8
OAI as a Web Service
  • IVOA presumed preference for Web Services
  • OAI PMH defined as set of HTTP Get services
  • Broad interest in seeing Web Service version
  • Evolving the standard toward WS
  • NVO has prototyped WS versions
  • Gretchen Greene Wil OMullane (STScI)
  • Charlie Cowert (SDSC)
  • Ray Plante (NCSA)
  • In contact with OAI community
  • Opportunity present proposal for standard WS
    version to OAI community

9
Reasons to adopt OAI
  • Existing, well-tested standard we dont have to
    reinvent
  • Easy to implement
  • Demonstrated by NVO
  • Lots of existing OAI software tools
  • For clients and servers
  • Lowers cost of implementation
  • Interoperability with larger digital library
    community
  • Do these hold in the web service context?
  • Yes, if the WS version leverages the original OAI
    schema

10
A Harvesting Standard based on OAI
  • Spelled out in alternate section 4 to the
    Registry Interface working draft
  • The OAI standard defined by
  • The (existing) OAI-PMH v2.0 specification
  • Operations, behavior, message schema
  • OAI-PMH schema defines envelope for resource
    description
  • The (proposed) OAI WSDL interface
  • Mapping to WS interface
  • Imports the OAI-PMH schema
  • Standard IVOA use of OAI
  • OAI spec provides hooks for community-specific
    semantics
  • Resource description format
  • Sets

11
IVOA Metadata Format
  • Define metadata format ivo_vor
  • Description using the VOResource schema
  • Restricted to a resource sub-type defined in a
    standard extension schema
  • Today one of the working draft extensions
  • Non-standard extensions should accessible via a
    non-standard metadata format name
  • Harvester can expect to fully comprehend the
    resource description
  • Dublin core format, oai_dc, required
  • For cross-disciplinary interoperability
  • Support is trivial via standard XSL stylesheet

12
Sets Named Categories of records
  • The OAI notion of sets
  • Each record may belong to zero or more named
    categories called sets
  • Sets may be defined by a community or the
    individual provider
  • Enables selective harvesting by category
  • Proposed use of sets for IVOA
  • Implicit definition of a set for each standard
    resource sub-type
  • E.g. Organisation, Registry, SimpleImageAcess,
    etc.
  • Set name of the form ivo_type e.g.
    ivo_Registry
  • Allows harvesting of specific types
  • Explicit definition of special sets
  • ivo_managed those records with authority ID that
    originates with that registry
  • ivo_standard any record of a standard resource
    sub-type
  • Full registry replication by omitting set argument

13
Other miscellaneous specifications
  • Required resource records
  • One Registry record describing the registry
    itself
  • One Authority record for each AuthorityID it
    manages
  • One Organisation record for each publisher that
    registers an AuthorityID
  • The Identify operation response must include
    the registry record for the registry
  • e.g. ltresource xsitypeRegistrygt

14
Related Harvester Interface
  • Additional standard operation to be supported by
    searchable registry (i.e. the harvester)
  • Harvest Me a mechanism to tell a harvester
    that an update is available
  • Inputs
  • ivo-id the ID of the harvestable registry
  • harvestingType HTTP Get or WS version
  • Should we allow either?
  • baseURL endpoint for harvesting interface
  • lastUpdate date of most recent update to
    registry contents
  • Harvester may choose when/if to harvest

15
Conclusions
  • Reasons to adopt OAI-based harvesting
  • Existing, well-tested standard we dont have to
    reinvent
  • Easy to implement
  • Lots of existing OAI software tools
  • Interoperability with larger digital library
    community
  • OAI fulfills all of the harvesting functionality
    set proposed in current WD
  • Proposal presumes a WS interface desired
  • Opportunity to contribute to DL community
  • Action
  • Endorse preference to support standard if
  • Meets needs
  • Has favorable cost/benefit ratio
  • Continue to develop within context of Registry
    Interface WD
  • Projects should study OAI spec carefully
  • If doesnt meet above criteria, enumerate how

16
Myths about OAI
  • OAI does more than we need, too heavy
  • If you do this right, you will reinvent most/all
    of the required functionality
  • Its cheaper to do something simpler
  • Its cheaper to adopt standard if you can
    leverage existing software
  • OAIWS arent these envelopes redundant?
  • The two envelopes serve different functions
  • OAI envelope managing a set of related
    operations in a format-independent way
  • Response management Record management
  • responseDate
    identifier
  • request inputs
    datestamp

  • set membership
  • Envelopes are easily skipped
Write a Comment
User Comments (0)
About PowerShow.com