Title: Publishing and Resource Discovery with Registries
1Publishing and Resource Discovery with Registries
THE US NATIONAL VIRTUAL OBSERVATORY
Kevin Benson Sebastien Derriere Pierre
Fernique Matthew Graham Gretchen Greene
Bob Hanisch Paul Harrison Martin Hill Jeongin
Lee Gerard Lemson
Tony Linde Tom McGlynn Wil OMullane Keith
Noddle Ramon Williamson
Visit the NVO Demo Booth
2Summary (2003)
- We built a working prototype registry system to
support an end-user VO service - Distributed Publishing and Searchable components
- Encoded descriptions using emerging VO XML
standard schemas - OAI Harvesting Standard deployed easily
- Used to discover Cone Search and SIA services
- Whats next Interoperable registries IVOA-wide
- Stablize XML metadata standard
- Standardize registry interfaces
3Summary (2004)
- We built a working production registry system to
support an end-user VO services - DataScope discovers Cone Search, Simple Image
Access services - OpenSkyQuery Portal discovers OpenSkyNodes
- Whats next Interoperable registries IVOA-wide
- Stabilize XML metadata standard
- Standardize registry interfaces
- gt IVOA Frozen working draft standard for
January 05 releases
4Registries 2004
- Review of Registry architecture
- Resource Metadata Model
- IVOA Registry Interface Standard
- Harvesting
- Searching
- The NVO Publishing Process
- Searching for Resources
- Curation Issues
5The role of Resource Registries
- Used to discover and locate resourcesdata and
servicesthat can be used in a VO application - Resource anything that is describable and
identifiable. - Besides data and services organizations,
projects, software, - Presently concerned with simple set of resource
types - Registry a list of resource descriptions
- Expressed as structured metadata
- to enable automated processing and searching
6Selected Requirements
- Allow user to select resources that are likely to
pertain to a scientific question - Select resources based on characteristics
- Type of resource catalogs, image archives, EPO,
services - Coverage in space, time, and frequency
- Where data comes from, who curates it
- Dynamic resources will come and go
- Distributed Should not depend on a single point
of failure or single view of the VO. - Preserve the data providers control over their
data - Curators control what gets registered, content,
updates - Allow integration with existing resource
management - Allow extension to new types of resources
7IVOA Registry Working Group (RWG)
- IVOA International Virtual Observatory Alliance
- Common, global approach to registries
- Towards a standard framework
- Registry Model
- Resource Identifiers
- Metadata schemas
- Registry Interface
- Distributed model for registries
8Registry Model
VO Projects
Full Searchable Registry
Data Centers
Local Searchable Registry
Specialized Portals Services
9Registry Model
VO Projects
harvest
(pull)
Full Searchable Registry
Data Centers
Local Searchable Registry
Specialized Portals Services
10Registry Model
VO Projects
harvest
(pull)
replicate
Full Searchable Registry
Data Centers
Local Searchable Registry
Specialized Portals Services
11Registry Model
VO Projects
harvest
(pull)
replicate
Full Searchable Registry
Data Centers
selective harvesting
Local Searchable Registry
Specialized Portals Services
12Registry Model
VO Projects
Full Searchable Registry
Data Centers
search queries
Local Searchable Registry
Client Applications
Specialized Portals Services
13Registry Model
VO Projects
Full Searchable Registry
Data Centers
search queries
Local Searchable Registry
Client Applications
Specialized Portals Services
14Registry Model
VO Projects
Full Searchable Registry
Data Centers
search queries
Local Searchable Registry
Client Applications
Specialized Portals Services
15Registries in UseDataScope
JHU/STScI
Full Searchable Registry
harvest
(pull)
Caltech
Local Publishing Registry
search for services
NCSA
DS
DataScope
16Registries in UseDataScope
JHU/STScI
AstroGrid
Full Searchable Registry
Full Searchable Registry
harvest
CDS
(pull)
Caltech
Local Publishing Registry
HEASARC
search for services
NCSA
DS
DataScope
17Registries in UseDataScope
JHU/STScI
AstroGrid
Full Searchable Registry
Full Searchable Registry
harvest
CDS
(pull)
Data Providers
Caltech
Simple Image Access
Local Publishing Registry
HEASARC
Simple Image Access
search for services
Simple Image Access
NCSA
DS
DataScope
18Registries in Use
- Registries in the NVO are currently operating and
functional - DataScope discovers Cone Search, Simple Image
Access (SIA) services - OpenSkyQuery Portal discovers OpenSkyNodes
- CDS Aladin/GLU (Pierre Fernique)
- harvests Cone Search and SIA services
- converts them into GLU dictionary records
- Accessible directly by the Aladin image and
catalog viewer - AstroGrid Registry foundation for building
workflows - Portal uses descriptions to stitch services
together - (Previous talk by Keith Noddle)
- Cross-project harvesting
- NVO, AstroGrid, AVO (Vizier, GLU)
- Registries are at the leading edge of VO
development
19Resource Metadata Model
20Resource Metadata Model
Core Metadata
as XML
IVOA Working Draft VOResource
Resource
Organisation
Service
21Resource Metadata Model
as XML
IVOA Working Draft VOResource
Resource
VORegistry
Organisation
Authority
Service
Registry
VODataService
DataCollection
SkyService
TabularSkyService
22Resource Metadata Model
as XML
IVOA Working Draft VOResource
Resource
VORegistry
Organisation
Authority
Service
Registry
VODataService
DataCollection
SkyService
TabularSkyService
SIA
ConeSearch
SimpleImageAccess
ConeSearch
23Resource Metadata Model
as XML
IVOA Working Draft VOResource
Resource
VORegistry
VOCEA
Organisation
Authority
Service
CEAApplication
Registry
CEAService
VODataService
DataCollection
SkyService
TabularSkyService
SIA
ConeSearch
SimpleImageAccess
ConeSearch
24IVOA Working DraftRegistry Interface (RI)
Standard
- Kevin Benson (AstroGrid), Editor
- Harvesting
- Delivering resource descriptions from publishers
to searchable registries - Adoption of Open Archives Initiative (OAI)
standard Protocol for Metadata Harvesting - http//www.openarchives.org/
- RI defines application of OAI to VO resource
records - Plug in VOResource as metadata format
- Optional SOAP version to augment HTTP Get
standard - Searching
- Returns XML VOResource records
- Keyword search
- Advanced search
- Uses the Astronomical Dataset Query Language
(ADQL) - Refer to metadata items via a simplified XPath
- Easily mapped to either SQL for an RDBMS
implementation, XQuery for an XML DB
implementation
25Publishing to the NVOhttp//www.us-vo.org/publish
.cfm
- Resources are published if one can use VO
facilities to find them. - Multiple layers of publishing
- Starts with registry description of resource
- Data Access Services
- Incremental exposure for incremental effort
- Who are you? How you publish depends on what you
want to publish. - An individual with a small data collection
- An archive center
- Someone with a cool service
- Extinction Correction Service
- Developed by C. Miller, K. S. Krughoff
- In one day of the NVO Summer School using VO
tools
26Small collectionsVO-ready Repositories
- Repositories that allow users to deposit data to
share with community - Guarantee long-term storage, availability
- Automatic support for VO publishing mechanisms
- Entries into NVO Registry
- Support for standard services
- Cone Search, SIA, SSA, SkyNode
- Currently available Repositories
- Images NCSA Astronomy Digital Image Library
http//adil.ncsa.uiuc.edu/ - Spectra Spectrum Services for the VO
http//voservices.net/spectrum/ - More public repositories are expected to emerge
- Check NVO website (http//us-vo.org/) for latest
27Persistent ArchivesTools for Federation
- Registering your resources with a public VO
publishing registry
Choose resource type
STScI Registry
Edit Form
NCSA Registry
28Persistent ArchivesTools for Federation
- Registering your resources with a VO publishing
registry - Enter description into registration form at one
of the available NVO registries - STScI/JHU Registry http//nvo.stsci.edu/voregist
ry/ - NCSA Registration Portal http//nvo.ncsa.uiuc.ed
u/nvoregistration.html - Caltech Carnivore http//mercury.cacr.caltech.edu
8080/carnivore/ - If you have a large number of resources to
register, you can run your own registry on your
own site - NCSA VORegistry-in-a-Box http//nvo.ncsa.uiuc.edu/
VO/software/ - Caltech Carnivore http//mercury.cacr.caltech.edu
8080/carnivore/
29Persistent ArchivesTools for Federation
- What can/should you register?
- Should your Organization
- Declares yourself as a publisher with an ID
- Should your Collection
- Users at least know how to access it via a
Browser - Can your existing services
- Browser-based services e.g. search page
- Traditional CGI services
- Web Services
- The next level
- Implement and register one or more standard
services - Cone Search
- Simple Image Access
- SkyNode
- Simple Spectral Access
- standard still in development
- NVO Summer School Software package server-side
templates and toolkits http//www.us-vo.org/summe
r-school/
30Cool ServicesIntegrating with the VO
- Register your service at a registry
- Integrate support for standard VO formats,
schemas - FITS and VOTable
- Enable integration with existing tools
visualizers - Standard Data Model schemas (emerging)
- VOResource, Space-time Coordinates, Spectra
- Enable integration with other services using
these models - Implement Standard Support Interface
- a standard in development for
- Self-description, tracking health and usage
31Searching the Registry
- Use a searchable registry to find data and
services - NVO has two searchable registries available
- STScI/JHU Registry http//nvo.stsci.edu/voregist
ry/ - Caltech Carnivore http//mercury.cacr.caltech.edu
8080/carnivore/ - Two types of searches
- Simple keyword-based search
- Advanced search
- STScI/JHU SQL-based
- Caltech XQuery-based
- Currently working on user-oriented improvements
to interactive interface - G. Greene W. OMullane _at_ STScI
- Help with advanced searches
- Improved organization of returned results
32Accessing the Registry from Applications
- Custom Web Service Interfaces available
- keyword and advanced search functions
- Currently used by DataScope and SkyPortal
- IVOA Standard Web Service interface
- Full support targeted for January 2005 roll-out
- Beta support available from Caltech Carnivore
- Available Java client software
- Currently available via NVO Summer School
software distribution - Zip file http//chart.stsci.edu/twiki/bin/view/Ma
in/Software - HowTos http//chart.stsci.edu/twiki/bin/view/Main
/NVOSummerSchoolCourseNotes - Includes
- Client library for IVOA Standard search interface
- Sample client code for both custom and standard
interfaces
33Curation Issues
- NVO Registries now contain over 3000 records
- Lots of problematic metadata
- Missing information, incorrect usage, truncated
values - Duplicates, deprecated records, missing resources
- Broken/non-compliant services
- People need to assume responsibility for curation
- Software can help, but is not sufficient
- Role of Registry administrator?
34A practical approach to Curation
- Proposal VerificationLevel tag attached to
resource descriptions by a registry curator - 3 levels
- Unverified
- Verified by software
- Verified by human curator
- Tag exposed to users/apps e.g. select only
highly verified resources - Tag is specific to a registry can by overridden
when harvested by another registry. - Software verification
- NCSA building a suite of software verifiers
- Can be incorporated directly into registries
- Either locally or by calling a remote web service
- First example Cone Search Verifier
http//nvo.ncsa.uiuc.edu/services/csvalidate.html
35Summary 2004
- NVO is operating production registries
- serving end-user applications
- greater emphasis on user interfaces
- registry searches easily integrated into
applications - Full release of latest improvements by January
2005 - Interoperable exchange between IVOA registries
- Extensible Resource Metadata model
- IVOA Registry Interface Standard is emerging
- Whats next shift from development to curation
- Finalize RI standard
- Address curation issues
- No talk on registries next year