Title: IVOAs Data Integration Approah
1IVOAs Data Integration Approah
2Definitions
- A Virtual Observatory (VO) is a collection of
interoperating data archives and software tools
which utilize the internet to form a scientific
research environment in which astronomical
research programs can be conducted. The VO
consists of a collection of data centres each
with unique collections of astronomical data,
software systems and processing capabilities. - Various VO projects are funded through national
and international programs, and all projects work
together under the International Virtual
Observatory Alliance (IVOA) to share expertise
and develop common standards and infrastructures
for data exchange and interoperability. - The goal of the IVOA is the development of
architectural decisions and standards in the
astronomy domain - NVO (National Virtual Observatory) is an VO
project in compatible with IVOA.
3IVOA
- Data is communicated between services in two
basic formats FITS and XML. - The IVOA architecture uses services at different
levels HTTP GET/POST services, SOAP services,
Grid services. - IVOA has executive committee and Interested
groups such as GGF Astro-RG
4IVOA Architecture Diagram
5IVOA Standards
- Metadata Registries for VO
- Resource Metadata for the Virtual Observatory
- IVOA Metadata Registry Interface
- VOTable Format Definition
- Unified Content Descriptors (UCD)
- DAL Architecture
- Simple Image Access Protocol
- Simple Spectral Access Specification
- IVOA Query Language
- IVOA SkyNode Interface
- Astronomical Data Query Language (ADQL)
- VO Query Language
- Data Modeling
- A unified domain model for astronomy, for use in
the Virtual Observatory - Data model for quantity
- IVOA Observation data model
- Simple Spectral Data Model
- Simulation Data Model
Our focus area regarding IVOAs approach to data
access and integration
6Data Access and Integration Issues
- DAL Architecture
- Simple Image Access Protocol
- Simple Spectral Access Specification
- VOQL and SkyNode Interfaces
- IVOA Query Language
- IVOA SkyNode Interface
- Astronomical Data Query Language (ADQL)
- VO Query Language
7Data Access Layer (DAL)
- Defines and formulates standards for uniform
access to VO data that may have heterogeneous
representations by different data providers. - Family of data access services access to VO
resources - 1. Simple Image Access (SIA)
- uniform access to image archives
- atlas and pointed image archives
- image cutouts, image mosaics
- image is returned as a FITS file or graphics
file - 2. Simple Spectral Access (SSA, currently being
specified) - access to 1D spectra and SEDs
- spectra is returned as ASCII, VOTable, FITS.
- 3. VO Query Language IVOA SkyNode Interfaces
- VONode
- OpenSkyNode
- OpenSkyQuery Portal and Protocol
Data Integration
81. Simple Image Access Protocol
- A protocol for retrieving image data from a
variety of astronomical image repositories
through a uniform interface. - SOAP/WSDL and HTTP/GET based Web Services
implementation are defined - SIA data model with familiar astronomical image
which generally means a 2D sky projection with a
data array that is logically a regular grid of
pixels encoded as a FITS image, GIF/JPEG, etc - The SIA includes standardized dataset metadata
such as provenance, image geometry, scale,
format, position, time of observation, spectral
bandpass and access information.
92. Simple Spectral Access Specification
- A simple query POS, SIZE, FORMAT like SIA
possibly refined by spectral or time bandpass,
etc. In the simplest case, data returning could
be wavelength, flux as text (for spectrum). - The goal of the Simple Spectral Access (SSA)
specification is to define a uniform interface to
spectral data including spectral energy
distributions (SEDs), 1D spectra, and time series
data. In contrast to 2D images, spectra are
stored in a wide variety of formats and there is
no widely used standard in astronomy for
representing spectral data. - The data model for SEDs defines a set of spectra
or time series, some of which may have only one
or few data points (photometry) and each of which
may have different contextual metadata like
aperture, position, etc. - Spectra is returned as ASCII, VOTable, FITS.
- SOAP/WSDL and HTTP/GET based WebServices
implementation are defined
103. VO Query Language IVOA SkyNode
- Data (in Databases)
- Integration issues
- ADQL, VOQL, SkyNode, OpenSkyServer
11Astronomical Data Query Language (ADQL)
- ADQL is based on a subset of SQL plus region
with, as a minimum support, for circle (Cone
Search). - ADQL has two forms
- ADQL/x An XML document conforming to the XSD
- ADQL/s A String form based on SQL92 and
conforming to the ADQL grammar. Some non standard
extensions are added. - Extensions to SQL92 include
- ADQL supports the region specification The Region
would look something like Region(CIRCLE J2000
19.5 36.7 0.02) - JDBC Mathematical functions shall be allowed in
ADQL - XMATCH implies crossmatch between two or more
astronomical catalogues - To support Xquery as well as SQL, it will be
possible to express selections and selection
criteria as a simple Xpath - ADQL supports the syntax to return only the first
N records from a query
12lt?xml version"1.0" encoding"utf-16"?gt ltSelect
xmlnsxsd"http//www.w3.org/2001/XMLSchema"
xmlnsxsi"http//www.w3.org/2001/XMLSchema-instan
ce"gt ltSelectiongt ltItemsgt
ltSelectionItem xsitype"ExprSelectionItem"gt
ltExpr xsitype"ColumnExpr"gt ltColumn
xsitype"AllColumnReference"gt
ltTableNamegtalt/TableNamegt lt/Columngt
lt/Exprgt lt/SelectionItemgt lt/Itemsgt
lt/Selectiongt ltTableClausegt ltFromClausegt
ltTableReferencegt ltTablegt
ltNamegtTablt/Namegt ltAliasNamegtalt/AliasName
gt lt/Tablegt lt/TableReferencegt
lt/FromClausegt ltWhereClausegt ltCondition
xsitype"RegionSearch"gt ltRegion
xmlnsq1"urnnvo-region" xsitype"q1circleType"
gt ltq1Centergt ltPos3Vector
xmlns"urnnvo-coords"gt
ltCoordValuegt ltValuegt
ltdoublegt1.2lt/doublegt
ltdoublegt2.4lt/doublegt
ltdoublegt2.4lt/doublegt lt/Valuegt
lt/CoordValuegt
lt/Pos3Vectorgt lt/q1Centergt
ltq1Radiusgt0.2lt/q1Radiusgt lt/Regiongt
lt/Conditiongt lt/WhereClausegt
lt/TableClausegt lt/Selectgt
ADQL/s might be as follows Select a. from Tab
a where Region('Circle Cartesian 1.2 2.4 3.6
0.2') This is represented in xml as shown left
Sample ADQL/x
13VO Query Language
- The Virtual Observatory Query Language (VOQL) is
an ambitious language at a higher level than
ADQL. A VOQL portal would take VOQL programs. - Layers of VOQL
- VOQL1 WebServices ADQL and VOTABLE to exchange
information between machines - VOQL2 Federation SQL-like query language and
federation system, i.e. combination of SkyQuery ,
JVOQL and VO standards - VOQL3 SkyXQuery future XML-based query language.
- The highest level of VOQL is a semantics-based
language that allows astronomers to build queries
in the language of astronomy rather than the
language of databases.
14IVOA SkyNode Interface
- The SkyNode Interface describes the minimum
required interface to participate in the IVOA as
a queryable VONode as well as requirements to be
a Full OpenSkyNode, part of the OpenSkyQuery
Portal. - The OpenSkyQuery protocol drives a data service
that allows querying of a relational database or
a federation of databases. The request is written
in a specific XML abstraction of SQL that is part
of ADQL (Astronomical Data Query Language). - The Portal will formulate a plan and create
multiple queries, typically one per archive. And
the results are collected, joined, and served to
the users. - There are two types of SkyNodes
- Basic SkyNode
- Full SkyNode
15SkyNodes (Basic / Full)
- Basic -simple ADQL/x queries
- Full ADQL/x/s, performance query, ExecPlan,
XMatch and footprint.
16OpenSkyQuery from NVO
- As an example of data integration according to
IVOA
17OpenSkyQuery
- A Virtual Observatory prototype application that
marries Web Services technology with emerging VO
standards to enable dynamic cross-matching
queries between different VO-enabled archives
18OpenSkyQuery is consists of
- Open SkyNodes
- Basic building blocks of the federated query
system. They offer core services, including some
special sophistication search functions. They are
identical Web Services. Only the content of the
Databases differ. - Open SkyPortal
- The starting point for a queries. Queries are
divided up, organized into sorted plan and sent
off to the first node. The only thing a portal
really has to do is split up a query and ship it.
- NVO Registry
- All nodes must be registered in this registry.
19SkyNodes
- SkyNodes are services supporting ADQL.
- Database query interfaces based on Web Services
- Take ADQL and returns data.
- The next generation of the DAL Cone Search
protocol, providing federated access to
distributed astronomical databases. - The formalism for distributed astronomical DBMS
queries through large scale processing depends on
the VOStore formalism. - SkyNode and SkyServer are used interchangeably.
In case of OpenSkyServer, SkyServer is used.
20OpenSkyPortal
- Enables the OpenSkyQuery (OSQ)
- Ability to build queries using a graphical
interface (OSS). OSQ includes query builder that
allows creating complex ADQL queries. OSQ is also
integrated with VOPlot to plot query results. - Planning Execution - ExecPlan Document
- Portal makes a plan by asking each node for its
estimate of data for the given query (Perform
Queries). - The nodes are ordered based on this information -
the one with least data is the first to execute - The ExecPlan is next sent to the first node in
the plan (the one which will execute last) which
passes is recursively to the other nodes. - The data is passed back from each node and
XMatched. Finally the result is passed to the
portal.
21Execution Overview
- After the portal has constructed ExecPlan
document. it then sends it to the first node. - First node does not execute its section yet,
instead it passes it off to the next node and
this continues until it reaches the last node. - The last node will run its section, return its
results to the previous node, and continue until
it has reached the portal again.
22OpenSkyQuery Architecture
later
End 2003
Open SkyQuery Portal
VOQL Portal
Uses only Registry, Lev3 and Lev4 SkyNodes.
High Level Language allowing seemingly uniform
access to services.
MetaData
SkyQuery
LEV3
SkyQueryWebApp
VOQLQuery
PerformQuery
Tables
Columns
XMatch
ExecPlan
Clients
May use Services at any level
23A part from the original ExecPlan document that
portal passes to the nodes
ExecPlan document lists of nodes queries
24ExecPlan Document
- Two sections in the Plan
- 1. Format
- The specified transport for this particular plan.
It is almost always VOTABLE, but may occasionally
be DataSet. VOTABLE is the only required
supported format for nodes. - 2. PlanElements
- An array of PlanElement objects.
- Sorted from lowest index to highest. the node at
PlanElements0 would be the first to receive the
plan. - PlanElement
- Statement
- Hosts (list of mirrors for that node - serviceURL
in registry) - Target (shortName (from the registry) of the
intended node)
25Summary
- IVOA is not directly addressing the data
integration issues. - In order to solve data integration problems, they
do not propose any innovative architecture
instead, they plan to use - SRB or NGAS for the implementation of VOSpace
- GridFTP or a simple application of Ogsa-DAI for
the implementation of VOStore.
26APPENDIX
27Standards for accessing and querying data and
metadata
- Create standard metadata attributes to describe
astronomy quantities. - UCD Uniform Content Descriptor
- Use standard data formats for storage and
transformission - FITS image format
- VOTable
- Create standard services for accessing data
formats using standard metadata
28Terms ConflictionSkyNode, SkyServer, VOStore,
VOSpace
- VOStore and VOSpace are defined in Grid and Web
Services specs. These are more generic terms than
the SkyNode and SkyServer. - IVOA is still not sure about the underlying
implementation of VOStore and VOSpace - VOStore seeks to develop a common API for
managing and using remote read/write storage. - VOSpace manages metadata and data collections and
sits between user and VOStore. - Minimal info management system to organize shared
collections. - VOSpace can function in near term as a SRB or
NGAS interoperability layer. - SkyNodes are services supporting ADQL.
- Database query interfaces based on Web Services
- the next generation of the DAL Cone Search
protocol, providing federated access to
distributed astronomical databases. - The formalism for distributed astronomical DBMS
queries through large scale processing depends on
the VOStore formalism. - SkyNode and SkyServer are used interchangeably.
In case of OpenSkyServer, SkyServer is used.
29Terms
- SkyNodes are VONodes serving Data kept in
Databases. - SkyNodes have ADQL based SOAP interfaces
returning VOTable based results. - OpenSkyQuery Portal is a portal allowing access
to multiple SkyNodes and enable integration of
data - IVOA has specifications for VOStore (SkyNode) and
VOSpace but not implementation specifications - VOSpace Manages medata and data collections and
sits between user (portal) and VOStore.
30ADQL/s
- Sample ADQL/s querying two distributed Databases
- FROM statement specifies which databases to use
and defines alias for the databases - The clause XMATCH and Region are OpenSkyQuery
extensions to SQL - XML version of the below query (ADQL/s) is called
ADQL/x and put into Statement tag of the
ExecPlan document. (see Slide 21)