Title: ChinaVO Data Access Service
1Chinese Virtual Observatory
China-VO Data Access Service Based on OGSA
Jian Sang National Astronomical Observatory of
China
2Outline
- VO,Grid and OGSA
- Build the catalog data service
- Build the image mosaic service
- Faced technical difficulties
3The Increase Of Astronomical Data
4Challenges
- The quantity of data nearly amounts to PB.
- The data is distributed and stored in
heterogeneous DBMSs in heterogeneous - host environments.
5The VOs Goal
- The VOs initial goal is to federate existing
astronomical data archives and provide standard
services for manipulating these data.
HOW TO REACH THIS GOAL?
The Grid technology can solve the problem!
6What is Grid
- Grid technology has been driven by genesis from
metacomputing, but - In practice, the Grid is about resource sharing
and coordinated problem solving in dynamic,
multi-institutional virtual organizations - Focus on how to enable, maintain and control the
sharing of resources to achieve a common goal
7What Grid offers
- Resource management protocols and services that
support secure remote access to shared data
resources and computing and the co-allocation of
multiple resources. - Security solutions that support management of
credentials and policies. - Information query protocols and services that
provide configuration and status information
about resources,organizations and services. - Data Management services that locate and
transport datasets between storage systems and
applications.
8What is OGSA
- The Open Grid Services Architecture (OGSA)
represents an evolution towards a Grid system
architecture based on Web services concepts and
technologies. - The OGSA integrates key Grid technologies
(including the Globus Toolkit with Web services
mechanisms to create a distributed system
framework based around the Open Grid Services
Infrastructure (OGSI).
In Grids ,Everything is Service
9The Open Grid Services Architecture
- Service orientation to virtualize resources
- From Web services( everything is service)
- -Standard interface definition
mechanismsmultiple protocol bindings,multiple
implementations,local/remote transparency - Building on Globus Toolkit
- -Grid service semantics for service
interactions - -Management of transient instances
- -Factory,Registry,Discovery,other services
- -Reliable and secure transport
- Multiple host environmentsJ2EE,.NET,C,
10The Structure of Grid Service
11Grid service interfaces
12Construct The Astronomical Data Grid
- The astronomical data service is the most
fundamental and important component in Virtual
Observatory. - In the aspect of data share, the VO can be think
as a astronomical Data Grid
VOAstronomical Data Grid
13Outline
- VO,Grid and OGSA
- Build the catalog data access service
- Build the image mosaic service
- Faced difficulties
14The Classification of Astronomical Data Service
- Astronomical Catalog Service
- Image Mosaic Service
- Spectrum Data Service
- Simulation Data Service
-
-
15Existing Astronomical Datasets we have
16Build Catalog Data Service
- How to federate the catalog data into VO,that is,
how to build Data Service using the existing
databases and programs?
17Define Catalog Service Interface
Some standards we used
- Input Query Language SQL(now),ADQL (plan)
- Output Data Format VOTable 1.0
- Catalog resource metadata registry protocol
- VOResource 0.9
-
input ADQL query sentence output VOTable format
result it makes service interface/API simple.
18How to use existing databases and programs to
create catalog data service
- How to create a catalog data service that can
understand ADQL and generate VOTable format
result?? - we adopt two ways!
- Reconstruct the existing catalog DBMS
- Encapsulate search program,like pmm
- The CDS has offered search program for big
catalog like USNO A2,0.. -
19Catalog data service based on DB
GT3 Interface
VOTable
ADQL
VOTable Wrapper
ADQL/SQL Translator
SQL
ResultSet
JDBC
Catalog/metadata
DBMS
20Advantage and disadvantage
- Can sufficiently use the functions of SQL
language and implement complex query. - DBMSs offer the most powerful functions for data
management and maintenance. - Need many works to reconstruct the DBs.
- To big catalogs, like USNOB1.0,2MASS PSC, query
efficiency is low
21(No Transcript)
22Data service based on search program
GT3 Interface
ADQL
VOTable
VOTable Wrapper
ADQL Translator
parameters
stream
JNI/
program
Data Files
23Advantage and disadvantage
- Positional search is quicker than DB
- Only offer search functions that programs could
offer. Many programs only offer position search
functions,no statistical functions.
24Catalog Access Service Provided by us
25How to call a Catalog data service
Resource Registry
1.ltFind Factorygt
ltregistrygt
2.ltFactory GSHgt
Data Service Factory
3.ltcreate data servicegt
Grid Client
4.ltData service GSHgt
Create Data service
5.ltdata request(ADQL)gt
Data Service Instance
Database
6.ltresult (VOTable)gt
26Use Data Service to build www service for end user
Web Client
End user dont know where the data services are
http
Web server
Data Mining Service
Data processing Service
Data Visualization Service
Grid Client
Resources Register
Services Register
MySQL
Oracle 9i
Files
27Use data service to create other service
- Our next work is to build a
- multi-wavelength cross-identification service
(MWCI)based on the catalog data service. - What is multi-wavelength cross-identification ?
- To cross-identify datasets by positional
consistency, we can understand objects from
different wavelength properties.
28The steps of multi-wavelength cross-identification
- Cross-identify datasets from different
wavelengths within error radius. - Divide the result of cross-identification into
three situations one-to-one, one-to-two,
one-to-many. - Choose the one-to-one entry for data mining
- The other two situations need statistical
analysis to determine which source are the true
counterpoint.
29Requirements
- Locate the datasets that users want to use.
- (dataset discovery)
- How to cross-match the datasets in heterogeneous
DBMSs at different locations effectively and
efficiently. - Find storage resource to store the results
30 Registry
MWCI Factory
Data Service
2
4
2MASS
MWCI
1
5
MWCI Service Provider
User Application
. . .
3
6
. . .
5
Data Service
storage Factory
7
6
NVSS
4
storage
Storage Service Provider
31Outline
- VO,Grid and OGSA
- Build the catalog data access service
- Build the image mosaic service
- Faced technical difficulties
32Build The Image Mosaic Service
- Use DSS-I sky image build our first image mosaic
service. -
33the definition of interface of service
- Input parameters
- 1.RA,2.Dec,3.image height,4.image width
- transport protocols gridFTP
- Output Data format fits
34Realization of DSS-I image mosaic service
GridFTP
GT3 Interface
JNI/
Fits file
parameters
GetImage
DSS-I Image Files
35Outline
- VO,Grid and OGSA
- Build the catalog data access service
- Build the image mosaic service
- Faced technical difficulties
36Technical Difficulties
- service/resource registry and discovery!
- ADQL2SQL translator
- protocol shortcoming
37protocol shortcomings
- The shortcomings of VOTable 1.0 protocol
- 1.How to encapsulate result of join query!!
- 2.The standard to encapsulating spectrum data
- 3.the definition of FIELD element is not
strict and uncompleted - The shortcoming of UCD
- 1.Cant express concrete meaning,such as
ERROR ,Error for what?? - 2. incomplete, exampleHTMID has no UCD
- Lack of standard for Unit
38Thank You
Q A
?
www. .org
39Our provided catalogs in Catalog Service
40The Step Of Calling A Data Service
41Transparencies for Astro Data Access
- Heterogeneity Transparency
- Name Transparency
- Distribution Transparency
42What is Grid Service?
43What Is The Data Grid
- DataGrid A dynamic logical namespace that
enables coordinated sharing of heterogeneous
distributed storage resources and digital
entities based on local and global policies
across administrative domains in a virtual
enterprise. - DataGrid
- Logical name space for location independent
identifiers - Abstractions for storage repositories,
information repositories, and access APIs - Latency management
44Using a Data Grid in Abstract
Data Grid
- User asks for data from the data grid
45(No Transcript)