Title: VO Query Language
1VO Query Language
- GSFC XML Group
- Ed Shaya
- Brian Thomas
- Kirk Borne
2VOQL Requirements
- Provide a means for users to submit general
requests for astronomical information from a
distributed set of repositories. - Allow for the science use cases.
- Easy to learn and use
- Hide from the user obvious but tedious steps
- May require several levels o f language with only
the top level being easy. - Allow for web form entry.
- Independent of internal arrangement of data at
repositories. - Plug-n-play metadata and ontology.
- Span a distributed set of heterogeneous services.
- Each VO query can transform to multiple queries
in local dialects. - Workflow of interactions between registries,
services, and user. - Integration of multiple responses
3More VOQL Requirements
- Easy to parse and transform into other forms
- Extensible
- Sites can extend query language through local
namespaces - VO namespace can add language elements into the
future.
4XML Query Language
- Compatible XML and Human-Readable versions
- Xquery is a superset of Xpath
- Based on Quilt, XQL, and XML-QL
- Quilt is based on Object Query Langauge (OQL)
- OQL is based on Structured Query Language (SQL)
- If,then,else case switch basic functions
define new functions - FLWR (for, let, where, return)
- for i in (1 to 3)
- let j (1 to i)
- Results in
- i 1, j 1
- I 2, j (1,2)
- I 3, j (1,2,3)
5XQuery Continued
- for s in document('bright_stars.xml'')//id_main
- let b document('photometry.xml'')//starname
s/band - where count (b) gt 1
- return
- ltcolorsgt
- ltstarNamegtilt/starnamegt
- for j in (2 to count(b))
- ltcolor namebj_at_name - bj-1_at_namegt
- bj/value - bj-1/value
- lt/colorgt
- lt/colorsgt
6(No Transcript)
7OLAP/XMLA
- On-line Analytical Processes
- Reduces bandwidth/time of data out
- Statistical Package add on to Databases
- Analysis of DataCubes
- Hierarchy of Axis Values
- Years, Months, Days, Hours, minutes
- Degrees, minutes, seconds
- Interior, core, mantle, atmosphere, mesosphere,
exosphere
8JVO Query Language Naoki Yasuda
- Retrieves catalog data and images from multiple
data servers via a single user interface - Extension of SQL
- Catalog.UCD
- Box(Point(c1.ra,c1.dec), width1,height1)
- XMATCH(c1,c2,!c3,)lt 3 arcsec
- Select Catalog Keyword1 Keyword2
- Select by MAXMIN(PROPERTY) ALL NAME
- Area insideoutside area0
- Area1 overlapunion area2 shape
- SHAPE box, circle, oval, triangle,point
- DIFF(x.obs_date, y.obs_date) gt 30 days
9Data mining
- Beyond finding data intense data filtering,
conditioning, knowledge synthesis. - Grid Services?
- Principal Component Analysis
- Iterative solutions
- Genetic algorithms
- Maximum-likelihood functions
- Neural nets
- Decision trees
- Cluster analysis
- Regression analysis
10Data Objects
- Dataset
- Tables
- Fields
- Units
- Class (UCD)
- Range
- Values
- Images
- Axes
- Coordinate Maps
- Data Values
- Spectra
- Wavelength
- Intensity
11ADQL
- Obtain Data Sets
- By bibliographic query
- Author, date published, title, journal, volume
- By description
- Keywords, abstract, mission name
- Obtain tables
- By title, table , field names
- By Xpath
- /LocalGroup/galaxyM31/region7/v-band
- Obtain table data by UCDs or field names
- Min/max of range, regular expression
- Obtain N-cube data
- Subset by axis values,
- subset by ra,dec, radius or more generally
Func(axes1..)
12Astronomy Data Query Language (ADQL)
13ADQL/Query Schema
14Knowledge Based Query
- Class ? Instance? Objects
- Property (V-band) ? Instance ? value (-1.4)
- Measurement property values are Data
- Modifier (aperture) ? Instance ? value (3 arcsec)
- Modifier (inequality) ? Instance ? value (before,
not) - Aggregate property member, region, component
- Values are bags of objects
- SubclassOf property subclass has restricted
property value range or restricted list of
properties. - Property Space N-properties form a space.
- A bit of math is needed to relate values.
15Problem Statement Language Root
16PSL Constraint
17PSL AstroObject
18Dataset Schema
ltdataset subject"astronomy"gt lttitlegtAC 2000.2
The Astrographic Catalogue on the Hipparcos
Systemlt/titlegt ltaltname type"ADC"gt1275lt/altnamegt
ltaltname type"CDS"gtI/275lt/altnamegt ltaltname
type"brief"gtThe AC 2000.2 Cataloguelt/altnamegt ltre
ferences type"source"gt
ltreferencegt lttitlegtAC 2000.2 The Astrographic
Catalogue on the Hipparcos Systemlt/titlegt ltauthor
gtltinitialgtSlt/initialgtltinitialgtElt/initialgtltlastName
gtUrbanlt/lastNamegtlt/authorgt ltauthorgtltinitialgtTlt/in
itialgtltinitialgtElt/initialgtltlastNamegtCorbinlt/lastNa
megtlt/authorgt ltauthorgtltinitialgtGlt/initialgtltinitial
gtLlt/initialgtltlastNamegtWycofflt/lastNamegtlt/authorgt
ltauthorgtltinitialgtElt/initialgtltlastNamegtHoeglt/lastNa
megtlt/authorgt ltauthorgtltinitialgtClt/initialgtltlastNam
egtFabriciuslt/lastNamegtlt/authorgt ltauthorgtltinitialgt
Vlt/initialgtltinitialgtVlt/initialgtltlastNamegtMakarovlt/
lastNamegtlt/authorgt ltjournalgtltnamegtAstron.
J.lt/namegtltvolumegt115lt/volumegtltpagenogt1212lt/pagenogt
ltdategtltyeargt1998lt/yeargtlt/dategtltbibcodegt1998AJ..
..115.1212Ult/bibcodegt lt/journalgt
lt/referencegt lt/referencesgt
19Dataset Continued
ltkeywords xmlbasehttp//adc.gsfc.nasa.gov/keywo
rdLists/adc/ parentListURL"adc_keywordList.html"gt
ltkeyword xlinkhref"kw_p.htmlPositional_data"gt
Positional datalt/keywordgt ltkeyword
xlinkhref"kw_a.htmlAstrographic_zones"gtAstrogra
phic zoneslt/keywordgt ltkeyword xlinkhref"kw_s.ht
mlSurveys"gtSurveyslt/keywordgt lt/keywordsgt ltdescrip
tionsgt ltdescriptiongt ltparagt The AC 2000.2
is a revised version of the 1997 release of the
AC 2000 (Cat. ltI/247gt). It was decided that the
availability of an improved reference catalogue
and the inclusion of photometry from the Tycho-2
catalogue would be sufficient to warrant a
complete re-reduction of the data and a new
distribution of the catalogue. The AC 2000.2
catalog contains positions of 4,621,751 stars at
the average epoch of plate exposures for each
star (average 1907). lt/paragt lt/descriptiongt
20Case Study 0 Setting up the Query
- Return RA, Dec, Vmag for stars with 13ltVmaglt15
and 101253.5ltRAlt131343 and 183800ltDElt
184000. - PSL
- ltobject classstargt
- ltproperty nameVmaggt
- ltrange min13 max15/gt
- ltvaluegt?vmaglt/valuegt
- lt/propertygt
- ltproperty nameRAgt
- ltrange min101253.5 max131343/gt\
- ltvaluegt?ralt/valuegt
- lt/propertygt
- ltproperty nameDEgt
- ltrange min183800 max184000/gt
- ltvaluegt?delt/valuegt
- lt/propertygt
- lt/objectgt
21Case Study 0 Mapping Query to Metadata
- Search for tables with metadata that satisfy
- Object/classstar search-gt keyword,
description - Property_at_nameVmag search-gt field/UCD, name
- Property_at_nameRA search-gt field/UCD, name
- Property_at_nameDE search-gt field/UCD, name
- Property/range search-gt field/min and field/max
or coverage attributes - For all such tables, return
- ?vmag, ?ra, ?de
- Also, return group/field_at_nameerror for group
with Vmag info.
22Problem Statement Language (PSL)
PSL Pull down AndConstrainties, Andproperties
Property Name Pull Down Name, Class, etc.
MathML Pull down ,-,/,,sum,avg,lt,gt, etc
- Begin RequestConstraint
- Â Â Â Â Â Â Find astronomical objects with the
following properties         AND these
properties             1. Name assign to
var1Â Â Â Â Â Â Â Â Â Â Â Â 2. Class is "cluster of
galaxies galaxy cluster"Â Â Â Â Â Â Â Â Â Â Â Â 3.
Measurement quantities satisfy           Â
     a. X-ray brightness gt 3.3E7Jy  assign to
var2Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 1. Time interval of
measurement 1998Y-1999Y Â Â Â - Â Â Â Â Â Â Â Using the above variables satisfy, the
math formulae - Â Â Â Â Â Â Â Â Â Â Â Â 1. (var2 var3) lt (var1
logvar4) OR these constraints     Â
  several constraints for which one must
be true etc Return a table with the following
sequence of fields  var1   var2  - End Request
23Brian Thomas Infrastructure
24Tony Lindes Infrastructure
- VO activity
- User
- Problem Assistant service to help user state
the problem - Ontology terms and relationships derived from
existing data - Workflow to retrieve data, merge it, analyze
it, reduce it - Registry lists all services and their high
level metadata - Job Control decides which jobs and when
- Data Centre receiver of query for all internal
data sources - Data Source Service uses translator to restate
query - Translator from data query language to
implemented service - Languages
- Problem Statement Language (PSL)
- Workflow Language (WFL)
- Astronomical dataset Query Language (ADQL)
- Ontology Query Language (OQL)
- Registry Query Language (RQL)
25Conclusion
- Metadata should clearly distinguish between
values that are property values and those that
are modifiers of properties. - Then, a mapping from a natural(ish) scientific
knowledge based language (PSL) to a request
language for data-center common items (ADQL) is
possible. - A federated system with a VO-wide vocabulary plus
specialized (local) namespaces is best for
getting started right away and permitting for
evolution.