AstroGrid Datacenters - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

AstroGrid Datacenters

Description:

Able to proxy; 3rd parties can publish data without requiring more work from ... email/file/ftp/myspace. AstroGrid. CEA. SkyNode. JSP. Publishers' AstroGrid Library ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 23
Provided by: Marti228
Category:

less

Transcript and Presenter's Notes

Title: AstroGrid Datacenters


1
AstroGrid Datacenters
  • AstroGrid Consortium Review
  • Dec 2004
  • Martin Hill (AstroGrid_at_ROE)

2
Outline
  • Challenge
  • Approach
  • Developed
  • Storepoints
  • Describing data
  • Query Language
  • Status
  • Versioning
  • Software Publishers AstroGrid Library

3
Problem Challenge Outline
  • Large datasets (to Petabytes)
  • So?
  • Distributed Science comes from combining
  • Bandwidth rising slower than
  • No/few established suitable standards
  • FITS images/tables. Ambiguous headers.
    Ambiguous subformat, eg spectra.
  • VOTable introduced. Ambiguous subformat eg
    spectra vs catalogue. Verbose.
  • No/few established common terms
  • Involves Scientists

4
Approach Publishers AstroGrid Library
  • General solution to
  • Discover problems faced, accumulate solutions in
    software
  • Experimentally publish sets and types (not host).
  • Many smaller datasets owned by people without web
    skills (eg solar) so
  • Need 'easy/unskilled installation
  • Able to proxy 3rd parties can publish data
    without requiring more work from owner (eg
    VizieR, Trace)
  • Free website, range of standard interfaces
  • Danger too general (any query against any
    dataset producing any results).

5
Existing Solutions
  • Common task publish RDBMs to web
  • Accumulated tools skill-sets
  • No combined solution offering
  • Standard interface (eg query language)
  • Scientific values (errors, units)
  • Spatial querying (common)
  • VO Metadata for query and results

6
Developing Standards
  • Resource metadata
  • Query language (ADQL/s, ADQL/x)
  • Web interfaces
  • Working beyond standards
  • ? Feeding research to IVOA
  • Parallel development
  • In the VO eg Starlink, NVO, VizieR
  • External SRB, Taverna, GridPP monitor
  • Convergence

7
Protocols Interfaces
  • Human web pages
  • SOAP
  • Toolkit Incompatibilities
  • Streaming awkward (via Toolkits)
  • Longer term benefits?
  • Raw Http post (eg servlets, CGI)
  • Simpler
  • More existing skills amongst Astronomers
  • Mixed (eg SIAP, SkyNode)
  • ? Dont Choose Implement
  • Mix Match, Plug Play

8
(No Transcript)
9
(No Transcript)
10
Releasing
  • Deploy early if temporarily
  • Independent Integrated Access
  • Versioning
  • Servers clients, ie new clients can still use
    old servers, and new servers work with old
    clients.
  • Add and deprecate, dont change
  • Delete intelligently
  • (Remove quickly unused i/fs, eg CEA if CEA
    upgrades, JSPs)
  • Need hosts
  • Hosts need hardware
  • Publishers need to know their data

11
Describing Data
  • Registry Resource documents
  • IVO Tabular Sky Service
  • Units, UCDs
  • Solar vs Sky vs
  • Images vs Catalogues
  • Concept extended for RdmsMetadata
  • UCD1 - Dictionaries Ontologies
  • Relationships (simple errors)
  • Queryable
  • Mirrors vs Copies

12
Query Language
  • SQL - ADQL/xml
  • Defined common functions CIRCLE XMATCH (sky
    not solar)
  • Working on
  • XQL
  • Units
  • Investigating UCDs instead of columns
  • Cross-dataset querying

13
Results
  • QueryMetadataRawResults VoResults
  • FITS vs VOTable vs HDF vs CSV vs HTML vs
  • ? All of them
  • Results - queryable data - inputs

14
Data Analysis
(Clive Page)
  • Faster ? feasible
  • Joins
  • Polar coordinate matches ( HTM, HealPix).
  • Cross-match algorithms
  • Distributed queries
  • Breaking down query
  • Moving the right data
  • Combining the results

15
Status
  • Readily available
  • Debugging developer
  • Debugging astronomer
  • Inform User

16
Storepoints
  • No data persistence at PALs
  • Web server machines not data storage ones
  • Large result sets
  • No workspace, memory models, etc
  • ? Streaming outputs
  • SRB, GridFTP not ready.

17
Identifying Storepoints
  • Concepts

MySpace
Community HomeSpace
SRB
FTP
FTP
VoSpace (Registered)
SRB
GridFTP
MySpace
SRB
GridFTP
HTTP
  • ? FTP, File, MySpace extend.
  • 3rd iteration 2nd in use

18
(No Transcript)
19
Data Service Architecture
JSP
SIAP
CEA
Axis
AstroGrid
Plugin Manager
SkyNode
Cone
Datacenter Implementation
Slinger
/XML/CSV zip/plain email/file/ftp/myspace
20
Publishers AstroGrid Library
  • Easy to publish to the VO
  • Web Application, includes
  • SOAP (AstroGrid, CEA, prepped for SkyNode)
  • CGI (SIAP, NVO-cone search, SSA)
  • HTML pages (cone search, query builder, status
    monitor)
  • Features
  • Asynchronous (stateful) Synchronous Queries
  • Queues
  • Comprehensive Status (incl historical)
  • Variety results
  • Fully Streamed no curation issues
  • Server Plugins, including
  • RDBMS (JDBC)
  • FITS file collection
  • eXist (XML)
  • Helper Tools
  • Metadata Generators
  • Ready-made website access

21
Situation Now
  • Installed
  • SuperCOSMOS Science Archive (RDBMS)
  • astrogrid.roe.ac.uk8080/pal-ssa/
  • astrogrid.roe.ac.uk8080/pal-twomass/
  • astrogrid.roe.ac.uk8080/pal-usnob/
  • 6dF Spectra
  • grendel12.roe.ac.uk8080/pal-6df/
  • Wide Field Survey
  • TRACE (FITS files, Solar, under test)
  • Proxy (bespoke special plugins)
  • All NVO-cone-compatible DBs (test)
  • VizieR
  • Evaluated/ing at
  • ESO
  • RAL (solar)
  • JBO (Merlin)
  • Reviewing Query Language, metadata documents, etc

22
Future
  • Quality
  • Metadata wizards
  • Sell to hosts deploy to Leicester, JBO, ESO,
    RAL, The World....
  • Explicit and Investigative Queries
  • Distributed queries combining results (NVO Exec
    plans)
  • Full SIA, SSA interface
  • More user admin web pages
  • Local authorisation
Write a Comment
User Comments (0)
About PowerShow.com