Data Management and grid methods in astronomy - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Data Management and grid methods in astronomy

Description:

Shrink-wrapped Solutions. 2MASS The Two Micron All Sky Survey ... Distributed on DVD as SQL schema and dumps. Setup database and tables using schema ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 20
Provided by: JGS56
Category:

less

Transcript and Presenter's Notes

Title: Data Management and grid methods in astronomy


1
Data Management and grid methods in astronomy
  • Case Studies from the ANU Supercomputer Facility

2
Overview
  • Taking Astronomy online
  • Deploying shrink-wrapped solutions
  • Building to a standard
  • Custom implementation

3
Taking Astronomy OnlineAstronomy today is
End-to-End data management
4
Case Study 1Shrink-wrapped Solutions
  • 2MASS The Two Micron All Sky Survey
  • Catalogue of 473 million astronomical objects
  • Collected in near infrared 99.998 of the sky
  • From Arizona and Chile between 1997 and 2001
  • Distributed on DVD as SQL schema and dumps
  • Setup database and tables using schema
  • Ingest data from dump files
  • Open database to user queries

5
Case Study 1 (continued)Shrink-wrapped Solutions
  • SDSS Sloan Digital Sky Survey
  • 1.5 deg2 images covering 25 of sky from New
    Mexico
  • 215 million objects (SDSS data release 5 , 1 July
    2005)
  • Distributed as SkyServer website
  • Microsoft .NET website
  • Microsoft SQL backend database
  • Setup .NET and SQL servers
  • Download and install SkyServer site
  • Download and ingest data
  • Open up website for internet access

6
Case Study 1 (continued)Shrink-wrapped Solutions
  • Issues
  • Data volumes
  • 2MASS data distribution 43GB 163 GB as SQL DB
  • SDSS data distribution 2.5TB 4TB as SQL DB
  • Access and operations
  • 2MASS SQL database setup, user access
  • SDSS website installation, configuration
    issues
  • Updates
  • 2MASS static data-set from completed project
  • SDSS periodic new data releases (cumulative)
    and website software updates

7
Case Study 2Building to a Standard
  • The Virtual Observatory (VO)
  • A data and services grid for Astronomy
  • Standardised
  • Data storage and exchange formats
  • Service interfaces and protocols
  • Tools and analysis software
  • Publication services, registries and portals
  • International Virtual Observatory Alliance (IVOA)
  • Developing and publishing consensual VO standards

8
Case Study 2 (continued)Building to a Standard
  • VO standard services for the MACHO data
  • 127,000 0.5 deg2 images (7TB) collected from
    1991 to 2003 by ANU Mount Stromlo 50 telescope
  • ConeSearch service to IVOA standard
  • Locate all images within a given circle on the
    sky
  • Input
  • RA and DEC (ie x,y) coordinates plus search
    radius
  • Output
  • VOTable (XML) metadata for all candidate images
  • Including URLs which enable image archive
    retrieval
  • Implementation
  • Ruby-on-Rails Web Service Base URL parameters
    HTTP Get

9
Case Study 2 (continued)Building to a Standard
  • MACHO ConeSearch example
  • Query
  • http//macho.anu.edu.au/image/conesearch?RA275.11
    DEC-25.46SR0.5
  • Response

ww.ivoa.net/xml/VOTable/v1.1 ID"MACHO_OBS_SEARCH_
RESULTS" NVO VOTable format
document containing MACHO Image search results.

ID"J2000" equinox"J2000." epoch"J2000."/

name"MACHO Observation search results"
Ordered list of MACHO Observation
records as selected by given parameters
ON
ID"dec" name"dec"/
value"275.11" ID"ra" name"ra"/
datatype"char" value"0.5" ID"sr" name"sr"/

Observation combined metadata data table"
MACHO Observation combined metadata
data table
datatype"int" unit"---" ID"OBSID" name"Obs Id
ucd1" width"6" ucd"meta.idobs"

name"Obs Time" width"8" ucd"time.epoch"
arraysize"8"
datatype"char" unit"gmt" ID"DATEOBS" name"Obs
Date" width"10" ucd"time.epoch"
arraysize"10" ....
10
Case Study 2 (continued)Building to a Standard
  • Issues in MACHO ConeSearch development
  • Data curation
  • Metadata content addition of WCS locators
  • Metadata ontology determination of UCDs
  • Metadata format conversion to VOTable
  • Data delivery
  • Accessing stateful HSM data store via stateless
    HTTP protocol avoiding timeouts, tracking
    requests
  • Service publication
  • Service metadata for different registries

11
Case Study 3Custom Implementation
  • The Southern Sky Survey (S3)
  • 100 of southern sky imaged 3 to 6 times by the
    ANU Siding Springs SkyMapper telescope
  • 250,000 5.7 deg2 512MB images
  • Two planned data releases with no embargo period,
    direct to the Virtual Observatory
  • Foundation southern sky resource for Virtual
    Observatory

12
Case Study 3 (continued)Custom Implementation
  • S3 data capture
  • Images read out to local 5TB RAID array
  • Trickled via gigabit link to ANUSF
  • Max 1.5TB per night, 5MB/s transfer required
  • Use Lightweight Data Replicator (LDR)
  • Underlying mechanism is Globus GridFTP
  • File replicas tracked with Globus RLS
  • Sweeper script scans for new observations and
    registers these with LDR maintains disk cache by
    removing older transferred files

13
Case Study 3 (continued)Custom Implementation
  • S3 data capture

14
Case Study 3 (continued)Custom Implementation
  • S3 data processing
  • Science Data Pipeline System (SDPS)
  • Running on APAC-NF SGI Altix cluster at ANUSF
  • Image calibration and metadata augmentation
  • Photometric standardisation and reduction
  • Image and object catalogues generation
  • C/Perl with PostgreSQL science engineering DBs
  • Generated data products hosted on ANUSF Massdata
    store

15
Case Study 3 (continued)Custom Implementation
  • S3 data processing

16
Case Study 3 (continued)Custom Implementation
  • S3 data exploration
  • Engineering Data Web Interface
  • Project personnel only
  • Data quality assurance
  • Scheduler manual intervention
  • PHP Perl accessing PostgreSQL engineering DB
  • Science Data Web Interface
  • IVOA standards compliant search and delivery
  • Access to object catalogues and images
  • Ruby-on-Rails accessing PostgreSQL science DB

17
Case Study 3 (continued)Custom Implementations
  • S3 data management issues
  • Data capture
  • Custom integration of instrument and storage
    using data-grid technologies.
  • Tune TCP window sizes and reduce MD5 checksum
    checking to obtain required 5MB/s transfer rate
  • Data processing
  • Collation of images taken months apart, thus
    offline on HSM system coordination of data
    staging
  • Data exploration
  • Curation ontological augmentation via UCDs
  • VOTable generation

18
Acknowledgements
  • Jonathan McCabe, ANU Supercomputer Facility
  • Stephen McMahon, ANU Supercomputer Facility
  • Tim Preston, ANU Mount Stromlo Observatory

19
References
  • Data Intensive Science Needs for Australian
    Astronomy, Discussion Paper, Peter Quinn, UWA,
    Sept. 2007.
  • 2MASS http//www.ipac.caltech.edu/2mass
  • SDSS http//www.sdss.org
  • IVOA http//www.ivoa.net
  • IVOA Standards http//www.ivoa.net/Documents
  • MACHO http//wwwmacho.anu.edu.au
  • MACHO VO Services http//macho.anu.edu.au
  • S3 http//www.mso.anu.edu.au/skymapper
Write a Comment
User Comments (0)
About PowerShow.com