Title: Data Management and grid methods in astronomy
1Data Management and grid methods in astronomy
- Case Studies from the ANU Supercomputer Facility
2Overview
- Taking Astronomy online
- Deploying shrink-wrapped solutions
- Building to a standard
- Custom implementation
3Taking Astronomy OnlineAstronomy today is
End-to-End data management
4Case Study 1Shrink-wrapped Solutions
- 2MASS The Two Micron All Sky Survey
- Catalogue of 473 million astronomical objects
- Collected in near infrared 99.998 of the sky
- From Arizona and Chile between 1997 and 2001
- Distributed on DVD as SQL schema and dumps
- Setup database and tables using schema
- Ingest data from dump files
- Open database to user queries
5Case Study 1 (continued)Shrink-wrapped Solutions
- SDSS Sloan Digital Sky Survey
- 1.5 deg2 images covering 25 of sky from New
Mexico - 215 million objects (SDSS data release 5 , 1 July
2005) - Distributed as SkyServer website
- Microsoft .NET website
- Microsoft SQL backend database
- Setup .NET and SQL servers
- Download and install SkyServer site
- Download and ingest data
- Open up website for internet access
6Case Study 1 (continued)Shrink-wrapped Solutions
- Issues
- Data volumes
- 2MASS data distribution 43GB 163 GB as SQL DB
- SDSS data distribution 2.5TB 4TB as SQL DB
- Access and operations
- 2MASS SQL database setup, user access
- SDSS website installation, configuration
issues - Updates
- 2MASS static data-set from completed project
- SDSS periodic new data releases (cumulative)
and website software updates
7Case Study 2Building to a Standard
- The Virtual Observatory (VO)
- A data and services grid for Astronomy
- Standardised
- Data storage and exchange formats
- Service interfaces and protocols
- Tools and analysis software
- Publication services, registries and portals
- International Virtual Observatory Alliance (IVOA)
- Developing and publishing consensual VO standards
8Case Study 2 (continued)Building to a Standard
- VO standard services for the MACHO data
- 127,000 0.5 deg2 images (7TB) collected from
1991 to 2003 by ANU Mount Stromlo 50 telescope - ConeSearch service to IVOA standard
- Locate all images within a given circle on the
sky - Input
- RA and DEC (ie x,y) coordinates plus search
radius - Output
- VOTable (XML) metadata for all candidate images
- Including URLs which enable image archive
retrieval - Implementation
- Ruby-on-Rails Web Service Base URL parameters
HTTP Get
9Case Study 2 (continued)Building to a Standard
- MACHO ConeSearch example
- Query
- http//macho.anu.edu.au/image/conesearch?RA275.11
DEC-25.46SR0.5 - Response
ww.ivoa.net/xml/VOTable/v1.1 ID"MACHO_OBS_SEARCH_
RESULTS" NVO VOTable format
document containing MACHO Image search results.
ID"J2000" equinox"J2000." epoch"J2000."/
name"MACHO Observation search results"
Ordered list of MACHO Observation
records as selected by given parameters
ON
ID"dec" name"dec"/
value"275.11" ID"ra" name"ra"/
datatype"char" value"0.5" ID"sr" name"sr"/
Observation combined metadata data table"
MACHO Observation combined metadata
data table
datatype"int" unit"---" ID"OBSID" name"Obs Id
ucd1" width"6" ucd"meta.idobs"
name"Obs Time" width"8" ucd"time.epoch"
arraysize"8"
datatype"char" unit"gmt" ID"DATEOBS" name"Obs
Date" width"10" ucd"time.epoch"
arraysize"10" ....
10Case Study 2 (continued)Building to a Standard
- Issues in MACHO ConeSearch development
- Data curation
- Metadata content addition of WCS locators
- Metadata ontology determination of UCDs
- Metadata format conversion to VOTable
- Data delivery
- Accessing stateful HSM data store via stateless
HTTP protocol avoiding timeouts, tracking
requests - Service publication
- Service metadata for different registries
11Case Study 3Custom Implementation
- The Southern Sky Survey (S3)
- 100 of southern sky imaged 3 to 6 times by the
ANU Siding Springs SkyMapper telescope - 250,000 5.7 deg2 512MB images
- Two planned data releases with no embargo period,
direct to the Virtual Observatory - Foundation southern sky resource for Virtual
Observatory
12Case Study 3 (continued)Custom Implementation
- S3 data capture
- Images read out to local 5TB RAID array
- Trickled via gigabit link to ANUSF
- Max 1.5TB per night, 5MB/s transfer required
- Use Lightweight Data Replicator (LDR)
- Underlying mechanism is Globus GridFTP
- File replicas tracked with Globus RLS
- Sweeper script scans for new observations and
registers these with LDR maintains disk cache by
removing older transferred files
13Case Study 3 (continued)Custom Implementation
14Case Study 3 (continued)Custom Implementation
- S3 data processing
- Science Data Pipeline System (SDPS)
- Running on APAC-NF SGI Altix cluster at ANUSF
- Image calibration and metadata augmentation
- Photometric standardisation and reduction
- Image and object catalogues generation
- C/Perl with PostgreSQL science engineering DBs
- Generated data products hosted on ANUSF Massdata
store
15Case Study 3 (continued)Custom Implementation
16Case Study 3 (continued)Custom Implementation
- S3 data exploration
- Engineering Data Web Interface
- Project personnel only
- Data quality assurance
- Scheduler manual intervention
- PHP Perl accessing PostgreSQL engineering DB
- Science Data Web Interface
- IVOA standards compliant search and delivery
- Access to object catalogues and images
- Ruby-on-Rails accessing PostgreSQL science DB
17Case Study 3 (continued)Custom Implementations
- S3 data management issues
- Data capture
- Custom integration of instrument and storage
using data-grid technologies. - Tune TCP window sizes and reduce MD5 checksum
checking to obtain required 5MB/s transfer rate - Data processing
- Collation of images taken months apart, thus
offline on HSM system coordination of data
staging - Data exploration
- Curation ontological augmentation via UCDs
- VOTable generation
18Acknowledgements
- Jonathan McCabe, ANU Supercomputer Facility
- Stephen McMahon, ANU Supercomputer Facility
- Tim Preston, ANU Mount Stromlo Observatory
19References
- Data Intensive Science Needs for Australian
Astronomy, Discussion Paper, Peter Quinn, UWA,
Sept. 2007. - 2MASS http//www.ipac.caltech.edu/2mass
- SDSS http//www.sdss.org
- IVOA http//www.ivoa.net
- IVOA Standards http//www.ivoa.net/Documents
- MACHO http//wwwmacho.anu.edu.au
- MACHO VO Services http//macho.anu.edu.au
- S3 http//www.mso.anu.edu.au/skymapper