Title: VOEnabling the MACHO data
1VO-Enabling the MACHO data
- ANU Supercomputer Facility
- Data-Grids Group
- Jon Smillie
- Data.Grids_at_anu.edu.au
2Overview
- What is the Virtual Observatory?
- The MACHO data-set in review.
- VO-enabling the MACHO data-set.
- Web-interface
- Metadata
- Images
- Future projects.
3What is the Virtual Observatory?
- Worldwide Astronomy Data Online
- Dont go to a telescope, go online!
- Search and process multiple data-sets,
wavelengths, and instruments. - Mine existing data-sets first
- Web interfaces, plus
- Data/Metadata standards
- Data-mining, visualisation
- Theory/Modelling
- and more
4What is the Virtual Observatory?
- A distributed international community
- International IVOA
- International Virtual Observatory Alliance
- United States NVO
- The National Virtual Observatory
- United Kingdom AstroGrid
- Australia AusVO
- 4 way ARC-LIEF partnership in 2003
- 10 way ARC-LIEF partnership in 2004
- APAC Grid-program Application support project
2004-2006 - Mix of top-down prescriptive infrastructures
and bottom-up collaboration.
5Overview
- What is the Virtual Observatory?
- The MACHO data-set in review.
- VO-enabling the MACHO data-set.
- Web-interface
- Metadata
- Images
- Future projects.
6The Macho Project
- Dark Matter search
- gravitational microlensing
- Lightcurves for 20 million stars spanning 10
years - 100,000 74MB two colour 0.5 sq-deg images of
LMC, SMC and Galactic Bulge - Utilised Mt Stromlo 50 Great Melbourne
Telescope
7Macho Data at ANUSF
- 10TB calibrated images, photometry and
time-series lightcurve data - Plus instrument calibration data, metadata, etc
- Online on APAC-NF Mass Data Storage System at ANU
Supercomputer Facility - Data was streamed nightly direct from telescope
at Mt Stromlo
8Overview
- What is the Virtual Observatory?
- The MACHO data-set in review.
- VO-enabling the MACHO data-set.
- Web-interface
- Metadata
- Images
- Future projects.
9Macho Data VO-enabled
- The objective
- Put the right interfaces on the data
- Bottom-up integration with the VO
- Present the data in a VO standardised way
- Thus available to compatible tools
- Searchable web-interface out now
- Upgrade metadata formats
- Legacy XML metadata construction
- Rationalisation of FITS headers
- Augmentation of FITS headers with WCS data
- Delivery of VO-Table 1.0 format metadata
- Convert images to VO compliant FITS files
10Overview
- Who is ANU Internet Futures?
- What is the Virtual Observatory?
- The MACHO data-set in review.
- VO-enabling the MACHO data-set.
- Web-interface
- Metadata
- Images
- Future projects.
11Macho Data Web Interface
- Web interface at wwwmacho.anu.edu.au
- Apache web server
- Zebra metadata server Z39.50 metadata
- ProFTPD ftp server FTP data delivery
12Macho Data Web Interface
- Demo MACHO Lightcurve Extracter
- Eg Lightcurve for 207.16604.214
- PS Trainable lightcurve browser by Margaret Kahn
and Markus Hegland
13Overview
- Who is ANU Internet Futures?
- What is the Virtual Observatory?
- The MACHO data-set in review.
- VO-enabling the MACHO data-set.
- Web-interface
- Metadata
- Images
- Future projects.
14Macho Data Legacy Metadata
- Existing metadata spread across multiple binary
databases accessed via a library of C/C
utilities legacy of original data acquisition
system. - This data being collated into one XML set
- Taking all opportunities to refine, correct and
enhance - XML metadata will be foundation of all search and
analysis interfaces to MACHO data to come. - Painstaking process
- Missing data needs to be reconstructed
- Data format needs to be made uniform across
entire data-set formats changed organically
through 10 years of project - Erroneous data needs to be detected and corrected
15Macho Data FITS Metadata
- MACHO images contained in FITS file
- Flexible Image Transport System Astro standard
- FITS headers carry metadata related to image.
- FITS headers of all images will be aligned with
XML metadata (remember, 120,000 images!) - Technology
- Python code
- PyFITS python interface to FITS format files
- Python code walks the XML tree for each image,
loading values into FITS header as it goes.
16Macho Data WCS Calibration
- World Coordinate System Augmentation
- For each MACHO image
- Determine canonical image location on sky
- (relative to UCAC star catalogue)
- Add these locators to FITS header
- Allows unambiguous image location
- Free of MACHO/Telescope specifics
17Macho Data WCS Calibration
- World Coordinate System Augmentation the gory
details. - For each original MACHO Image
- Stars chosen using S-extractor.
- Stars identified from the UCAC2 catalog based on
head RA, DEC. - triangle matching algorithm used to match up
trios of stars (a triangle is scale and
rotationally invariant - so this code can be used
on almost any data without knowledge of image
orientation details). - Matching triangles are then fit with a linear
transformation of the form X2aX1bY1c
Y2dX1eY1f and the match considered complete
with at least 5 trios of stars are found in a
small region of 6 parameter space (a-gtf). - this linear transformation is used to match up
all stars from astrometric catalog with those
identified by S-extractor. - Using IRAF a 3rd order transformation is then
used to map from x,y on image to RA DEC on the
sky and this is written into the image header. - This code courtesy of Brian Schmidt of MSO.
18Macho Data Metadata VO-enabled
- Presenting metadata in VOTable 1.0
- Emerging international standard for VO
data-exchange - XML based. - Maximise opportunities for use of MACHO data by
VO search-engines, visualisation tools, etc. - Technology
- XSLT pipeline? Python/libxml?
- Legacy XML data-set -gt VOTable
- Convert en-masse, or on-demand?
- Storage vs. Transmission
19Macho Data Metadata VO-enabled
- Converting metadata to VOTable 1.0
- Issues
- UCD semantic mappings VO ontology
- According to VOTable standard Unified Content
Descriptor (UCD) needs to be associated with each
data field but which one? - Approx 1500 UCDs defined
- Eg 22 for time alone
- TIME Time and related Quantities TIME_AGE Age
TIME_CROSSING Crossing Time TIME_CYCLE Cycle
number of a variable object. TIME_DATE Date
(Julian Date or Heliocentric Julian date)
TIME_DELAY Time Delay TIME_DIFF Time Difference
O-C Residual TIME_DP/DT Period Rate Of Change
TIME_EPOCH Epoch TIME_EQUINOX Equinox
TIME_EVOLUTION Evolution Time TIME_EXPTIME
Exposure Time TIME_INTERVAL Time Interval
TIME_LIMIT Time Limit TIME_MISC Time in general
TIME_PERIOD Period TIME_PHASE Phase (in the
context of periodic variable objects) TIME_RATE
Rate Of Certain Phenomenon TIME_RELAXATION
Relaxation Time TIME_RESOLUTION Time resolution
TIME_SCALE Time Scale TIME_ZONE Time Zone - UCDs need to be verified by real astronomers
20Macho Data Images VO-enabled
- Convert images to VO compliant FITS files
- Construct multi-image ME-FITS files
- Original MACHO images
- Tar bundle of 16 FITS format files
- Legacy of MACHO camera system
- 8 CCDs per colour X 2 colours x 1 FITS file each
- After WCS processing
- 2 ME-FITS files, 1 per colour, each containing 8
images - Final stage of WCS conversion pipeline.
21Macho Data Conversion pipeline
Multi-image FITS Files
FITS Alignment WCS Augmentation
- Convert images to VO compliant FITS files
- 1. World Coordinate System Augmentation
- 2. Convert to Multi-image ME-FITS files
22Macho Data Conversion pipeline
- Conversion pipeline on APAC LC cluster
- Highly parallel process
- Stage original images from MDSS
- Spawn one job per image to perform WCS
augmentation and alignment of FITS header
metadata - Stage updated FITS files back to MDSS
23Overview
- Who is ANU Internet Futures?
- What is the Virtual Observatory?
- The MACHO data-set in review.
- VO-enabling the MACHO data-set.
- Web-interface
- Metadata
- Images
- Future projects.
24After MACHO Future VO Work
- WFI Wide Field Imager
- 8 CCD instrument in use at AAT and SSO 40
- 52x52 field of view on SSO 40
- Legacy data archive under construction
- Valuable VO resource ripe for VO-enabling.
- 4S - Stromlo Southern Sky Survey
- To use upcoming Skymapper telescope
- 5 sq-deg field of view images
- 30 times larger than full moon
- Data to stream direct to APAC MDSS
- 25TB of images expected
- VO Compliant data standards from day one
- Build on MACHO/WFI lessons and infrastructure
25Summary
- Philosophy
- get the data out there
- VO enable legacy data-sets
- MACHO
- WFI
- Deal with the issues
- Metadata and data standardisation
- Ontological/semantic standardisation
- Interfaces and interoperability
- Ready for new data streams as they come online
- WFI
- 4S VO compliant from day one