Title: UK E-Science Initiative and its Application to SDO
1UK E-Science Initiativeand itsApplication to SDO
2SUMMARY
- The UK Astrogrid
- Dealing with SDO Data Volumes
- The PPARC E-Science AO
- HMI Data Products and Pipeline
3What is the Grid?
- Ian Foster, Argonne National Lab University of
Chicago - A Grid is a system that
- Coordinates resources that are not subject to
centralized control. - Uses standard, open, general-purpose protocols
and interfaces. - Delivers nontrivial qualities of service.
- - Ian Foster, What is the Grid? A Three Point
Checklist
Network
PC
Laptop
Mainframe
Phone / PDA
Space Missions
GRID
Printer
4UK Astrogrid
- Astrogrid is one of three major world-wide
projects (along with European AVO and US-VO
projects) which aim to create an astronomical
Virtual Observatory - Astrogrid has a significant Solar Physics
component - The Virtual Observatory will be a set of
co-operating and interoperable software systems
that - allow users to interrogate multiple data centres
in a seamless and transparent way - provide powerful new analysis and visualisation
tools - give data centres a standard framework for
publishing and delivering services using their
data.
5How does Astrogrid work?
Web Service A web service is any piece of
software that makes itself available over the
Internet and uses a standardized XML messaging
system. - Ethan Cerami, Top Ten FAQs for Web
Services, The OReilly Network
Web Service
Data Archive
User
Web Interface
RESOURCES
Web Service
Web Service
Data Storage
Web Service
Distributed Network of Registries
Data Transformation Processing
6Astrogrid Registry
Registry Dynamic database of metadata
describing a set of Internet-available resources.
A registry is used to identify and locate
resources satisfying user-specified criteria, and
to direct more detailed information requests to
the relevant services. Robert Hanisch, STSCI
- METADATA
- Basic ID, title, service type
- Curation Location, contact, publisher, creator,
etc. - Metadata Allowed methods, input / output
variables, etc. - Metadata Format Wavelength, coordinates,
instrument coverage
Registries contain information about resources
7 Solar Interior to Outer Atmosphere
- Science goal Connect observations of the
interior to fluctuations in the solar atmosphere - Data Required Helioseismology observations
connected with solar atmosphere observations - Current difficulties Being able to search
efficiently for solar atmospheric events that may
be responding to an excitation source in the
interior - Grid future Ability to
- Search easily for events e.g. flux emergence, AR
evolution, flares, coronal mass ejections, over
specific time periods - Extract parameters over the cycle from the
atmosphere and interior in order to compare their
evolution - Crucial for SDO to relate convection zone
observations to magnetic field data for
Photosphere and above
8SDO HMI Archiving and Processing
- SDO instruments generate raw data ( 2 Tbyte/day)
along with derived products - Derived products result from pipeline processing
that must keep up with the flow of incoming data - GRID or Virtual Observatory approach could allow
- Distributed data holding
- Distributed processing capability
- Network bandwidths and processing power at single
sites set limits - Available network bandwidths for users could
limit data transfer from/between multiple
archives - All data at one site implies considerable
processing power accessible by many distributed
users
9Distributed Archive Approach
- Multiple copies of the data desirable
- Needs a minimum of two geographically separated
sources with the advantages - Greater resilience in ability to supply users
- Load sharing between different providers
(network and processing) - Avoids need for single site to provide excessive
processing power
10Single Archive Approach
- Solar data normally stored in a raw form and need
to be processed before use - Processing involves extraction and calibration of
selected observations. - For data (e.g. helioseismology data) involving
extended time intervals, processing data at
source is desirable - Advantages that result
- Reduced amount of information to be returned to
user - Affords the instrument teams more control over
the processing and quality of their data products - but
- Heavy loading of processors at single archive
site unless requests are for high-level
lower-volume data products
11Network Issues
- UK has SuperJanet backbone currently at 10 Gbps
- Local access points operate at 2.5 Gbps (e.g. UCL
interconnect rate to backbone) - Europe has Geant backbone at 10 Gbps covering
UK, France, Germany,Sweden, Switzerland with 2.5
Gbps local interconnects - Transatlanic connection to Geant currently 2.5
Gbps with upgrade to 10 Gbps planned for 2004 - Discussion of Global 1 Tbps network by 2006??
- Geant driven in part by needs of HEP community
for LHC hence SDO may not have a problem in
moving data between sites
12PPARC E-Science AO
- Proposals due by 31st May, 2003
- Existence of first level Astrogrid infrastructure
assumed - Proposals should
- Be for the application of infrastructure and
related techniques to real data sets - Underpin science but close connection between
projects and the science programme is essential - Demonstrate an enabling role for eventual science
exploitation - Ensure development of standards and deployment of
Grid infrastructure - SDO bid is now anticipated by PPARC
13HMI Data Analysis Pipeline
Net Access/ Mirror
Enabling Code/ Algorithms
Data Product
HMI SRR/SCR Presentation April 8-10
14HMI Science Data Analysis Plan
Science Exploitation
HMI SRR/SCR Presentation April 8-10
15HMI Data Volumes
Net Access
HMI SRR/SCR Presentation April 8-10
16 17What is Astrogrid?
Astrogrid is a 5 M data grid project that will
link data archives, resources, and disciplines
from UK space institutions into a virtual
observatory.
- Data Archives
- Mullard Space Science Laboratory
- Rutherford Appleton Laboratory
- University of Cambridge
- University of Leicester
- Royal Observatory Edinburgh
- Queens University Belfast
- Jodrell Bank Observatory
18GRID/Virtual Observatory
- Within a virtual observatory
- Not required for all datasets to be stored at a
single site - Metadata and registries allow system to handle a
distributed archive. - Different organisations or countries could host
the different datasets or different parts of the
datasets (e.g. split by time). - Complete catalogues relating to particular
datasets should be held wherever the data are
held. - Distributed data holding reduces the pressure on
- Network connection to an archive
- Processing capabilities needed at the archive
site - Most accessed data could be selectively copied to
distributed archives e.g. EGSO, Astrogrid - Derived data products should be held at
distributed sites - Material needed for more detailed searches should
be described by metadata in appropriate
registries.
19Example Solar / Stellar Flares
Science Problem A solar physicist studying the
flare mechanism would like to gather data on both
solar and stellar flares. Data Required X-ray
datasets lightcurves, spectra, and redshift /
blueshift information from SOHO, Yokhoh, EXOSAT,
ROSAT, XMM, Chandra, etc. Current Issues No
stellar flare catalogue (at time of science
problem writing), datasets provided by several
different archives with no common interface.
NEW Stellar Flare Catalogue
Merged Solar Flare List
User
Web Interface
20HMI Data Archive
21HMI Data Flow
22HMI Dataflow Concept
HMI SRR/SCR Presentation April 8-10
23HMI Standard Data Products
24UK Astrogrid Scientific Aims
- Improve the quality, efficiency, ease, speed, and
cost-effectiveness of on-line astronomical
research - Make comparison and integration of data from
diverse sources seamless and transparent - Remove data analysis barriers to
interdisciplinary research - Make science involving manipulation of large
datasets as easy and as powerful as possible.
25UK Astrogrid Practical Goals
- Develop, with our IVOA partners (including
European Grid of Solar Observations/EGSO),
internationally agreed standards for data,
metadata, data exchange and provenance - Develop a software infrastructure for data
services - Establish a physical grid of resources shared by
AstroGrid and key data centres - Construct and maintain an AstroGrid Service and
Resource Registry - Implement a working Virtual Observatory system
based around key UK databases and of real
scientific use to astronomers - Provide a user interface to that VO system
- Provide, either by construction or by adaptation,
a set of science user tools to work with that VO
system - Establish a leading position for the UK in VO
work