Title: The European Grid of Solar Observations
1The European Gridof Solar Observations
- Bob Bentley (UCL) the EGSO Team
- 20 November 2003, IAS
2Outline
- Overview of EGSO and its relationship with other
projects - The problem being addressed by EGSO
- EGSO Search capability
- Solar Event and Feature Catalogues
- Current status of EGSO
3EGSO European Grid of Solar Observations
- EGSO is a Grid test-bed related to a particular
application - Designed to improve access to solar data for the
solar physics and other communities - Addresses the generic problem of a distributed
heterogeneous data set and a scattered user
community - Funded under the Information Society Technologies
(IST) thematic priority of the ECs Fifth
Framework Program (FP5) - Started March 2002 duration of 36 months
- Involves 11 groups in Europe and the US, led by
UCL-MSSL - 4 in UK, 2 in France, 2 in Italy, 1 in
Switzerland, 2 in US - Several associate partners, mainly in the US
- EGSO is interacting with many other projects
- US-VSO CoSEC and EGSO working closely together
- Also working with the ILWS/SDO Project
- Collaborated with ESAs study project SpaceGRID
- Involved with other EC funded Grid projects
through GRIDSTART
4Objectives of EGSO
- Support user community scattered around the world
- Current and future projects are international
collaborations - EGSO funded by EC, but has US partners
- Build enhanced search capability for solar data
- Analysis of solar data is event driven
- Search capability linked to this not currently
available - Increasing data volumes, etc. require new
methodology - Provide access to solar data centres and
observatories around the world - Data available in Europe (or US) not enough for
many studies - Provide ability to process data at source
- Both pipeline and more complex processing
- Includes ability to upload code to some providers
- Need for these capabilities not unique to solar
physics
5The extended EGSO family
- Partners provide expertise in solar physics and
IT - UK
- UCL-MSSL, UCL-CS, RAL, Univ. Bradford,
Astrium - France
- IAS (Orsay), Observatoire de Paris-Meudon
- Italy
- Istituto Nazionale di Astrofisico, Politecnico
di Torino - INAf includes observatories of Turin, Florence,
Naples and Trieste - Switzerland
- Univ. Applied Sciences (Windisch)
- Netherlands
- ESA Solar Group
- US
- SDAC (NASA-GSFC), National Solar Observatory
(VSO) - Stanford University, Montana State University
(VSO) - Lockheed-Martin (CoSEC)
6Connections with related Projects
- Virtual Solar Observatory (US-VSO)
- SDAC (NASA-GSFC) and NSO are partners in EGSO
- EGSO Coordinator in Chair of VSO Steering
Committee - Differences in scale in objectives big/small
box -
- Sun-Earth Connector (CoSEC)
- EGSO Coordinator is a CoI in CoSEC continuation
grant - Significant synergies between EGSO and CoSEC
- The three project are trying to collaborate
closely - Joint sessions at AAS and AGU regular telecons.
- Joint Technical Meeting in conjunction with AGU
(Dec.) - European attendance supported by 8K US from
NASA-HQ - VSPO and VHO also attending
- All involved in new WG of Div. II (Sun and
Heliosphere) of IAU on International Data
Access (chair EGSO Coordinator) - EGSO leading discussions on interoperability with
the IVOA
7Potential expansion
- EGSO architecture is designed to be flexible, and
the system should be able to handle potential
areas of expansion. On the horizon are - Living with a Star (ILWS)
- Solar Dynamic Observatory (SDO) will produce
2TB/day - EGSO could enable immediate access to the data
- And optimize creation of duplicate copy in
Europe - Modelling and Simulations
- Should be possible if we can support uploading of
code - Space Weather
- Immediacy of access to data few minutes to few
hours - Access to other types of data STP,
magnetospheric, etc. - Use of models with parameters derived from data
as input
8Generic Solar Physics Query
- Identify suitable observations (many
serendipitous) - As many different data sets as are available
- Should be possible without accessing the data
- Locate the data
- Data scattered, with differing means of access
(some proprietary) - Often only need a subset of each data set
- Process the data
- Involves extraction and calibration of a subset
of raw data - Uses code defined by instrument teams (SolarSoft,
C) - Return results to the User
- Compare results from different instruments
- SolarSoft (IDL) provides a standard platform for
analysis - Note the exchange in order of the 3rd and 4th
bullets in this Grid expression of the problem,
as compared to current practice
9Nature of solar observations
- For a complete picture of what is happening, we
need to use as wide a range of observations as
possible - The appearance of the Sun changes dramatically
with wavelength - Different layers of the solar atmosphere and
material at different temperatures are best seen
at different wavelengths - For technical and practical reasons
- UV, EUV, X-rays and ?-rays observed from space
- Radio and optical wavelengths observed from the
ground - Issues related to coverage by each observatory
- Differences in approach to handling data have
developed - The observations used to build up a picture of
the plasma in multi-dimensional parameter space
(incl. x, y, z, t, T ?) - How plasma contained in 3d structures evolves
with time - Where and how energy released and how it affects
the system - Etc
10Some generic issues
- We need to build on the existing situation
-
- User community scattered around the world
- Capabilities of users their computing
facilities vary greatly - Users want to know if data addressing a problem
exists - Not really interested in where the data are
located - Or, how the data are accessed, processed, etc.
- Increasing desire for combined studies with other
regimes - Astrophysics, Climate Physics, Space Weather,
etc. - Data centres and observatories located around the
world - Large and small data providers (with varying
resources) - Need to make it as easy as possible to add new
data sets - Planned data volumes much larger than for current
instruments - Cataloguing differs in quality, contents, and
dependencies - Must handle multiple copies of data and
proprietary data - Must ensure integrity of data providers
- Authentication an issue that needs serious
consideration - Need to minimize how it affects the user, etc.
11Handling the data
- EGSO will dramatically enhance access to solar
data - Data could be located anywhere in the world
- User only needs to know observations exist, not
where located - System able to optimize use of sources (closest,
least used, etc) and handle of replicated data
and aggregated sources - Burden on provider minimized to encourage
participation - As far as possible, process the data at source
- Involves extraction and calibration of a subset
of the raw data - Software for processing defined by instrument
team (IDL, C) - Processing reduces volumes of data moved around
- Simplifies requirements on users own system
- Standard (pipe-line) processing adequate for many
users - More complex problems require ability to upload
code - Used in analysis of extended data sets
(helioseismology, etc) - System allocates resources Security an issue
- Models and simulations have similar requirements
12The EGSO Search Engine
- In order to provide an enhanced search
capability, EGSO will improve the quality and
availability of metadata - Enhanced cataloguing describes the data more
fully - Standardized metadata versions of observing
catalogues tie together the heterogeneous data
sets - New types of catalogue allow searches on events,
features and phenomena rather than just date
time, pointing, etc - Ancillary data used to provide additional search
criteria - Images, time series, derived products, etc.
- Search Registry describes all metadata available
for search -
- It will be possible to access to EGSO through
- A flexible Graphic User Interface (GUI ) normal
route - An Application Program Interface (API) this
provides access for users from other
applications, communities or Grids
13(No Transcript)
14One User Interface implementation will not
satisfy all user requirements and users will be
able to tailor the interface to their needs
15The enhanced solar catalogues
- Unified Observing Catalogues (UOC)
- Metadata form of observing catalogues used to tie
together the heterogeneous data, leaving the data
unchanged - Self describing (e.g. XML), quantised by time and
instrument, with no dependencies on ancillary
data or proprietary software and any errors
corrected - Standards defined for future data sets (e.g.
STEREO, ILWS, Solar-B) -
- Solar Event Catalogues (SEC)
- Built from information contained in published
lists - Flare lists, CME lists, lists in SGD, etc.
- Solar Feature Catalogue (SFC)
- Lists of the occurrence of events, phenomena and
features provides an alternate means of selecting
data - Derived using image recognition software
developed in WP5 - Similar hierarchical cataloguing required in
other data Grid projects
16The enhanced solar catalogues
- Unified Observing Catalogues (UOC)
- Metadata form of observing catalogues used to tie
together the heterogeneous data, leaving the data
unchanged - Self describing (e.g. XML), quantised by time and
instrument, with no dependencies on ancillary
data or proprietary software and any errors
corrected - Standards defined for future data sets (e.g.
STEREO, ILWS, Solar-B) -
- Solar Event Catalogues (SEC)
- Built from information contained in published
lists - Flare lists, CME lists, lists in SGD, etc.
- Solar Feature Catalogue (SFC)
- Lists of the occurrence of events, phenomena and
features provides an alternate means of selecting
data - Derived using image recognition software
- Similar hierarchical cataloguing required in
other data Grid projects
Objective of the improved metadata is to pose
questions like Identify events when a filament
eruption occurred within 30 of the north-west
limb and there were good observations in H?, EUV
and soft X-rays
17Prototypes of some Services
- Solar Event Catalogue (SEC) Server
- Server specializing in event catalogues
- EGSO interested in things that could be added
- Interface into EGSO as a Web Service
- Currently being tested with
- Flare lists from NOAA and REHESSI
- NOAA Proton Event list
- CME lists from SOHO/LASCO
- Can be seen through URL http//radiosun.ts.astro.
it/sec/sec.php - Planning to add
- NOAA Active Region (NAR) database
- Various indices Sunspot Number (SSN), 10.7cm
flux, Kp - Database for Solar Observatories (DSO)
- Registry needed within EGSO to provide more
detailed information on (all) possible data
sources - Currently being populated and will shortly be
accessible through the Web
18Feature Recognition in EGSO
- The enhanced search capability of EGSO requires
development of new types of metadata the Solar
Feature Catalog (SFC) is a major part of this - Key to developing alternate routes into the data
- EGSO has a work package (WP5) dedicated to
developing tools needed to detect common solar
features and then employing them to derive the
feature catalog - Major undertaking! Where possible we need help
from others in the community to - Help verify the results and extend capability to
as wide a range of features as possible - Help refine ideas of how the results can be used
19Outline of progress on the SFC
- Software developed to prepare images for feature
recognition codes - Removal of artifacts, regularize shape, etc.
- Now working on codes to detect the features
- Codes for sunspots, active regions and filaments
developed and under test - Codes to recognize coronal holes and magnetic
neutral lines under investigation - Document discussing techniques available shortly
- Trying to define standard way of describing
features for the feature catalogue - Preliminary version of SFC has been prepared
- Document on format available for discussion
- Now starting experimenting with the results to
determine if objectives can be realized with the
stored information
20Image Preprocessing Toolkit
First part is code to clean images prior to
further processing
- Difficulties with Images
- Image shape (ellipse), centre and pole
coordinates - Weather transparency (clouds) and different
thickness of atmosphere - Centre-to-limb darkening
- Defects in data (strips, lines intensity)
- Errors in FITS header information
21Sunspots detection in white light
Original image on the left and detected sunspots
on the right
22Filament Detection in H?
Original image on the left and detected filaments
on the right
23Active Regions detection in H?
Detected active regions on the right with
corresponding result from Big Bear Solar
Observatory on the left
24AR detection in Ca II, H? and EUV images
25Use of the Solar Feature Catalog
- The SFC can be used in at least three ways
- Outline features recognized in one wavelength on
an image taken in another (at a different time) - Determine when events related to features have
occurred e.g. filament eruptions, flux
emergence - Track relative motion of features e.g. sunspots
- The SFC will be deployed as a Server addressed
through Web Services - Not clear whether the SFC Server will be combined
with the already deployed SEC Server - Server will be accessible to other VO projects
- Feature Recognition software will be released
under the EGSOs Open Source software policy - Requirement of EC on IST Projects
26Current Status of EGSO
- Extensive survey of requirements in 2002
- Working architecture defined and detailed during
the first half of 2003 - Release 1 of EGSO was demonstrated at IST2003 in
Milan (October 2-4) - Demonstration of how the three roles work
together - Solve simple query based on time and wavelength
- Access to data resources initially through
SolarWeb - Working prototype of the Solar Event Catalog
Server - New version of EGSO Data Model document was
released recently - Describes both solar and heliospheric data
27Activities in the near future
- First components of Feature Recognition software
and documentation available shortly - Image Cleaning software is first part of toolkit
- Release 2 of EGSO due at the end of November
- Development of interface to Data Providers
- More complex query supported through SEC Server
- Greater GUI capabilities
- Release 3 of EGSO due mid-February 2004
- Building profiles of data (etc.) providers
- Discussing file formats and metadata with
producers - Trying to finalize designs of UOC and Search
Registry, and description of providers in
Resource Registry - Format of synoptic maps discussed with NOAA and
NSO - Discussing concerns and interfaces with data
sources
28Conclusions
- Of necessity the solar community needs to move
towards a virtual environment to access solar and
related data - EGSO is a Data/Computing GRID that will greatly
enhance access to solar data and provide advanced
search capabilities - EGSO represents the European contribution to the
global virtual solar observatory and has
established close links with counterparts in the
US. Discussion in progress will ensure full
interoperability with the heliospheric community - For more information on EGSO see
- http//www.egso.org
- Or e-mail
- bentley_at_egso.org
29(No Transcript)
30Building the Solar Feature Catalog
- For each feature we must first
- Fully test the feature recognition code using
images from a wide time period and several
sources - Finalize the format of how the information is
described in the catalog - Then run code on a representative set of images
- Summary synoptic data gathered by SOHO one
example of type, but not necessarily coherent - Ideally need image cadence that allows us to have
a reasonable idea of when things change - Probably requires use of images from several GBOs
- Raises issues of consistency related to image
quality, etc
31Architecture Viewed as 3 Tiers
32Possible requirements for Data Providers
- Need to register each dataset. Required info.
could include - Catalogue Information All observations should
be summarised in observing catalogues (prime
source only) - Data Map This defines in broad outline what time
intervals of the data are actually held. It
should be considered dynamic. - Type of storage On-line, near on-line or
off-line. - Means to retrieve data The exact meaning of this
information depends on whether the provider is
active or passive. - Active source address where process data can be
retrieved from and the means of retrieval (ftp,
http, etc.). - Passive source map of the physical location of
data within the provider system and the means of
retrieval. - Resource limits The resource usage beyond this a
provider switches from active to passive mode
needs to be defined. - Details of access restrictions If any part of a
data set is proprietary (for some period) or
otherwise restricted. - Frequency of updates So that system does not
have to constantly monitor all data sources
33Access to Resources
- EGSO is a Grid and activities depend on access to
resources - Resources described by entries in a Resource
Registry and managed by a Broker. Types include - Metadata from prime data providers
- Data from data centres, observatories, etc
- Processing simple, multi-instance processors,
HPC(?) - Storage cache space, on-line mass storage,
etc. - Services support of complex (meta)data
products - Note Some providers can support multiple
capabilities - The Broker allocates resources and controls
- How much being requested of a particular provider
- Processing of data staging of results
- Processing may be at different site to data
provider - Broker Registries replicated to provide system
resilience and permit load sharing
34Ancillary Data
- This is catch-all term for all non-catalogue
items - Items used to set the context for a search
- Images, time series, derived parameters, etc.
- Processed products from data-intensive
instruments - STP, etc. data could be incorporated in this way
- Some data items will have to be derived
on-the-fly - Not possible for everything to have been prepared
already - Servers will provide this type of complex data
products - Products include derived parameters, specialized
actions, etc. e.g. GOES temperature, matched
areas of images, etc. - SolarSoft packages, e.g. Chianti, could be
brought up like this
Objective of the improved metadata is to pose
questions like Identify events when a filament
eruption occurred within 30 of the north-west
limb and there were good observations in H?, EUV
and soft X-rays
35EGSO Services Products
- EGSO will establish a number of services which
can also be used by other communities and Grids - API access to the EGSO Search Engine
- Supports complex queries of all metadata as a
service - Solar Event Catalogue servers
- Sites that specialize in providing access to
Solar Event data - Servers of complex data products
- Servers able to process requested items on the
fly - Products include derived parameters, specialized
actions, etc. e.g. GOES temperature, matched
areas of images, etc. - SolarSoft packages, e.g. Chianti, could be
brought up like this - A lot of scope for processing a number of things
in this way - Extracted processed data products
- Requested data can be provided in a number of
formats - Single file type/format will not satisfy all
requirements
36Definition of Requirements
- Solicitation of Requirements
- User Survey - in collaboration with SpaceGRID, in
March 2002, with over 100 responses - Use Cases - covering multiple usage modes and
scientific goals - User Consultation - discussions at meetings and
other discussions - Brainstorming - Systems Concepts Document,
cross-fertilization from other GRID and Virtual
Observatory projects - Total of 220 requirements defined
- Requirements refined to address system
capabilities, not implementation - Specified in formal manner applicable in system
design, but annotated to allow for meaning to be
readily understood by stakeholders - Currently soliciting feedback and finalizing
priority of each requirement
37Architecture Roles and Relations
- Architecture defined in terms of roles
- Provider, Broker and Consumer
38http//radiosun.ts.astro.it/sec/sec.php