Title: GRID COMPUTING FOR NEW EARTH SCIENCE PARADIGMS
1WP2 Data Management
Horst Schwichtenberg
2WP2 Data Management
- Contents
- Overview Tasks
- Including results and deliverable of D2.1/2.2
- Test Suite - Data Management Example
- Partner contribution by partners
- SCAI, IISAS, KNMI, GCRAS,CNRS, CGG
3WP2 Data management TASKS 2.1/2.2
- Analysis of Existing data technologies and data
usage policies in ES - What is typical for data provision and data
flow in complex ES scenarios - What are the typical ES data policies
- How are the data information systems/repositorie
s organized - Deliverable (PM12) of Survey ready
- Begin PM 1 End PM 6
- Milestones PM 6 and PM 12
- Effort by partner
-
4WP2 Data Management TASK 2.1/2.2
- Requirements on Data management and Policies
- Questionnaire to describe the data management of
a given applications - Data organisation
- Data policy
- Data access
- Data information systems
- Data flow before and during the computation
- Metadata
- 21 different scenarios were analyzed
- and classified (simple, complex, complex
workflows (WP1) - on the grid, partly on a grid, not yet gridified
- Some of them are based mainly on web services for
data dissemination - Some of them are using Grid Infrastructures
5WP2 Deliverable TASK 2.1/2.2
- ES has
- Global, regional, local applications
- Alternative use of the data at different time
and spatial resolution - Large historical distributed archives
- Long term data archives have to be exploited
- Near real-time access to data
- For processing, value adding and dissemination
- For now-casting and alert
- Models to provide long term trends and forecast
- Processing-intensive, data-intensive and complex
applications - Integrate different data sources
- Data fusion, data assimilation, data mining,
modelling - Standardisation, Virtual Organisation,
- Link data to information system and knowledge
6WP2 Deliverable TASK 2.1/2.2
- Data format
- As many standard formats as instruments and/or
user communities - Auto-descriptive format (NetCDF, HDF..) or not
- ASCII or Binary compressed or not
- Meteorological format (GRIB, BUFR)
- Data files
- Flat files
- Organisation simple to complex architecture
depending on - Size and number of files created
- end-users
- Metadata linked to catalogue, especially for
shared data - Database
- Few for data because depending on
- the size of data if relatively small
- Organisation, Data provider
- Mainly for Metadata
7WP2 Deliverable TASK 2.1/2.2
- A data policy always exists and concerns
- Use of data
- Publication of the results (co-author,
acknowledgement, reference) - Large variety of data policy
- User and data Use Academic, Industrial,
Commercial, accepted proposal - Data source
- Confidential or sensitive
- restricted to authorized users even for bought
data - Free on a web site
- Organisation delivering the data
- Access may be restricted for a limited time
(thematic campaigns) - Absolute Need to access restriction group and
even person
8WP2 Deliverable TASK 2.1/2.2
- Metadata for discovery of data and information
- Resource Metadata (computing and storage
resources, Lfns, ) - Discovery Metadata (data objects for scientific
description eg. ISO 19115, 19139, 19119, Dublin
Core, ) - Use Metadata describes data objects and files
needed for access on data - Metadata is central for ES
- ? middleware has to support restricted access
- Data discovery by ES portals (Geon)
- Semantic and ontology techniques used for search,
discovery and accessing widely dispersed,
heterogeneous data sources
9WP2 Deliverable TASK 2.1/2.2
- General Requirements to middleware stacks and
SOA - Interfaces/Layers to access to heterogeneous
federated RDBMS - Access to data from different locations in a
grid and from locations outside of a grid
infrastructure - Webservice (WS- Standards) based interfaces
(esp. with Open GIS services) - Fast transfer of large files and a large number
of different files - For complex workflows, robust and fast
replication data is indispensable - Data access and management also for Microsofts
.net - Support of Metadata intensive applications in
distributed environments - User/role based access control to Metadata and
data - Ontologie technologies should be available for
the resources and specific ES domains
10WP2 ESR Data Management Requirements for Grid
middleware stacks T2.3
- TASK 2.3 Comparison of existing Grid Services
and ES Requirements for Data management - Find missing pieces in Grid Infrastructures
(EGEE) and middleware stacks based on the WP1
and WP2 Requirements for ES applications - Recommendations to ES for new developments and
porting of applications to grid environments - For example
- Access to existing ES databases outside of Grid
infrastructures like EGEE by DB interfaces like
AMGA or OGSA-DAI - Integration of Webservice based standards like
OGC/GIS with existing classical Middleware stacks
like gLite or Unicore
11WP2 ESR Data Management
Requirements for Grid middleware stacks T2.3
-
- TASK 2.3
- Begin PM 4 End PM 21
- Milestones (M2.4) PM 21
- Deliverable (D2.3) PM 21
- Effort by partner
Next Milestone will be in PM23
DEGREE IST 2005- 034619
Internal Review at CRS4 12 June 2007
11
12WP2 ESR Data Management
Testsuite Task 2.4
- Work will be done in close cooperation with WP1
- Begin PM x End PM 21
- Milestones (Mx) PM 21
- Deliverable (Dx) PM 21
- Effort by partner
WP2 will contribute data management relevant
applications to the Testsuite
DEGREE IST 2005- 034619
Internal Review at CRS4 12 June 2007
12
13WP2 ESR Data Management Test suite Task 2.4
-
- Task will provide a typical ES applications with
emphasis on data management - First Application provided GOME
-
-
Example Validation of GOME/ERS experiment with
Lidar data Two different instruments
Ground-based Lidar, spectrometer aboard the
satellite, ERS. The satelitte data stored by
orbit or pixel different algorithms The Lidar
data stored in monthly files with one
profile/night
14WP2 ESR Data Management Test suite Task 2.4
Part of Opera/NNO meta data scheme
Column Type -----------------------
-----------------------------------------------
dataset character varying(50) level
character varying(5) version character
varying(4) orbit integer file_name
character varying(50) start_date timestamp
without time zone stop_date timestamp without
time zone lat numeric(8,2) lon
numeric(8,2) proc_center character
varying(50) proc_date timestamp without time
zone file_input character varying(50)
proc_description character varying(50)
footprint geometry (Multipolygon)
15WP2 ESR Data Management Test suite Task 2.4
ES Requirement for middleware developers
- secure and restricted access to (external)
Meta data in an grid environment - preferable
interfaces provides industrie standards
(part of ES are industry) - the RDBMS need to
support spatial data types (OpenGIS conform)
16WP2 ESR Data Management Partner contribution
- Partner SCAI TASK 2.1/2.2
- Work done
- Preparation of data management questionnaire
- Collecting examples
- Contribution to D2.1/2.2
- Effort 0.78 (official)
-
-
17WP2 ESR Data Management
Partner contribution
- Partner SCAI TASK 2.3
- Work today
- Requirement Access to external distributed
RDBMS from the GRID - Layers to be considered OGSA-DAI, AMGA,
Spitfire, - (Opaque layers hides differences of underlying DB
systems) - Interoperability of the interfaces with grid
services of middleware stacks - Capabilities and missing features of grid
services and interfaces - First results (exp) OGSA-DAI (quasi standard)
not integrated to gLite - AMGA integrated tool of gLite, but very specific
- Effort until today 2,34 PM
-
-
18WP2 ESR Data Management Partner Contribution
- Partner SCAI Planned Work
- Task 2.3
- Availability of grid services/tools to implement
data policies - Continue with further requirements from WP1 and
WP2 D2.2/2.1 - Preparation and coordination of D2.3
- Effort until PM 21 2.34 PM
- Task 2.4
- Contribution to Test Suite description of GOME
- Contribution to Test Suite 2nd example
- Effort until end of Project xx
19WP2 ESR Data Management
Partner Contribution
- Partner SCAI Planed Work
- Task 2.3
- Availability of grid services/tools to implement
data policies - Preparation and coordination of D2.3
- Effort until PM 23 xx
- Task 2.4
- Contribution to Test Suite description of GOME
- Contribution to Test Suite 2nd example
- Effort until end of Project xx
20WP2 ESR Data Management
Partner Contribution
- Partner UISAV TASK WP2.1/2.2
- Work done
- WP2 application questionnaires analysis
- WP1 application questionnaires analysis
- Contribution to D2.1/2.2
- Data provision
- Integration of relevant sections from WP1
questionnaires to D2.1/2.2 - Effort 1.29 (official)?
-
-
DEGREE IST 2005- 034619
Internal Review at CRS4 12 June 2007
20
21WP2 ESR Data Management Partner contribution
- Partner UISAV TASK 2.3
- Work today
- Analysis of catalogue services for application
and grid infrastructure specific metadata
catalogues - Metadata catalogue types needed application
specific metadata, compute and storage resources
metadata, discovery metadata, VO and security
metadata - Analyzed software
- Standard grid metadata catalogues (MDS/WS-MDS,
RLS, MCS, AMGA)? - RDF and ontology-capable (semantic) catalogues
(RDFPeers, SDR, DSWS-R, TUPELO, Edutella)? - Other catalogues (Graffiti, DIMES)?
- Observed properties
- Content language, security, maturity, query
language, distribution/integration of content,
standards conformance -
DEGREE IST 2005- 034619
Internal Review at CRS4 12 June 2007
21
22WP2 ESR Data Management Partner contribution
- Partner UISAV TASK WP2.3
- Planned work
- Analysis of catalogue services for application
and grid infrastructure specific metadata
catalogues - Extend the set of analyzed catalogue services and
observed properties - Prepare categorization of analyzed services
- Document analysis (reports, deliverable)?
- TASK WP2.4
- Work today
- Draft of flood prediction application testsuite
description - Planned work
- Further elaboration of data management specific
issues in Flood application testsuite - contribution to Testsuite 2nd example
DEGREE IST 2005- 034619
Internal Review at CRS4 12 June 2007
22
23WP2 ESR Data Management Partner contribution
- Partner GCRAS
- TASK WP2.1/2.2
-
- Work done
- Evaluation of ES grid data fusion applications
(SPIDR, ESSE, CLASS) - GIS applications review and OpenGIS standards
summary - Contribution to D2.1/2.2
- Effort 1 PM
-
-
24WP2 ESR Data Management Partner Contribution
- Partner GCRAS
- TASK WP2.3
- Work today
- Analysis of interoperability on the query
language and data model levels between OGC
(WCS), OGSA-DAI (SQL) and NetCDF - OPenDAP - Metadata standards catalog, inventory level and
ordering extensions - Data access analysis of scientific array based
data models and relational structure models SQL,
XML/Xquery, OpenDAP, ESSE - Effort until today 1,5
-
-
25WP2 ESR Data Management Partner Contribution
- Partner
GCRAS - Planned Work
- Task 2.3
- Data visualization tools (connection to Grid
environments) - Metadata search engines
- ES grid services for data export, processing and
mining - Task 2.4
- contribution toTestsuite discription of GEONGrid
- contribution to Testsuite 2nd example
- Effort until end of Project 0,9
26WP2 ESR Data Management Partner contribution
- Partner CNRS
- TASK WP2.1/2.2
-
- Work done
- Evaluation of data policies and security
- Contribution to D2.1/2.2
- Effort x PM
- TASK WP2.4
- Providing Examples for Testsuite (1st Gome)
- Effort x PM
-
27WP2 ESR Data Management Partner contribution
- Partner KNMI
- TASK WP2.1/2.2
-
- Work done
- Evaluation of data technologies in Weather
forecast - Contribution to D2.1/2.2
- Effort x PM
- TASK WP2.4
- Providing Examples for Testsuite
- Effort x PM
-
28WP2 ESR Data Management Partner contribution
- Partner CGG
- TASK WP2.1/2.2
- Work done
- Evaluation of data technologies Geophysics esp.
in large enterprises - Contribution to D2.1/2.2
- Effort x PM
- TASK WP2.3
- in preparation
- Effort x PM
- TASK WP2.4
- Providing Examples for Testsuite
- Effort x PM
-
29WP2 ESR Data Management Dependencies to other
Workpackages
WP1 Requirements
WP2 Data management
- WP3
- Jobmanagement
- Co-scheduling of data
- Workflows
30WP2 ESR Data Management Requirements for Grid
middleware stacks
Road map
for 2.3 Jun Aug
Okt
Jan 13 15
17 21
Checkpoint Internal report esp. on missing
pieces with EGEE Sync with WP34
Checkpoint WP1. /WP2. requirements considered
D2.3 ready
31WP2 ESR Data Management Requirements for Grid
middleware stacks
- Future
- universal or domain specific WS-
platform to access data in different Grid
infrastructures (EGEE, Nordu-Grid) - Open GIS Services to Glite, Unicore, .