Test Suite results data management - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Test Suite results data management

Description:

AMGA. GRelC. OGSA-DAI. Language abstraction. Fast data transfer ... Different time / coordinates. Save redundant or introduce relations? DEGREE IST 2006- 034619 ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 25
Provided by: Peti2
Category:

less

Transcript and Presenter's Notes

Title: Test Suite results data management


1
Test Suite results data management
  • Horst Schwichtenberg, André GemündSCAI,
    Fraunhofer

2
Overview
  • Requirements/Gaps
  • (DEGREE Deliverable D2.1)
  • Testsuite
  • Feedback from developers
  • GOME Test by EGEE NA4 ES

3
Requirements/ Gaps
  • Support required for Search, locate, access and
    process ES datasets
  • meta data intensive applications in distributed
    environments
  • interfaces to heterogeneous and federated data
    resources
  • use of Webservice (WS- ) and OGF standards
    (eg. ByteIO)
  • the accessiblilty for all platforms (including
    .NET )
  • interfaces to ES data resources (OGC standard
    conform)
  • robust and fast replication of data in a
    complex workflow
  • Ontology technologies (e.g. Semantic Grid, see
    also OGF / Semantic OGSA Architecture)
  • Definition of ES domain specific Ontologies
    (addresses ES researches)
  • interoperability between query languages, data
    stores and data formats
  • e.g . SQL, SPARQL (W3C semantic web), OPeNDAP,
    LAS, DODS, NETCDF, HDF

4
Requirements/ Gaps
  • Comparison of grid middleware and tools for
    datamagement is given in the
  • DEGREE Deliverable D2.3
  • see http//eu-degree.org

5
Test suites available
6
Feedback
  • In case of Data Management aspects
  • A lot of middleware developers is interessted
  • But there is not so much feedback în short
    time
  • First results from EGEE-NA4 Earthscience
  • in case of interfaces to heterogeneous
    and federated data resources in case of GOME
    available
  • Results from grid5000 (France) will be available
    soon

7
GOME TEST (EGEE)
  • GOME-Validation Test Suite
  • High amount of datasets from two sources
  • GOME satellite measurements
  • LIDAR ground station measurements
  • Correlate by metadata
  • geo-coordinates date of measurement
  • Target components (as specified)
  • Data management
  • Database access
  • Workflow control

8
Test Suite Evaluation by EGEE Motivation
  • Testing the DEGREE test suite
  • Sufficiency of specification and utility
  • Investigate AMGA and GRelC as alternatives
  • OGSA-DAI was used since EGEE-I
  • OGSA-DAI supplies only a Java client library
  • SCAI used a java wrapper to access from Python
    and Perl
  • also in EGEE-NA4 lifescience
  • gLite integration (auth by VOMS )
  • Are better and completely different approaches ?
  • Check for features/functionality no benchmarking !

9
GOME TEST (EGEE)
  • Three different approaches

10
GOME TEST (EGEE)
  • What was done
  • Implement GOME-Validation as a representative
    workflow
  • Transmission and Grid registration of data files
  • Extraction and archiving of Metadata
  • Bidirectional correlation of files through
    Metadata
  • Abstraction of Metadata backend

11
Proceeding
  • Software Design

12
GOME TEST (EGEE)
  • APIs
  • AMGA C, Java, Python, Perl, PHP
  • GRelC C, C, Java (GRelCJProxy)
  • OGSA-DAI Java
  • Thus we used Java as common denominator.

13
GOME TEST (EGEE)
  • Problems / Characteristics
  • Backend Compatibility
  • Bulk Action support
  • Data schema and types
  • GIS features
  • Hierarchical metadata
  • Indexing (IDs)
  • Query language
  • Access to existing (Reuse of) Data sources

14
GOME TEST (EGEE)
  • Database Compatibility
  • AMGA
  • uses ODBC
  • MySQL, Oracle, pgSQL, etc.
  • Only few extensions supported
  • GRelC
  • Native C API libraries of DB manufacturer
  • PostgreSQL, MySQL, Oracle (SQLite ODBC backend
    in development)
  • Needs pgSQL as configuration backend

15
GOME TEST (EGEE)
  • Database Compatibility
  • OGSA-DAI
  • Unique strength
  • Uses JDBC, eXist and custom drivers
  • Write data providers for arbitrary data sources
  • Databases and files already included
  • Combine data from different sources
  • Execute Transformations on data
  • Deliver to Grid-FTP, Gridservice, Client,

16
GOME TEST (EGEE)
  • Data schema (AMGA)
  • custom schema
  • AMGA uses path structures (Posix like )
  • possible advantages
  • Entity-specific attributes (different attribs
    per file)
  • Dynamic change (on the fly change of schema)
  • Inheritance of attributes (hierarchical metadata)

17
GOME TEST (EGEE)
  • Using hierarchies in AMGA example
  • /gometest/lidar/ano/hgl/30108/
  • /ano/
  • Identifies station and thus also coordinates
  • Here Andoya, Norway
  • /hgl/
  • Author, here Georg Hansen
  • /30108/
  • Identifies file entity
  • Files in this directory
  • Real Datasets

18
GOME TEST (EGEE)
  • Location of measurement
  • e.g. point or polygon
  • Use of PostGIS datatypes?
  • AMGA can use int, float, varchar, timestamp,
    text, or numeric
  • But unknown fieldtypes of database get returned
    as text
  • OGSA-DAI GRelC let you choose
  • No datatype abstraction

19
GOME TEST (EGEE)
  • Query language
  • OGSA-DAI and GRelC use SQL
  • Highly coupled to table schema
  • Differences in SQL dialect (e.g. pgSQL lt-gt
    Oracle)
  • Support for SQL functions, Views, Extensions
  • Both add support for XQuery if working with
    XMLDBs
  • AMGA defines own query language
  • selectattr Tableattribute (like(Tableattribute,
    value)
  • Reusable queries, but limited expressability
  • Add. Functions need source change

20
GOME TEST (EGEE)
  • Bulk Actions
  • AMGA additionally supports socket connection
    instead of document based (SOAP)
  • Low latency
  • Multiple queries without delay
  • High transfer rates possible
  • OGSA-DAI workflows
  • Pipeline, Parallel grouping of activities
  • Powerful but cumbersome

21
GOME TEST (EGEE)
  • Integration with gLite and EGEE
  • Work in progress for all three
  • AMGA
  • VOMS auth, experiments as file catalogue
  • GRelC
  • Publishes to BDII DBs and their data schema etc.
  • VOMS auth and integrating with Broker
  • OGSA-DAI
  • VOMS auth in progress (OMII Europe)

22
GOME TEST (EGEE)
  • Data schema (OGSA-DAI GRelC)
  • Raw SQL tables
  • Taken directly from Test suite specification
  • 2 Tables
  • One for LIDAR and one for GOME files
  • Problem 1 Lidar files hosts n datasets
  • Different time / coordinates
  • Save redundant or introduce relations?

23
Conclusions DB ACCESS
  • What EGEE Earth Science VO would like to have
  • Integration of external data sources like
    OGSA-DAI
  • For custom data sources (e.g. GIS )
  • Integration to gLite
  • Integration with file catalogue
  • Browsable in both directions
  • Support for aliases and replicas
  • Assess best replica for current location
  • VOMS-based Authorization Authentication
  • Extendible for GIS-features and the like
  • APIs for Java, C, Python Perl

24
Conclusions DB ACCESS
  • Datatypes
  • GIS types
  • To use GIS functions in queries (like CONTAINS())
  • Relations
  • Correlation, containment, adjacency,
  • Custom relations (ontology-like)
  • isResultOf
  • isUsedInExperiment
  • Array types (vector / matrix data sets)
Write a Comment
User Comments (0)
About PowerShow.com