The BioDA Project - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

The BioDA Project

Description:

deliverFromURL(xsl) OGSA-DAI. Client. BDWQueryActivity ... 8. XSL transform to BDW. format. 9. To WF unit. 1. BGI. InvokeOperation() Key Issues encountered ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 19
Provided by: shirleyc7
Category:
Tags: bioda | project | xsl

less

Transcript and Presenter's Notes

Title: The BioDA Project


1
Data Integration in Bioinformatics Using OGSA-DAI
  • The BioDA Project
  • Shirley Crompton, Brian Matthews (CCLRC)
  • Alex Gray, Andrew Jones,
  • Richard White (Cardiff University)

2
Overview
  • Bioinformatics Data Access and Integration
    Requirements
  • Generic
  • BioDA Workshop and Questionnaire
  • BDWorld-specific
  • OGSA-DAI exemplar

3
The BioDA Project
  • Independent Evaluation of OGSA-DAI
  • the suitability of that software in its present
    form
  • how to leverage OGSA-DAI in bioinformatics GRID
  • OGSA-DAI Product Improvement
  • Feedbacks to the DAIT Team
  • Knowledge Dissemination
  • Evaluation Report
  • Publications/Presentations
  • Workshop on OGSA-DAI for the bioinformatics
    eResearch community

4
Bioinformatics
  • The Application and development of computing of
    mathematics to the management, analysis an
    understanding of data to solve biological
    question.
  • Attwood, TK and Parry-Smith, DJ 1999

Data Management
Data Analysis
5
Grid Computing
  • ... flexible, secure, coordinated resource
    sharing among dynamic collections of individuals,
    institutions and resources
  • Foster, Kesselman and Tuecke, 2001

6
1st BioDA Workshop(NeSC, 8 Dec 2004)
  • Workshop Objectives
  • examine bioinformatics communitys needs for data
    access and integration (DAI) on the grid, and
  • to explore the application of OGSA-DAI, a
    middleware developed expressly to address DAI
    requirements of eScience projects

7
The BioDA Survey
8
The Results
  • 17 key requirements, top of the list include
  • schema integration
  • schema mapping
  • mixed language query
  • complex join across databases
  • provenance data
  • flexible resource discovery
  • RDF database access

9
The BioDA Exemplar
  • The BioDiversity World
  • To create a GRID-based problem solving
    environment. 
  • Enable collaborative exploration and analysis of
    global biodiversity patterns using workflow and
    rich data sources from around the world
  • Example applications would be modeling species
    distributions against climate change,
    conservation prioritization and linking
    evolutionary changes to past climates. 

10
BDWorld(Source BDWolrd)
11
BDWorld Data Resources Key Issues
  • geographically distributed and autonomous
  • heterogeneous in structure and data standards
  • mainly read via HTTP/XML protocols using custom
    wrappers
  • SQL queries are limited to the EBI EMBL store and
    BDWorld cache databases
  • potentially resource-intensive to harvest
  • a single taxa name may resolve into a large
    number of accepted taxon names
  • same query repeated on different data collections

12
Resource Wrapping(SourceBDWorld)
User
Remote Resource
Workflow enactment engine
Wrapper
The GRID
13
Implications for BioDA
  • abstraction layer (BGI) ? Proprietary invocation
    mechanism
  • InvokeOperation
  • (ResourceHandler, Operation, XmlDataCollection)
  • prepared search statements defined in individual
    data resource wrapper
  • BGI protocols ? BDW communication objects.
    Search parameters and results passed as
    XmlDataCollecton

14
BioDA Exemplar
  • Two main possibilities within BDW
  • Augment BGI to support inclusion of queries in
    workflows and to be sent directly to OGSA-DAI
    enabled databases.
  • Define precise search parameters, exclude
    ineligible data early on
  • Potential application of OGSA-DQP
  • Very major revision to BDW protocols also,
  • many resources of interest are simply not exposed
    as databases.
  • Provide facilities within individual wrappers
    that benefit from OGSA-DAI.

15
OGSA-DAI Prototype(What wed have liked)
OGSA-DAI R5 GDS
3. Invoke wrapper
Wrapper Module
BDWQueryActivity
2. Create GDS and query
6. Download url
7. url
deliverFromURL(url)
5. Download URL
OGSA-DAI Client
deliverFromURL(xsl)
8. XSL transform to BDW format
XSLTransform
9. To WF unit
deliverToURL/GFTP
16
Key Issues encountered
  • Complex client-side coding to orchestrate the
    application flow
  • require several GDS perform requests
  • Difficult to synchronise
  • Remote web databases have different response time
    (or not response at all!)
  • Different data transformation series applicable
    to different data resources
  • BDW Protocols specify data returned as a BDW
    XmlDataCollection object

17
OGSA-DAI Prototype(What we ended up doing)
OGSA-DAI R5 GDS
BDWQueryActivity
6. return XmlRemoteData
3. Invoke wrapper/s
4. Query, transform
OGSA-DAI Client
18
Conclusion
  • Highlighted key bioinformatics eScience project
    requirements for OGSA-DAI
  • support for a metadata-driven two-step access to
    data and data integration
  • Reviewed BDWorld DAI requirements
  • uniform access to disparate, heterogeneous data
    resources
  • including anonymous access to web information
    system
  • Reviewed the BDWorld OGSA-DAI exemplar and
    issues encountered
Write a Comment
User Comments (0)
About PowerShow.com