Workflow and data integration in ebioscience - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Workflow and data integration in ebioscience

Description:

Lennart Post, Roel van Driel (UvA), Wendy Bruins, Jeroen Pennings, Annemieke de ... Heron. Penguin. IBU. Gene Ontology. Mouse p53: {List of GO identifiers} ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 34
Provided by: Mars179
Category:

less

Transcript and Presenter's Notes

Title: Workflow and data integration in ebioscience


1
Workflow and data integration in e-bioscience
  • Some user requirements

2
Integrative Bioinformatics Unitand collaborations
IBU Márcia Inda,Oskar Bruning, Scott
Marshall, Lennart Post, Tessa Pronk, Han
Rauwerda, Marco Roos, Igor Serov, Robert Stad
(FlexGen), Peter Sterk (EBI), Frans Verster,
Asli Umur, Timo Breit
Collaborations Semantic modeling and its
applications Pieter Adriaans (UvA) Guus
Schreiber, Machiel Jansen (VU) Werner Ceusters,
Anand Kumar, Barry Smith (IFOMIS) Marijke Keet
() Information management Ersin Kaletas, Bob
Hertzberger (UvA) Joost Kok, Fons Verbeek
(LIACS) Case studies Lennart Post, Roel van
Driel (UvA), Wendy Bruins, Jeroen Pennings,
Annemieke de Vries (RIVM) Dutch system biology of
Lactococcus lactis consortium Workflow Adam
Belloum (UvA) Taverna developers (EBI)
More information www.micro-array.nl
3
Outline
  • Computational experiments in an e-science
    environment
  • Requirements
  • Annotation of both data and services
  • Provenance
  • Intersecting views of data
  • Mapping experimental data to meaning
  • What is an ontology? Essential concepts.
  • Semantic annotation of experimental data
  • A brief look at Taverna
  • Conclusions

4
Computational experiment
Database
Database
Computational experiment in workflow environment
...
Database
5
Issues raised by computational experimentation
  • How will we find relevant data?
  • How will we automatically integrate such data
    into our experiment?
  • How will we find apropriate services?
  • How will we integrate our results as usable data
    for a new (computational) experiment?
  • -gt annotation

6
Computational Experiments Anticipated needs of
the data consumer
  • Data integration - combining different types of
    data
  • Data annotation beyond formats
  • Not only
  • Data types (integer, string, etc.)
  • But also
  • Data semantics What do the data represent?
  • Determined by the experimental design
  • Provenance What has been done to the data?
  • Description of the procedure(s) that
    produced/transformed the data
  • Find and apply appropriate (web) services
  • Reuse results from a computational experiment as
    data in another computational experiment
  • derived data is tagged and put into the
    repository

7
Anticipated needs of the data supplier (and
consumer)
  • Data in
  • Simple submission/registration of data to
    e-science repository
  • Semi-automatic annotation
  • Data out
  • Easy search and retrieval of previous datasets
    (my personal and my groups data)
  • Easy search and retrieval of relevant datasets
    from public repository
  • Combining data
  • Different types and different sources
  • Example Intersecting views of data
  • data mapped to physical or semantic space
    (Examples follow..)

8
Intersecting views of data mRNA levels mapped
to embryo cross-section
9
Why semantic annotation?
  • We want annotation to be machine-readable
  • Free text arbitrary text tags generated by
    users wont always match up
  • Simplest problem Finding a named object
  • Hyponyms - Different names exist for the same
    object in different contexts and roles.
  • Synonyms - The same name is used for different
    objects.
  • Which name should I use?
  • Standardized vocabulary list
  • can only find literal matches
  • Example Using data types to search for services
    will find too many!
  • Semantic tags
  • allow searching for similar items
  • Find items like this one.
  • allow searching with a description
  • Find items with these properties.
  • semantic description of service (OWL-S) as well
    as data (OWL)

10
What is an ontology?
  • Definitions
  • A collection of things that are defined in terms
    of their properties and relations to other
    things.
  • A specification of a conceptualization that is
    designed for reuse across multiple applications
    and implementations (Gruber 93, 95, Guarino
    96, Guarino and Giaretta 95)
  • General applications
  • Searching for objects that are resources,
    documents, concepts, experimental data, or
    collections of these things.
  • Knowledge capture
  • Example Biological model with hypothetical
    knowledge
  • Common applications in bioinformatics
  • Annotation of database entries (e.g. gene
    products)

11
Inheritance in ontologies
Animal
Mammal
Bird
Robin
Heron
Penguin
  • Often represented as DAGs (Directed Acyclic
    Graphs) or hierarchies (trees)
  • Power of inheritance
  • Inclusion relations (ISA) apply transitivity to
    create inheritance of class and properties
    downward along chains in the hierarchy.
  • Use an element as a metadata tag for semantic
    annotation (ontotag)
  • An ontotag serves as a pointer into a semantic
    space

12
Gene Ontology
Mouse p53 List of GO identifiers Process apopt
osis, DNA damage response, signal transduction by
p53 class mediator... Component cytoplasm,
cytosol... Function DNA binding, protein
binding...
  • Cluster of genes X from micro array analysis
  • Collection of List of GO identifiers per gene
    in cluster
  • Most prevalent GO identifiers
  • Apoptosis, Cytosol, Protein Binding
  • Significant relationships between GO classes
    (e.g. cell death and DNA damage response)

13
(No Transcript)
14
Intersecting views of data IImRNA levels mapped
to gene ontology
15
Applications for search
?
  • Finding an object when we dont know the name
    (for example, the ontology has changed!)
  • It belongs to Class E5 and has these attributes
    (x, y, ..) and relations (a, b, ..).
  • Its similar to Object A but plays a role in
    context G

16
Ontological search for annotated data
Annotated Experimental data
Domain Ontology
Human
von Hippel-Lindau
Zebrafish
Polycystic Kidney Disease
von Hippel-Lindau
17
Ontological search for similar model model
extension
Another Knowledge Model
My Knowledge Model
Gene A
Gene B
Gene A
18
Semantic annotation - ontotags
Evidence Ontology
Provenance
Author
Gene Ontology
Metadata
19
Computational experiment
Database
Database
Some provenance should be added by the
module/service itself
...
Database
20
What is Taverna?
  • Taverna myGrid sourceforge Tom Oinn,
    Matthew Pocock, Martin Senger, Anil Wipat, Peter
    Li, Kevin Glover sourceforge
  • Institutes
  • European Bioinformatics Institute (EBI),
  • IT Innovation
  • Rosalind Franklin Centre for Genomic Research
    (RFCGR)
  • Newcastle Computer Science faculty
  • Newcastle Centre for Life
  • Manchester Computer Science faculty
  • Nottingham University Mixed Reality Lab
  • Release 1.0 January 24, 2005
  • Motivation Scufl (Simple conceptual unified flow
    language) was created because WSFL (Web Services
    Flow Language) and BPEL (Business Process
    Execution Language) do not have the levels of
    user abstraction necessary for most
    bioinformaticians and.

21
Taverna Highlights
  • Language, Platform, and Domain independent
  • Services available as remote and local components
  • Visual interface
  • Workflow graph
  • Visualisers
  • Access to computing clusters such as at European
    Bioinformatics Institute via services (no
    administrative overhead)
  • Workflow exchange through XML (XScufl)
  • Provenance
  • Personalisation

22
Taverna Workflow diagram
23
Taverna Advanced model explorer
24
GoViz workflow output
25
Workflow wishlist - Visualization
Feature Extraction
Preprocessing/ Normalization
Differential Expression
Clustering
  • Visualization
  • Interactive Visualization
  • especially linked brushing where selections in
    one view become active in another view

26
Taverna - Intermediate results
27
Taverna - Provenance
28
Conclusions
  • Semantic annotation is essential for data
    integration.
  • Ontological tags (ontotags) can be used for
    semantic annotation of both data and
    (web)services.
  • Ontotags and provenance can be added by the
    (web)services themselves.
  • Interaction will sometimes be needed from
    (web)services.
  • Taverna provides a foundation for the further
    implementation of semantic annotation and
    provenance.

29
The End
  • Science is built up of facts, as a house is
    built of stones but an accumulation of facts is
    no more a science than a heap of stones is a
    house. 
  • Henri Poincaré,
  • Science and Hypothesis, 1905

30
VL-e wishlist applied to Taverna
  • Present
  • Absent
  • Potential or Intention

31
Functional Wishlist
  • Language, Platform (not browser), and Domain
    independent
  • Encapsulation of procedures for novice users and
    best practice
  • Access to DBMS a service on(/from) which a
    workflow entity can store(/retrieve) data
  • Access to databases from workflow
    (storage/retrieval/querying) (ODBC)
  • Integration of 3rd party software the ability to
    integrate existing software packages in a
    workflow (R, Matlab, VTK, ITK, FSL, etc.)
  • Discovery and invocation of existing web-services
    developed/maintained by others (e.g. EMBOSS)
  • Typing mechanism for input/output data connected
    entities in a workflow should only be allowed to
    exchange data if the type of the data produced by
    the outputting-module is of the same type as is
    consumed by the inputting-module
  • Fan-in ( the input data of an entity can come
    from multiple entities) andfan-out ( the output
    of an entity can be passed to multiple entities)

32
User interface and SW Engineering wishlist
  • User-friendly (graphical, sensible defaults,
    wizards)
  • Interactive graph editing of workflow diagram
  • Encapsulation the ability to create hierarchies
    of workflow), copy/paste (topologies are
    first-class objects being able to load a
    topology as if it is a module)
  • Capture workflow, provenance
  • Based on well-established standards (i.e. Grid
    software, easy to install, maintain)
  • Software engineering maintainability of
    dependency on 3rd party software
  • Open source
  • Semantic annotation of web services as well as
    the data produced by a given module
  • Visualization from a service component
  • Interaction with (the visualization from) a
    service component, especially selections

33
Run-time wishlist
  • Execution of workflow, controlled (e.g. stepwise
    useful in debugging)
  • Distributed execution (e.g. across a Grid of
    systems)
  • Interactive, dynamic execution of workflow,
    Dynamic workflow (execution is not predetermined)
  • Monitoring execution of workflow, gathering
    information on execution of workflow (metadata)
    (also from inside a workflow)
  • Maintain history/log of executed workflow for
    later scrutinyReproduction of experiment
  • Checkpointing both data (as a BLOB) and
    process checkpointing
  • nohup execution (being able to execute a
    workflow in the background, without having to
    be logged in all the time)
  • Control flow (while/for/if-then-else,
    parallel/sequential/recursion, execute the same
    workflow with multiple different input, parameter
    sweeping, gathering/collecting of result)
  • Resource brokering given the description of
    resources required by a workflow entity and the
    description of abilities provided by a resource
    the (automatic) brokering of and entity onto a
    resource
  • Quality-of-Service fault tolerant, stable, high
    availability, dependable
Write a Comment
User Comments (0)
About PowerShow.com