Fluxion: The ComparaGRID Data Integration Architecture - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Fluxion: The ComparaGRID Data Integration Architecture

Description:

Introductory Example: Human-chimp. The Fluxion Data-Integration Architecture. The Domain Ontology ... Introductory Example: Human-chimp. The Fluxion Data ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 22
Provided by: nes68
Category:

less

Transcript and Presenter's Notes

Title: Fluxion: The ComparaGRID Data Integration Architecture


1
Fluxion The ComparaGRID Data Integration
Architecture
  • Matthew Pocock, Tony Burdett, Rob Davey,
  • Andrew Gibson, Trevor Paterson

2
The Collaboration
  • Developing a GRID-based system for integrating
    and exploring data from comparative genomics, to
    discover biological knowledge that can not be
    discovered from any one source
  • Collaborative BBSRC project
  • 5 sites across the UK
  • http//www.comparagrid.org

3
Outline
  • Introductory Example Human-chimp
  • The Fluxion Data-Integration Architecture
  • The Domain Ontology
  • Publishing Data
  • Runcible Rules mapping between ontologies
  • Bringing it all together

4
Outline
  • Introductory Example Human-chimp
  • The Fluxion Data-Integration Architecture
  • The Domain Ontology
  • Publishing Data
  • Runcible Rules mapping between ontologies
  • Bringing it all together

5
Introductory Example
6
Outline
  • Introductory Example Human-chimp
  • The Fluxion Data-Integration Architecture
  • The Domain Ontology
  • Publishing Data
  • Runcible Rules mapping between ontologies
  • Bringing it all together

7
The Fluxion Stack
Raw data
Aggregation
Semantics
Syntax
Raw data
integrator
Pub svc
Trans svc
query
data
8
Query Semantics
  • Query OWL class interpreted as
  • K query epistemic closure of query
  • Against knowledge-base exposed by that
    data-source, not The World
  • Result is a knowledge-base
  • All entailed by queried KB (its a subset)
  • Can be statements in the original KB, or any
    statements that always follow
  • Contains at least the statements needed to
  • Allow a reasoner to classify all the individuals
    who match the query correctly
  • Preferably using properties, not asserted types
    (a-box preferred over t-box,dont over-commit)
  • An application should always run the result
    query through an OWL reasoner

9
Rationale
  • Low barrier-to-entry for implementers
  • Support a range of implementations
  • Speed for accuracy
  • Implementation complexity for data-volume
  • Simplistic implementations
  • Return all instances of known classes e.g. db
    table with minimal filtering if in doubt,
    return it
  • Complex implementations
  • Can compute the schema of k(query), and return
    the remaining knowledge base
  • Intermediate implementations semi-canned query
  • query ? Ci ? ?
  • Compute a safe bounds k(query) ? k(query)
  • Extra filtering to discard the knowledge-base
    pertaining to k(query)

10
Outline
  • Introductory Example Human-chimp
  • The Fluxion Data-Integration Architecture
  • The Domain Ontology
  • Publishing Data
  • Runcible Rules mapping between ontologies
  • Bringing it all together

11
Role of Ontology in Fluxion
  • Fluxion
  • Uses semantics of OWL
  • Not any domain-specific information
  • Any domain
  • A domain ontology defines what Fluxion integrates
  • Developing a good domain ontology is
  • Hard work
  • Poorly scoped
  • No widely-validated methodology
  • Biologist ? Modeller so language gap

12
Ontology
Upper classes
Domain classes
Derives
Informs
Classes used by data model(s)
Datatypes
13
Outline
  • Introductory Example Human-chimp
  • The Fluxion Data-Integration Architecture
  • The Domain Ontology
  • Publishing Data
  • Runcible Rules mapping between ontologies
  • Bringing it all together

14
Publishing Data
  • Vast amounts of data in legacy databases
  • SQL
  • Text/flat-file
  • Custom/proprietary formats
  • Implicit and under-defined semantics
  • Data Publisher Role
  • Schema as OWL concepts
  • Queries populate OWL instances
  • Supported formats automated
  • mix-in knowledge

15
Outline
  • Introductory Example Human-chimp
  • The Fluxion Data-Integration Architecture
  • The Domain Ontology
  • Publishing Data
  • Runcible Rules mapping between ontologies
  • Bringing it all together

16
Runcible Rules
  • Source databases have different models
  • Application-specific
  • Mutually incompatible
  • Ontology could become universal union
  • Subsumption not the solution
  • Expert knowledge required to map from source
    schema to domain ontology
  • Do not want this fossilized in application code
  • Map a source schema to multiple domains

17
Runcible Rules
  • Declarative
  • Like xpath/xquery, xslt
  • Patterns
  • OWL class expressions with holes
  • Match against source database
  • Bind variables
  • Generate domain/application OWL
  • Fill in template OWL statements using bound
    variables
  • Rule application semantics are reversible
  • Given source-gtdomain rules, domain-gtsource rules
    can be machine-generated
  • Supports a wide range of optimization strategies

18
Rules Demo
lt?xml version"1.0" ?gt ltmappinggt   ltrulegt
    ltforall id"?sr"gt       ltingt
        ltowlClass rdfabout"domainSeq_region"/gt
      lt/ingt       ltdogt         ltindividual
id"?cr"gt           ltrdftypegt
            ltowlClass rdfabout"targetChromosom
e_Representation"/gt           lt/rdftypegt
        lt/individualgt         ltforall id"?sri"gt
          ltingt             ltwalk from"?sr"gt
              ltdown rdfresource"domainSeq_regio
n_has_Seq_region_id"/gt             lt/walkgt
          lt/ingt           ltdogt
            ltvalue of"?c"gt               ltonProp
erty rdfresource"targethas_id"/gt
              ltset id"?sri"/gt
              lt!-- this may be subject to some
transformation operation --gt             lt/valuegt
19
Outline
  • Introductory Example Human-chimp
  • The Fluxion Data-Integration Architecture
  • The Domain Ontology
  • Publishing Data
  • Runcible Rules mapping between ontologies
  • Bringing it all together

20
Bringing It All Together
21
Acknowledgements
  • Newcastle
  • Anil Wipat
  • Darren Wilkinson
  • Richard Boys
  • Matthew Pocock
  • Madhu Bhattacharjee
  • Dan Swan
  • Phil Lord
  • EBI
  • Peter Rice
  • Tony Burdett
  • http//www.comparagrid.org
  • mailtocomparagrid_at_lists.bbsrc.ac.uk
  • Manchester
  • Robert Stevens
  • Andrew Gibson
  • Roslin
  • Andy Law
  • Trevor Patterson
  • John Innes Centre
  • Jo Dicks
  • Rob Davey
  • http//deanmoor.ncl.ac.uk/blogs
  • http//deanmoor.ncl.ac.uk/websvn
Write a Comment
User Comments (0)
About PowerShow.com