Title: Fluxion: The ComparaGRID Data Integration Architecture
1Fluxion The ComparaGRID Data Integration
Architecture
- Matthew Pocock, Tony Burdett, Rob Davey,
- Andrew Gibson, Trevor Paterson
2The Collaboration
- Developing a GRID-based system for integrating
and exploring data from comparative genomics, to
discover biological knowledge that can not be
discovered from any one source - Collaborative BBSRC project
- 5 sites across the UK
- http//www.comparagrid.org
3Outline
- Introductory Example Human-chimp
- The Fluxion Data-Integration Architecture
- The Domain Ontology
- Publishing Data
- Runcible Rules mapping between ontologies
- Bringing it all together
4Outline
- Introductory Example Human-chimp
- The Fluxion Data-Integration Architecture
- The Domain Ontology
- Publishing Data
- Runcible Rules mapping between ontologies
- Bringing it all together
5Introductory Example
6Outline
- Introductory Example Human-chimp
- The Fluxion Data-Integration Architecture
- The Domain Ontology
- Publishing Data
- Runcible Rules mapping between ontologies
- Bringing it all together
7The Fluxion Stack
Raw data
Aggregation
Semantics
Syntax
Raw data
integrator
Pub svc
Trans svc
query
data
8Query Semantics
- Query OWL class interpreted as
- K query epistemic closure of query
- Against knowledge-base exposed by that
data-source, not The World - Result is a knowledge-base
- All entailed by queried KB (its a subset)
- Can be statements in the original KB, or any
statements that always follow - Contains at least the statements needed to
- Allow a reasoner to classify all the individuals
who match the query correctly - Preferably using properties, not asserted types
(a-box preferred over t-box,dont over-commit) - An application should always run the result
query through an OWL reasoner
9Rationale
- Low barrier-to-entry for implementers
- Support a range of implementations
- Speed for accuracy
- Implementation complexity for data-volume
- Simplistic implementations
- Return all instances of known classes e.g. db
table with minimal filtering if in doubt,
return it - Complex implementations
- Can compute the schema of k(query), and return
the remaining knowledge base - Intermediate implementations semi-canned query
- query ? Ci ? ?
- Compute a safe bounds k(query) ? k(query)
- Extra filtering to discard the knowledge-base
pertaining to k(query)
10Outline
- Introductory Example Human-chimp
- The Fluxion Data-Integration Architecture
- The Domain Ontology
- Publishing Data
- Runcible Rules mapping between ontologies
- Bringing it all together
11Role of Ontology in Fluxion
- Fluxion
- Uses semantics of OWL
- Not any domain-specific information
- Any domain
- A domain ontology defines what Fluxion integrates
- Developing a good domain ontology is
- Hard work
- Poorly scoped
- No widely-validated methodology
- Biologist ? Modeller so language gap
12Ontology
Upper classes
Domain classes
Derives
Informs
Classes used by data model(s)
Datatypes
13Outline
- Introductory Example Human-chimp
- The Fluxion Data-Integration Architecture
- The Domain Ontology
- Publishing Data
- Runcible Rules mapping between ontologies
- Bringing it all together
14Publishing Data
- Vast amounts of data in legacy databases
- SQL
- Text/flat-file
- Custom/proprietary formats
- Implicit and under-defined semantics
- Data Publisher Role
- Schema as OWL concepts
- Queries populate OWL instances
- Supported formats automated
- mix-in knowledge
15Outline
- Introductory Example Human-chimp
- The Fluxion Data-Integration Architecture
- The Domain Ontology
- Publishing Data
- Runcible Rules mapping between ontologies
- Bringing it all together
16Runcible Rules
- Source databases have different models
- Application-specific
- Mutually incompatible
- Ontology could become universal union
- Subsumption not the solution
- Expert knowledge required to map from source
schema to domain ontology - Do not want this fossilized in application code
- Map a source schema to multiple domains
17Runcible Rules
- Declarative
- Like xpath/xquery, xslt
- Patterns
- OWL class expressions with holes
- Match against source database
- Bind variables
- Generate domain/application OWL
- Fill in template OWL statements using bound
variables - Rule application semantics are reversible
- Given source-gtdomain rules, domain-gtsource rules
can be machine-generated - Supports a wide range of optimization strategies
18Rules Demo
lt?xml version"1.0" ?gt ltmappinggt ltrulegt
ltforall id"?sr"gt ltingt
ltowlClass rdfabout"domainSeq_region"/gt
lt/ingt ltdogt ltindividual
id"?cr"gt ltrdftypegt
ltowlClass rdfabout"targetChromosom
e_Representation"/gt lt/rdftypegt
lt/individualgt ltforall id"?sri"gt
ltingt ltwalk from"?sr"gt
ltdown rdfresource"domainSeq_regio
n_has_Seq_region_id"/gt lt/walkgt
lt/ingt ltdogt
ltvalue of"?c"gt ltonProp
erty rdfresource"targethas_id"/gt
ltset id"?sri"/gt
lt!-- this may be subject to some
transformation operation --gt lt/valuegt
19Outline
- Introductory Example Human-chimp
- The Fluxion Data-Integration Architecture
- The Domain Ontology
- Publishing Data
- Runcible Rules mapping between ontologies
- Bringing it all together
20Bringing It All Together
21Acknowledgements
- Newcastle
- Anil Wipat
- Darren Wilkinson
- Richard Boys
- Matthew Pocock
- Madhu Bhattacharjee
- Dan Swan
- Phil Lord
- EBI
- Peter Rice
- Tony Burdett
- http//www.comparagrid.org
- mailtocomparagrid_at_lists.bbsrc.ac.uk
- Manchester
- Robert Stevens
- Andrew Gibson
- Roslin
- Andy Law
- Trevor Patterson
- John Innes Centre
- Jo Dicks
- Rob Davey
- http//deanmoor.ncl.ac.uk/blogs
- http//deanmoor.ncl.ac.uk/websvn