Title: Getting the Gist from the Biologist
1Getting the Gist from the Biologist
- Andrew Gibson
- Postdoctoral Research Associate
- The University of Manchester
2Overview
- A Model of Ontology Development
- Factions and interests
- Experiences and Perspectives
- As ontologist for ComparaGRID project
- Also as a biologist / bioinformatician
- Expectations and relationships
3I. Modelling Ontology Space
- Factions
- Domain
- Biology / Bioinformatics
- Computer Science
- Programming, Databases, Software Engineering
- Formal Logic
- Description Logic, Language OWL, Reasoning
- Knowledge Engineering
- Tools Protégé/Swoop, Design Patterns,
Methodologies
4Faction Pyramid
Biology
Computer Science
Formal Logics
Knowledge Engineering
5Ontological Feng Shui I
- Not applicable
- Showpiece framework
- Low / no user uptake
- Well engineered
- Expressive
- Implemented System
Biology
Computer Science
Formal Logics
Knowledge Engineering
6Ontological Feng Shui II
- Practical
- Well engineered
- Implemented System
- Low expressivity
- Reasoning unlikely
Biology
Computer Science
Formal Logics
Knowledge Engineering
7Ontological Feng Shui III
- Practical
- Expressive
- Implemented System
- Problems with
- Reusability
- Structure
- Evaluation
Biology
Computer Science
Formal Logics
Knowledge Engineering
8Ontological Feng Shui IV
- Showpiece knowledge artefact
- No Implementation
- Little practical use
Biology
Computer Science
Formal Logics
Knowledge Engineering
9Modelling Domain Knowledge
Domain Pizza
Modelling pizzas using Protégé Although formal
logic is applied, its application is intrinsic in
the use of Protégé, pizzas or the methodology for
engineering the knowledge did not become the
concern of the formal logic faction
Computer Science
Formal Logics
Knowledge Engineering
10II. Experiences Perspectives
- ComparaGRID Ontology Development
- Integration of genetic and genomic data
- Model organism databases
- Different schemas / naming conventions in schemas
- Requirements (in proposal)
- A controlled vocabulary
- in OWL, allowing precise class descriptions
- including aim of using a reasoner to make
inferences - Interactions between the factions has been
interesting!...
11Approach
Domain Comparative Genomics
Computer Science
Formal Logics
Knowledge Engineering
12Friends, Romans
- After Caesar's death, another character appears
in the form of Caesar's devotee, Mark Antony,
who, by a rousing speech over the corpse deftly
turns public opinion against the assassins by
speaking to the more personal side of his
position, rather than the public and rational
tactic Brutus uses in his speeches. Antony rouses
the mob to drive them from Rome - - Wikipedia
13Terms
- Mark Antony
- This term is central to my domain, it must be in
the ontology - Brutus
- Fine, so long as you can define it
- Mark Antony
- But why, everyone knows what it means!
- In OWL, define with axioms, terms are mere labels
14 and conditions
- Brutus
- I can make new unambiguous terms and define them
instead - Mark Antony
- Whats the use in that, no-one else will
understand these silly terms! - Axioms are for decoding the meaning of a term
- Experienced ontologist can deduce meaning from
axioms - User confusion
- This ontology is for computer, rather than human
interpretation
15Exception overload
- Mark Antony
- Is one of these always one of these?
- Brutus
- Usually!
- Natural for biologists to consider exceptions
before rules - Adds to reluctance to create more specific terms
and add in axioms
16and more conditions
- Brutus
- Well I need to create axioms for the ontology to
be decidable - Mark Antony
- Yes, but I suspect that these restrictive axioms
will introduce problems for me in the future,
better to leave it open ended - Fear of commitment to a particular meaning
- Lack of experience
- Really understanding the ontology should allay
fears
17Its our knowledge
- Mark Antony
- Lo, I can make a definition for this troublesome
term - Brutus
- But by doing that youre disregarding other
interpretations of that term - Mark Antony
- No Matter! This ontology is for us!
- Duality of intention
- Controlled vocabulary and goals of reusability
and sharing in conflict
18so were all ontologists!
- Mark Antony
- I do not see the terms or the relationships I
expect in your ontology - Brutus
- The meanings were complex, so I created a
complex model - Mark Antony
- Oh, I would model it like this!
- Seeing complex models as unnecessary
- Overriding (lengthy) knowledge engineering
decisions - Reluctance to invest time in understanding
complexity
19Complexity mine
- Mark Antony
- This concept is too complex, I dont think you
should model it in OWL - Brutus
- I think youll need this concept for a more
complete model - Mark Antony
- Youre wasting your time! I can probably use
some computer science hack or just an
undefined term as a placeholder - Preferring simpler models because of fear or more
complex ones - Overriding knowledge engineering principles
20Occams Razor
- Mark Antony
- This model is hard to understand because of its
complexity, William of Occam says make it
simpler Pluralitas non est ponenda sine
neccesitate or plurality should not be posited
without necessity - Brutus
- Then I could probably reason that all of Occams
chairs had only 3 legs - Crowd ltapplausegt
21Minimum Information Problem
- Brutus
- and I can account for your uncertainty with
more abstract super-classes - Mark Anthony
- But why, these levels of abstraction dont seem
to mean anything, so must be unnecessary!
22III. Expectations and Relationships
- Expectations (of the biologist)
- Ontology development not skilled task
- Tools are freely available
- Existing ontologies designed for human
interpretation - No prominent (widely used) methodologies
- Class hierarchy is fine
- Axioms too restrictive
- Existing ontologies often lightweight
- Reasoners can be applied magically
- Ontologies can be designed to let reasoners do
work - But not in tandem with first two points
23Montagues Capulets Revisited
Biology
Capulets (Bioinformatics)
Formal Logics
Computer Science
Montagues (Knowledge Representation)
Knowledge Engineering
See C.Goble C. Wroe, Comp Funct Genom 2004
5 623632
Philosophers
24Bioinformatics
- Goal
- Biological Data Management
- Drive
- Pragmatic
- Here and now
- Established
- Quickly evolving
Biology
PERL
Databases / SQL
OBO Controlled Vocabularies
XML
Web Services
Java / OO
Software Engineering
RDF
Computer Science
25Knowledge Representation
- Goal
- Using expressive formal language to encapsulate
knowledge - Drive
- Research orientated
- Extension favoured over application
- Cutting edge!
- Methodologies?
Formal Logics
FOL
OWL
Protégé
Swoop
Ontoclean
Value Partitions / Lists
Upper Ontology
Knowledge Engineering
26Bio-eye view
- Biology ? Computer Science
- Path of least resistance
- Data oriented
- Familiarity with DB / SQL, XML
- Biology ? Logic
- (Over)complicated and alien
- OWL looks like RDF, but
- Open world assumption confusing
- Reasoning sounds good though
- Biology ? Knowledge Engineering
- Accessible ontology editors Protégé-OWL / Swoop
/ OBO-Edit - Best practice and methodology?
CS
Logic
KE
27Wheres the knowledge?
- Not all ontologies aim to capture knowledge
- Structured controlled vocabulary
- Framework for understanding
- I.e. human interpretation step still required
- Computer is in the dark!
- Reasoners require axioms
- Knowledge needs to be made explicit
- In this sense it exists outside of the person
- Knowledge engineering should provide consistency
- Methodologies need work
28Acknowledgements
- Robert Stevens
- Uli Sattler
- Katy Wolstencroft
- Georgina Moulton
- Matthew Horridge
- ComparaGRID BBSRC
- Shakespeare Carole Goble
29Discussion
Biology
Computer Science
Formal Logics
Knowledge Engineering
30ComparaGRID usecase
Chicken
Human
Pig
QTL Map
Expression Analysis
31Montagues and Capulets II
32(Semantic Web)
Biology
The Semantic Web is an extension of the current
Web, in which information is given well-defined
Meaning C. Goble C. Wroe, 2004
Formal Logics
Computer Science
Knowledge Engineering