Title: Stuart Aitken
1A Process Ontology for Cell Biology
- Stuart Aitken
- Artificial Intelligence Applications Institute
2Outline
- Rapid Knowledge Formation (RKF) Project
- RKF Project goals and domain
- The Cyc knowledge based-system
- RKF Tools
- Process Ontology
- General approach
- Formalisation
- Example
3Rapid Knowledge Formation
- The RKF project aims to develop tools which will
allow domain experts to enter knowledge directly
into the KBS. - DARPA-funded, two teams
- CYCORP
- SRI
- Organised around Challenge Problems Cell
Biology
4RKF
- Aim To enable biologists to construct an
ontology/KB from a textbook source
formalise
Ontology
Alberts et al, Essential Cell Biology, 1998
5Rapid Knowledge Formation
- Key techniques
- The KBS has knowledge of the KA process
- Knowledge of salience
- Knowledge of the requirements of an adequate
formalisation - There is a dialogue between expert and system,
which clarifies the concept being defined.
6Rapid Knowledge Formation
- Evaluation
- After a period of tool development,
- trials are organised, both
- expert performance, and
- KE performance is measured,
- and assessed independently.
- The evaluation is extensive over a period of 2
weeks
7The Cyc KBS
- Cyc (Doug Lenat) is a knowledge-based system,
under development since 1984, aiming to
represent common sense knowledge. - Cyc uses a large upper-level ontology
- Uses a logical language based on first-order logic
8The Cyc KBS
- Concepts in the Upper Ontology
- Thing, Agent, Event
- TangibleThing, InformationBearingObject
- . Dog, Book
- subclass(genls), instance-of(isa)
- parts, subevent, role predicates
- 1600 concepts in total in the public release
(1998) - small of Cyc - Classification
- Stuff-like vs Object-like
- Individual vs Set
9The Cyc KBS
- The upper-ontology supports application
development
Thing
Upper-level
Intermediate-level
Application-level
10The Cyc KBS
- Cyc includes
- An inference engine,
- GUI,
- tools for ontology development.
- Until the RKF project, ontology development was
by trained knowledge engineers, working with
domain experts.
11RKF
- New tools in Cyc
- Define a new concept, and place it correctly in
the ontology - Refine a concept definition
- Define a new predicate
- Assert a new fact
- Define a new rule
- State an analogy
- Construct a new process
12RKF
- User interaction
- Selection of items in the interface
- Choice determined intelligently, KBS has
knowledge of salience, and the KA process, this
knowledge must be authored - Browsing of the ontology
- Search
- Natural language dialogue
13Process Models
RNA Transcription
BindsTogether
Move
14Process Descriptor
- Q Name the process
- A RNA Transcription
- QSelect the type of Process that describes the
category best - event localised
- creation or destruction event
- say this _ _ _ _ _ _
- Q Define
- affected object _ _ _ _ _
- location _ _ _ _ _
- actor _ _ _ _ _
15Process Models
- Describing Processes
- Complex expressions at the instance level
- Simpler to describe in terms of types
subevent(Event,Event) doneBy(Event,Agent)
Upper-level
Intermediate-level
?
Application-level
ForAll ?E ?F ?G implies (subevent(?E,?G) and
isa(?E,BindsTogether) subevent(?F,?G) and
isa(?F,Move)) before(startOf(?E),startOf(?F))
16Script Vocabulary
- The Script theory defines the semantics of
Type-Level assertions - (typePlaysRoleInScene RNATranscription
- DNAMolecule BindsTogether
- objectActedOn)
- Requires rules for identity
- Can require complex reasoning
- Good for user input
- Can be extended to cover pre and postconditions
of actions
17Scripts
RNA Transcription
startsAfterStartingOfInScript
BindsTogether
Move
t
e
f
Forall subevents f of t, of type Move, and all
subevents e of t, of type BindsTogether, (startsA
fterStartingof f e) where t is of type
RNATranscription
18Scripts
BindsTogether
Nucleotide
Types
Instance
N
e
objectActedOn
For some n in N, (objectActedOn e n)
19New Script Vocabulary
(preconditionOfScene-negated BindsTogether
touchingDirectly ltRibonucleotide
Nucleotidegt) (postconditionOfScene
BindsTogether connectedTo ltRibonucleotide
Nucleotidegt)
BindsTogether
N
N
R
not touchingDirectly
R
connectedTo
20New Script Vocabulary
BindsTogether
Ribonucleotide
Nucleotide
Types
Set of Instances
role
role
N
R
e
Postcondition
Precondition
Some ?n in N, some ?r in R (not (touchingDirectly
?n ?r))
Some ?n in N, some ?r in R (connectedTo ?n ?r)
identity
21Script Vocabulary
- The Script vocabulary forms an intermediate
level, which - lies behind the Process descriptor GUI (i.e. the
textboxes) - Not, in itself, a taxonomy of processes, but
allows processes to be described in detail. - Defining the subclass relation is just one task.
22Vaccinia Virus Life Cycle
- The vaccinia virus life cycle was selected as an
example of a complex model to formalise as a set
of Scripts. - The model includes actions, decomposition,
ordering, objects-playing-roles and
pre/postconditions - It is a good test for the Script vocabulary
23Vaccinia Virus Life Cycle
Temporal
mRNATranscription-Early
ViralGeneTranslation-Early
MovementOfProtein
OutputsmessengerRNA
Participants
mRNATranscription-Early
InputsmessengerRNA
ViralGeneTranslation-Early
MovementOfProtein
Conditions
mRNATranscription-Early
PrespatiallySubsumes Cell VirusCore
ViralGeneTranslation-Early
PostspatiallySubsumes
CellCytoplasm Vitf2
MovementOfProtein
24Evaluation
- 8 biologists were selected, and trained in the
tools, 4 per team - The knowledge to be formalised was selected
(chapter 7 in Alberts) - The knowledge base was allowed to contain
pump-priming knowledge - The biologists entered knowledge , using the
tools, then tested it against a set of questions, - Ontology/KB was revised
25Evaluation
- Results (outline)
- A huge amount of data was collected, but analysis
is complex (IET Inc) - Domain experts were able to develop ontologies
after light training - Knowledge engineers out-perform domain experts in
ontology construction
26Summary
- Power Tools for ontology development are being
implemented and tested in the RKF project. - A Script/Process vocabulary has been developed
and applied to processes in cell biology,
covering - Temporal order
- Participants
- Pre/postconditions
- Repetition