Title: Ontologies for biological annotation
1(No Transcript)
2GO terms implicitly refer to other term
- cysteine biosynthesis
- myoblast fusion
- hydrogen ion transporter activity
- snoRNA catabolism
- wing disc pattern formation
- epidermal cell differentiation
- regulation of flower development
- interleukin-18 receptor complex
- B-cell differentiation
- dorsal ectoderm
3(No Transcript)
4biosynthesis is_a metabolism
5cysteine is_a serine family amino acid is_a amino
acid is_a amine
6cysteine is_a serine family amino acid is_a amino
acid is_a serine
7Composed terms currently cause problems
- No link to external ontology term
- Redundancy
- Inconsistency
- Extra work
- Annotation bottleneck
- Tangled DAGs and confusing displays
- we have no way to disentangle
- Solution so far
- fix errors based on results of term name parsing
(Obol) - reactive, not proactive
8Solution actively manage composed terms
- Explicit pre-coordination
- Composed terms should now/soon be coordinated
using oboedit plugin - building block terms are recorded in ontology
along with composite term - Benefits
- Correct DAG structure can be inferred from
external ontologies - e.g. make sure GO CHEBI align
- placement consistency checking automated
- additional work can be automated
- synonyms, text definitions
9How will terms be pre-coordinated by oboedit?
- How do we record a definition for a composite
term? - using a logical definition (computational
essence) - A logical definition consists of
- a generic term (aka genus)
- relationships to other terms which serve to
discriminate this specific term from other is_a
children of the generic term (aka differentiae) - Can be written in natural language as
- A ltgeneric termgt which ltdiscriminating
characteristicsgt
10Example of pre-coordination
- cysteine biosynthesis
- generic term
- biosynthesis
- discriminating characteristics
- outputs cysteine
- natural language (Aristotelian style)
- a biosynthesis process which outputs cysteine
11Example in Obo format
Term id GO0019344 name cysteine
biosynthesis intersection_of GO0009058 !
biosynthesis intersection_of outputs CHEBI15356
! cysteine is_a GO0009070 ! serine family amino
acid biosynthesis is_a GO0006534 ! cysteine
metabolism
12Alternate syntax
GOcysteine_biosynthesis GObiosynthesis ?
outputs(CHEBIcysteine)
- used in pheno-syntax
- more compact
- similar to OWL abstract syntax
- I use Obo1.2 format or natural language in the
rest of this presentation
13This allows us to dynamically untangle
- Process axis view (primary is_as, via generic
term) - biological_process
- metabolism
- biosynthesis
- cysteine biosynthesis
- Process participant axis view
- amine
- amino acid
- serine family amino acid
- cysteine
- Combined view
- (same as current tangled diamond lattice)
14Obol demo
- http//yuri.lbl.gov/amigo/obol
15Recording the relationship is important
- Why not just a simple cross-product?
- e.g. biosynthesis x cysteine
- Relationships are important for reasoning and
querying - Consider
- cysteine biosynthesis from serine
- mRNA export from nucleus during heat stress
- Without the relations, the logical definition is
not specific enough - the essence is not captured
- Relations should come from RO
- more required
16Multiple discriminating characteristics are
allowed
- Cysteine biosynthesis from serine
- Generic term
- biosynthesis
- Discriminating characteristics
- output cysteine
- input serine
Term name cysteine biosynthesis from
serine intersection_of GO0009058 !
biosynthesis intersection_of outputs CHEBI15356
! cysteine intersection_of input CHEBI17822 !
serine
17Composite terms can be nested
Term id GOxxxxxxx name regulation of
cysteine biosynthesis intersection_of GO0050789
! regulation of biological process intersection_of
regulates GO0019344 ! cysteine
biosynthesis Term id GO0019344 name
cysteine biosynthesis intersection_of GO0009058
! biosynthesis intersection_of outputs
CHEBI15356 ! cysteine
- regulationregulates(biosynthesisoutputs(cyste
ine)) - regulationregulates(biosynthesis)outputs(cystei
ne)
YES
NO
18Composite terms can optionally be manufactured in
bulk
- Generic term metabolism,biosynthesis
- Differentia has_output serine, cysteine,
- With caution
- Sparse vs dense matrices
- not all combinations are types
19On the importance of necessary and sufficient
conditions
- Why intersection_of?
- Why not just make normal links in the GO DAG?
- normal relationships are for necessary conditions
only - we want both necessary and sufficient conditions
- captures the essence of the term
20Normal DAG links only capture necessary
conditions, not essence
immune cell activation
inflammatory response
is_a
text def
macrophage activation
part_of
A change in morphology and behavior of a
macrophage resulting from exposure to a cytokine,
chemokine, cellular ligand, pathogen, or soluble
factor
21Indistinguishable by DAG
immune cell activation
inflammatory response
is_a
text def
monocyte activation
part_of
A change in morphology and behavior of a monocyte
resulting from exposure to a cytokine, chemokine,
cellular ligand, pathogen, or soluble factor
22essence captured by genus-differentia
immune cell activation
inflammatory response
is_a
macrophage activation
part_of
id GOmacrophage_activation intersection_of
GOcell_activation intersection_of activates
CLmacrophage
23essence captured by genus-differentia
cell activation
is_a
immune cell activation
inflammatory response
genus
is_a
part_of
CLmacrophage
macrophage activation
activates
id GOmacrophage_activation intersection_of
GOcell_activation intersection_of activates
CLmacrophage
24Current status of pre-coordinated terms
- SO already contains composite terms
- 46 pre-coordinated terms
- A silenced gene is a gene which has the quality
of being silenced - GO-BP/CL integration underway
- retrospectively pre-coordinated terms
- Obol page has pre-coordinated terms from
automatic parsing - http//www.fruitfly.org/cjm/obol
25Pre- vs post- coordinated
- Pre-coordination
- terms are in ontology with IDs and computable
definitions - increases complexity of ontology
- complexity can be managed by tools
- e.g. new oboedit features
- Post-coordination
- terms are combined in the database
- forces more complexity in database schema and
database applications
26Pre-coordination is useful in moderation
- Commonly used terms should be pre-coordinated
- eg cysteine biosynthesis oocyte differentiation
pectoral fin - Avoid taking to extremes
- cf ICD-9
- Where do we draw the line?
- ontologies should be built around one or a few
axes of classification - term explosion typically gets large when
multiple axes are combined - we can change our minds later
- pre- and post- coordination is commensurable
27Commensurability
- Annotator annotates to
- nucleuspart_of(astrocyte)
- Anatomy editor creates new term
- uses oboedit cross-product plugin
- astrocyte_nucleus nucleuspart_of(astrocyte)
- Annotation can be dynamically promoted to new
term in answer to queries - various software techniques for achieving this
28Post-coordination in GO annotations
- Pre- and post- coordination are compatible and
commensurable - We should extend the annotation format to allow
denoting more specific classes - e.g.
- cholesterol transport in liver
- advanced applications can query this
- standard applications suffer no loss
- extended annotations can be used to help seed new
terms in the ontology - This is already being done (MGI,Dicty)
- we just want to capture this in interopeable way
29Post-composition in gene association files
- New column in GA file format
30Database issues
- Chado and GO DB can handle pre- and post-
coordination - in theory anyway
- not yet fully tested
- How does it work?
- anonymous term created for coordinated term
- documentation in chado cvs
- chado/modules/cv/doc/