Ontologies for biological annotation - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Ontologies for biological annotation

Description:

Composed terms should now/soon be coordinated using oboedit plugin ... We should extend the annotation format to allow denoting more specific classes. e.g. ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 31
Provided by: chris1008
Category:

less

Transcript and Presenter's Notes

Title: Ontologies for biological annotation


1
(No Transcript)
2
GO terms implicitly refer to other term
  • cysteine biosynthesis
  • myoblast fusion
  • hydrogen ion transporter activity
  • snoRNA catabolism
  • wing disc pattern formation
  • epidermal cell differentiation
  • regulation of flower development
  • interleukin-18 receptor complex
  • B-cell differentiation
  • dorsal ectoderm

3
(No Transcript)
4
biosynthesis is_a metabolism
5
cysteine is_a serine family amino acid is_a amino
acid is_a amine
6
cysteine is_a serine family amino acid is_a amino
acid is_a serine
7
Composed terms currently cause problems
  • No link to external ontology term
  • Redundancy
  • Inconsistency
  • Extra work
  • Annotation bottleneck
  • Tangled DAGs and confusing displays
  • we have no way to disentangle
  • Solution so far
  • fix errors based on results of term name parsing
    (Obol)
  • reactive, not proactive

8
Solution actively manage composed terms
  • Explicit pre-coordination
  • Composed terms should now/soon be coordinated
    using oboedit plugin
  • building block terms are recorded in ontology
    along with composite term
  • Benefits
  • Correct DAG structure can be inferred from
    external ontologies
  • e.g. make sure GO CHEBI align
  • placement consistency checking automated
  • additional work can be automated
  • synonyms, text definitions

9
How will terms be pre-coordinated by oboedit?
  • How do we record a definition for a composite
    term?
  • using a logical definition (computational
    essence)
  • A logical definition consists of
  • a generic term (aka genus)
  • relationships to other terms which serve to
    discriminate this specific term from other is_a
    children of the generic term (aka differentiae)
  • Can be written in natural language as
  • A ltgeneric termgt which ltdiscriminating
    characteristicsgt

10
Example of pre-coordination
  • cysteine biosynthesis
  • generic term
  • biosynthesis
  • discriminating characteristics
  • outputs cysteine
  • natural language (Aristotelian style)
  • a biosynthesis process which outputs cysteine

11
Example in Obo format
Term id GO0019344 name cysteine
biosynthesis intersection_of GO0009058 !
biosynthesis intersection_of outputs CHEBI15356
! cysteine is_a GO0009070 ! serine family amino
acid biosynthesis is_a GO0006534 ! cysteine
metabolism
12
Alternate syntax
GOcysteine_biosynthesis GObiosynthesis ?
outputs(CHEBIcysteine)
  • used in pheno-syntax
  • more compact
  • similar to OWL abstract syntax
  • I use Obo1.2 format or natural language in the
    rest of this presentation

13
This allows us to dynamically untangle
  • Process axis view (primary is_as, via generic
    term)
  • biological_process
  • metabolism
  • biosynthesis
  • cysteine biosynthesis
  • Process participant axis view
  • amine
  • amino acid
  • serine family amino acid
  • cysteine
  • Combined view
  • (same as current tangled diamond lattice)

14
Obol demo
  • http//yuri.lbl.gov/amigo/obol

15
Recording the relationship is important
  • Why not just a simple cross-product?
  • e.g. biosynthesis x cysteine
  • Relationships are important for reasoning and
    querying
  • Consider
  • cysteine biosynthesis from serine
  • mRNA export from nucleus during heat stress
  • Without the relations, the logical definition is
    not specific enough
  • the essence is not captured
  • Relations should come from RO
  • more required

16
Multiple discriminating characteristics are
allowed
  • Cysteine biosynthesis from serine
  • Generic term
  • biosynthesis
  • Discriminating characteristics
  • output cysteine
  • input serine

Term name cysteine biosynthesis from
serine intersection_of GO0009058 !
biosynthesis intersection_of outputs CHEBI15356
! cysteine intersection_of input CHEBI17822 !
serine
17
Composite terms can be nested
Term id GOxxxxxxx name regulation of
cysteine biosynthesis intersection_of GO0050789
! regulation of biological process intersection_of
regulates GO0019344 ! cysteine
biosynthesis Term id GO0019344 name
cysteine biosynthesis intersection_of GO0009058
! biosynthesis intersection_of outputs
CHEBI15356 ! cysteine
  • regulationregulates(biosynthesisoutputs(cyste
    ine))
  • regulationregulates(biosynthesis)outputs(cystei
    ne)

YES
NO
18
Composite terms can optionally be manufactured in
bulk
  • Generic term metabolism,biosynthesis
  • Differentia has_output serine, cysteine,
  • With caution
  • Sparse vs dense matrices
  • not all combinations are types

19
On the importance of necessary and sufficient
conditions
  • Why intersection_of?
  • Why not just make normal links in the GO DAG?
  • normal relationships are for necessary conditions
    only
  • we want both necessary and sufficient conditions
  • captures the essence of the term

20
Normal DAG links only capture necessary
conditions, not essence
immune cell activation
inflammatory response
is_a
text def
macrophage activation
part_of
A change in morphology and behavior of a
macrophage resulting from exposure to a cytokine,
chemokine, cellular ligand, pathogen, or soluble
factor
21
Indistinguishable by DAG
immune cell activation
inflammatory response
is_a
text def
monocyte activation
part_of
A change in morphology and behavior of a monocyte
resulting from exposure to a cytokine, chemokine,
cellular ligand, pathogen, or soluble factor
22
essence captured by genus-differentia
immune cell activation
inflammatory response
is_a
macrophage activation
part_of
id GOmacrophage_activation intersection_of
GOcell_activation intersection_of activates
CLmacrophage
23
essence captured by genus-differentia
cell activation
is_a
immune cell activation
inflammatory response
genus
is_a
part_of
CLmacrophage
macrophage activation
activates
id GOmacrophage_activation intersection_of
GOcell_activation intersection_of activates
CLmacrophage
24
Current status of pre-coordinated terms
  • SO already contains composite terms
  • 46 pre-coordinated terms
  • A silenced gene is a gene which has the quality
    of being silenced
  • GO-BP/CL integration underway
  • retrospectively pre-coordinated terms
  • Obol page has pre-coordinated terms from
    automatic parsing
  • http//www.fruitfly.org/cjm/obol

25
Pre- vs post- coordinated
  • Pre-coordination
  • terms are in ontology with IDs and computable
    definitions
  • increases complexity of ontology
  • complexity can be managed by tools
  • e.g. new oboedit features
  • Post-coordination
  • terms are combined in the database
  • forces more complexity in database schema and
    database applications

26
Pre-coordination is useful in moderation
  • Commonly used terms should be pre-coordinated
  • eg cysteine biosynthesis oocyte differentiation
    pectoral fin
  • Avoid taking to extremes
  • cf ICD-9
  • Where do we draw the line?
  • ontologies should be built around one or a few
    axes of classification
  • term explosion typically gets large when
    multiple axes are combined
  • we can change our minds later
  • pre- and post- coordination is commensurable

27
Commensurability
  • Annotator annotates to
  • nucleuspart_of(astrocyte)
  • Anatomy editor creates new term
  • uses oboedit cross-product plugin
  • astrocyte_nucleus nucleuspart_of(astrocyte)
  • Annotation can be dynamically promoted to new
    term in answer to queries
  • various software techniques for achieving this

28
Post-coordination in GO annotations
  • Pre- and post- coordination are compatible and
    commensurable
  • We should extend the annotation format to allow
    denoting more specific classes
  • e.g.
  • cholesterol transport in liver
  • advanced applications can query this
  • standard applications suffer no loss
  • extended annotations can be used to help seed new
    terms in the ontology
  • This is already being done (MGI,Dicty)
  • we just want to capture this in interopeable way

29
Post-composition in gene association files
  • New column in GA file format

30
Database issues
  • Chado and GO DB can handle pre- and post-
    coordination
  • in theory anyway
  • not yet fully tested
  • How does it work?
  • anonymous term created for coordinated term
  • documentation in chado cvs
  • chado/modules/cv/doc/
Write a Comment
User Comments (0)
About PowerShow.com