How to build cross-species interoperable ontologies - PowerPoint PPT Presentation

1 / 66
About This Presentation
Title:

How to build cross-species interoperable ontologies

Description:

How to build cross-species interoperable ontologies Chris Mungall, LBNL Melissa Haendel, OHSU – PowerPoint PPT presentation

Number of Views:163
Avg rating:3.0/5.0
Slides: 67
Provided by: ChrisMu153
Category:

less

Transcript and Presenter's Notes

Title: How to build cross-species interoperable ontologies


1
How to build cross-species interoperable
ontologies
  • Chris Mungall, LBNL
  • Melissa Haendel, OHSU

2
The challenge..
  • There are many fun and interesting issues
    involved in building and using cross-species
    ontologies
  • homology
  • evo-devo
  • reasoning using ontologies
  • connecting genomics databases to phenotypes

3
but
  • Unfortunately, there are many more prosaic issues
    with unsatisfying solutions
  • multiple ontologies already exist
  • limited cooperation between the developers of
    these ontologies
  • they differ widely in every aspect imaginable
  • they are heavily embedded in existing databases
    and applications and slow to change
  • tools and infrastructure support falls short of
    what we need
  • FORTUNATELY, solutions are emerging..

4
Outline
  • Anatomy Ontologies Background
  • Case studies
  • GO A unified cross-species ontology
  • CL Cell Ontology Unifying multiple existing
    efforts
  • Building interoperable gross anatomy ontologies
  • (Melissa)

5
Ontologies
  • Computable qualitative representations of some
    part of the world
  • Relationships with computable properties
  • e.g. transitivity
  • languages and formats like owl and obo have a
    formal semantics
  • Entities are grouped into classes
  • Relationships are statements about all the
    members of a class
  • the most common form is the all-some statement

6
Ontologies are not smart
  • Deductive Logic is not flexible
  • Example
  • Human knowledge
  • chromosomes are found in the nucleus
  • Naïve ontology encoding
  • every chromosome part_of some nucleus
  • But this is wrong
  • Ontologies dont make exceptions!
  • Solution
  • (1) create location-specific subclasses
  • nuclear chromosome
  • mitochondrial chromosome
  • (2) invert statement every nucleus has
    chromosomes

7
Existing Anatomy Ontologies
  • Human AOs
  • Model Organism AOs
  • Domain specific AOs
  • Cross-species AOs

8
FMA Foundational Model of Anatomy
  • Domain adult human
  • no develops_from relationships, few embryonic
    structures
  • Size large (70k classes)
  • Language frames
  • Approach
  • formal, Strict single inheritance, Purely
    structural perspective
  • No computable definitions
  • Heavily pre-coordinated
  • Trunk of communicating branch of zygomatic
    branch of right facial nerve with
    zygomaticofacial branch of right zygomatic nerve
  • Distal epiphysis of of distal phalanx of right
    little toe
  • Extensive spatial relationships in selected areas
  • e.g. veins, arteries
  • Uses
  • not designed for one particular use

9
FMA Example
/ FMA62955 ! Anatomical entity is_a FMA61775
! Physical anatomical entity is_a FMA67165 !
Material anatomical entity is_a FMA67135 !
Anatomical structure is_a FMA67498 ! Organ
is_a FMA55670 ! Solid organ is_a
FMA55661 ! Parenchymatous organ is_a
FMA55662 ! Lobular organ is_a FMA13889
! Pituitary gland is_a FMA20020 !
Vestibular gland is_a FMA55533 !
Accessory thyroid gland is_a FMA58090 !
Areolar gland is_a FMA59101 ! Lacrimal
gland is_a FMA62088 ! Lactiferous
gland is_a FMA7195 ! Lung
is_a FMA7197 ! Liver is_a FMA7198 !
Pancreas is_a FMA7210 ! Testis
is_a FMA76835 ! Accessory pancreas is_a
FMA9597 ! Salivary gland is_a FMA9599
! Bulbo-urethral gland is_a FMA9600 !
Prostate is_a FMA9603 ! Thyroid gland
10
Model Organism Anatomy Ontologies
  • Typically species-centric
  • FBbt Drosophila melanogaster
  • WBbt C elegans
  • ZFA Danio rerio
  • XAO Xenopus
  • MA Adult Mouse (no develops from)
  • EMAP/EMAPA developing mouse
  • Uses
  • primarily gene expression, also phenotype
    description
  • others Virtual FLy Brain, Phenoscape
  • Approach
  • use-case driven
  • practicality over formality
  • No computable definitions
  • (exception FBbt)

11
Other anatomy ontologies
  • Developing human
  • EHDAA2
  • Vectors
  • TGMA mosquito
  • TADS - tick
  • Upper ontologies
  • CARO
  • AEO
  • Domain-specific anatomy ontologies
  • NIF_Anatomy, NIF_Cell neuroscience
  • Phylogenetic or multi-taxon AOs
  • HAO hymeoptera
  • PO plant
  • TAO telost
  • AAO amphibian
  • SPD Spider
  • we will return to these later..

12
Problem
  • These AOs are not developed in a coordinated
    fashion
  • use of a shared upper ontology does not buy us
    much
  • even the 3 mammalian AOs are massively different
  • Data annotated using these ontologies effectively
    becomes siloed
  • There is redundancy of effort in areas of shared
    biology
  • Are there lessons from existing ontologies?

13
Building ontologies that are interoperable across
species
  • Case Studies
  • GO
  • Cell Ontology

14
Gene Ontology
  • Covers all kingdoms of life
  • viruses, bacteria, archaea
  • fungi, metazoans, plants
  • Covers biology at different scales
  • Issues
  • terminological confusion (e.g. blood)
  • large, difficult to maintain

15
How does GO deal with taxonomic variation?
  • What GO says
  • every nucleus is part_of some cell
  • What GO does not say
  • every cell has_part some nucleus
  • wrong for bacteria (and mammalian erythrocytes)
  • Take home
  • Logical quantifiers are essential to
    understanding the ontology
  • Saying what something is part of is safer than
    saying what its parts are

16
Principle avoidance of taxonomic differentia
  • Not in GO
  • vertebrate eye development
  • insect eye development
  • cephalopod eye development
  • In GO
  • eye development
  • camera-type eye development
  • compound eye development
  • Exceptions for usability
  • cell wall
  • fungal-type cell wall differentiacross-linked
    glycoproteins and carbohydrates, chitin /
    beta-glucan
  • plant-type cell wall differentia cellulose,
    pectin,


no implication of homology
17
The problem of vagueness in GO
  • limb development
  • wing development

18
Adding taxonomic constraints to GO
  • GO now includes two additional relations
  • only_in_taxon
  • never_in_taxon
  • See
  • Kusnierczyk, W Taxonomy-based partitioning of
    the Gene Ontology, JBI 2008
  • Deegan et al Formalization of taxon-based
    constraints to detect inconsistencies in
    annotation and ontology development, BMC
    Bioinformatics 2010

19
Examples
  • lactation only_in_taxon Mammalia (NCBITaxon40674
    )
  • OWL lactation in_taxon only Mammalia
  • odontogenesis never_in_taxon Aves
    (NCBITaxon8782)
  • OWL odontogenesis in_taxon only not Aves
  • chloroplast only_in_taxon (Viridiplantae or
    Euglenozoa) (NCBITaxon33682 or NCBITaxon33090)

20
Uses of taxon relationships
  • Clarifying meaning of GO terms
  • Detection of errors in electronic and manual
    annotation
  • Automated reasoners
  • GO previously had chicken genes involved in
    lactation, slime mold genes involved in fin
    regeneration
  • Providing views over GO
  • e.g. subset of GO excluding terms that are never
    in drosophila

21
Scalability of single-ontology approach GO
  • How does GO cope with wide taxonomic diversity?
  • conservation at molecular level, wide diversity
    of phenotypes at level of gross anatomical
    development, physiology, and organismal behavior
  • GO Development
  • Focused on model systems
  • beak development added only recently
  • GO Behavior
  • Very broad coverage
  • Some specific terms, e.g. drosophila courtship

22
Proposal outsource portions of the ontology
23
Ontology Views
  • Ontologies, traditional
  • independent standalone resources
  • Ontologies, new
  • interconnected resources
  • multiple views possible
  • Subsetting
  • Aggregation
  • Subsetting Aggregation
  • views can be manually specified (e.g. go slims)
    or automatically constructed
  • Limited re-writing possible
  • e.g. names

24
Views
slim
subset
aggregate
aggregatesubset
subset
subset
scattered subset
domain/taxon-specific cut
25
Subset of GO
26
vertebrate subset
27
Outline
  • Case studies
  • GO A unified cross-species ontology
  • Cell Ontology Unifying multiple existing efforts
  • Gross Anatomy

28
Cell types
  • GO-Cell Component
  • cell parts
  • CL cell ontology
  • Anatomical Ontologies
  • Includes cell types
  • FBbt (Drosophila)
  • WBbt (C elegans)
  • ZFA, TAO (Danio rerio, Teleost)
  • FMA (Human)
  • PO (Plant)
  • FAO (Fungi)
  • Excludes cell types
  • MA (adult mouse)
  • EMAPA (developing mouse)
  • EHDAA2 (developing human)

29
Overlap (simplified view)
CL
PO
ZFA
FMA
NIF cell
brain
MA
plant spore
alveolar macrophage
lung
neuron
30
The Problem
  • Duplicated work
  • No unified view
  • Confusion for users
  • Confusion for annotators

31
Alternative proposals
  • LUMP Combine into one monolithic CL ontology
  • SPLIT Taxon-specific cell types in taxon-centric
    ontologies
  • Obsolete generic cell types currently in tcAOs
  • -vs-
  • Taxon-specific subclasses of generic cell types

32
LUMP
all cells
plants
fish
human
mouse
plant spore
alveolar macrophage
neuron
33
CL Lumping proposal
  • Advantages
  • one stop shopping for CL
  • (but this can be done with aggregate views)
  • Disadvantages
  • tcAO IDs well-established
  • Little advantage to lumping plant cells with
    animal cells
  • Harder to manage editorially
  • Cross-granular relationships

34
(Partial) Splitting proposal
  • Advantages
  • Easier to manage
  • Sensible subdivision of labor
  • Common cell types in shared common cell ontology
  • e.g. shared definition of neuron
  • Taxon-specific subtypes in taxon-centric
    ontologies
  • Disadvantages
  • Aggregate view is problematic
  • union of ontologies contains multiple classes
    labeled neuron
  • Can be solved by obsoleting existing generic cell
    classes in tcAOs and replacing by CL IDs
  • problem cross-granular relationships

35
Current solution for CL split and retain IDs
  • Any cell type shared by two model taxa should be
    in CL
  • tcAOs retain both generic and specific cell type
    classes
  • Formally connected to CL via subclass
    relationships
  • or even stronger taxon-specific equivalent

36
Example aggregate view
CL-metazoa
FMA
FBbt
CL
cell
muscle cell
muscle organ
cell
cell
i
i
muscle cell
p
muscle cell
i
frontal pulsatile organ muscle
37
Example aggregatesubset view
CL-metazoa
FMA
FBbt
CL
cell
muscle cell
cell
cell
i
i
muscle cell
muscle cell
i
frontal pulsatile organ muscle
38
Who maintains the connections and how?
  • How
  • maintained as xrefs for convenience
  • Who
  • either tcAO or CL
  • Synchronization?
  • hard
  • reasoning over aggregate view

39
Who maintains the connections?
cls responsibility
Term id CL0000584 name enterocyte def "An
epithelial cell that has its apical plasma
membrane folded into microvilli to provide ample
surface for the absorption of nutrients from the
intestinal lumen." SANBImhl xref
FMA62122 is_a CL0000239 ! brush border
epithelial cell
cl.obo
Term id ZFA0009269 name enterocyte namespace
zebrafish_anatomy def "An epithelial cell that
has its apical plasma membrane folded into
microvilli to provide ample surface for the
absorption of nutrients from the intestinal
lumen." SANBIcurator synonym "enterocytes"
EXACT PLURAL xref CL0000584 xref
TAO0009269 xref ZFINZDB-ANAT-070308-209 is_a
ZFA0009143 ! brush border epithelial
cell relationship end ZFS0000044 !
Adult relationship part_of ZFA0005124 !
intestinal epithelium relationship start
ZFS0000000 ! Unknown
zfa.obo
zfas responsibility
40
Issues with aggregate view
FMA
FBbt
CL
duplicate names
lattices hairballs
cell
muscle cell
cell
cell
i
i
muscle cell
muscle cell
i
frontal pulsatile organ muscle
41
Duplicate names
  • Searching for muscle cell returns
  • CL0000187 ! muscle cell
  • FBbt00005074 ! muscle cell
  • FMA67328 ! muscle cell
  • ZFA0009114 ! muscle cell
  • NIF_Cellsao519252327 ! Muscle Cell
  • Proposed solutions
  • rename in source ontology
  • yuck
  • make end-user applications smarter
  • not practical for n applications
  • auto-rename in ontology view
  • best solution

42
Aggregate view
Term id CL0000584 name enterocyte def "An
epithelial cell that has its apical plasma
membrane folded into microvilli to provide ample
surface for the absorption of nutrients from the
intestinal lumen." SANBImhl xref
FMA62122 is_a CL0000239 ! brush border
epithelial cell
cl-metazoa.obo
Term id ZFA0009269 name zebrafish
enterocyte def "An epithelial cell that has its
apical plasma membrane folded into microvilli to
provide ample surface for the absorption of
nutrients from the intestinal lumen."
SANBIcurator synonym "enterocytes" EXACT
PLURAL xref CL0000584 xref
TAO0009269 xref ZFINZDB-ANAT-070308-209 is_a
CL0000584 ! enterocyte is_a ZFA0009143 ! brush
border epithelial cell relationship end
ZFS0000044 ! Adult relationship part_of
ZFA0005124 ! intestinal epithelium relationship
start ZFS0000000 ! Unknown
rewritten name (or syn TBD)
FMA class not shown, but it would also subclass
generated from xref
lattice
43
Summary taxon variation in CL
  • Current solution is a compromise
  • Constraints
  • integrate with pre-existing tcAO ontologies
  • these ontologies have links to gross anatomy
  • tcAOs loosely integrated with CL
  • plant cell types should be left to PO
  • Synchronization remains a challenge

44
Lessons for gross anatomy
cross-ontology link (sample)
caro / all
cell
tissue
import
metazoa
skeleton
nervous system
appendage
gut
gonad
muscle tissue
circulatory system
mesoderm
gland
skeletal tissue
respiratory airway
larva
vertebrata
arthropoda
mollusca
trachea
bone
limb
mantle
mushroom body
fin
vertebra
tibia
shell
vertebral column
cuticle
mesonephros
antenna
foot
parietal bone
cephalopod
teleost
mammalia
drosophila
amphibia
tentacle
neuron types XYZ
weberian ossicle
mammary gland
tibiafibula
brachial lobe
mouse
human
zebrafish
NO pons
45
Conclusions
  • Historically anatomy ontologies have been
    developed by different groups largely in
    isolation
  • The Phenotype RCN should coordinate these efforts
  • Dynamic Views
  • Explicit taxonomic relationships

46
  • end

47
  • Melissa Here

48
Idealized model (M0)
  • A single ontology for ontology editors and
    consumers
  • Different editors have editing rights to
    different ontology partitions
  • by taxon
  • by domain (e.g. neuroscience, skeletal anatomy)
  • No taxon-specific subtypes
  • use structure, function etc as differentia
  • Users obtain dynamic views according to their
    needs

49
Example M0
mammalian view
link (small sample)
ventral nerve cord
cell
tissue
mesoderm
gut
user/editor view
circulatory system
appendage
gonad
larva
muscle tissue
gland
skeletal tissue
respiratory airway
nervous system
mollusc view
trachea
neuro view
bone
limb
mantle
fin
vertebra
tibia
pons
vertebral column
mushroom body
skeletal view
mollusc foot
parietal bone
metencephalon
mesonephros
antenna
mammary gland
weberian ossicle
tentacle
pupal DN3 period neuron
tibiafibula
brachial lobe
50
Slightly less idealized model (M1)
  • Maintain series of ontologies at different
    taxonomic levels
  • euk, plant, metazoan, vertebrate, mollusc,
    arthropod, insect, mammal, human, drosophila
  • Each ontology imports/MIREOTs relevant subset of
    ontology above it
  • this is recursive
  • Subtypes are only introduced as needed
  • Work together on commonalities at appropriate
    level above your ontology

51
Example M1
cross-ontology link (sample)
caro / all
cell
tissue
import
metazoa
skeleton
nervous system
appendage
gut
gonad
muscle tissue
circulatory system
mesoderm
gland
skeletal tissue
respiratory airway
larva
vertebrata
arthropoda
mollusca
trachea
bone
limb
mantle
mushroom body
fin
vertebra
tibia
shell
vertebral column
cuticle
mesonephros
antenna
foot
parietal bone
cephalopod
teleost
mammalia
drosophila
amphibia
tentacle
neuron types XYZ
weberian ossicle
mammary gland
tibiafibula
brachial lobe
mouse
human
zebrafish
NO pons
52
Objections to M1
  • Biological
  • homology vs analogy
  • functional grouping classes
  • e.g. respiratory airway, eye
  • Practical
  • tools
  • what about existing AOs?
  • new AOs should be designed for integration from
    the ground up

53
Protocol for new AOs
  • Collect draft list of terms
  • subdivide roughly into applicability at taxonomic
    levels
  • request new terms from existing AOs above you
  • is a new mid-level AO required?
  • yes collaborate and create, go to 1.
  • import subset from next AO above
  • Build your ontology

54
Example the octopus ontology
  • Collect and subdivide terms
  • cephalopod tentacle, brachial lobe,
    subesophageal mass, beak, visceropericardial
    coelum, swim bladder
  • mollusc mantle
  • metazoan nervous system, muscle tissue
  • Mollusc anatomy ontology does not exist
  • either (i) find collaborators and create
  • or (ii) keep mollusc terms in your ontology for
    now, but mark them as possibly migrating upwards
  • Import terms from mollusc AO(i), or metazoan if
    (ii) no mollusc AO

55
How are things organized now?
  • 3 examples
  • PO
  • TAO/ZFA
  • Uberon
  • In Melissas talk

56
Some AOs are cross-granular
FMA
cell
muscle organ
i
muscle cell
p
i
p
muscle cell protoplasm
subcellular
cell
tissue and gross anatomy
57
Cross-granular relationships
FMA
cell
muscle organ
i
muscle cell
p
i
p
58
Cross-granular relationships
FMA
CL
cell
cell
muscle organ
i
i
muscle cell
p
muscle cell
i
p
i
59
Obsoleting generic classes in tcAOs
FMA
CL
cell
cell
muscle organ
i
i
muscle cell
p
muscle cell
i
p
i
60
Migrating cross-granular relationships
FMA
CL
cell
cell
muscle organ
i
i
muscle cell
muscle cell
p
i
p
i
61
true path violations
FMA
CL
FBbt
cell
cell
muscle organ
i
i
muscle cell
muscle cell
p
frontal pulsatile organ muscle
i
p
i
i
62
fix
FMA
CL
FBbt
cell
cell
muscle organ
i
i
muscle cell
muscle cell
p
frontal pulsatile organ muscle
i
p
i
i
muscle cell AND part of some human
63
PO Plants
  • Single unified ontologies for all plants
  • cell types and gross anatomy
  • Generalized from ontology of flowering plants

64
TAO and ZFA
  • Teleost and Zebrafish

65
Uberon
  • Designed to unify existing tcAOs
  • Uses modern ontology development techniques
  • heavily axiomatized less work for humans, leave
    it to reasoners
  • automated QC
  • automated classification
  • Current size 5k classes
  • Multiple relationship types
  • Links to and from GO, CL
  • Aggregate views possible using xrefs maintained
    in uberon

66
Uberon lessons
  • Original Design Goals
  • Unify metazoan tcAOs for cross-species phenotype
    queries
  • Seed initial version from text matching
  • Was this a good idea?
  • metazoans are fairly diverse
  • many original dubious grouping classes have been
    eliminated or split
  • functional grouping classes remain
  • tissues, germ layers, etc less controversial
  • Uberon is really a vertebrate AO in which weve
    added placeholder metazoan terms
  • labels are misleading
  • high false ve, false ve from txt matching
  • starting from textbook comparative anatomy
    knowledge would have been better (give time)
Write a Comment
User Comments (0)
About PowerShow.com