Title: How to build cross-species interoperable ontologies
1How to build cross-species interoperable
ontologies
- Chris Mungall, LBNL
- Melissa Haendel, OHSU
2The challenge..
- There are many fun and interesting issues
involved in building and using cross-species
ontologies - homology
- evo-devo
- reasoning using ontologies
- connecting genomics databases to phenotypes
3but
- Unfortunately, there are many more prosaic issues
with unsatisfying solutions - multiple ontologies already exist
- limited cooperation between the developers of
these ontologies - they differ widely in every aspect imaginable
- they are heavily embedded in existing databases
and applications and slow to change - tools and infrastructure support falls short of
what we need - FORTUNATELY, solutions are emerging..
4Outline
- Anatomy Ontologies Background
- Case studies
- GO A unified cross-species ontology
- CL Cell Ontology Unifying multiple existing
efforts - Building interoperable gross anatomy ontologies
- (Melissa)
5Ontologies
- Computable qualitative representations of some
part of the world - Relationships with computable properties
- e.g. transitivity
- languages and formats like owl and obo have a
formal semantics - Entities are grouped into classes
- Relationships are statements about all the
members of a class - the most common form is the all-some statement
6Ontologies are not smart
- Deductive Logic is not flexible
- Example
- Human knowledge
- chromosomes are found in the nucleus
- Naïve ontology encoding
- every chromosome part_of some nucleus
- But this is wrong
- Ontologies dont make exceptions!
- Solution
- (1) create location-specific subclasses
- nuclear chromosome
- mitochondrial chromosome
- (2) invert statement every nucleus has
chromosomes
7Existing Anatomy Ontologies
- Human AOs
- Model Organism AOs
- Domain specific AOs
- Cross-species AOs
8FMA Foundational Model of Anatomy
- Domain adult human
- no develops_from relationships, few embryonic
structures - Size large (70k classes)
- Language frames
- Approach
- formal, Strict single inheritance, Purely
structural perspective - No computable definitions
- Heavily pre-coordinated
- Trunk of communicating branch of zygomatic
branch of right facial nerve with
zygomaticofacial branch of right zygomatic nerve - Distal epiphysis of of distal phalanx of right
little toe - Extensive spatial relationships in selected areas
- e.g. veins, arteries
- Uses
- not designed for one particular use
9FMA Example
/ FMA62955 ! Anatomical entity is_a FMA61775
! Physical anatomical entity is_a FMA67165 !
Material anatomical entity is_a FMA67135 !
Anatomical structure is_a FMA67498 ! Organ
is_a FMA55670 ! Solid organ is_a
FMA55661 ! Parenchymatous organ is_a
FMA55662 ! Lobular organ is_a FMA13889
! Pituitary gland is_a FMA20020 !
Vestibular gland is_a FMA55533 !
Accessory thyroid gland is_a FMA58090 !
Areolar gland is_a FMA59101 ! Lacrimal
gland is_a FMA62088 ! Lactiferous
gland is_a FMA7195 ! Lung
is_a FMA7197 ! Liver is_a FMA7198 !
Pancreas is_a FMA7210 ! Testis
is_a FMA76835 ! Accessory pancreas is_a
FMA9597 ! Salivary gland is_a FMA9599
! Bulbo-urethral gland is_a FMA9600 !
Prostate is_a FMA9603 ! Thyroid gland
10Model Organism Anatomy Ontologies
- Typically species-centric
- FBbt Drosophila melanogaster
- WBbt C elegans
- ZFA Danio rerio
- XAO Xenopus
- MA Adult Mouse (no develops from)
- EMAP/EMAPA developing mouse
- Uses
- primarily gene expression, also phenotype
description - others Virtual FLy Brain, Phenoscape
- Approach
- use-case driven
- practicality over formality
- No computable definitions
- (exception FBbt)
11Other anatomy ontologies
- Developing human
- EHDAA2
- Vectors
- TGMA mosquito
- TADS - tick
- Upper ontologies
- CARO
- AEO
- Domain-specific anatomy ontologies
- NIF_Anatomy, NIF_Cell neuroscience
- Phylogenetic or multi-taxon AOs
- HAO hymeoptera
- PO plant
- TAO telost
- AAO amphibian
- SPD Spider
-
- we will return to these later..
12Problem
- These AOs are not developed in a coordinated
fashion - use of a shared upper ontology does not buy us
much - even the 3 mammalian AOs are massively different
- Data annotated using these ontologies effectively
becomes siloed - There is redundancy of effort in areas of shared
biology - Are there lessons from existing ontologies?
13Building ontologies that are interoperable across
species
- Case Studies
- GO
- Cell Ontology
14Gene Ontology
- Covers all kingdoms of life
- viruses, bacteria, archaea
- fungi, metazoans, plants
- Covers biology at different scales
- Issues
- terminological confusion (e.g. blood)
- large, difficult to maintain
15How does GO deal with taxonomic variation?
- What GO says
- every nucleus is part_of some cell
- What GO does not say
- every cell has_part some nucleus
- wrong for bacteria (and mammalian erythrocytes)
- Take home
- Logical quantifiers are essential to
understanding the ontology - Saying what something is part of is safer than
saying what its parts are
16Principle avoidance of taxonomic differentia
- Not in GO
- vertebrate eye development
- insect eye development
- cephalopod eye development
- In GO
- eye development
- camera-type eye development
- compound eye development
- Exceptions for usability
- cell wall
- fungal-type cell wall differentiacross-linked
glycoproteins and carbohydrates, chitin /
beta-glucan - plant-type cell wall differentia cellulose,
pectin,
no implication of homology
17The problem of vagueness in GO
- limb development
- wing development
18Adding taxonomic constraints to GO
- GO now includes two additional relations
- only_in_taxon
- never_in_taxon
- See
- Kusnierczyk, W Taxonomy-based partitioning of
the Gene Ontology, JBI 2008 - Deegan et al Formalization of taxon-based
constraints to detect inconsistencies in
annotation and ontology development, BMC
Bioinformatics 2010
19Examples
- lactation only_in_taxon Mammalia (NCBITaxon40674
) - OWL lactation in_taxon only Mammalia
- odontogenesis never_in_taxon Aves
(NCBITaxon8782) - OWL odontogenesis in_taxon only not Aves
- chloroplast only_in_taxon (Viridiplantae or
Euglenozoa) (NCBITaxon33682 or NCBITaxon33090)
20Uses of taxon relationships
- Clarifying meaning of GO terms
- Detection of errors in electronic and manual
annotation - Automated reasoners
- GO previously had chicken genes involved in
lactation, slime mold genes involved in fin
regeneration - Providing views over GO
- e.g. subset of GO excluding terms that are never
in drosophila
21Scalability of single-ontology approach GO
- How does GO cope with wide taxonomic diversity?
- conservation at molecular level, wide diversity
of phenotypes at level of gross anatomical
development, physiology, and organismal behavior - GO Development
- Focused on model systems
- beak development added only recently
- GO Behavior
- Very broad coverage
- Some specific terms, e.g. drosophila courtship
22Proposal outsource portions of the ontology
23Ontology Views
- Ontologies, traditional
- independent standalone resources
- Ontologies, new
- interconnected resources
- multiple views possible
- Subsetting
- Aggregation
- Subsetting Aggregation
- views can be manually specified (e.g. go slims)
or automatically constructed - Limited re-writing possible
- e.g. names
24Views
slim
subset
aggregate
aggregatesubset
subset
subset
scattered subset
domain/taxon-specific cut
25Subset of GO
26vertebrate subset
27Outline
- Case studies
- GO A unified cross-species ontology
- Cell Ontology Unifying multiple existing efforts
- Gross Anatomy
28Cell types
- GO-Cell Component
- cell parts
- CL cell ontology
- Anatomical Ontologies
- Includes cell types
- FBbt (Drosophila)
- WBbt (C elegans)
- ZFA, TAO (Danio rerio, Teleost)
- FMA (Human)
- PO (Plant)
- FAO (Fungi)
- Excludes cell types
- MA (adult mouse)
- EMAPA (developing mouse)
- EHDAA2 (developing human)
29Overlap (simplified view)
CL
PO
ZFA
FMA
NIF cell
brain
MA
plant spore
alveolar macrophage
lung
neuron
30The Problem
- Duplicated work
- No unified view
- Confusion for users
- Confusion for annotators
31Alternative proposals
- LUMP Combine into one monolithic CL ontology
- SPLIT Taxon-specific cell types in taxon-centric
ontologies - Obsolete generic cell types currently in tcAOs
- -vs-
- Taxon-specific subclasses of generic cell types
32LUMP
all cells
plants
fish
human
mouse
plant spore
alveolar macrophage
neuron
33CL Lumping proposal
- Advantages
- one stop shopping for CL
- (but this can be done with aggregate views)
- Disadvantages
- tcAO IDs well-established
- Little advantage to lumping plant cells with
animal cells - Harder to manage editorially
- Cross-granular relationships
34(Partial) Splitting proposal
- Advantages
- Easier to manage
- Sensible subdivision of labor
- Common cell types in shared common cell ontology
- e.g. shared definition of neuron
- Taxon-specific subtypes in taxon-centric
ontologies - Disadvantages
- Aggregate view is problematic
- union of ontologies contains multiple classes
labeled neuron - Can be solved by obsoleting existing generic cell
classes in tcAOs and replacing by CL IDs - problem cross-granular relationships
35Current solution for CL split and retain IDs
- Any cell type shared by two model taxa should be
in CL - tcAOs retain both generic and specific cell type
classes - Formally connected to CL via subclass
relationships - or even stronger taxon-specific equivalent
36Example aggregate view
CL-metazoa
FMA
FBbt
CL
cell
muscle cell
muscle organ
cell
cell
i
i
muscle cell
p
muscle cell
i
frontal pulsatile organ muscle
37Example aggregatesubset view
CL-metazoa
FMA
FBbt
CL
cell
muscle cell
cell
cell
i
i
muscle cell
muscle cell
i
frontal pulsatile organ muscle
38Who maintains the connections and how?
- How
- maintained as xrefs for convenience
- Who
- either tcAO or CL
- Synchronization?
- hard
- reasoning over aggregate view
39Who maintains the connections?
cls responsibility
Term id CL0000584 name enterocyte def "An
epithelial cell that has its apical plasma
membrane folded into microvilli to provide ample
surface for the absorption of nutrients from the
intestinal lumen." SANBImhl xref
FMA62122 is_a CL0000239 ! brush border
epithelial cell
cl.obo
Term id ZFA0009269 name enterocyte namespace
zebrafish_anatomy def "An epithelial cell that
has its apical plasma membrane folded into
microvilli to provide ample surface for the
absorption of nutrients from the intestinal
lumen." SANBIcurator synonym "enterocytes"
EXACT PLURAL xref CL0000584 xref
TAO0009269 xref ZFINZDB-ANAT-070308-209 is_a
ZFA0009143 ! brush border epithelial
cell relationship end ZFS0000044 !
Adult relationship part_of ZFA0005124 !
intestinal epithelium relationship start
ZFS0000000 ! Unknown
zfa.obo
zfas responsibility
40Issues with aggregate view
FMA
FBbt
CL
duplicate names
lattices hairballs
cell
muscle cell
cell
cell
i
i
muscle cell
muscle cell
i
frontal pulsatile organ muscle
41Duplicate names
- Searching for muscle cell returns
- CL0000187 ! muscle cell
- FBbt00005074 ! muscle cell
- FMA67328 ! muscle cell
- ZFA0009114 ! muscle cell
- NIF_Cellsao519252327 ! Muscle Cell
- Proposed solutions
- rename in source ontology
- yuck
- make end-user applications smarter
- not practical for n applications
- auto-rename in ontology view
- best solution
42Aggregate view
Term id CL0000584 name enterocyte def "An
epithelial cell that has its apical plasma
membrane folded into microvilli to provide ample
surface for the absorption of nutrients from the
intestinal lumen." SANBImhl xref
FMA62122 is_a CL0000239 ! brush border
epithelial cell
cl-metazoa.obo
Term id ZFA0009269 name zebrafish
enterocyte def "An epithelial cell that has its
apical plasma membrane folded into microvilli to
provide ample surface for the absorption of
nutrients from the intestinal lumen."
SANBIcurator synonym "enterocytes" EXACT
PLURAL xref CL0000584 xref
TAO0009269 xref ZFINZDB-ANAT-070308-209 is_a
CL0000584 ! enterocyte is_a ZFA0009143 ! brush
border epithelial cell relationship end
ZFS0000044 ! Adult relationship part_of
ZFA0005124 ! intestinal epithelium relationship
start ZFS0000000 ! Unknown
rewritten name (or syn TBD)
FMA class not shown, but it would also subclass
generated from xref
lattice
43Summary taxon variation in CL
- Current solution is a compromise
- Constraints
- integrate with pre-existing tcAO ontologies
- these ontologies have links to gross anatomy
- tcAOs loosely integrated with CL
- plant cell types should be left to PO
- Synchronization remains a challenge
44Lessons for gross anatomy
cross-ontology link (sample)
caro / all
cell
tissue
import
metazoa
skeleton
nervous system
appendage
gut
gonad
muscle tissue
circulatory system
mesoderm
gland
skeletal tissue
respiratory airway
larva
vertebrata
arthropoda
mollusca
trachea
bone
limb
mantle
mushroom body
fin
vertebra
tibia
shell
vertebral column
cuticle
mesonephros
antenna
foot
parietal bone
cephalopod
teleost
mammalia
drosophila
amphibia
tentacle
neuron types XYZ
weberian ossicle
mammary gland
tibiafibula
brachial lobe
mouse
human
zebrafish
NO pons
45Conclusions
- Historically anatomy ontologies have been
developed by different groups largely in
isolation - The Phenotype RCN should coordinate these efforts
- Dynamic Views
- Explicit taxonomic relationships
46 47 48Idealized model (M0)
- A single ontology for ontology editors and
consumers - Different editors have editing rights to
different ontology partitions - by taxon
- by domain (e.g. neuroscience, skeletal anatomy)
- No taxon-specific subtypes
- use structure, function etc as differentia
- Users obtain dynamic views according to their
needs
49Example M0
mammalian view
link (small sample)
ventral nerve cord
cell
tissue
mesoderm
gut
user/editor view
circulatory system
appendage
gonad
larva
muscle tissue
gland
skeletal tissue
respiratory airway
nervous system
mollusc view
trachea
neuro view
bone
limb
mantle
fin
vertebra
tibia
pons
vertebral column
mushroom body
skeletal view
mollusc foot
parietal bone
metencephalon
mesonephros
antenna
mammary gland
weberian ossicle
tentacle
pupal DN3 period neuron
tibiafibula
brachial lobe
50Slightly less idealized model (M1)
- Maintain series of ontologies at different
taxonomic levels - euk, plant, metazoan, vertebrate, mollusc,
arthropod, insect, mammal, human, drosophila - Each ontology imports/MIREOTs relevant subset of
ontology above it - this is recursive
- Subtypes are only introduced as needed
- Work together on commonalities at appropriate
level above your ontology
51Example M1
cross-ontology link (sample)
caro / all
cell
tissue
import
metazoa
skeleton
nervous system
appendage
gut
gonad
muscle tissue
circulatory system
mesoderm
gland
skeletal tissue
respiratory airway
larva
vertebrata
arthropoda
mollusca
trachea
bone
limb
mantle
mushroom body
fin
vertebra
tibia
shell
vertebral column
cuticle
mesonephros
antenna
foot
parietal bone
cephalopod
teleost
mammalia
drosophila
amphibia
tentacle
neuron types XYZ
weberian ossicle
mammary gland
tibiafibula
brachial lobe
mouse
human
zebrafish
NO pons
52Objections to M1
- Biological
- homology vs analogy
- functional grouping classes
- e.g. respiratory airway, eye
- Practical
- tools
- what about existing AOs?
- new AOs should be designed for integration from
the ground up
53Protocol for new AOs
- Collect draft list of terms
- subdivide roughly into applicability at taxonomic
levels - request new terms from existing AOs above you
- is a new mid-level AO required?
- yes collaborate and create, go to 1.
- import subset from next AO above
- Build your ontology
54Example the octopus ontology
- Collect and subdivide terms
- cephalopod tentacle, brachial lobe,
subesophageal mass, beak, visceropericardial
coelum, swim bladder - mollusc mantle
- metazoan nervous system, muscle tissue
- Mollusc anatomy ontology does not exist
- either (i) find collaborators and create
- or (ii) keep mollusc terms in your ontology for
now, but mark them as possibly migrating upwards - Import terms from mollusc AO(i), or metazoan if
(ii) no mollusc AO
55How are things organized now?
- 3 examples
- PO
- TAO/ZFA
- Uberon
- In Melissas talk
56Some AOs are cross-granular
FMA
cell
muscle organ
i
muscle cell
p
i
p
muscle cell protoplasm
subcellular
cell
tissue and gross anatomy
57Cross-granular relationships
FMA
cell
muscle organ
i
muscle cell
p
i
p
58Cross-granular relationships
FMA
CL
cell
cell
muscle organ
i
i
muscle cell
p
muscle cell
i
p
i
59Obsoleting generic classes in tcAOs
FMA
CL
cell
cell
muscle organ
i
i
muscle cell
p
muscle cell
i
p
i
60Migrating cross-granular relationships
FMA
CL
cell
cell
muscle organ
i
i
muscle cell
muscle cell
p
i
p
i
61true path violations
FMA
CL
FBbt
cell
cell
muscle organ
i
i
muscle cell
muscle cell
p
frontal pulsatile organ muscle
i
p
i
i
62fix
FMA
CL
FBbt
cell
cell
muscle organ
i
i
muscle cell
muscle cell
p
frontal pulsatile organ muscle
i
p
i
i
muscle cell AND part of some human
63PO Plants
- Single unified ontologies for all plants
- cell types and gross anatomy
- Generalized from ontology of flowering plants
64TAO and ZFA
65Uberon
- Designed to unify existing tcAOs
- Uses modern ontology development techniques
- heavily axiomatized less work for humans, leave
it to reasoners - automated QC
- automated classification
- Current size 5k classes
- Multiple relationship types
- Links to and from GO, CL
- Aggregate views possible using xrefs maintained
in uberon
66Uberon lessons
- Original Design Goals
- Unify metazoan tcAOs for cross-species phenotype
queries - Seed initial version from text matching
- Was this a good idea?
- metazoans are fairly diverse
- many original dubious grouping classes have been
eliminated or split - functional grouping classes remain
- tissues, germ layers, etc less controversial
- Uberon is really a vertebrate AO in which weve
added placeholder metazoan terms - labels are misleading
- high false ve, false ve from txt matching
- starting from textbook comparative anatomy
knowledge would have been better (give time)