Title: Introduction to Ontology
1Introduction to Ontology
- Barry Smith
- http//ontology.buffalo.edu/smith
2Who am I?
- NCBO National Center for Biomedical Ontology
(NIH Roadmap Center)
- Stanford Medical Informatics
- University of San Francisco Medical Center
- Berkeley Drosophila Genome Project
- Cambridge University Department of Genetics
- The Mayo Clinic
- University at Buffalo Department of Philosophy
3Who am I?
- NYS Center of Excellence in Bioinformatics and
Life Sciences Ontology Research Group - Buffalo Clinical and Translational Science
Institute (CTSI) -
4Who am I?
- Cleveland Clinic Semantic Database
- Gene Ontology
- Ontology for Biomedical Investigations
- Open Biomedical Ontologies Consortium
- Institute for Formal Ontology and Medical
Information Science - BIRN Ontology Task Force
- ...
5(No Transcript)
6 natural language labels
to make the data cognitively accessible to human
beings
and algorithmically tractable to computers
7compare legends for maps
compare legends for maps
8compare legends for maps
common legends allow (cross-border) integration
9ontologies are legends for data
10legends
- help human beings use and understand complex
representations of reality - help human beings create useful complex
representations of reality - help computers process complex representations
of reality - help glue data together
11annotations using common ontologies can yield
integration of image data
12computationally tractable legends
- help human beings find things in very large
complex representations of reality
13where in the body ? where in the cell ?
what kind of organism ?
what kind of disease process ?
14- to yield
- distributed accessibility of the data to
humans - reasoning with the data
- cumulation for purposes of research
- incrementality and evolvability
- integration with clinical data
Creating broad-coverage semantic annotation
systems for biomedicine
15(No Transcript)
16(No Transcript)
17 The Gene Ontology
18 The Gene Ontology
19(No Transcript)
20(No Transcript)
21The Idea of Common Controlled Vocabularies
GlyProt
MouseEcotope
sphingolipid transporter activity
DiabetInGene
GluChem
22The Idea of Common Controlled Vocabularies
GlyProt
MouseEcotope
Holliday junction helicase complex
DiabetInGene
GluChem
23Multiple kinds of data in multiple kinds of silos
- Lab / pathology data
- Electronic Health Record data
- Clinical trial data
- Patient histories
- Medical imaging
- Microarray data
- Protein chip data
- Flow cytometry
- Mass spec
- Genotype / SNP data
24How to find your data?
- How to find other peoples data?
- How to reason with data when you find it?
- How to work out what data does not yet exist?
25Multiple kinds of standardization for data
- Terminologies (SNOMED, UMLS)
- CDEs (Clinical research)
- Information Exchange Standards (HL7 RIM)
- LIMS (LOINC)
- MGED standards for microarray data, etc.
26how solve the problem of making such data
queryable and re-usable by others to address NIH
mandates?
part of the solution must involve standardized
terminologies and coding schemes
27 most successful, thus far UMLS
- collection of separate terminologies built by
trained experts - massively useful for information retrieval and
information integration - UMLS Metathesaurus a system of post hoc mappings
between overlapping source vocabularies
28for UMLS
- local usage respected
- regimentation frowned upon
- cross-framework consistency not important
- no concern to establish consistency with basic
science - different grades of formal rigor, different
degrees of completeness, different update policies
29caBIG approach BRIDG (top-down imposition)
30(No Transcript)
31for science
- where do you find scientifically validated
information linking gene products and other
entities represented in biochemical databases to
semantically meaningful terms pertaining to
disease, anatomy, development in different model
organisms?
A new approach
32 33SNOMED
- Ultimately as data become attached to the
samples (e.g., pathology data, genotypes) these
will be linked to the patient records.
34where in the body ? where in the cell ?
what kind of organism ?
what kind of disease process ?
35ontologies high quality controlled structured
vocabularies for the annotation (description) of
data
36compare legends for diagrams
37or chemistry diagrams
legends for chemistry diagrams
Prasanna, et al. Chemical Compound Navigator A
Web-Based Chem-BLAST, Chemical Taxonomy-Based
Search Engine for Browsing Compounds PROTEINS
Structure, Function, and Bioinformatics
63907917 (2006)
38Ramirez et al. Linking of Digital Images to
Phylogenetic Data Matrices Using a Morphological
Ontology Syst. Biol. 56(2)283294, 2007
39The Network Effects of Synchronization
GlyProt
MouseEcotope
Holliday junction helicase complex
DiabetInGene
GluChem
40Five bangs for your GO buck
- based in biological science
- incremental approach (evidence-based evolutionary
pathway) - cross-species data comparability (human, mouse,
yeast, fly ...) - cross-granularity data integration (molecule,
cell, organ, organism) - cumulation of scientific knowledge in
algorithmically tractable form, links people to
software
41- Model organism databases employ scientific
curators who use the experimental observations
reported in the biomedical literature to
associate GO terms with entries in gene product
and other molecular biology databases - (4 mill. p.a. NIH funding)
The methodology of annotations
42How to extend the GO methodology to other domains
of clinical and translational medicine?
43the problem
existing clinical vocabularies are of variable
quality and low mutual consistency current
proliferation of tiny ontologies by different
groups with urgent annotation needs
44the solution
- establish common rules governing best practices
for creating ontologies in coordinated fashion,
with an evidence-based pathway to incremental
improvement
45How to build an ontology
- work with scientists to create an initial
top-level classification - find 50 most commonly used terms corresponding
to types in reality - arrange these terms into an informal is_a
hierarchy according to the universality principle - A is_a B ? every instance of A is an instance of
B - fill in missing terms to give a complete
hierarchy - (leave it to domain scientists to populate the
lower levels of the hierarchy)
46First step (2003)
- a shared portal for (so far) 58 ontologies
- (low regimentation)
- http//obo.sourceforge.net ? NCBO BioPortal
47(No Transcript)
48OBO now the principal entry point for creation of
web-accessible biomedical data
- OBO and OBOEdit low-tech to encourage users
- Simple (web-service-based) tools created to
support the work of biologists in creating
annotations (data entry) - OBO ? OWL DL converters make OBO Foundry
annotated data immediately accessible to Semantic
Web data integration projects
49Second step (2004)reform efforts initiated,
e.g. linking GO formally to other ontologies and
data sources
GO
Cell type
Osteoblast differentiation Processes whereby an
osteoprogenitor cell or a cranial neural crest
cell acquires the specialized features of an
osteoblast, a bone-forming cell which secretes
extracellular matrix.
New Definition
50Third step (2006)
The OBO Foundryhttp//obofoundry.org/
51(No Transcript)
52Building out from the original GO
53initial OBO Foundry coverage
54- Continuants (aka endurants)
- have continuous existence in time
- preserve their identity through change
- exist in toto whenever they exist at all
- Occurrents (aka processes)
- have temporal parts
- unfold themselves in successive phases
- exist only in their phases
55You are a continuant
- Your life is an occurrent
- You are 3-dimensional
- Your life is 4-dimensional
56Dependent entities
- require independent continuants as their bearers
- There is no run without a runner
- There is no grin without a cat
57Dependent vs. independent continuants
- Independent continuants (organisms, buildings,
environments) - Dependent continuants (quality, shape, role,
propensity, function, status, power, right)
58All occurrents are dependent entities
- They are dependent on those independent
continuants which are their participants (agents,
patients, media ...)
59BFO Top-Level Ontology
Continuant
Occurrent (always dependent on one or more
independent continuants)
Independent Continuant
Dependent Continuant
60 A representation of top-level types
Continuant
Occurrent
biological process
Independent Continuant
Dependent Continuant
cell component
molecular function
61Top-Level Ontology
Continuant
Occurrent
Independent Continuant
Dependent Continuant
Side-Effect, Stochastic Process, ...
Functioning
Function
62Top-Level Ontology
Continuant
Occurrent
Independent Continuant
Dependent Continuant
Functioning
Side-Effect, Stochastic Process, ...
Function
63Top-Level Ontology
instances (in space and time)
64CRITERIA
- The ontology is open and available to be used by
all. - The ontology is in, or can be instantiated in, a
common formal language. - The developers of the ontology agree in advance
to collaborate with developers of other OBO
Foundry ontology where domains overlap.
CRITERIA
65- UPDATE The developers of each ontology commit to
its maintenance in light of scientific advance,
and to soliciting community feedback for its
improvement. - ORTHOGONALITY They commit to working with other
Foundry members to ensure that, for any
particular domain, there is community convergence
on a single controlled vocabulary.
CRITERIA
66- communities must work together to ensure
consistency ? orthogonality ? modular development
plus additivity of annotations - if we annotate a database or body of literature
with one OBO Foundry ontology, we should be able
to add annotations from a second such ontology
without conflicts - ontologies do not need to create tiny theories of
anatomy or chemistry within themselves
ORTHOGONALITY
67CRITERIA
- IDENTIFIERS The ontology possesses a unique
identifier space within OBO. - VERSIONING The ontology provider has procedures
for identifying distinct successive versions. - The ontology includes textual definitions for all
terms.
CRITERIA
68- CLEARLY BOUNDED The ontology has a clearly
specified and clearly delineated content. - DOCUMENTATION The ontology is well-documented.
- USERS The ontology has a plurality of
independent users.
CRITERIA
69- COMMON ARCHITECTURE The ontology uses relations
which are unambiguously defined following the
pattern of definitions laid down in the OBO
Relation Ontology -
CRITERIA
70- OBO Foundry is serving as a benchmark for
improvements in discipline-focused terminology
resources - yielding callibration of existing terminologies
and data resources and alignment of different
views
Consequences
71Foundry ontologies all work in the same way
- all are built to represent the types existing in
a pre-existing domain and the relations between
these types in a way which can support reasoning - we have data
- we need to make this data available for semantic
search and algorithmic processing - we create a consensus-based ontology for
annotating the data - and ensure that it can interoperate with Foundry
ontologies for neighboring domains
72Mature OBO Foundry ontologies (now undergoing
reform)
- Cell Ontology (CL)
- Chemical Entities of Biological Interest (ChEBI)
- Foundational Model of Anatomy (FMA)
- Gene Ontology (GO)
- Phenotypic Quality Ontology (PaTO)
- Relation Ontology (RO)
- Sequence Ontology (SO)
73Ontologies being built to satisfy Foundry
principles ab initio
- Ontology for Clinical Investigations (OCI)
- Common Anatomy Reference Ontology (CARO)
- Ontology for Biomedical Investigations (OBI)
- Protein Ontology (PRO)
- RNA Ontology (RnaO)
- Subcellular Anatomy Ontology (SAO)
74Ontologies in planning phase
- Biobank/Biorepository Ontology (BrO, part of OBI)
- Environment Ontology (EnvO)
- Immunology Ontology (ImmunO)
- Infectious Disease Ontology (IDO)
- Mouse Adult Neurogenesis Ontology (MANGO)
75OBO Foundry provides a method for handling legacy
databases
76Senselab/NeuronDB
- NeuronDB comprehends three types of neuronal
properties - voltage gated conductances
- neurotransmitter receptors
- neurotransmitter substances
- Many questions immediately arise what are
receptors? Proteins? Protein complexes? The
Foundry framework provides an opportunity to
evaluate such choices.
http//senselab.med.yale.edu/
77Senselab/NeuronDB
- The GO Molecular Function (MF) ontology already
has classes such as receptor activity
(GO_0004872) plus subclasses describing receptor
activities already referred to in NeuronDB. - This provides a roadmap for further development.
Review the 130 receptor classes to see if they
exist in MF, where not, create subclasses and
submit to GO for future inclusion. We can then
e.g. take advantage of GO Annotations to find
the proteins that correspond to these receptor
classes in different species.
78OBO Foundry Success Story
- Model organism research seeks results valuable
for the understanding of human disease. - This requires the ability to make reliable
cross-species comparisons, and for this anatomy
is crucial. - But different MOD communities have developed
their anatomy ontologies in uncoordinated
fashion.
79Multiple axes of classification
Functional cardiovascular system, nervous
system Spatial head, trunk, limb Developmental
endoderm, germ ring, lens placode Structural
tissue, organ, cell Stage developmental staging
series
80- Developmental terms are often lumped together for
lack of a way to categorize them - Stages are represented in a variety of ways.
Terms can be children of superstages, stages can
be integrated into each term, or stages can be
assigned to terms from a separate ontology
81Ontologies facilitate grouping of annotations
brain 20 hindbrain 15
rhombomere 10
Query brain without ontology 20 Query brain
with ontology 45
82CARO Common Anatomy Reference Ontology
- for the first time provides guidelines for model
organism researchers who wish to achieve
comparability of annotations - for the first time provides guidelines for those
new to ontology work - See Haendel et al., CARO The Common Anatomy
Reference Ontology, in Burger (ed.), Anatomy
Ontologies for Bioinformatics Springer, in press.
83CARO-conformant ontologies already in development
- Fish Multi-Species Anatomy Ontology (NSF funding
received) - Ixodidae and Argasidae (Tick) Anatomy Ontology
- Mosquito Anatomy Ontology (MAO)
- Spider Anatomy Ontology
- Xenopus Anatomy Ontology (XAO)
- undergoing reform Drosophila and Zebrafish
Anatomy Ontologies
84OBI / OCI
- Ontology for Biomedical Investigations
- overarching terminology resource for MIBBI
Foundry - Ontology for Clinical Investigations
- collaboration with EPOCH ontology for clinical
trial management - and with CDISC (FDA mandated vocabulary for
clinical trial reports)
85next step repertoire of disease ontologiesbuilt
out of OBO Foundry elements
86Scope of Draft Ontology for Multiple Sclerosis