Title: The OBO Foundry
1The OBO Foundry
Ontology A Vision for the Future and Its
Realization
- Barry Smith
- University at Buffalo
- http//ontology.buffalo.edu/smith
2- how do we know what data we have ?
- how do I know what data you have ?
- how do we know what data we dont have ?
- how do we make different sorts of data
combinable, as we need to do in large domains
such as neurodevelopment, immunology, cancer ...?
we are accumulating huge amounts of sequence
data, image data, pharma data, ...
3- genomic medicine, molecular medicine,
translational medicine, personalized medicine ...
need - methods for data integration to enable reasoning
across data at multiple granularities
to identify biomedically relevant relations on
the side of the entities themselves
4(No Transcript)
5where in the body ?
what kind of disease process ?
we need semantic annotation of data
we need ontologies
6- Semantic Web, Moby, wikis, etc.
- let a million flowers (and weeds) bloom
- to create integration rely on (automatically
generated?) post hoc mappings
how create broad-coverage semantic annotation
systems for biomedicine?
7 most successful, thus far UMLS
- built by trained experts
- massively useful for information retrieval and
information integration - UMLS Metathesaurus a system of post hoc mappings
between source vocabularies separately built
8(No Transcript)
9UMLS-based mappings fall shortof creating
interoperability
- because local usage is respected
- regimentation frowned upon, no concern for
cross-framework consistency - UMLS terminologies have different grades of
formal rigor, different degrees of completeness,
different update policies
10with UMLS-based annotations
- we can know what data we have (via term
searches), but it is noisy - we can map between data at single granularities
(via synonyms), but synonymy information is
noisy - how do we know what data we dont have ?
- how do we reason with data (as at the molecular
level), when no common logical backbone ?
11for science
- to develop high quality annotation resources in a
collaborative, community effort? - create an evolutionary path towards improvement
of terminologies, of the sort we find elsewhere
in science - find ways to reward early adopters of the results
what is to be done?
12for science
- science works out from a consensus core, and
strives to isolate and resolve inconsistencies as
it extends at the fringes - we need to create a consensus core
- start with what for human beings are trivialities
(low hanging fruit) and work out from there
for science, consistency is a sine qua non
13Anatomical Space
Anatomical Structure
Organ Cavity Subdivision
Organ Cavity
Organ
Serous Sac
Organ Component
Serous Sac Cavity
Tissue
Serous Sac Cavity Subdivision
is_a
Pleural Sac
Pleura(Wall of Sac)
Pleural Cavity
part_of
Parietal Pleura
Visceral Pleura
Interlobar recess
Mediastinal Pleura
Mesothelium of Pleura
FMA
Foundational Model of Anatomy
14for science
- include ontologies corresponding to the basic
biomedical sciences in the core
clinical medicine relies on anatomy and molecular
biology to provide integration across medical
specialisms
15for science
- where do we find scientifically validated
information linking gene products and other
entities represented in biochemical databases to
semantically meaningful terms pertaining to
disease, anatomy, development, histology in
different model organisms?
but we need more
16(No Transcript)
17what makes GO so wildly successful ?
18- science basis of the GO trained experts curating
peer-reviewed literature - different model organism databases employ
scientific curators who use the experimental
observations reported in the biomedical
literature to associate GO terms with gene
products in a coordinated way
The methodology of annotations
19- cellular locations
- molecular functions
- biological processes
- used to annotate the entities represented in the
major biochemical databases - thereby creating integration across these
databases and making them available to semantic
search
A set of standardized textual descriptions of
20what cellular component?
what molecular function?
what biological process?
21This process
- leads to improvements and extensions of the
ontology - which in turn leads to better annotations
- ? a virtuous cycle of improvement in the quality
and reach of both future annotations and the
ontology itself - RESULT a slowly growing computer-interpretable
map of biological reality within which major
databases are automatically integrated in
semantically searchable form
22Five bangs for your GO buck
- science base
- cross-species database integration
- cross-granularity database integration
- through links to the things which are of
biomedical relevance - ? semantic searchability links people to software
23but now
need to improve the quality of GO to support
more rigorous logic-based reasoning across the
data annotated in its terms need to extend the GO
by engaging ever broader community support for
the addition of new terms and for the correction
of errors
24but also
need to extend the methodology to other domains,
including clinical domains ? need for disease
ontology immunology ontology symptom
(phenotype) ontology clinical trial ontology ...
25the problem
existing clinical vocabularies are of variable
quality and low mutual consistency need for
prospective standards to ensure mutual
consistency and high quality of clinical
counterparts of GO need to ensure consistency of
the new clinical ontologies with the basic
biomedical sciences if we do not start now, the
problem will only get worse
26the solution
- establish common rules governing best practices
for creating ontologies and for using these in
annotations - apply these rules to create a complete suite of
orthogonal interoperable biomedical reference
ontologies - this solution is already being implemented
27First step (2003)
- a shared portal for (so far) 58 ontologies
- (low regimentation)
- http//obo.sourceforge.net ? NCBO BioPortal
28(No Transcript)
29Second step (2004)reform efforts initiated, e.g.
linking GO to other OBO ontologies to ensure
orthogonality
GO
Cell type
Osteoblast differentiation Processes whereby an
osteoprogenitor cell or a cranial neural crest
cell acquires the specialized features of an
osteoblast, a bone-forming cell which secretes
extracellular matrix.
New Definition
30Third step (2006)
The OBO Foundryhttp//obofoundry.org/
31- a family of interoperable gold standard
biomedical reference ontologies to serve the
annotation of inter alia - scientific literature
- model organism databases
- clinical trial data
The OBO Foundry
The OBO Foundry http//obofoundry.org/
32A prospective standard
- designed to guarantee interoperability of
ontologies from the very start (contrast to post
hoc mapping) - established March 2006
- 12 initial candidate OBO ontologies focused
primarily on basic science domains - several being constructed ab initio
- by influential consortia who have the authority
to impose their use on large parts of the
relevant communities.
33- GO Gene Ontology
- ChEBI Chemical Ontology
- CL Cell Ontology
- FMA Foundational Model of Anatomy
- PaTO Phenotype Quality Ontology
- SO Sequence Ontology
- CARO Common Anatomy Reference Ontology
- CTO Clinical Trial Ontology
- FuGO Functional Genomics Investigation Ontology
- PrO Protein Ontology
- RnaO RNA Ontology
- RO Relation Ontology
new
The OBO Foundry http//obofoundry.org/
34Ontology Scope URL Custodians
Cell Ontology (CL) cell types from prokaryotes to mammals obo.sourceforge.net/cgi- bin/detail.cgi?cell Jonathan Bard, Michael Ashburner, Oliver Hofman
Chemical Entities of Bio- logical Interest (ChEBI) molecular entities ebi.ac.uk/chebi Paula Dematos, Rafael Alcantara
Common Anatomy Refer- ence Ontology (CARO) anatomical structures in human and model organisms (under development) Melissa Haendel, Terry Hayamizu, Cornelius Rosse, David Sutherland,
Foundational Model of Anatomy (FMA) structure of the human body fma.biostr.washington. edu JLV Mejino Jr., Cornelius Rosse
Functional Genomics Investigation Ontology (FuGO) design, protocol, data instrumentation, and analysis fugo.sf.net FuGO Working Group
Gene Ontology (GO) cellular components, molecular functions, biological processes www.geneontology.org Gene Ontology Consortium
Phenotypic Quality Ontology (PaTO) qualities of anatomical structures obo.sourceforge.net/cgi -bin/ detail.cgi? attribute_and_value Michael Ashburner, Suzanna Lewis, Georgios Gkoutos
Protein Ontology (PrO) protein types and modifications (under development) Protein Ontology Consortium
Relation Ontology (RO) relations obo.sf.net/relationship Barry Smith, Chris Mungall
RNA Ontology (RnaO) three-dimensional RNA structures (under development) RNA Ontology Consortium
Sequence Ontology (SO) properties and features of nucleic sequences song.sf.net Karen Eilbeck
35 RELATION TO TIME GRANULARITY CONTINUANT CONTINUANT CONTINUANT CONTINUANT OCCURRENT
RELATION TO TIME GRANULARITY INDEPENDENT INDEPENDENT DEPENDENT DEPENDENT
ORGAN AND ORGANISM Organism (NCBI Taxonomy?) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality(PaTO) Organism-Level Process (GO)
CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) Phenotypic Quality(PaTO) Cellular Process (GO)
MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Function (GO) Molecular Process (GO)
Annotations plus ontologies yield an ever-growing
computer-interpretable map of biological
reality.
36 RELATION TO TIME GRANULARITY CONTINUANT CONTINUANT CONTINUANT CONTINUANT OCCURRENT
RELATION TO TIME GRANULARITY INDEPENDENT INDEPENDENT DEPENDENT DEPENDENT
ORGAN AND ORGANISM Organism (NCBI Taxonomy?) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality(PaTO) Biological Process (GO)
CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) Phenotypic Quality(PaTO) Biological Process (GO)
MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Function (GO) Molecular Process (GO)
Building out from the original GO
37- Disease Ontology (DO)
- Biomedical Image Ontology (BIO)
- Upper Biomedical Ontology (OBO UBO)
- Environment Ontology (EnvO)
- Systems Biology Ontology (SBO)
Under consideration
The OBO Foundry http//obofoundry.org/
38- OBO Foundry a subset of OBO ontologies, whose
developers have agreed in advance to accept a
common set of principles reflecting best practice
in ontology development designed to ensure - tight connection to the biomedical basic
sciences - compatibility
- interoperability, common relations
- formal robustness
- support for logic-based reasoning
The OBO Foundry http//obofoundry.org/
39CRITERIA
- The ontology is OPEN and available to be used by
all. - The ontology is in, or can be instantiated in, a
COMMON FORMAL LANGUAGE. - The developers of the ontology agree in advance
to COLLABORATE with developers of other OBO
Foundry ontology where domains overlap.
CRITERIA
The OBO Foundry http//obofoundry.org/
40- UPDATE The developers of each ontology commit to
its maintenance in light of scientific advance,
and to soliciting community feedback for its
improvement. - ORTHOGONALITY They commit to working with other
Foundry members to ensure that, for any
particular domain, there is community convergence
on a single controlled vocabulary.
CRITERIA
The OBO Foundry http//obofoundry.org/
41for science
- if we annotate a database or body of literature
with one high-quality biomedical ontology, we
should be able to add annotations from a second
such ontology without conflicts
orthogonality of ontologies implies additivity of
annotations
The OBO Foundry http//obofoundry.org/
42CRITERIA
- IDENTIFIERS The ontology possesses a unique
identifier space within OBO. - VERSIONING The ontology provider has procedures
for identifying distinct successive versions to
ensure BACKWARDS COMPATIBITY with annotation
resources already in common use - The ontology includes TEXTUAL DEFINITIONS and
where possible equivalent formal definitions of
its terms.
CRITERIA
43- CLEARLY BOUNDED The ontology has a clearly
specified and clearly delineated content. - DOCUMENTATION The ontology is well-documented.
- USERS The ontology has a plurality of
independent users.
CRITERIA
The OBO Foundry http//obofoundry.org/
44- COMMON ARCHITECTURE The ontology uses relations
which are unambiguously defined following the
pattern of definitions laid down in the OBO
Relation Ontology. - Smith et al., Genome Biology 2005, 6R46
CRITERIA
The OBO Foundry http//obofoundry.org/
45OBO Relation Ontology
Foundational is_a part_of
Spatial located_in contained_in adjacent_to
Temporal transformation_of derives_from preceded_by
Participation has_participant has_agent
The OBO Foundry http//obofoundry.org/
46- Further criteria will be added over time in light
of lessons learned in order to bring about a
gradual improvement in the quality of Foundry
ontologies - ALL FOUNDRY ONTOLOGIES WILL BE SUBJECT TO
CONSTANT UPDATE IN LIGHT OF SCIENTIFIC ADVANCE
IT WILL GET HARDER
The OBO Foundry http//obofoundry.org/
47- But not everyone needs to join
- The Foundry is not seeking to serve as a check on
flexibility or creativity - ALL FOUNDRY ONTOLOGIES WILL ENCOURAGE COMMUNITY
CRITICISM, CORRECTION AND EXTENSION WITH NEW
TERMS
IT WILL GET HARDER
The OBO Foundry http//obofoundry.org/
48- to introduce some of the features of SCIENTIFIC
PEER REVIEW into biomedical ontology development - CREDIT for high quality ontology development work
- KUDOS for early adopters of high quality
ontologies / terminologies e.g. in reporting
clinical trial results
GOALS
The OBO Foundry http//obofoundry.org/
49- to providing a FRAMEWORK OF RULES to counteract
the current policy of ad hoc creation of new
annotation schemas by each clinical research
group by - REUSABILITY if data-schemas are formulated using
a single well-integrated framework ontology
system in widespread use, then this data will be
to this degree itself become more widely
accessible and usable
GOALS
The OBO Foundry http//obofoundry.org/
50- to serve as BENCHMARK FOR IMPROVEMENTS in
discipline-focused terminology resources - once a system of interoperable reference
ontologies is there, it will make sense to
calibrate existing terminologies in its terms in
order to achieve more robust alignment and
greater domain coverage - exploit the avenue of EVIDENCE-BASED MEDICINE
(NIH CLINICAL RESEARCH NETWORKS) to foster their
use by clinicians
GOALS
The OBO Foundry http//obofoundry.org/
51-
- June 2006 establishment of MICheck
- reflects growing need for prescriptive
checklists specifying the key information to
include when reporting experimental results
(concerning methods, data, analyses and results).
the vision is spreading
The OBO Foundry http//obofoundry.org/
52- MICheck a common resource for minimum
information checklists analogous to OBO / NCBO
BioPortal - MICheck Foundry will create a suite of
self-consistent, clearly bounded, orthogonal,
integrable checklist modules - Taylor CF, et al. Nature Biotech, in press
MICheck Foundry
The OBO Foundry http//obofoundry.org/
53- Transcriptomics (MIAME Working Group)
- Proteomics (Proteomics Standards Initiative)
- Metabolomics (Metabolomics Standards Initiative)
- Genomics and Metagenomics (Genomic Standards
Consortium) - In Situ Hybridization and Immunohistochemistry
(MISFISHIE Working Group) - Phylogenetics (Phylogenetics Community)
- RNA Interference (RNAi Community)
- Toxicogenomics (Toxicogenomics WG)
- Environmental Genomics (Environmental Genomics
WG) - Nutrigenomics (Nutrigenomics WG)
- Flow Cytometry (Flow Cytometry Community)
MICheck/Foundry communities
54- how to replicate the successes of the GO in
clinical medicine? - choose two or three representative disease
domains - work out reasoning challenges for those domains
- work with specialists to create ontologies
interoperable with OBO Foundry basic science
ontologies to address these reasoning challenges - work with leaders of professional associations
and of clinical trial initiatives to foster the
collection of clinical data annotated in their
terms
Fourth Step (the future)
55 CONTINUANT CONTINUANT CONTINUANT CONTINUANT OCCURRENT
INDEPENDENT INDEPENDENT DEPENDENT DEPENDENT
ORGAN AND ORGANISM Organism (NCBI Taxonomy?) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality(PaTO) Organism-Level Process (GO)
CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) Phenotypic Quality(PaTO) Cellular Process (GO)
MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Function (GO) Molecular Process (GO)
OBO Foundry coverage (canonical ontologies)
56INDEPENDENT CONTINUANTS
organism
system
organ
organ part
tissue
cell
acellular anatomical structure
biological molecule
genome
DEPENDENT CONTINUANTS DEPENDENT CONTINUANTS DEPENDENT CONTINUANTS DEPENDENT CONTINUANTS
physiology (functions) pathology pathology pathology
physiology (functions) acute stage progressive stage resolution stage
57Draft Ontology for Acute Respiratory Distress
Syndrome
58Draft Ontology for Muscular Sclerosis
59Draft Ontology for Muscular Sclerosis
to apprehend what is unknown requires a complete
demarcation of the relevant space of alternatives
60with thanks (inter alia) to
with thanks to
Michael Ashburner Carol Bean Judy Blake OIivier Bodenreider William Bug Werner Ceusters Lindsay Cowell Louis Goldberg Frank Hartel David Hill Anand Kumar Suzi Lewis Jane Lomax Onard Mejino Chris Mungall Mark Musen NCBO Team Fabian Neuhaus Alan Rector Cornelius Rosse Karen Skinner Kent Spackman University at Buffalo