The OBO Foundry - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

The OBO Foundry

Description:

Ontology: A Vision for the Future and Its Realization The OBO Foundry Barry Smith University at Buffalo http://ontology.buffalo.edu/smith – PowerPoint PPT presentation

Number of Views:185
Avg rating:3.0/5.0
Slides: 58
Provided by: BarryS163
Category:

less

Transcript and Presenter's Notes

Title: The OBO Foundry


1
The OBO Foundry
Ontology A Vision for the Future and Its
Realization
  • Barry Smith
  • University at Buffalo
  • http//ontology.buffalo.edu/smith

2
  • how do we know what data we have ?
  • how do I know what data you have ?
  • how do we make different sorts of data
    combinable, as we need to do in large domains
    such as neurodevelopment, immunology, cancer ...?

we are accumulating huge amounts of sequence
data, image data, pharma data, ...
3
  • genomic medicine, molecular medicine,
    translational medicine ... need
  • methods for data integration to enable reasoning
    across data at multiple granularities

This is not a database problem biomedically
relevant data need to be identified as focus
4
(No Transcript)
5
we need to know where in the body, where in the
cell
we need to know what kind of disease process
we need semantic annotation of data
we need ontologies
6
  • Semantic Web, Moby, wikis, etc.
  • let a million flowers (weeds) bloom
  • to create integration rely on (automatically
    generated ?) post hoc mappings

how create broad-coverage semantic annotation
systems for biomedicine?
7
most successful, thus far UMLS
  • built by trained experts
  • massively useful for information retrieval and
    information integration
  • MeSH creates out of literature a semantically
    searchable space
  • UMLS Metathesaurus a system of post hoc mappings
    between independent source vocabularies

8
for UMLS
  • local usage respected
  • regimentation frowned upon
  • cross-framework consistency not important
  • no concern to establish consistency with basic
    science
  • different grades of formal rigor, different
    degrees of completeness, different update policies

9
with UMLS-based annotations
  • we can know what data we have (via term searches)
  • we can map between data at single granularities
    (via synonyms)
  • how do we combine data across granularities?
  • but how do we resolve logical conflicts ?
  • how do we know what data we dont have ?
  • how do we reason with data ?

10
for science
  • how ensure current annotation efforts not wasted
    because the terminologies used become obsolete?
  • develop high quality annotation resources in a
    collaborative, community effort
  • create an evolutionary path towards improvement,
    of the sort we find elsewhere in science

a new approach
11
for science
  • science works out from a consensus core, and
    strives to isolate and resolve inconsistencies as
    it extends its outer fringes
  • we need to create a consensus core
  • starting with what for human beings are
    trivialities (low hanging fruit) and working
    outwards from there

for science, consistency is the sine qua non
12
Anatomical Space
Anatomical Structure
Organ Cavity Subdivision
Organ Cavity
Organ
Serous Sac
Organ Component
Serous Sac Cavity
Tissue
Serous Sac Cavity Subdivision
is_a
Pleural Sac
Pleura(Wall of Sac)
Pleural Cavity
part_of
Parietal Pleura
Visceral Pleura
Interlobar recess
Mediastinal Pleura
Mesothelium of Pleura
FMA
Foundational Model of Anatomy
13
for science
  • where do you find scientifically validated
    information linking gene products and other
    entities represented in biochemical databases to
    semantically meaningful terms pertaining to
    disease, anatomy, development in different model
    organisms?

a new approach
14
(No Transcript)
15
(No Transcript)
16
  • science basis of the GO trained experts curating
    peer-reviewed literature
  • Model organism databases employ scientific
    curators who use the experimental observations
    reported in the biomedical literature to
    associate GO terms with gene products in a
    regimented way

The methodology of annotations
17
  • cellular locations
  • molecular functions
  • biological processes
  • used to annotate the entities represented in the
    major biochemical databases
  • thereby creating integration across these
    databases

A set of standardized textual descriptions of
18
what cellular component?
what molecular function?
what biological process?
19
This process
  • leads to improvements and extensions of the
    ontology
  • which in turn leads to better annotations
  • ? a virtuous cycle of improvement in the quality
    and reach of both future annotations and the
    ontology itself
  • RESULT a slowly growing computer-interpretable
    map of biological reality within which major
    databases are automatically integrated in
    semantically searchable form

20
But now
need to improve the quality of the GO to support
more rigorous logic-based reasoning across the
data annotated in its terms need to improve the
extent of the GO by engaging ever broader
community support for the addition of new terms
and for the correction of errors
21
But also
need to extend the methodology to other domains,
including clinical trials clinical records
literature of clinical medicine ? need disease,
symptom (phenotype) ontologies
22
the problem
existing clinical vocabularies are of variable
quality and low mutual consistency need for
prospective standards to ensure mutual
consistency and high quality of clinical
counterparts of GO need to ensure consistency of
the new clinical ontologies with the basic
biomedical sciences if we do not start now, the
problem will only get worse
23
(No Transcript)
24
the solution
  • establish common rules governing best practices
    for creating ontologies and for using these in
    annotations
  • apply these rules to create a complete suite of
    orthogonal interoperable biomedical reference
    ontologies
  • this solution is already being implemented

25
First step (2003)
  • a shared portal for (so far) 58 ontologies
  • (low regimentation)
  • http//obo.sourceforge.net ? NCBO BioPortal

26
(No Transcript)
27
Second step (2004)reform efforts initiated,
e.g. linking GO to other OBO ontologies to ensure
orthogonality
GO

Cell type

Osteoblast differentiation Processes whereby an
osteoprogenitor cell or a cranial neural crest
cell acquires the specialized features of an
osteoblast, a bone-forming cell which secretes
extracellular matrix.
New Definition
28
Third step (2006)
The OBO Foundryhttp//obofoundry.org/
29
  • a family of interoperable gold standard
    biomedical reference ontologies to serve the
    annotation of
  • scientific literature
  • model organism databases
  • clinical data
  • experimental results

The OBO Foundry
30
  • A subset of OBO ontologies, whose developers have
    agreed in advance to accept a common set of
    principles designed to ensure
  • tight connection to the biomedical basic
    sciences
  • compatibility
  • interoperability
  • formal robustness
  • support for logic-based reasoning

31
A prospective standard
  • designed to guarantee interoperability of
    ontologies from the very start (contrast to post
    hoc mapping)
  • established March 2006
  • 12 initial candidate OBO ontologies focused
    primarily on basic science domains
  • several being constructed ab initio

32
  • GO Gene Ontology
  • ChEBI Chemical Ontology
  • CL Cell Ontology
  • FMA Foundational Model of Anatomy
  • PaTO Phenotype Quality Ontology
  • SO Sequence Ontology
  • CARO Common Anatomy Reference Ontology
  • CTO Clinical Trial Ontology
  • FuGO Functional Genomics Investigation Ontology
  • PrO Protein Ontology
  • RnaO RNA Ontology
  • RO Relation Ontology

new
33
Ontology Scope URL Custodians
Cell Ontology (CL) cell types from prokaryotes to mammals obo.sourceforge.net/cgi- bin/detail.cgi?cell Jonathan Bard, Michael Ashburner, Oliver Hofman
Chemical Entities of Bio- logical Interest (ChEBI) molecular entities ebi.ac.uk/chebi Paula Dematos, Rafael Alcantara
Common Anatomy Refer- ence Ontology (CARO) anatomical structures in human and model organisms (under development) Melissa Haendel, Terry Hayamizu, Cornelius Rosse, David Sutherland,
Foundational Model of Anatomy (FMA) structure of the human body fma.biostr.washington. edu JLV Mejino Jr., Cornelius Rosse
Functional Genomics Investigation Ontology (FuGO) design, protocol, data instrumentation, and analysis fugo.sf.net FuGO Working Group
Gene Ontology (GO) cellular components, molecular functions, biological processes www.geneontology.org Gene Ontology Consortium
Phenotypic Quality Ontology (PaTO) qualities of anatomical structures obo.sourceforge.net/cgi -bin/ detail.cgi? attribute_and_value Michael Ashburner, Suzanna Lewis, Georgios Gkoutos
Protein Ontology (PrO) protein types and modifications (under development) Protein Ontology Consortium
Relation Ontology (RO) relations obo.sf.net/relationship Barry Smith, Chris Mungall
RNA Ontology (RnaO) three-dimensional RNA structures (under development) RNA Ontology Consortium
Sequence Ontology (SO) properties and features of nucleic sequences song.sf.net Karen Eilbeck
34
RELATION TO TIME GRANULARITY CONTINUANT CONTINUANT CONTINUANT CONTINUANT OCCURRENT
RELATION TO TIME GRANULARITY INDEPENDENT INDEPENDENT DEPENDENT DEPENDENT
ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality(PaTO) Organism-Level Process (GO)
CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) Phenotypic Quality(PaTO) Cellular Process (GO)
MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Function (GO) Molecular Process (GO)
Annotations plus ontologies yield an ever-growing
computer-interpretable map of biological
reality.
35
RELATION TO TIME GRANULARITY CONTINUANT CONTINUANT CONTINUANT CONTINUANT OCCURRENT
RELATION TO TIME GRANULARITY INDEPENDENT INDEPENDENT DEPENDENT DEPENDENT
ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality(PaTO) Biological Process (GO)
CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) Phenotypic Quality(PaTO) Biological Process (GO)
MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Function (GO) Molecular Process (GO)
Building out fron the original GO
36
CONTINUANT CONTINUANT CONTINUANT CONTINUANT OCCURRENT
INDEPENDENT INDEPENDENT DEPENDENT DEPENDENT
ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality(PaTO) Organism-Level Process (GO)
CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) Phenotypic Quality(PaTO) Cellular Process (GO)
MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Function (GO) Molecular Process (GO)
OBO Foundry coverage (canonical ontologies)
37
  • Disease Ontology (DO)
  • Biomedical Image Ontology (BIO)
  • Upper Biomedical Ontology (OBO UBO)
  • Environment Ontology (EnvO)
  • Systems Biology Ontology (SBO)

Under consideration
38
CRITERIA
  • The ontology is open and available to be used by
    all.
  • The ontology is in, or can be instantiated in, a
    common formal language.
  • The developers of the ontology agree in advance
    to collaborate with developers of other OBO
    Foundry ontology where domains overlap.

CRITERIA
39
  • UPDATE The developers of each ontology commit to
    its maintenance in light of scientific advance,
    and to soliciting community feedback for its
    improvement.
  • ORTHOGONALITY They commit to working with other
    Foundry members to ensure that, for any
    particular domain, there is community convergence
    on a single controlled vocabulary.

CRITERIA
40
for science
  • communities must work together to ensure
    consistency ? orthogonality ? additivity of
    annotation frameworks
  • ADDITIVITY if we annotate a database or body of
    literature with one high-quality biomedical
    ontology, we should be able to add annotations
    from a second such ontology without conflicts

GLOSS ON ORTHOGONALITY
41
CRITERIA
  • IDENTIFIERS The ontology possesses a unique
    identifier space within OBO.
  • VERSIONING The ontology provider has procedures
    for identifying distinct successive versions.
  • The ontology includes textual definitions for all
    terms.

CRITERIA
42
  • CLEARLY BOUNDED The ontology has a clearly
    specified and clearly delineated content.
  • DOCUMENTATION The ontology is well-documented.
  • USERS The ontology has a plurality of
    independent users.

CRITERIA
43
  • COMMON ARCHITECTURE The ontology uses relations
    which are unambiguously defined following the
    pattern of definitions laid down in the OBO
    Relation Ontology.
  • Smith et al., Genome Biology 2005, 6R46

CRITERIA
44
OBO Relation Ontology
Foundational is_a part_of
Spatial located_in contained_in adjacent_to
Temporal transformation_of derives_from preceded_by
Participation has_participant has_agent
45
  • Further criteria will be added over time in light
    of lessons learned in order to bring about a
    gradual improvement in the quality of the
    ontologies in the Foundry
  • ALL ONTOLOGIES WILL BE SUBJECT TO CONSTANT UPDATE
    IN LIGHT OF SCIENTIFIC ADVANCE

IT WILL GET HARDER
46
  • But not everyone needs to join
  • The Foundry is not seeking to serve as a check on
    flexibility or creativity
  • ALL ONTOLOGIES WILL WELCOME CONSTANT COMMUNITY
    CRITICISM, CORRECTION AND EXTENSION

IT WILL GET HARDER
47
  • to introduce some of the features of SCIENTIFIC
    PEER REVIEW into biomedical ontology development
  • REUSABILITY if data-schemas are formulated using
    a single well-integrated framework ontology
    system in widespread use, then this data will be
    to this degree itself become more widely
    accessible and usable

GOALS
48
  • to create controlled vocabularies for semantic
    annotation of clinical trial records, scientific
    journal articles ...
  • to help in creating better mappings e.g. between
    human and model organism phenotypes
  • Zhang and Bodenreider, AMIA 2005

GOALS
49
  • to aid literature search http//www.gopubmed.org/
  • to counteract the current policy of ad hoc
    creation of new annotation schemas by each
    clinical research group by providing a common
    shared framework
  • to end the terminology wars by advancing
    community-based regimentation of clinical and
    other vocabularies in a scientific spirit

GOALS
50
  • to serve as a benchmark for improvements in
    discipline-focused terminology resources
  • once interoperable reference ontologies are
    there, it will make sense to callibrate existing
    terminologies in order to achieve greater domain
    coverage and alignment of different but veridical
    views

GOALS
51
  • June 2006 establishment of MICheck
  • reflects growing need for prescriptive
    checklists specifying the key information to
    include when reporting experimental results
    (concerning methods, data, analyses and results).

the vision is spreading
52
  • MICheck a common resource for minimum
    information checklists analogous to OBO / NCBO
    BioPortal
  • MICheck Foundry will create a suite of
    self-consistent, clearly bounded, orthogonal,
    integrable checklist modules
  • Taylor CF, et al. Nature Biotech, in press

The vision is spreading
53
  • Transcriptomics (MIAME Working Group)
  • Proteomics (Proteomics Standards Initiative)
  • Metabolomics (Metabolomics Standards Initiative)
  • Genomics and Metagenomics (Genomic Standards
    Consortium)
  • In Situ Hybridization and Immunohistochemistry
    (MISFISHIE Working Group)
  • Phylogenetics (Phylogenetics Community)
  • RNA Interference (RNAi Community)
  • Toxicogenomics (Toxicogenomics WG)
  • Environmental Genomics (Environmental Genomics
    WG)
  • Nutrigenomics (Nutrigenomics WG)
  • Flow Cytometry (Flow Cytometry Community)

MICheck/Foundry communities
54
INDEPENDENT CONTINUANTS
organism
system
organ
organ part
tissue
cell
acellular anatomical structure
biological molecule
genome
DEPENDENT CONTINUANTS DEPENDENT CONTINUANTS DEPENDENT CONTINUANTS DEPENDENT CONTINUANTS
physiology (functions) pathology pathology pathology
physiology (functions) acute stage progressive stage resolution stage
next step repertoire of disease ontologiesbuilt
out of OBO Foundry elements
55
Ontology for Acute Respiratory Distress Syndrome
56
Ontology for Muscular Sclerosis
57
with thanks (inter alia) to
with thanks to
Michael Ashburner Carol Bean Judy Blake OIivier Bodenreider William Bug Werner Ceusters Lindsay Cowell Louis J. Goldberg Frank Hartel David Hill Anand Kumar Suzi Lewis Jane Lomax Onard Mejino Chris Mungall Mark Musen NCBO Team Fabian Neuhaus Alan Rector Cornelius Rosse Karen Skinner Kent Spackman University at Buffalo
Write a Comment
User Comments (0)
About PowerShow.com