OBO and OBD: Biomedical ontologies and data - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

OBO and OBD: Biomedical ontologies and data

Description:

OBO and OBD: Biomedical ontologies and data. Chris Mungall ... orthology. Mutations in orthologous genes give rise to similar phenotypes ... – PowerPoint PPT presentation

Number of Views:221
Avg rating:3.0/5.0
Slides: 53
Provided by: chris1012
Category:

less

Transcript and Presenter's Notes

Title: OBO and OBD: Biomedical ontologies and data


1
OBO and OBD Biomedical ontologies and data
  • Chris Mungall
  • Howard Hughes Medical Institute, UC Berkeley
  • National Center for Biomedical Ontology

2
Outline
  • A brief history of OBO
  • Overview of The National Center for Biomedical
    Ontologies
  • The OBO Foundry
  • OBD The OBO Database
  • Storing mutant phenotype and disease data in OBD
  • Technology for OBD

3
OBO History
  • 1999 Gene Ontology
  • 2003 Open Bio-Ontologies
  • 2005 National Center for Biomedical Ontologies
  • neo-OBO
  • OBD
  • 2006 OBO Foundry

4
The Gene Ontology
  • Application domain
  • annotation of genes and gene products
  • initially model organism focused
  • 3 Orthogonal Ontologies, 20k terms
  • molecular function
  • biological process
  • cellular component
  • Formalism
  • DAG - is_a and part_of relations

5
Gene Ontology software and infrastructure
  • Ontologies managed in CVS repository
  • Editor software Obo-edit
  • Native file format Obo-format
  • Annotation data managed at distributed sites
  • associations of genes and gene products to GO
    terms
  • Daily uploads into central database
  • GODB schema
  • AmiGO browser
  • Exports
  • GO-RDF, Obo-XML, OWL, MySQL

6
OBO Open Bio-Ontologies
  • Offshoot of the GO
  • Initial ontologies
  • anatomical
  • cell types
  • fly anatomy
  • mouse anatomy
  • zebrafish anatomy
  • plant anatomy
  • dictostelium anatomy
  • human anatomy
  • developmental stages
  • experimental conditions
  • chemical
  • phenotype and disease
  • phenotypic attributes
  • mammalian phenotype
  • plant phenotype
  • human disease
  • relations

7
Obol integrating GO and OBO
None of these relationships are explicitly
encoded in the ontology
  • GO Biological Process
  • cysteine biosynthesis
  • myoblast fusion
  • snoRNA catabolism
  • wing disc pattern formation
  • epidermal cell differentiation
  • regulation of flower development
  • B-cell differentiation
  • midbrain development
  • Mammalian Phenotype Ontology
  • increased activated B-cell number
  • kidney hypoplasia

We are currently creating logic definitions for
these composite terms
8
Problems with OBO mark 1
  • No funding!
  • Minimal infrastructure
  • Hosted on sourceforge (http//obo.sourceforge.net)
  • CVS
  • Web Page, ontology summaries
  • Periodic downloads and automated format
    conversions
  • No APIs or database
  • Minimal review of incoming ontologies
  • Existing clinical ontologies and terminologies
    not included
  • SNOMED, UMLS, etc

9
National Center for Biomedical Ontologies
  • Formed in 2005
  • The goal of the Center is to support biomedical
    researchers in their knowledge-intensive work, by
    providing online tools and a Web portal enabling
    them to access, review, and integrate disparate
    information resources in all aspects of
    biomedical investigation and clinical practice. A
    major focus of our work involves the use of
    biomedical ontologies to aid in the management
    and analysis of data derived from complex
    experiments

10
NCBO 6 Cores
  • Core 1 Ontologies and metadata
  • Core 2 Biomedical data annotations
  • Core 3 Driving Biological Projects and external
    research collaborations
  • Core 4 Infrastructure
  • Core 5 Education
  • Core 6 Dissemination

11
(No Transcript)
12
Core 1 Ontologies (Stanford, UVic, Mayo)
  • Develop an ontology registry/library
  • Provide ontology services BioPortal
  • search
  • change management, ontology lifecycle
  • peer review
  • metadata
  • ontology mapping (PROMPT)
  • visualisation
  • Technology
  • Protégé
  • Lexgrid

13
Core 2 Data (Berkeley)
  • Biomedical data annotation
  • experimental results
  • clinical trial data
  • Tools to support annotation
  • phenote
  • OBD
  • Data warehouse for data annotated using OBO
    ontologies

14
Core 3 Driving Biological Projects
  • Linking model organism phenotypes to human
    disease genes
  • Zebrafish
  • (Westerfield. Univ of Oregon)
  • Fruitfly
  • (Ashburner, Univ of Cambridge, UK)
  • Clinical trial data
  • HIV Clinical trials
  • (Sim, UCSF)

15
Cores 5 and 6 Education and dissemination
(University at Buffalo)
  • Principles of ontology design
  • Outreach
  • Organisation of meetings and workshops

16
Part 2 OBD
  • The OBD definition of an ontology
  • OBO Foundry
  • OBD - Annotations database

17
A ontology is
  • A formal representation of some aspect of reality

sense organ
  • what types of entity exist?

eye disc
is_a
  • what are the relationships between these entities?

develops from
eye
part_of
ommatidium
18
Types vs instances
  • Instances
  • what is particular in reality
  • Types (universals, kinds)
  • what is general in reality
  • a potential object of investigation by science

19
Ontologies vs Data
  • Instances
  • what is particular in reality
  • represented in databases
  • electronic health records
  • experimental data
  • Types (universals, kinds)
  • what is general in reality
  • a potential object of investigation by science
  • represented in ontologies

20
Relations
  • Instance level relations
  • often time-variant
  • this particular ommatidium part_of this
    particular compound eye, right now
  • Type level relations
  • specify what is true for all instances of a type
  • time-invariant
  • all instances of ommatidium part_of some instance
    of a compound eye, at all times

21
The OBO Relation Ontology
  • Foundational relations
  • is_a
  • part_of
  • has_participant
  • located_in
  • adjacent_to
  • transformation_of
  • derives_from
  • http//obo.sourceforge.net/relationship
  • Smith B, Ceusters W, Klagges B, Kohler J, Kumar
    A, Lomax J, Mungall CJ, Neuhaus F, Rector A,
    Rosse C (2004) Relations in Biomedical Ontologies
    .Genome Biology, 2005, 6R46

22
  • We want to encourage ontologies to be
    interoperable and follow certain principles
  • this is vital for OBD
  • We want the neo-OBO to be ecumenical
  • cannot impose standards on outside ontologies

23
The OBO Foundry
  • Collaborative experiment amongst OBO ontology
    developers
  • Establish high-standard orthogonal reference
    ontologies

24
OBO Foundry principles
  • http//www.obofoundry.org/
  • intelligibility to biologist curators,
    annotators, users
  • formal robustness
  • stability
  • compatibility
  • interoperability
  • support for logic-based reasoning

25
OBO Foundry Criteria
  • The ontology uses relations which are
    unambiguously defined following the pattern of
    definitions laid down in the OBO Relation
    Ontology.
  • Assumption if we are to create ontologies which
    support logical reasoning then we need to take
    time and instances into account

26
OBD A Database for OBO
  • About OBD
  • A use case linking genes to disease
  • On the correct representation of phenotypes
  • Representation of instances in a database
  • SQL/Relational DBs
  • Semantic Web DBs
  • Deductive DBs

27
OBD A Database for OBO
  • OBO is a repository of ontologies
  • OBD is a repository of data annotated using these
    ontologies
  • to integrate data from various sources
  • to allow researchers to retrieve data and perform
    advanced queries using ontologies
  • to help generate and explore hypotheses
  • to allow for reasoning over data

28
OBD and the NCBO Driving Biological Projects
  • OBD is a general purpose data repository
    complementary to OBO
  • generic schema
  • Special care for generating, integrating and
    analysing data from DBPs
  • OBD Foundry
  • mutant phenotypes
  • clinical trials

29
Linking diseases to genes
  • Humans share genes with other organisms
  • orthology
  • Mutations in orthologous genes give rise to
    similar phenotypes
  • Understanding phenotypes helps us understand
    diseases and disorders
  • Example holoprosencephaly

30
SHH-/
SHH-/-
shh-/
shh-/-
31
What is a phenotype?
  • Represented using qualities
  • Qualities are dependent entities
  • Phenotype A collection of one or more quality
    types inhering in some entity types (both
    continuants and occurrents)
  • An instance of a phenotype
  • One or more quality instances inhering in the
    entity instances that comprise a particular
    organism instance
  • Example osteoporosis
  • The quality of being low mass inhering in bone

32
Representing phenotypes in OBD
types
instances
entity
quality
A Formal Theory of Substances, Qualities and
Universals Neuhaus F, Smith B, Grenon P,
Proceedings of FOIS 2004
33
quality
anatomical entity
organism
genotype
environment
is_a (indirect)
eye
red
Drosophila melanogaster
y1 cn1 bw1 sp1 genotype
agar medium
instance_of
34
What ontologies does OBD require?
  • Continuants
  • Multiple species-centric ontologies (fly, fish,
    mouse, ..)
  • OBO Cell ontology
  • OBO/GO Cellular component
  • Organism type NCBI Taxonomy
  • Occurrents
  • OBO/GO Biological Process
  • OBO Relation ontology
  • Environment not yet required
  • Phenotypic Qualities
  • PATO

35
Pre- vs post- composed
  • Pre-composed phenotypes
  • e.g. from Mammalian Phenotype Ontology (MP)
  • osteoporosis
  • syndactyly
  • pink fur
  • Post-composition of phenotypes
  • e.g. anatomical ontology and PATO
  • (loosely) Entity type Quality type
  • bone, low mass
  • fingers, fused
  • fur, pink
  • OBD must support both

36
OBD Schema
  • Minimal generic schema
  • Ontological primitives only
  • type
  • instance
  • relation
  • Usable across biological domains
  • Instance data only as good as the ontologies used
    to type instances
  • Importance of OBO Foundry

37
OBD and OBO
  • How do we integrate them?
  • Two physical databases, one portal
  • requires mediation layer
  • efficiency concerns?
  • Replicate OBO in OBD
  • Mechanisms for managing change and ontology
    lifecycle

38
Data vs ontologies
  • How do we decide what goes in OBO and what goes
    in OBD?
  • OBO types
  • OBD instances
  • ..but most curated scientific data concerns types
    (or at least representative instances)
  • inferences about gene types, protein types,
    genotypes from multiple experiments on multiple
    instances
  • where do we draw the line?

39
Technological framework for OBD
  • Representations must be encoded using some
    technological framework
  • What are the choices?
  • Choice 1 Traditional DBMS
  • Choice 2 Semantic Web
  • representation language DL
  • RDF triple store database
  • Choice 3 Logic based
  • representation language FOPL or some fragment
  • deductive database
  • Note XMDR exploring similar paths?

40
SQL semantics
  • Scalable, standard, etc
  • Similarities to FOPL
  • -
  • Fragment of FOPL too limited
  • Limited deductive capabilities
  • /-
  • Closed world assumption

41
SQL DBs Prior experience
  • The Gene Ontology Database (GODB)
  • Predecessor to OBD
  • More restricted domain
  • genes and gene products
  • GO
  • Molecular Function
  • Biological Process
  • Cellular Component
  • Technological framework
  • SQL database, semi-generic schema
  • Pre-computed basic inferences
  • Difficult to extend to more general use cases for
    OBD

42
Semantic Web Technology
  • Representation language
  • RDF triple based
  • Some fragment of OWL entailment
  • Query language
  • RDF based - SPARQL, SeRQL
  • Database systems and tools
  • Jena, Sesame, Kowari, SWI-SeRQL engine

43
quality
anatomical entity
organism
genotype
environment
is_a
eye
red
Drosophila melanogaster
y- cn b genotype
agar medium
instantiation
44
RDF triple representation
45
OBD-Sesame
  • Trial phenotype dataset from fruitfly and
    zebrafish
  • converted to OWL instance representation
  • Ontologies converted to OWL
  • PATO - qualities (attributes)
  • Anatomical ontologies (fly, fish)
  • Cell ontology
  • Gene ontology
  • function, process, component
  • NCBI Taxonomy
  • OWL loaded into a Sesame DB

46
Example query in SeRQL
find mutations affecting the morphology of the
wing vein
SELECT DISTINCT EI, ET, OrgI, QI, QT, QN FROM
EI rdftype ET rdfslabel EN, EI
OBO_REL_part_of OrgI rdftype Tax rdfslabel
TaxN, EI OBO_REL_has_quality QI
rdftype QT rdfslabel QN WHERE label(EN)
"wing vein" AND label(TaxN) Arthropoda" AND
label(QN) morphology"
results of query on OBD-sesame one instance of
wing vein L2, branched in a fruitfly
47
Example query in SeRQL
find mutations bone mass
SELECT DISTINCT EI, ET, OrgI, QI, QT, QN FROM
EI rdftype ET rdfslabel EN, EI
OBO_REL_part_of OrgI rdftype Tax rdfslabel
TaxN, EI OBO_REL_has_quality QI
rdftype QT rdfslabel QN WHERE label(EN)
bone" AND label(QN) reduced mass"
with OWL entailment, should return mutants with
phenotypes instantiating osteoporosis (if this
term is defined in OWL)
48
OBD-Sesame preliminary results
  • Dependent upon storage layer
  • memory-based
  • FAST
  • RDBMS-based
  • SLOW
  • Deductive queries implemented inefficiently
  • Slower than GODB (SQL)
  • Benchmark results
  • analysis by Shengqiang Shu
  • http//smi.stanford.edu/projects/cbio/mwiki-intern
    al/index.php/RDF_Sesame_Demo_Benchmark

49
Considerations with RDF/OWL
  • More is needed to ensure interoperation
  • Foundry ontologies can help
  • Foundations in RDF triples
  • 3-ary predicates (ie binary relations Rxy)
  • n-ary relations must be reified or constructed
    from binary relations
  • real world instance level relations are
    time-indexed
  • Rxy forces us to use a time-slice representation
  • multiplication of entities

50
Alternative Deductive Databases
  • Provides deductive abilities without RDF
    constraints
  • Can represent n-ary relations and time
  • Various kinds, different tradeoffs
  • SQLdeduction (Datalog)
  • Prolog
  • Disjunctive databases
  • Deductive object-oriented databases (FLORA2)
  • FOL theorem provers
  • Different OBD instantiations for different
    applications?

51
OBD-Foundry
  • Fundamental tension/conflict
  • Accept a variety of instance data typed by a
    variety of ontologies
  • Ensure interoperability
  • Solution OBD Foundry
  • parallels OBO Foundry experiment
  • use driving biological projects as model of
    process
  • use less restricted representation language
  • core relations part of language
  • deductive databases

52
Future
  • Integrate OBD with BioPortal
  • Incorporate more clinical terminologies into OBO
  • Collaborations with external groups and other
    National Centers
  • Bring in more Driving Biological Projects
Write a Comment
User Comments (0)
About PowerShow.com