The Joy of Ontology - PowerPoint PPT Presentation

1 / 152
About This Presentation
Title:

The Joy of Ontology

Description:

The Joy of Ontology Suzanna Lewis SMI Colloquium April 20th, 2006 Sections Why make an ontology What is an ontology How to create an ontology Logically Technically ... – PowerPoint PPT presentation

Number of Views:153
Avg rating:3.0/5.0
Slides: 153
Provided by: Suzann141
Category:

less

Transcript and Presenter's Notes

Title: The Joy of Ontology


1
The Joy of Ontology
  • Suzanna Lewis
  • SMI Colloquium
  • April 20th, 2006

2
Sections
  • Why make an ontology
  • What is an ontology
  • How to create an ontology
  • Logically
  • Technically
  • Organizationally
  • National Center for Biomedical Ontology
  • Case study Phenotypes, our current work on OBD

3
Why make an ontology?
  • What is the motivation?

4
The Problem(s) with data
  • Inaccessibility of widely distributed data
  • Over abundance of information
  • Speed and performance
  • Interpreting the data syntactically
  • Interpreting the data semantically

5
We started the GO
  • To develop a shared language adequate for the
    annotation of molecular characteristics across
    organisms.
  • To agree on a mutual understanding of the
    definition and meaning of any word used. and thus
    to support cross-database queries.
  • To provide database access via these common terms
    to gene product annotations and associated
    sequences.

6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
Annotation of Yeast Microarray Clusters Using GO
GENE PROCESS FUNCTION CELLULAR COMPONENT
SDH1 tricarboxylic acid cycle succinate dehydrogenase mitochondrial inner membrane
NDI1 oxidative phosphorylation NADH dehydrogenase mitochondrial inner membrane
QCR7 electron transport ubiquinol--cytochrome-c reductase mitochondrial inner membrane
COX6 oxidative phosphorylation cytochrome-c oxidase mitochondrial inner membrane
RIP1 electron transport Rieske Fe-S protein mitochondrial inner membrane
COX15 oxidative phosphorylation cytochrome-c oxidase mitochondrial inner membrane
CYT1 electron transport cytochrome-c1 mitochondrial inner membrane
COR1 electron transport ubiquinol--cytochrome-c reductase mitochondrial inner membrane
SDH3 tricarboxylic acid cycle succinate dehydrogenase subunit mitochondrial inner membrane
SDH4 tricarboxylic acid cycle succinate dehydrogenase subunit mitochondrial inner membrane
SDH2 tricarboxylic acid cycle succinate dehydrogenase subunit mitochondrial inner membrane
MDH1 tricarboxylic acid cycle malate dehydrogenase mitochondrial matrix
QCR6 electron transport ubiquinol--cytochrome-c reductase mitochondrial inner membrane
CYT1 electron transport cytochrome-c1 mitochondrial inner membrane
Microarray data from Figure 2K of Eisen et al.
(1998). Cluster analysis and display of
genome-wide expression patterns, Proc. Natl.
Acad. Sci. 95 (25) 14863-14868.
10
The Challenge of Communication
11
Ontologies are essential to make sense of
biomedical data
12
A Portion of the OBO Library
13
Motivation to capture biological reality
  • Inferences and decisions we make are based upon
    what we know of the biological reality.
  • An ontology is a computable representation of
    this underlying biological reality.
  • Enables a computer to reason over the data in
    (some of) the ways that we do.

14
What is an ontology
15
Ontology (as a branch of philosophy)
  • The science of what exists in every area of
    reality
  • The classification of entities what kinds of
    things exist
  • The relations between these entities
  • Defines a scientific field's vocabulary and the
    canonical formulations of its theories.
  • Seeks to solve problems which arise in these
    domains.

16
A biological ontology is
  • A machine interpretable representation of some
    aspect of biological reality
  • what kinds of things exist?

sense organ
eye disc
is_a
develops from
  • what are the relationships between these things?

eye
part_of
ommatidium
17
Entity a definition
  • anything which exists, including things and
    processes, functions and qualities, beliefs and
    actions, software and images

18
Representation a definition
  • An image, idea, map, picture, name, description
    ... which refers to, or is intended to refer to,
    some entity or entities in reality
  • this or is intended to refer to should always
    be assumed

19
Ontologies represent types in reality
ontology
reality
20
Scientific data represent instances in reality
21
Two kinds of representational artifact
  • Databases, inventories, images represent what is
    particular in reality instances
  • Ontologies, terminologies, catalogs represent
    what is general in reality (exists in multiple
    instances) types (universals, kinds)

22
Ontologies are not for representing concepts in
peoples heads
ontology
23
The researcher has a cognitive representation of
what is general, based on his knowledge of the
science
Cognitive representation
ontology
24
types
substance
organism
animal
mammal
cat
frog
siamese
instances
25
An ontology is like a scientific text it is a
representation of types in reality
26
Atomic representational unit a definition
  • terms, icons, bar codes, alphanumeric identifiers
    ... which
  • refer, or are intended to refer, to entities in
    reality, and
  • are not built out of further sub-representations
  • Representational units are the atoms in the
    domain of representations

27
Modular representational unit a definition
  • A representation which is built out of other
    representational units, which together form a
    structure that mirrors a corresponding structure
    in reality

28
Periodic Table
The Periodic Table
29
Ontology a definition
  • A modular, representational artifact whose
    representational units are intended to represent
  • types in reality
  • the relations between these types which are true
    universally (i.e. for all instances)
  • lung is_a anatomical structure
  • lobe of lung part_of lung

30
How to create an ontology
  • Part 1
  • The logic and science

31
In computer science, there is an information
handling problem
  • Different groups of data-gatherers develop their
    own idiosyncratic terms in which they represent
    information.
  • To put this information together, methods must be
    found to resolve terminological and conceptual
    incompatibilities.
  • Again, and again, and again

32
The Reality
  • Do not assume that data integration can be
    brought about by somehow mapping incompatible,
    low quality ontologies built for different
    purposes

33
Two flavors of ontology
  1. Application ontology
  2. Reference ontology

34
Application Ontology
  • An application ontology is comparable to an
    engineering artifact such as a software tool. It
    is constructed for specific practical purposes.

35
Reference Ontology
  • A reference ontology is analogous to a scientific
    theory it seeks to optimize representational
    adequacy to its subject matter

36
Assumptions
  • There are best practices in ontology development
    which should be followed to create stable
    high-quality ontologies
  • Shared high quality ontologies foster
    cross-disciplinary and cross-domain re-use of
    data, and create larger communities

37
Why do we need rules/standards for good ontology?
  • Ontologies must be intelligible both to humans
    (for annotation) and to machines (for reasoning
    and error-checking)
  • Unintuitive rules lead to errors in
    classification
  • Simple, intuitive rules facilitate training of
    curators and annotators
  • Common rules allow alignment with other
    ontologies (and thus cross-domain exploitation of
    data)
  • Logically coherent rules enhance harvesting of
    content through automatic reasoning systems

38
Ontologies built according to common logically
coherent rules
  • Will make entry easier and yield a safer growth
    path
  • You can start small, annotating your data with
    initial fragments of a well-founded ontology,
    confident that the results will still be usable
    when the ontology grows larger and richer

39
OBO Foundry
The OBO Foundry
  • A subset of OBO ontologies whose developers
    agree in advance to accept a common set of
    principles designed to assure
  • intelligibility to biologist curators,
    annotators, users
  • formal robustness
  • stability
  • compatibility
  • interoperability
  • support for logic-based reasoning

40
The OBO Foundry
  1. The ontology is open and available to be used by
    all.
  2. The developers of the ontology agree in advance
    to collaborate with developers of other OBO
    Foundry ontology where domains overlap.
  3. The ontology is in, or can be instantiated in, a
    common formal language.
  4. The ontology possesses a unique identifier space
    within OBO.
  5. The ontology provider has procedures for
    identifying distinct successive versions.

41
The OBO Foundry
  1. The ontology has a clearly specified and clearly
    delineated content.
  2. The ontology includes textual definitions for all
    terms.
  3. The ontology is well-documented.
  4. The ontology has a plurality of independent
    users.
  5. The ontology uses relations which are
    unambiguously defined following the pattern of
    definitions laid down in the OBO Relation
    Ontology.

42
Orthogonality
Orthogonality
  • Ontology groups who choose to be part of the OBO
    Foundry thereby commit themselves to
    collaborating to resolve disagreements which
    arise where their respective domains overlap

43
agreed on relations
  • The success of ontology alignment demands that
    ontological relations (is_a, part_of, ...) have
    the same meanings in the different ontologies to
    be aligned.
  • See Relations in Biomedical Ontologies, Genome
    Biology May 2005, Barry Smith , Werner Ceusters,
    Bert Klagges, Jacob Köhler, Anand Kumar, Jane
    Lomax, Chris Mungall, Fabian Neuhaus, Alan L
    Rector, and Cornelius Rosse

44
Three fundamental dichotomies
  • continuants vs. occurrents
  • dependent vs. independent
  • types vs. instances

ONTOLOGIES ARE REPRESENTATIVES OF TYPES IN REALITY
45
For example in the GO
  • Molecules, cell components , organisms are
    independent continuants which have functions
  • Functions are dependent continuants which become
    realized through special sorts of processes we
    call functionings
  • Processes are occurrents include functionings,
    side-effects, stochastic processes

46
Continuants (aka endurants)
  • have continuous existence in time
  • preserve their identity through change
  • exist in toto whenever they exist at all

47
Occurrents (aka processes)
  • have temporal parts
  • unfold themselves in successive phases
  • exist only in their phases

48
Continuants vs. Occurrents
  • Anatomy vs. Physiology
  • Snapshot vs. Video
  • Stocks vs. Flows
  • Commodities vs. Services
  • Products vs. Processes

49
Dependent entities
  • require independent continuants as their bearers
  • There is no grin without a cat

50
Dependent vs. independent continuants
  • Independent continuants (organisms, cells,
    molecules, environments)
  • Dependent continuants (qualities, shapes, roles,
    propensities, functions)
  • E.g. the acidity of this gut

51
All occurrents are dependent entities
  • They are dependent on those independent
    continuants which are their participants (agents,
    patients, media ...)

52
GOs three ontologies
biological process
molecular function
cellular component
53
Pumping blood
To pump blood
heart
54
UBO Upper Biomedical Ontology
Continuant 3D
Occurrent 4D
Independent Continuant
Dependent Continuant
Functioning
Side-Effect, Stochastic Process, ...
Quality
Function
Spatial Region
instances (in space and time)
55
How to create an ontology
  • Part 2
  • The technical aspects

56
Separate the Database from the Ontology
  • For extensibility
  • For generality
  • For reasonability
  • For interoperability

57
Why ontologies are worth it
  • Minimize database maintenance costs
  • Communication between researchers
  • As well as
  • Better query facilities
  • Ability to draw inferences
  • Detect correlations
  • Facilitate computational interpretation of text
  • And more

58
Before domain knowledge is embedded in the db
schema
Gene table
Exon table
RNA table
Protein table
59
Embedding domain knowledge in the db schema is
expensive
  • The logical description and the physical database
    description of the biology are co-mingled
  • Therefore new biological knowledge will force
  • Schema changes e.g. new tables
  • Query changes that explicitly refer to tables
  • Middleware changes to retrieve and format
  • GUI changes to display

60
After domain knowledge is embedded in the
ontology
feature table
61
Ontology driven db schema is less expensive to
maintain
  • The logical description and the physical database
    description of the biology are developed
    independently
  • Therefore new biological knowledge will only
    require
  • Ontology changes e.g. new terms
  • GUI changes display
  • No schema changes
  • No query changes
  • No middleware changes

62
Reality ultimately this is what the ontology
must reflect
63
Step 1 Build an ontology that reflects reality
64
Observations and experimentation
Step 2 Data capture
Database UIDs serving as proxies for instances
  • Patient IDs
  • Sequence accessions
  • Genetic strain IDs

65
Step 1 Build an ontology that reflects reality
Step 2 Data capture
Step 3 Classify data using the ontology
Database UIDs serving as proxies for instances
66
Ontology is a contract between researchers
  • A common language that allows us to share
    knowledge
  • Researchers have a stake in it
  • Every individual will benefit by being able to
    accurately interpret someone elses data
  • No more pre-scrubbing
  • No more time spent translating

67
Rules on types
  • Dont confuse types with instances
  • Dont confuse instances with leaf nodes
  • Dont confuse types with ideas
  • Dont confuse types with ways of getting to know
    types
  • Dont confuse types with ways of talking about
    types
  • Dont confuse types with data about types

68
An astronomy ontology should not include 'Buzz
Aldrin'
69
Rules on terms
  • Terms should be in the singular
  • Avoid abbreviations even when it is clear in
    context what they mean (breast for breast
    tumor)
  • Think of each term A in an ontology is
    shorthand for a term of the form
  • the type A

70
Rules on Definitions
  • The terms used in a definition should be simpler
    (more intelligible) than the term to be defined
    otherwise the definition provides no assistance
  • Definitions should be intelligible to both
    machines and humans
  • to human understanding
  • Humans need clarity and modularity
  • to machine processing
  • Machines can cope with the full formal
    representation

71
Confusing non-useful definitions
  • Swimming
  • Swimming is healthy and has 8 letters
  • Poland
  • The name of Poland

72
When defining terms use Aristotelian definitions
  • The definition of A takes the form
  • an A def. is a B which ...
  • where B is As parent in the hierarchy
  • A human being def. an animal which is rational
  • A helicase def. an enzyme which catalyzes the
    hydrolysis of ATP to unwind the DNA helix

73
Use of Aristotelian definitions
  • Makes defining terms easier
  • Each definition encapsulates in modular form the
    entire parentage of the defined term
  • The entire information content of the FMAs term
    hierarchy and definitions can be translated very
    cleanly into a computer representation
  • Now accepted by GO

74
Summary How to build your ontology
  • Keep It Simple
  • lowest possible barrier to entry
  • Technology independence
  • Clarity of definitions and scope
  • With new data, we change our minds
  • An ontology must adapt to reflect current
    understanding of reality
  • Mechanisms to support change

75
How to build an ontology
  • Part 3
  • The sociology and organizational aspects

76
Elements for Success GO
  • A Community with a common vision
  • A pool of talented and motivated
    developers/scientists
  • A mix of academic and commercial
  • An organized, light weight approach to product
    development
  • A leadership structure
  • Communication
  • A well-defined scope, (our business)

Adopted from Open Source Menu for Success
77
Gene Ontology
  • Community annotation production
  • Extreme Programming techniques to distributed
    ontology generation
  • Revision control
  • Nightly conflict resolution
  • Users are integral to the team
  • Rapid iterations

78
Why
Survey
Domain covered?
Public?
Community?
Active?
Salvage
Develop
Applied?
Improve
yes
no
Collaborate Learn
79
Is your domain covered?Due diligence
background research
  • Step 1 Learn what is out there
  • The most comprehensive list is on the OBO site.
    http//obo.sourceforge.net
  • Assess ontologies critically and realistically.

80
It is privately held?Ontologies must be shared
  • Communities form scientific theories
  • that seek to explain all of the existing evidence
  • and can be used for prediction
  • These communities are all directed to the same
    biological reality, but have their own
    perspective
  • The computable representation must be shared
  • Ontology development is inherently collaborative
  • Open ontologies become connected to instance data
    this feeds back on ontology development

81
Is it active? Pragmatic assessment of an ontology
  • Is there access to help, e.g.
  • help-me_at_weird.ontology.net ?
  • Does a warm body answer help mail within a
    reasonable timesay 2 working days ?

82
Is it applied?Toy ontologies are not useful
  • Every ontology improves when it is applied to
    actual instances of data
  • It improves even more when these data are used to
    answer research questions
  • There will be fewer problems in the ontology and
    more commitment to fixing remaining problems when
    important research data is involved that
    scientists depend upon
  • Be very wary of ontologies that have never been
    applied

83
Work with that community
  • To improve (if you found one)
  • To develop (if you did not)
  • How?

Improve
Collaborate and Learn
84
Community vs. Committee ?
  • Many people have (understandably) reservations
    about collaborative development, because it can
    easily be confused with design-by-committee
    projects.
  • Members of a committee represent themselves.
  • Members of a community represent their community.

85
Design for purpose - not in abstract
  • Who will use it?
  • If no one is interested, then go back to bed
  • What will they use it for?
  • Define the domain
  • Who will maintain it?
  • Be pragmatic and modest

86
Start with a concrete proposal not a blank slate.
  • But do not commit your ego to it.
  • Distribute to a small group you respect
  • With a shared commitment.
  • With broad domain knowledge.
  • Who will engage in vigorous debate without
    engaging their egos (or, at least not too much).
  • Who will do concrete work.

87
Step 1
  • Alpha0 the first proposal - broad in breadth but
    shallow in depth. By one person with broad domain
    knowledge.
  • Distribute to a small group (lt6).
  • Get together for two days and engage in vigorous
    discussion. Be open and frank. Argue, but do not
    be dogmatic.
  • Reiterate over a period of months. Do as much as
    possible face-to-face, rather than by
    phone/email. Meet for 2 days every 3 months or so.

88
Step 2
  • Distribute Alpha1 to your group.
  • All now test this Alpha1 in real life
  • By classifying representations of instances with
    these types
  • Do not worry that (at this stage) you if do not
    have tools.

89
Step 3
  • Reconvene as a group for two days.
  • Share experiences from implementation
  • Can your Alpha1 be implemented in a useful way ?
  • What are the conceptual problems ?
  • What are the structural problems ?

90
Step 4
  • Establish a mechanism for change.
  • Use CVS or Subversion.
  • Limit the number of editors with write permission
    (ideally to one person).
  • Unique stable identifiers
  • History tracking of changes
  • Release a Beta1.
  • Seriously implement Beta1 in real life.
  • Build the ontology in depth.

91
Step 5
  • After about 6 months reconvene and evaluate.
  • Is the ontology suited to its purpose ?
  • Is it, in practice, usable ?
  • Are we happy about its broad structure and
    content ?

92
Step 6
  • Go public.
  • Release ontology to community.
  • Release the products of the annotations.
  • Invite broad community input and establish a
    mechanism for this (e.g. SourceForge).

93
Step 7
  • Proselytize
  • Publish in a high profile journal
  • Engage new user groups
  • Emphasize openness
  • Write a grant

94
Step 8
  • Iterate
  • Improvements come in two forms
  • It is impossible to get it right the 1st (or 2nd,
    or 3rd, ) time.
  • What we know about reality is continually growing

95
Step 9
  • Bon appetit

96
Use the power of combination and collaboration
  • as far as possible dont reinvent
  • ontologies are like telephones they are valuable
    only to the degree that they are used and
    networked with other ontologies
  • but choose working telephones
  • most telephones were broken when the technology
    was first being developed

97
The National Center for Biomedical Ontology
  • Stanford Berkeley Mayo Victoria Buffalo
    UCSF Oregon Cambridge

98
Bioinformatics and Computational Biology
99
National Centers for Biomedical Computing2005
  1. National Center for Integrative Biomedical
    Informatics (Michigan)
  2. National Center for Multi-Scale Study of Cellular
    Networks (Columbia)
  3. National Center for Biomedical Ontology

100
  • Stanford Tools for ontology alignment,
    indexing, and management (Cores 1, 47 Mark
    Musen)
  • LawrenceBerkeley Labs Tools to use ontologies
    for data annotation (Cores 2, 57 Suzanna Lewis)
  • Mayo Clinic Tools for access to large
    controlled terminologies (Core 1 Chris Chute)
  • Victoria Tools for ontology and data
    visualization (Cores 1 and 2 Margaret-Anne
    Story)
  • University at Buffalo Dissemination of best
    practices for ontology engineering (Core 6 Barry
    Smith)

101
Driving Biological Projects
  • Trial Bank UCSF, Ida Sim
  • Flybase Cambridge, Michael Ashburner
  • ZFIN Oregon, Monte Westerfield

102
Case study Phenotypes, our current work on OBD
103
Animal disease models
Animal models
Mutant Gene Mutant or missing
ProteinMutant Phenotype
104
Animal disease models
Humans
Animal models
Mutant Gene Mutant or missing
ProteinMutant Phenotype (disease model)
Mutant Gene Mutant or missing
ProteinMutant Phenotype (disease)
105
Animal disease models
Humans
Animal models
Mutant Gene Mutant or missing
ProteinMutant Phenotype (disease model)
Mutant Gene Mutant or missing
ProteinMutant Phenotype (disease)
106
Animal disease models
Humans
Animal models
Mutant Gene Mutant or missing
ProteinMutant Phenotype (disease model)
Mutant Gene Mutant or missing
ProteinMutant Phenotype (disease)
107
SHH-/
SHH-/-
shh-/
shh-/-
108
Phenotype (clinical sign) entity
quality
109
Phenotype (clinical sign) entity
quality P1 eye hypoteloric
110
Phenotype (clinical sign) entity
quality P1 eye hypoteloric P2
midface hypoplastic
111
Phenotype (clinical sign) entity
quality P1 eye hypoteloric P2
midface hypoplastic P3 kidney
hypertrophied
112
Phenotype (clinical sign) entity
quality P1 eye hypoteloric P2
midface hypoplastic P3 kidney
hypertrophied
PATO hypoteloric hypoplastic
hypertrophied
ZFIN eye midface kidney

113
Phenotype (clinical sign) entity
quality
Anatomical ontology Cell tissue ontology
Developmental ontology Gene ontology
biological process cellular component
PATO (phenotype and trait ontology)
114
Phenotype (clinical sign) entity
quality P1 eye hypoteloric P2
midface hypoplastic P3 kidney
hypertrophied
Syndrome P1 P2 P3 (disease)
holoprosencephaly
115
Human holo- prosencephaly
Zebrafish shh
Zebrafish oep
116
OMIM gene ZFIN gene FlyBase gene FlyBase mut pub ZFIN mut pub mouse rat SNOMED OMIM disease
LAMB1 lamb1 LanB1 5 15 39 -
FECH fech Ferro- chelatase 2 5 2 29 Protoporphyria, Erythropoietic
GLI2 gli2a ci 388 41 22 -
SLC4A1 slc4a1 CG8177 7 7 19 Renal Tubular Acidosis, RTADR
MYO7A myo7a ck 84 5 9 3 16 Deafness DFNB2 DFNA11
ALAS2 alas2 Alas 1 7 14 Anemia, Sideroblastic, X-Linked
KCNH2 kcnh2 sei 27 3 12 -
MYH6 myh6 Mhc 166 3 1 12 Cardiomyopathy, Familial Hypertrophic CMH
TP53 tp53 p53 64 3 3 19 11 Breast Cancer
ATP2A1 atp2a1 Ca-P60A 32 6 1 11 Brody Myopathy
EYA1 eya1 eya 251 5 4 6 Branchiootorenal Dysplasia
SOX10 sox10 Sox100B 1 17 4 4 Waardenburg-Shah Syndrome
117
National Center for Biomedical Ontology
Capture and index experimental results
Open Biomedical Ontologies (OBO)
Open Biomedical Data (OBD)
BioPortal
Revise biomedicalunderstanding
Relate experimental data to results from other
sources
118
Phenotype as an observation
context
The class of thing observed
evidence
publication
environment
figure
assay
genetic
sequence ID
ontology
119
Review of proposed EAV EQ model
  • A phenotype is described using an Entity-Quality
    double
  • Entities are drawn from various OBO
    ontologiescell, anatomies, GO,
  • Qualities are drawn from one ontologyPATO

120
Separation of concerns
  • Not phenotypes
  • Genotype
  • Environment
  • Assay, measurement systems
  • Images

schema
Association Genotype Phenotype Environment
Assay Phenotype Entity Quality Entity
OBOClassID Quality PATOClassID
121
2003 Pilot study
  • Trial of EAV model on small collection of
    genotypes
  • FlyBase
  • ZFIN
  • Genes were non-orthologous
  • New curations - in progress
  • orthologous genes with clinical relevance
  • Use the same data model and exchange format?

122
Example data records
Genotype Entity Value
npo gut dysplastic
gut small
r210 retina irregular
brain fused
123
ZFIN schema extension stages
Genotype Stage Entity Quality
npo HatchingPec-fin gut dysplastic
HatchingPec-fin gut small
r210 HatchingLong-pec retina irregular
LarvalProtruding-mouth brain fused
124
Stages
Association Genotype Phenotype Environment
Assay Phenotype Stage Entity Quality Entity
OBOClassID Stage OBOAnatomicalStageClassID Quali
ty PATOClassID
means zero or more
125
Monadic and relational qualities
  • Monadic
  • the quality inheres in a single entity
  • Relational
  • the quality inheres in two or more entities
  • sensitivity of an organism to a kind of drug
  • sensitivity of an eye to a wavelength of light
  • can turn relational qualities into cross-product
    monadic qualities
  • e.g. sensitivityToRedLight
  • better to use relational qualities
  • avoids redundancy with existing ontologies

126
Incorporating relational qualities
Association Genotype Phenotype Environment
Assay Phenotype Stage Entity Quality
Entity Entity OBOClassID Quality
PATOVersion2ClassID
Example data record
Phenotype organism sensitiveTo puromycin
127
Measurable qualities
  • Some qualities are inexact and implicitly
    relative to a wild-type or normal quality
  • relatively short, relatively long, relatively
    reduced
  • easier than explicitly representing
  • this tail length shorter-than mouse wild-type
    tail length
  • Some qualities are determinable
  • use a measure function
  • unit, value, time
  • this tail has length L
  • measure(L, cm) 2
  • Keep measurements separate from (but linked to)
    quality ontology

128
Incorporating measurements
Association Genotype Phenotype Environment
Assay Phenotype Stage Entity Quality Entity
Measurement Measurement Unit Value
(Time) Entity OBOClassID Quality
PATOVersion2ClassID
Example data record
Phenotype gut acidic Measurement pH 5
129
The Methodology of Annotations
  • Scientific curators use experimental observations
    reported in the biomedical literature to link
    gene products with GO terms in annotations.
  • The gene annotations taken together yield a
    slowly growing computer-interpretable map of
    biological reality,
  • The process of annotating literature also leads
    to improvements and extensions of the ontology
    itself, which institutes a virtuous cycle of
    improvement in the quality and reach of future
    annotations and of future versions of the
    ontology.

130
When we annotate the record of an experiment
  • we use terms representing types to capture what
    we learn about the instances
  • this experiment as a whole (a process)
  • these instances experimented upon
  • the instances are typical
  • they are representatives of a type

131
Ontology
  • A thing of beauty is a joy forever
  • With acknowledgement and thanks to Mark Musen,
    Barry Smith, Sima Misra, Chris Mungall, Daniel
    Rubin, and David Hill

132
Interpretation of the schema
  • How should EAV data records be interpreted by a
    computer?
  • What are the instances?
  • Is the EAV schema just to improve database
    searching?
  • Can it be used for meaningful cross-species
    comparisons?

133
What is the Entity slot used for?
Genotype Entity Quality
npo gut dysplastic
gut small
r210 retina irregular
brain fused
tm84 d/v pattern formation abnormal
blood islands number increased
Bsb2 elongation of arista literal arrested
C-alpha1D adult behaviour uncoordinated
2003 trial data FB ZFIN
134
What is the Entity slot used for?
  • In practical terms
  • An ID from one of the following ontologies
  • GO CC, BP, and MF
  • Species-specific anatomical ontology
  • OBO Cell
  • Or a cross-product
  • e.g. acidification GO0045851
  • which has_locationOBOREL midgut FBbt00005383
  • example from FBal0062296 Acidification in the
    midgut of homozygous larvae is often less than in
    wild-type larvae
  • But what does it mean in the context of an
    annotation?

135
Universals and particulars
  • An ontology consist of universals (classes)
  • Fruitfly, wing, flight
  • Experimental data generally concerns particulars
    (instances) that instantiate universals
  • this particular wing of this particular fruitfly
  • this particular fruitfly participating in this
    particular flight from here to there
  • In annotation we often use a class ID as a proxy
    for an (unnamed) instance (or collection of
    instances)
  • It is important to always keep this distinction
    in mind

136
What is the Quality slot used for?
Genotype Entity Quality Quality
npo gut structure dysplastic
gut relative size small
r210 retina pattern irregular
brain structure fused
tm84 d/v pattern formation qualitative abnormal
blood islands relative number number increased
Bsb2 elongation of arista literal process arrested
C-alpha1D adult behaviour behavioral activity uncoordinated
2003 trial data FB ZFIN
137
Qualities
  • Treat as Qualities
  • a dependent entity
  • a quality must have independent entity(s) as
    bearer
  • the quality inheres_in the bearer
  • Examples
  • The particular shape of this ball
  • The particular structure of this wing
  • The particular length of this tail
  • The particular rate of synaptic transmission
    between these two neurons

138
Attribute universals vs Attribute particulars
  • In an EAV annotation, a PATO class ID typically
    serves as a proxy for an unnamed attribute
    instance
  • Universals (classes) must always be defined in
    terms of their instances

A Formal Theory of Substances, Qualities, and
Universals Fabian NEUHAUS Pierre GRENON Barry
SMITH
139
How is the attribute slot used?
Genotype Entity Attribute Attribute
npo gut structure dysplastic
gut relative size small
r210 retina pattern irregular
brain structure fused
tm84 d/v pattern formation qualitative abnormal
blood islands relative number number increased
Bsb2 elongation of arista literal process arrested
C-alpha1D adult behaviour behavioral activity uncoordinated
140
Current Model
Association Genotype Phenotype Environment
Assay Phenotype Stage Entity Attribute Entity
OBOClassID Stage OBOAnatomicalStageClassID Att
ribute PATOClassID
means zero or more
141
Composite phenotype classes
  • Mammalian phenotype has composite phenotype
    classes
  • e.g. reduced B cell number
  • Compose at annotation time or ontology curation
    time?
  • False dichotomy
  • Core 2 will help map between composite class
    based annotation and EAV annotation

142
Interpreting annotations
  • Annotations are data records
  • typically use class IDs
  • implicitly refer to instances
  • How do we map an annotation to instances?
  • Important for using annotations computationally

143
Interpreting annotations (1)
  • What does an EA (or EAV) annotation mean?
  • Annotation
  • GenotypeFBal00123 Ebrain Afused
  • presumed implied meaning
  • this organism
  • has_part x, where
  • x instance_of brain
  • x has_quality fused
  • or in natural language
  • this organism has a fused brain
  • Various built-in assumptions

144
(No Transcript)
145
Interpreting annotations (II)
  • What does this mean
  • annotation
  • GenotypeFBal00123 Ewing Aabsent
  • using same mapping as annotation I
  • fly98 has_part x, where
  • x instance_of wing
  • x has_quality absent
  • or in natural language
  • this fly has a wing which is not there!
  • What we really intend
  • NOT(this organism has_part x, where x instance_of
    wing)

146
Interpreting annotations (II)
  • What does this mean
  • annotation
  • GenotypeFBal00123 Ewing Aabsent
  • using same mapping as annotation I
  • this organism has_part x, where
  • x instance_of wing
  • x has_quality absent
  • or in natural language
  • this fly has a wing which is not there!
  • What we really intend
  • this organism has_quality wingless
  • wingless the property of having
    count(has_part wing)0

147
Are our computational representations intended to
capture reality?
148
Does this matter?
  • If we simply use the colloquial expression
    absent
  • What are the consequences?
  • Basic search will be fine
  • e.g. find all wing phenotypes
  • Logical reasoners will compute incorrect results
  • Computers will not be able to reason correctly
  • We must explicitly provide specific rules for
    certain attributes such as absent

149
Interpreting annotations (III)
  • What does this mean
  • annotation
  • Edigit Asupernumery
  • using same interpretation as annotation I
  • this organism has_part x, where
  • x instance_of digit
  • x has_quality supernumery
  • or in natural language
  • this organism has a particular finger which is
    supernumery!!
  • What we really intend
  • this person has_quality supernumery finger
  • supernumery finger the property of having
    count(has_part digit) gt wild-type

150
Interpreting annotations (IV)
  • What does this mean
  • annotation
  • Gtmp001 Ebrown fat cell Aincreased
    quantity
  • using same mapping as annotation I
  • this organism has_part x, where
  • x instance_of brown fat cell
  • x has_quality increased quantity
  • or in natural language
  • this organism has a particular brown fat cell
    which is increased in quantity
  • What we really intend
  • this organism has_part population_of(brown fat
    cell) which has_quality increased size

151
Other use cases
  • Spermatocyte devoid of asters
  • Homeotic transformations
  • Increased distance between wing veins
  • Some vs. all

152
Alternate perspectives
  • process vs. state
  • regulatory processes
  • acidification of midgut has_quality reduced rate
  • midgut has_quality low acidity
  • development vs. behavior
  • wing development has_quality abnormal
  • flight has_quality intermittent
  • granularity (scale)
  • chemical vs. molecular vs. cell vs. tissue vs.
    anatomical part
Write a Comment
User Comments (0)
About PowerShow.com