Title: The Joy of Ontology
1The Joy of Ontology
- Suzanna Lewis
- SMI Colloquium
- April 20th, 2006
2Sections
- Why make an ontology
- What is an ontology
- How to create an ontology
- Logically
- Technically
- Organizationally
- National Center for Biomedical Ontology
- Case study Phenotypes, our current work on OBD
3Why make an ontology?
4The Problem(s) with data
- Inaccessibility of widely distributed data
- Over abundance of information
- Speed and performance
- Interpreting the data syntactically
- Interpreting the data semantically
5We started the GO
- To develop a shared language adequate for the
annotation of molecular characteristics across
organisms. - To agree on a mutual understanding of the
definition and meaning of any word used. and thus
to support cross-database queries. - To provide database access via these common terms
to gene product annotations and associated
sequences.
6(No Transcript)
7(No Transcript)
8(No Transcript)
9Annotation of Yeast Microarray Clusters Using GO
GENE PROCESS FUNCTION CELLULAR COMPONENT
SDH1 tricarboxylic acid cycle succinate dehydrogenase mitochondrial inner membrane
NDI1 oxidative phosphorylation NADH dehydrogenase mitochondrial inner membrane
QCR7 electron transport ubiquinol--cytochrome-c reductase mitochondrial inner membrane
COX6 oxidative phosphorylation cytochrome-c oxidase mitochondrial inner membrane
RIP1 electron transport Rieske Fe-S protein mitochondrial inner membrane
COX15 oxidative phosphorylation cytochrome-c oxidase mitochondrial inner membrane
CYT1 electron transport cytochrome-c1 mitochondrial inner membrane
COR1 electron transport ubiquinol--cytochrome-c reductase mitochondrial inner membrane
SDH3 tricarboxylic acid cycle succinate dehydrogenase subunit mitochondrial inner membrane
SDH4 tricarboxylic acid cycle succinate dehydrogenase subunit mitochondrial inner membrane
SDH2 tricarboxylic acid cycle succinate dehydrogenase subunit mitochondrial inner membrane
MDH1 tricarboxylic acid cycle malate dehydrogenase mitochondrial matrix
QCR6 electron transport ubiquinol--cytochrome-c reductase mitochondrial inner membrane
CYT1 electron transport cytochrome-c1 mitochondrial inner membrane
Microarray data from Figure 2K of Eisen et al.
(1998). Cluster analysis and display of
genome-wide expression patterns, Proc. Natl.
Acad. Sci. 95 (25) 14863-14868.
10The Challenge of Communication
11Ontologies are essential to make sense of
biomedical data
12A Portion of the OBO Library
13Motivation to capture biological reality
- Inferences and decisions we make are based upon
what we know of the biological reality. - An ontology is a computable representation of
this underlying biological reality. - Enables a computer to reason over the data in
(some of) the ways that we do.
14What is an ontology
15Ontology (as a branch of philosophy)
- The science of what exists in every area of
reality - The classification of entities what kinds of
things exist - The relations between these entities
- Defines a scientific field's vocabulary and the
canonical formulations of its theories. - Seeks to solve problems which arise in these
domains.
16A biological ontology is
- A machine interpretable representation of some
aspect of biological reality
- what kinds of things exist?
sense organ
eye disc
is_a
develops from
- what are the relationships between these things?
eye
part_of
ommatidium
17Entity a definition
- anything which exists, including things and
processes, functions and qualities, beliefs and
actions, software and images
18Representation a definition
- An image, idea, map, picture, name, description
... which refers to, or is intended to refer to,
some entity or entities in reality - this or is intended to refer to should always
be assumed
19Ontologies represent types in reality
ontology
reality
20Scientific data represent instances in reality
21Two kinds of representational artifact
- Databases, inventories, images represent what is
particular in reality instances - Ontologies, terminologies, catalogs represent
what is general in reality (exists in multiple
instances) types (universals, kinds)
22Ontologies are not for representing concepts in
peoples heads
ontology
23The researcher has a cognitive representation of
what is general, based on his knowledge of the
science
Cognitive representation
ontology
24types
substance
organism
animal
mammal
cat
frog
siamese
instances
25An ontology is like a scientific text it is a
representation of types in reality
26Atomic representational unit a definition
- terms, icons, bar codes, alphanumeric identifiers
... which - refer, or are intended to refer, to entities in
reality, and - are not built out of further sub-representations
- Representational units are the atoms in the
domain of representations
27Modular representational unit a definition
- A representation which is built out of other
representational units, which together form a
structure that mirrors a corresponding structure
in reality
28Periodic Table
The Periodic Table
29Ontology a definition
- A modular, representational artifact whose
representational units are intended to represent - types in reality
- the relations between these types which are true
universally (i.e. for all instances) - lung is_a anatomical structure
- lobe of lung part_of lung
30How to create an ontology
- Part 1
- The logic and science
31In computer science, there is an information
handling problem
- Different groups of data-gatherers develop their
own idiosyncratic terms in which they represent
information. - To put this information together, methods must be
found to resolve terminological and conceptual
incompatibilities. - Again, and again, and again
32The Reality
- Do not assume that data integration can be
brought about by somehow mapping incompatible,
low quality ontologies built for different
purposes
33Two flavors of ontology
- Application ontology
- Reference ontology
34Application Ontology
- An application ontology is comparable to an
engineering artifact such as a software tool. It
is constructed for specific practical purposes.
35Reference Ontology
- A reference ontology is analogous to a scientific
theory it seeks to optimize representational
adequacy to its subject matter
36Assumptions
- There are best practices in ontology development
which should be followed to create stable
high-quality ontologies - Shared high quality ontologies foster
cross-disciplinary and cross-domain re-use of
data, and create larger communities
37Why do we need rules/standards for good ontology?
- Ontologies must be intelligible both to humans
(for annotation) and to machines (for reasoning
and error-checking) - Unintuitive rules lead to errors in
classification - Simple, intuitive rules facilitate training of
curators and annotators - Common rules allow alignment with other
ontologies (and thus cross-domain exploitation of
data) - Logically coherent rules enhance harvesting of
content through automatic reasoning systems
38Ontologies built according to common logically
coherent rules
- Will make entry easier and yield a safer growth
path - You can start small, annotating your data with
initial fragments of a well-founded ontology,
confident that the results will still be usable
when the ontology grows larger and richer
39OBO Foundry
The OBO Foundry
- A subset of OBO ontologies whose developers
agree in advance to accept a common set of
principles designed to assure - intelligibility to biologist curators,
annotators, users - formal robustness
- stability
- compatibility
- interoperability
- support for logic-based reasoning
40The OBO Foundry
- The ontology is open and available to be used by
all. - The developers of the ontology agree in advance
to collaborate with developers of other OBO
Foundry ontology where domains overlap. - The ontology is in, or can be instantiated in, a
common formal language. - The ontology possesses a unique identifier space
within OBO. - The ontology provider has procedures for
identifying distinct successive versions.
41The OBO Foundry
- The ontology has a clearly specified and clearly
delineated content. - The ontology includes textual definitions for all
terms. - The ontology is well-documented.
- The ontology has a plurality of independent
users. - The ontology uses relations which are
unambiguously defined following the pattern of
definitions laid down in the OBO Relation
Ontology.
42Orthogonality
Orthogonality
- Ontology groups who choose to be part of the OBO
Foundry thereby commit themselves to
collaborating to resolve disagreements which
arise where their respective domains overlap
43agreed on relations
- The success of ontology alignment demands that
ontological relations (is_a, part_of, ...) have
the same meanings in the different ontologies to
be aligned. - See Relations in Biomedical Ontologies, Genome
Biology May 2005, Barry Smith , Werner Ceusters,
Bert Klagges, Jacob Köhler, Anand Kumar, Jane
Lomax, Chris Mungall, Fabian Neuhaus, Alan L
Rector, and Cornelius Rosse
44Three fundamental dichotomies
- continuants vs. occurrents
- dependent vs. independent
- types vs. instances
ONTOLOGIES ARE REPRESENTATIVES OF TYPES IN REALITY
45For example in the GO
- Molecules, cell components , organisms are
independent continuants which have functions - Functions are dependent continuants which become
realized through special sorts of processes we
call functionings - Processes are occurrents include functionings,
side-effects, stochastic processes
46Continuants (aka endurants)
- have continuous existence in time
- preserve their identity through change
- exist in toto whenever they exist at all
47Occurrents (aka processes)
- have temporal parts
- unfold themselves in successive phases
- exist only in their phases
48Continuants vs. Occurrents
- Anatomy vs. Physiology
- Snapshot vs. Video
- Stocks vs. Flows
- Commodities vs. Services
- Products vs. Processes
49Dependent entities
- require independent continuants as their bearers
- There is no grin without a cat
50Dependent vs. independent continuants
- Independent continuants (organisms, cells,
molecules, environments) - Dependent continuants (qualities, shapes, roles,
propensities, functions) - E.g. the acidity of this gut
51All occurrents are dependent entities
- They are dependent on those independent
continuants which are their participants (agents,
patients, media ...)
52GOs three ontologies
biological process
molecular function
cellular component
53Pumping blood
To pump blood
heart
54UBO Upper Biomedical Ontology
Continuant 3D
Occurrent 4D
Independent Continuant
Dependent Continuant
Functioning
Side-Effect, Stochastic Process, ...
Quality
Function
Spatial Region
instances (in space and time)
55How to create an ontology
- Part 2
- The technical aspects
56Separate the Database from the Ontology
- For extensibility
- For generality
- For reasonability
- For interoperability
57Why ontologies are worth it
- Minimize database maintenance costs
- Communication between researchers
- As well as
- Better query facilities
- Ability to draw inferences
- Detect correlations
- Facilitate computational interpretation of text
- And more
58Before domain knowledge is embedded in the db
schema
Gene table
Exon table
RNA table
Protein table
59Embedding domain knowledge in the db schema is
expensive
- The logical description and the physical database
description of the biology are co-mingled - Therefore new biological knowledge will force
- Schema changes e.g. new tables
- Query changes that explicitly refer to tables
- Middleware changes to retrieve and format
- GUI changes to display
60After domain knowledge is embedded in the
ontology
feature table
61Ontology driven db schema is less expensive to
maintain
- The logical description and the physical database
description of the biology are developed
independently - Therefore new biological knowledge will only
require - Ontology changes e.g. new terms
- GUI changes display
- No schema changes
- No query changes
- No middleware changes
62Reality ultimately this is what the ontology
must reflect
63Step 1 Build an ontology that reflects reality
64Observations and experimentation
Step 2 Data capture
Database UIDs serving as proxies for instances
- Patient IDs
- Sequence accessions
- Genetic strain IDs
65Step 1 Build an ontology that reflects reality
Step 2 Data capture
Step 3 Classify data using the ontology
Database UIDs serving as proxies for instances
66Ontology is a contract between researchers
- A common language that allows us to share
knowledge - Researchers have a stake in it
- Every individual will benefit by being able to
accurately interpret someone elses data - No more pre-scrubbing
- No more time spent translating
67Rules on types
- Dont confuse types with instances
- Dont confuse instances with leaf nodes
- Dont confuse types with ideas
- Dont confuse types with ways of getting to know
types - Dont confuse types with ways of talking about
types - Dont confuse types with data about types
68An astronomy ontology should not include 'Buzz
Aldrin'
69Rules on terms
- Terms should be in the singular
- Avoid abbreviations even when it is clear in
context what they mean (breast for breast
tumor) - Think of each term A in an ontology is
shorthand for a term of the form - the type A
70Rules on Definitions
- The terms used in a definition should be simpler
(more intelligible) than the term to be defined
otherwise the definition provides no assistance - Definitions should be intelligible to both
machines and humans - to human understanding
- Humans need clarity and modularity
- to machine processing
- Machines can cope with the full formal
representation
71Confusing non-useful definitions
- Swimming
- Swimming is healthy and has 8 letters
- Poland
- The name of Poland
72When defining terms use Aristotelian definitions
- The definition of A takes the form
- an A def. is a B which ...
- where B is As parent in the hierarchy
- A human being def. an animal which is rational
- A helicase def. an enzyme which catalyzes the
hydrolysis of ATP to unwind the DNA helix
73Use of Aristotelian definitions
- Makes defining terms easier
- Each definition encapsulates in modular form the
entire parentage of the defined term - The entire information content of the FMAs term
hierarchy and definitions can be translated very
cleanly into a computer representation - Now accepted by GO
74Summary How to build your ontology
- Keep It Simple
- lowest possible barrier to entry
- Technology independence
- Clarity of definitions and scope
- With new data, we change our minds
- An ontology must adapt to reflect current
understanding of reality - Mechanisms to support change
75How to build an ontology
- Part 3
- The sociology and organizational aspects
76Elements for Success GO
- A Community with a common vision
- A pool of talented and motivated
developers/scientists - A mix of academic and commercial
- An organized, light weight approach to product
development - A leadership structure
- Communication
- A well-defined scope, (our business)
Adopted from Open Source Menu for Success
77Gene Ontology
- Community annotation production
- Extreme Programming techniques to distributed
ontology generation - Revision control
- Nightly conflict resolution
- Users are integral to the team
- Rapid iterations
78Why
Survey
Domain covered?
Public?
Community?
Active?
Salvage
Develop
Applied?
Improve
yes
no
Collaborate Learn
79Is your domain covered?Due diligence
background research
- Step 1 Learn what is out there
- The most comprehensive list is on the OBO site.
http//obo.sourceforge.net - Assess ontologies critically and realistically.
80It is privately held?Ontologies must be shared
- Communities form scientific theories
- that seek to explain all of the existing evidence
- and can be used for prediction
- These communities are all directed to the same
biological reality, but have their own
perspective - The computable representation must be shared
- Ontology development is inherently collaborative
- Open ontologies become connected to instance data
this feeds back on ontology development
81Is it active? Pragmatic assessment of an ontology
- Is there access to help, e.g.
- help-me_at_weird.ontology.net ?
- Does a warm body answer help mail within a
reasonable timesay 2 working days ?
82Is it applied?Toy ontologies are not useful
- Every ontology improves when it is applied to
actual instances of data - It improves even more when these data are used to
answer research questions - There will be fewer problems in the ontology and
more commitment to fixing remaining problems when
important research data is involved that
scientists depend upon - Be very wary of ontologies that have never been
applied
83Work with that community
- To improve (if you found one)
- To develop (if you did not)
- How?
Improve
Collaborate and Learn
84Community vs. Committee ?
- Many people have (understandably) reservations
about collaborative development, because it can
easily be confused with design-by-committee
projects. - Members of a committee represent themselves.
- Members of a community represent their community.
85Design for purpose - not in abstract
- Who will use it?
- If no one is interested, then go back to bed
- What will they use it for?
- Define the domain
- Who will maintain it?
- Be pragmatic and modest
86Start with a concrete proposal not a blank slate.
- But do not commit your ego to it.
- Distribute to a small group you respect
- With a shared commitment.
- With broad domain knowledge.
- Who will engage in vigorous debate without
engaging their egos (or, at least not too much). - Who will do concrete work.
87Step 1
- Alpha0 the first proposal - broad in breadth but
shallow in depth. By one person with broad domain
knowledge. - Distribute to a small group (lt6).
- Get together for two days and engage in vigorous
discussion. Be open and frank. Argue, but do not
be dogmatic. - Reiterate over a period of months. Do as much as
possible face-to-face, rather than by
phone/email. Meet for 2 days every 3 months or so.
88Step 2
- Distribute Alpha1 to your group.
- All now test this Alpha1 in real life
- By classifying representations of instances with
these types - Do not worry that (at this stage) you if do not
have tools.
89Step 3
- Reconvene as a group for two days.
- Share experiences from implementation
- Can your Alpha1 be implemented in a useful way ?
- What are the conceptual problems ?
- What are the structural problems ?
90Step 4
- Establish a mechanism for change.
- Use CVS or Subversion.
- Limit the number of editors with write permission
(ideally to one person). - Unique stable identifiers
- History tracking of changes
- Release a Beta1.
- Seriously implement Beta1 in real life.
- Build the ontology in depth.
91Step 5
- After about 6 months reconvene and evaluate.
- Is the ontology suited to its purpose ?
- Is it, in practice, usable ?
- Are we happy about its broad structure and
content ?
92Step 6
- Go public.
- Release ontology to community.
- Release the products of the annotations.
- Invite broad community input and establish a
mechanism for this (e.g. SourceForge).
93Step 7
- Proselytize
- Publish in a high profile journal
- Engage new user groups
- Emphasize openness
- Write a grant
94Step 8
- Iterate
- Improvements come in two forms
- It is impossible to get it right the 1st (or 2nd,
or 3rd, ) time. - What we know about reality is continually growing
95Step 9
96Use the power of combination and collaboration
- as far as possible dont reinvent
- ontologies are like telephones they are valuable
only to the degree that they are used and
networked with other ontologies - but choose working telephones
- most telephones were broken when the technology
was first being developed
97The National Center for Biomedical Ontology
- Stanford Berkeley Mayo Victoria Buffalo
UCSF Oregon Cambridge
98Bioinformatics and Computational Biology
99National Centers for Biomedical Computing2005
- National Center for Integrative Biomedical
Informatics (Michigan) - National Center for Multi-Scale Study of Cellular
Networks (Columbia) - National Center for Biomedical Ontology
100- Stanford Tools for ontology alignment,
indexing, and management (Cores 1, 47 Mark
Musen) - LawrenceBerkeley Labs Tools to use ontologies
for data annotation (Cores 2, 57 Suzanna Lewis) - Mayo Clinic Tools for access to large
controlled terminologies (Core 1 Chris Chute) - Victoria Tools for ontology and data
visualization (Cores 1 and 2 Margaret-Anne
Story) - University at Buffalo Dissemination of best
practices for ontology engineering (Core 6 Barry
Smith)
101Driving Biological Projects
- Trial Bank UCSF, Ida Sim
- Flybase Cambridge, Michael Ashburner
- ZFIN Oregon, Monte Westerfield
102Case study Phenotypes, our current work on OBD
103Animal disease models
Animal models
Mutant Gene Mutant or missing
ProteinMutant Phenotype
104Animal disease models
Humans
Animal models
Mutant Gene Mutant or missing
ProteinMutant Phenotype (disease model)
Mutant Gene Mutant or missing
ProteinMutant Phenotype (disease)
105Animal disease models
Humans
Animal models
Mutant Gene Mutant or missing
ProteinMutant Phenotype (disease model)
Mutant Gene Mutant or missing
ProteinMutant Phenotype (disease)
106Animal disease models
Humans
Animal models
Mutant Gene Mutant or missing
ProteinMutant Phenotype (disease model)
Mutant Gene Mutant or missing
ProteinMutant Phenotype (disease)
107SHH-/
SHH-/-
shh-/
shh-/-
108 Phenotype (clinical sign) entity
quality
109 Phenotype (clinical sign) entity
quality P1 eye hypoteloric
110 Phenotype (clinical sign) entity
quality P1 eye hypoteloric P2
midface hypoplastic
111 Phenotype (clinical sign) entity
quality P1 eye hypoteloric P2
midface hypoplastic P3 kidney
hypertrophied
112 Phenotype (clinical sign) entity
quality P1 eye hypoteloric P2
midface hypoplastic P3 kidney
hypertrophied
PATO hypoteloric hypoplastic
hypertrophied
ZFIN eye midface kidney
113 Phenotype (clinical sign) entity
quality
Anatomical ontology Cell tissue ontology
Developmental ontology Gene ontology
biological process cellular component
PATO (phenotype and trait ontology)
114 Phenotype (clinical sign) entity
quality P1 eye hypoteloric P2
midface hypoplastic P3 kidney
hypertrophied
Syndrome P1 P2 P3 (disease)
holoprosencephaly
115Human holo- prosencephaly
Zebrafish shh
Zebrafish oep
116OMIM gene ZFIN gene FlyBase gene FlyBase mut pub ZFIN mut pub mouse rat SNOMED OMIM disease
LAMB1 lamb1 LanB1 5 15 39 -
FECH fech Ferro- chelatase 2 5 2 29 Protoporphyria, Erythropoietic
GLI2 gli2a ci 388 41 22 -
SLC4A1 slc4a1 CG8177 7 7 19 Renal Tubular Acidosis, RTADR
MYO7A myo7a ck 84 5 9 3 16 Deafness DFNB2 DFNA11
ALAS2 alas2 Alas 1 7 14 Anemia, Sideroblastic, X-Linked
KCNH2 kcnh2 sei 27 3 12 -
MYH6 myh6 Mhc 166 3 1 12 Cardiomyopathy, Familial Hypertrophic CMH
TP53 tp53 p53 64 3 3 19 11 Breast Cancer
ATP2A1 atp2a1 Ca-P60A 32 6 1 11 Brody Myopathy
EYA1 eya1 eya 251 5 4 6 Branchiootorenal Dysplasia
SOX10 sox10 Sox100B 1 17 4 4 Waardenburg-Shah Syndrome
117National Center for Biomedical Ontology
Capture and index experimental results
Open Biomedical Ontologies (OBO)
Open Biomedical Data (OBD)
BioPortal
Revise biomedicalunderstanding
Relate experimental data to results from other
sources
118Phenotype as an observation
context
The class of thing observed
evidence
publication
environment
figure
assay
genetic
sequence ID
ontology
119Review of proposed EAV EQ model
- A phenotype is described using an Entity-Quality
double - Entities are drawn from various OBO
ontologiescell, anatomies, GO, - Qualities are drawn from one ontologyPATO
120Separation of concerns
- Not phenotypes
- Genotype
- Environment
- Assay, measurement systems
- Images
schema
Association Genotype Phenotype Environment
Assay Phenotype Entity Quality Entity
OBOClassID Quality PATOClassID
1212003 Pilot study
- Trial of EAV model on small collection of
genotypes - FlyBase
- ZFIN
- Genes were non-orthologous
- New curations - in progress
- orthologous genes with clinical relevance
- Use the same data model and exchange format?
122Example data records
Genotype Entity Value
npo gut dysplastic
gut small
r210 retina irregular
brain fused
123ZFIN schema extension stages
Genotype Stage Entity Quality
npo HatchingPec-fin gut dysplastic
HatchingPec-fin gut small
r210 HatchingLong-pec retina irregular
LarvalProtruding-mouth brain fused
124Stages
Association Genotype Phenotype Environment
Assay Phenotype Stage Entity Quality Entity
OBOClassID Stage OBOAnatomicalStageClassID Quali
ty PATOClassID
means zero or more
125Monadic and relational qualities
- Monadic
- the quality inheres in a single entity
- Relational
- the quality inheres in two or more entities
- sensitivity of an organism to a kind of drug
- sensitivity of an eye to a wavelength of light
- can turn relational qualities into cross-product
monadic qualities - e.g. sensitivityToRedLight
- better to use relational qualities
- avoids redundancy with existing ontologies
126Incorporating relational qualities
Association Genotype Phenotype Environment
Assay Phenotype Stage Entity Quality
Entity Entity OBOClassID Quality
PATOVersion2ClassID
Example data record
Phenotype organism sensitiveTo puromycin
127Measurable qualities
- Some qualities are inexact and implicitly
relative to a wild-type or normal quality - relatively short, relatively long, relatively
reduced - easier than explicitly representing
- this tail length shorter-than mouse wild-type
tail length - Some qualities are determinable
- use a measure function
- unit, value, time
- this tail has length L
- measure(L, cm) 2
- Keep measurements separate from (but linked to)
quality ontology
128Incorporating measurements
Association Genotype Phenotype Environment
Assay Phenotype Stage Entity Quality Entity
Measurement Measurement Unit Value
(Time) Entity OBOClassID Quality
PATOVersion2ClassID
Example data record
Phenotype gut acidic Measurement pH 5
129The Methodology of Annotations
- Scientific curators use experimental observations
reported in the biomedical literature to link
gene products with GO terms in annotations. - The gene annotations taken together yield a
slowly growing computer-interpretable map of
biological reality, - The process of annotating literature also leads
to improvements and extensions of the ontology
itself, which institutes a virtuous cycle of
improvement in the quality and reach of future
annotations and of future versions of the
ontology.
130When we annotate the record of an experiment
- we use terms representing types to capture what
we learn about the instances - this experiment as a whole (a process)
- these instances experimented upon
- the instances are typical
- they are representatives of a type
131Ontology
- A thing of beauty is a joy forever
- With acknowledgement and thanks to Mark Musen,
Barry Smith, Sima Misra, Chris Mungall, Daniel
Rubin, and David Hill
132Interpretation of the schema
- How should EAV data records be interpreted by a
computer? - What are the instances?
- Is the EAV schema just to improve database
searching? - Can it be used for meaningful cross-species
comparisons?
133What is the Entity slot used for?
Genotype Entity Quality
npo gut dysplastic
gut small
r210 retina irregular
brain fused
tm84 d/v pattern formation abnormal
blood islands number increased
Bsb2 elongation of arista literal arrested
C-alpha1D adult behaviour uncoordinated
2003 trial data FB ZFIN
134What is the Entity slot used for?
- In practical terms
- An ID from one of the following ontologies
- GO CC, BP, and MF
- Species-specific anatomical ontology
- OBO Cell
- Or a cross-product
- e.g. acidification GO0045851
- which has_locationOBOREL midgut FBbt00005383
- example from FBal0062296 Acidification in the
midgut of homozygous larvae is often less than in
wild-type larvae - But what does it mean in the context of an
annotation?
135Universals and particulars
- An ontology consist of universals (classes)
- Fruitfly, wing, flight
- Experimental data generally concerns particulars
(instances) that instantiate universals - this particular wing of this particular fruitfly
- this particular fruitfly participating in this
particular flight from here to there - In annotation we often use a class ID as a proxy
for an (unnamed) instance (or collection of
instances) - It is important to always keep this distinction
in mind
136What is the Quality slot used for?
Genotype Entity Quality Quality
npo gut structure dysplastic
gut relative size small
r210 retina pattern irregular
brain structure fused
tm84 d/v pattern formation qualitative abnormal
blood islands relative number number increased
Bsb2 elongation of arista literal process arrested
C-alpha1D adult behaviour behavioral activity uncoordinated
2003 trial data FB ZFIN
137Qualities
- Treat as Qualities
- a dependent entity
- a quality must have independent entity(s) as
bearer - the quality inheres_in the bearer
- Examples
- The particular shape of this ball
- The particular structure of this wing
- The particular length of this tail
- The particular rate of synaptic transmission
between these two neurons
138Attribute universals vs Attribute particulars
- In an EAV annotation, a PATO class ID typically
serves as a proxy for an unnamed attribute
instance - Universals (classes) must always be defined in
terms of their instances
A Formal Theory of Substances, Qualities, and
Universals Fabian NEUHAUS Pierre GRENON Barry
SMITH
139How is the attribute slot used?
Genotype Entity Attribute Attribute
npo gut structure dysplastic
gut relative size small
r210 retina pattern irregular
brain structure fused
tm84 d/v pattern formation qualitative abnormal
blood islands relative number number increased
Bsb2 elongation of arista literal process arrested
C-alpha1D adult behaviour behavioral activity uncoordinated
140Current Model
Association Genotype Phenotype Environment
Assay Phenotype Stage Entity Attribute Entity
OBOClassID Stage OBOAnatomicalStageClassID Att
ribute PATOClassID
means zero or more
141Composite phenotype classes
- Mammalian phenotype has composite phenotype
classes - e.g. reduced B cell number
- Compose at annotation time or ontology curation
time? - False dichotomy
- Core 2 will help map between composite class
based annotation and EAV annotation
142Interpreting annotations
- Annotations are data records
- typically use class IDs
- implicitly refer to instances
- How do we map an annotation to instances?
- Important for using annotations computationally
143Interpreting annotations (1)
- What does an EA (or EAV) annotation mean?
- Annotation
- GenotypeFBal00123 Ebrain Afused
- presumed implied meaning
- this organism
- has_part x, where
- x instance_of brain
- x has_quality fused
- or in natural language
- this organism has a fused brain
- Various built-in assumptions
144(No Transcript)
145Interpreting annotations (II)
- What does this mean
- annotation
- GenotypeFBal00123 Ewing Aabsent
- using same mapping as annotation I
- fly98 has_part x, where
- x instance_of wing
- x has_quality absent
- or in natural language
- this fly has a wing which is not there!
- What we really intend
- NOT(this organism has_part x, where x instance_of
wing)
146Interpreting annotations (II)
- What does this mean
- annotation
- GenotypeFBal00123 Ewing Aabsent
- using same mapping as annotation I
- this organism has_part x, where
- x instance_of wing
- x has_quality absent
- or in natural language
- this fly has a wing which is not there!
- What we really intend
- this organism has_quality wingless
- wingless the property of having
count(has_part wing)0
147Are our computational representations intended to
capture reality?
148Does this matter?
- If we simply use the colloquial expression
absent - What are the consequences?
- Basic search will be fine
- e.g. find all wing phenotypes
- Logical reasoners will compute incorrect results
- Computers will not be able to reason correctly
- We must explicitly provide specific rules for
certain attributes such as absent
149Interpreting annotations (III)
- What does this mean
- annotation
- Edigit Asupernumery
- using same interpretation as annotation I
- this organism has_part x, where
- x instance_of digit
- x has_quality supernumery
- or in natural language
- this organism has a particular finger which is
supernumery!! - What we really intend
- this person has_quality supernumery finger
- supernumery finger the property of having
count(has_part digit) gt wild-type
150Interpreting annotations (IV)
- What does this mean
- annotation
- Gtmp001 Ebrown fat cell Aincreased
quantity - using same mapping as annotation I
- this organism has_part x, where
- x instance_of brown fat cell
- x has_quality increased quantity
- or in natural language
- this organism has a particular brown fat cell
which is increased in quantity - What we really intend
- this organism has_part population_of(brown fat
cell) which has_quality increased size
151Other use cases
- Spermatocyte devoid of asters
- Homeotic transformations
- Increased distance between wing veins
- Some vs. all
152Alternate perspectives
- process vs. state
- regulatory processes
- acidification of midgut has_quality reduced rate
- midgut has_quality low acidity
- development vs. behavior
- wing development has_quality abnormal
- flight has_quality intermittent
- granularity (scale)
- chemical vs. molecular vs. cell vs. tissue vs.
anatomical part