Title: VR
1VR
2Formal Principles for Biomedical Ontologies
- Barry Smith
- http//ifomis.de
3Three levels of ontology
4Three levels of ontology
- formal (top-level) ontology dealing with
categories employed in every domain - object, event, whole, part, instance, class
- 2) domain ontology, applies top-level system to
a particular domain - cell, gene, drug, disease, therapy
- 3) terminology-based ontology
- large, lower-level system
- Dupuytrens disease of palm, nodules with no
contracture
5Three levels of ontology
- formal (top-level) ontology dealing with
categories employed in every domain - object, event, whole, part, instance, class
- 2) domain ontology, applies top-level system to
a particular domain - cell, gene, drug, disease, therapy
- 3) terminology-based ontology
- large, lower-level system
- Dupuytrens disease of palm, nodules with no
contracture
6Three levels of ontology
- formal (top-level) ontology dealing with
categories employed in every domain - object, event, whole, part, instance, class
- 2) domain ontology, applies top-level system to
a particular domain - cell, gene, drug, disease, therapy
- 3) terminology-based ontology
- large, lower-level system
- Dupuytrens disease of palm, nodules with no
contracture
7Compare
- pure mathematics (re-usable theories of
structures such as order, set, function, mapping)
- applied mathematics, applications of these
theories re-using the same definitions,
theorems, proofs in new application domains - physical chemistry, biophysics, etc. adding
detail
8Three levels of biomedical ontology
?????
- formal (top-level) ontology
- medical ontology has nothing like the technology
of re-usable definitions, theorems and proofs
provided by pure mathematics - 2) domain ontology
- UMLS Semantic Network, GALEN CORE
- 3) terminology-based ontology
- UMLS, SNOMED-CT, GALEN, FMA
9Description Logic , Protégé,
- and other tools for supporting automatic
reasoning do not fill this gap - they do not provide theories of classes,
functions, processes, etc. - rather successful coding in a DL-framework
presupposes that such theories have already been
applied in the very construction of the
terminology-based ontology
10IFOMIS
- Institute for Formal Ontology and Medical
Information Science, - mission
- use basic principles of philosophical ontology,
traditional theories of classification and
definition for quality assurance and alignment of
biomedical ontologies
11Strategy
- Part 1 Survey of GO
- Part 2 Provide principles for building
biomedical ontologies derived from formal
(top-level) ontology, and illustrate how they can
help in quality assurance of terminology-based
ontologies like GO - Part 3 Show how it can be done right
12Part One Survey of GO
13GO is three ontologies
- cellular components
- molecular functions
- biological processes
- December 16, 2003
- 1372 component terms
- 7271 function terms
- 8069 process terms
14GO an impressive achievement
- used by over 20 genome database and many other
groups in academia and industry - successful methodology, much imitated
- now part of OBO (open biological ontologies)
consortium - Here I focus on problems / errors
- GO here is just an example
15Primary aim of GO
- not rigorous definition and principled
classification - but rather providing a practically useful
framework for keeping track of the biological
annotations that are applied to gene products
16Each of GOs ontologies
- is organized in a graph-theoretical structure
involving two sorts of links or edges - is-a
- (epithelial cell differentiation is-a cell
differentiation) - part-of
- (axonemal microtubule part-of axoneme)
17This graph-theoretic architecture
- to designed to help humans, who can use the
graphs to locate the features and attributes they
are addressing in their work and thus to
determine the designated terms for these features
and attributes within GOs controlled
vocabulary.
18GOs three ontologies
- When a gene is identified, three important types
of questions need to be addressed Where is it
located in the cell? What functions does it have
on the molecular level? And to what biological
processes do these functions contribute?
19GOs three ontologies
20The Cellular Component Ontology (counterpart of
anatomy)
- consists of terms such as flagellum, chromosome,
ferritin, extracellular matrix and virion - Cellular components are physical and measurable
entities. They are, in the terminology of
philosophical ontology, objects or things
(independent continuants). They endure
self-identically through time while undergoing
changes of various sorts - Cellular component embraces also the
extracellular environment of cells and cells
themselves
21No organisms
- GO does not include terms for specific
organisms, not even for single-celled organisms
22The Molecular Function Ontology
- molecular function the action characteristic of
a gene product. - Actions such as ice nucleation or protein
stabilization do not endure but rather occur.
23The Molecular Function Ontology
- Originally included terms such as anti-coagulant
(defined as a substance that retards or
prevents coagulation) and enzyme (defined as a
substance that catalyzes) - These refer neither to functions nor to actions
but rather to components.
24The Molecular Activity Ontology
- Confusion remedied to a degree by policy change
of March 2003 All GO molecular function term
names with the exception of the parent term
molecular function and of the whole node binding
are to be appended with the word activity.
25Function Activity
- Thus the term structural molecule, which is
defined as meaning the action of a molecule
that contributes to structural integrity, is
amended to structural molecule activity
26still problems with GO Molecular Function
Definitions
- anti-coagulant activity (defined as a
substance that retards or prevents coagulation) - enzyme activity (defined as a substance that
catalyzes)
27 and there are still problems with Molecular
Function terms
- GO0005199
- structural constituent of cell wall
28structural constituent of cell wall
- Definition The action of a molecule that
contributes to the structural integrity of a cell
wall. - confuses actions, which GO includes in its
function ontology, with constituents, which GO
includes in its cellular component ontology
29- extracellular matrix structural constituent
- puparial glue (sensu Diptera)
- structural constituent of bone
- structural constituent of chorion (sensu Insecta)
- structural constituent of chromatin
- structural constituent of cuticle
- structural constituent of cytoskeleton
- structural constituent of epidermis
- structural constituent of eye lens
- structural constituent of muscle
- structural constituent of myelin sheath
- structural constituent of nuclear pore
- structural constituent of peritrophic membrane
(sensu Insecta) - structural constituent of ribosome
- structural constituent of tooth enamel
- structural constituent of vitelline membrane
(sensu Insecta)
30The Biological Process Ontology
- biological process A phenomenon marked by
changes that lead to a particular result,
mediated by one or more gene products. - Examples
- glycolysis,
- death,
- adult walking behavior
- response to blue light
31Occurrents
- Both molecular activity and biological process
terms refer to what philosophical ontologists
call occurrents - entities which do not endure through time but
rather unfold themselves in successive temporal
phases. - Occurrents can be segmented into parts along the
temporal dimension. - Continuants exist in toto in every instant at
which they exist at all.
32Molecular functions and biological processes are
closely interrelated
- E.g. the process anti-apoptosis involves the
molecular function apoptosis inhibitor activity. - How can GO express such relations?
33Are they a matter of granularity?
- A biological process is accomplished via one or
more ordered assemblies of molecular functions. - ??? Molecular activities building blocks of
biologica processes ??? - So Functions are parts of processes
- But no
34GOs three ontologies are separate
biological processes
molecular functions
- No links or edges defined between them
cellular constituents
35Question
- How understand granularity
- if not in terms of parthood?
36Molecular functions
- renamed activities,
- because activity unlike process, connotes
agency ? - but molecules are not agents
- hypothesis the term function was used for the
molecular function ontology because the
activities in question are functional in relation
to the pertinent organism.
37Functions
- A function is functional
- beneficial to the organism
- If an organism-part has a function, this is
because the functioning of this organism-part is
beneficial to the organism - The function of the heart is to pump blood
- Not the function of the hip is to financially
support hip-replacement surgeons
38- Some processes are functionings
- E.g. pumping blood
39? Two sorts of processes
- Functionings (realizations of functions
beneficial to the organism) - Other processes (e.g. the result of external
interventions) - Cf. difference between physiology and pathology
40GO not clear about this distinction
- transport The directed movement of substances
(such as ions) into, out of, or within a cell - cell growth and/or maintenance Any process
required for the survival and growth of a cell - Synonym cell physiology
- transport is-a cell growth and/or maintenance
- but (GO0019060) viral intracellular protein
transport - is-a transport
41Why do these problems arise?
- GO has no clear understanding of the role of
temporal relations in organizing an ontology - (thus also no clear understanding of the
difference between a function and the activity
which is the realization of a function)
42GO excludes organisms from its scope (they are of
the wrong granularity)
- Yet each process or function requires some bearer
or bearers which it is the process or function
of. - Processes are dependent on their bearers
- (Theory of dependence vs. independence part of
formal ontology) - (Theory of continuants vs. occurrents part of
formal ontology)
43Some formal ontology
- Components are independent continuants
- Functions are dependent continuants
- (the function of an object exists continuously in
time, just like the object which has the
function - and it exists even when it is not being
exercised) - Processes are (dependent) occurrents
44More generally
- Continuants can be divided into independent
(objects, things, components) and dependent
(features, attributes, conditions, functions,
roles, qualities ) - All occurrents are dependent entities.
- Every occurrent is dependent for its existence on
one or more continuants. - A change is always a change in some continuant
object.
45(No Transcript)
46Part Two
- Principles of Biomedical Ontologies and their
use in quality assurance of terminology-based
ontologies
47Principle of Temporal Coherence
- An ontology should rigorously distinguish
continuants from occurrents. - (Anatomy is a science of continuants)
48Principle of Dependence
- If an ontology recognizes a dependent entity
then it (or a linked ontology) should recognize
also the relevant class of bearers - Part of our aim here is to lay down principles
which can support such linkability
49Linking to external ontologies
- can also help to link together GOs own three
separate parts
50GOs three ontologies
biological processes
molecular functions
? dependent ?
cellular constituents
? independent
51GOs three ontologies
organism-level biological processes
cellular processes
molecular functions
cellular constituents
52 molecular functions
cellular processes
organism-level biological processes
molecule complexes
cellular constituents
organisms
part-of is dependent
on
53 molecular functions
cellular processes
organism-level biological processes
molecule complexes
cellular constituents
organisms
54 molecule complexes
cellular constituents
organisms
55 molecule complexes
cellular constituents
organisms
56GO must be linked with other, neighboring
ontologies
- GO has adult walking behavior but not adult
- GO has eye pigmentation but not eye
- GO has response to blue light but not light (or
blue) - 94 of words used in GO terms are not GO terms
- Part of the solution Medical FactNet (NLM, 10am
tomorrow)
57GO taking steps in this direction
- Linking to a good external ontology of organism
types (to solve some of the problems with sensu) - It needs to link further to a good external
ontology of anatomy, to solve the location
problem - and to a good external ontology of coarse-grained
reality, to solve the adult walking behavior
problem - Human beings know what walking means
58- Human beings know what adults are older than
embryos - GO needs to be linked to ontology of development
- and in general to resources for reasoning about
time and change
59but such linkages are possible
- only if GO itself has a coherent formal
architecture
60Principle of Univocity
- univocity terms should have the same meanings
(and thus point to the same referents) on every
occasion of use - UMLS-Semantic Network
- organization body plan (anatomy)
- organization social organization
61Polysemy of GOs part-of
- membrane part-of cell, intended to mean a
membrane is a part-of any cell - flagellum part-of cell, intended to mean a
flagellum is part-of some cells - replication fork part-of cell cycle, intended
to mean a replication fork is part-of the
nucleoplasm only during certain times of the cell
cycle
62Three meanings of part-of
- part-of can be part of (flagellum part-of
cell) - part-of is sometimes part of (replication
fork part-of the nucleoplasm) - part-of is included as a sublist in
63THE GOAL IS
- not to impose basic principles of classification
and definition on biologists - All the principles presented here should be
conceived not as iron requirements but rather as
rules of thumb - deviation from which is often marked by
characteristic families of coding errors
64example
- GO 0030430 host cell cytoplasm, defined as
The cytoplasm of a host cell - GO0018995 host, defined as Any organism in
which another organism, especially a parasite or
symbiont, spends part or all of its life cycle
and from which it obtains nourishment and/or
protection
65Why is this an error?
- because organisms do not fall within the scope of
GO - An organism is not a cellular component, and it
is not a molecular function, and not a biological
process, either
66host cell cytoplasm part-of host
- breaks GOs own granularity constraints
67Why univocity?
- humans are good at disambiguating ambiguous
expressions, machines not - 2. quality assurance and ontology maintenance
- 3. GO, SNOMED, etc., are designed to constitute
controlled vocabularies
68Quality assurance and ontology maintenance must
be automated
- As GO increases in size and scope it will be
increasingly difficult to maintain the semantic
consistency we desire without software tools that
perform consistency checks and controlled
updates. - The addition of each new term will require the
curator to understand the entire structure of GO
in order to avoid redundancy and to ensure that
all appropriate linkages are made with other
terms.
69The purpose of a controlled vocabulary
- to ensure that the same terms are used by
different research groups with the same meanings - this has implications also for the syntax of GO
terms ( the way terms are compounded together
out of other terms)
70Univocity and syntax
71/
- GO0008608 microtubule/kinetochore interaction
- df Physical interaction between microtubules and
chromatin via proteins making up the kinetochore
complex
72/
- GO0001539 ciliary/flagellar motility
- df Locomotion due to movement of cilia or
flagella.
73/
- GO0045798 negative regulation of chromatin
assembly/disassembly - df Any process that stops, prevents or reduces
the rate of chromatin assembly and/or disassembly
74/
- GO0000082 G1/S transition of mitotic cell cycle
- defined as Progression from G1 phase to S phase
of the standard mitotic cell cycle.
75/
- GO0001559 interpretation of nuclear/cytoplasmic
to regulate cell growth - df The process where the size of the nucleus
with respect to its cytoplasm signals the cell to
grow or stop growing.
76/
- GO0015539 hexuronate (glucuronate/galacturonate)
porter activity - df Catalysis of the reaction hexuronate(out)
cation(out) hexuronate(in) cation(in)
77Problems with GOs compositionality
/ (slash) 286
(semi-colon) 177
, (comma) 1206
and 180
78comma
- cytokinesis, site selection
79plurals
- biological process
- physiological processes cellular
process - cell growth and/or maintenance
80specification 39 complex 563
formation forming 142 regulator regulatory regulated regulation 1326
determination determinacy 56 acting on 146
with 54 constituting 35
from 141 constituent constitutive 29
in 51 dependent 182
via 164 sensu 469
81Questions regarding operators
- How does constituent relate to component
- If A within B then is A part-of B or
included-in-the-interior-of B ? - Does via mean by means of or along the path of ?
- How is un- related to not (how is
unlocalized related to not localized)
82involved in
- term-forming operator (reflection of GOs limited
resources for expressing relations) - hydrolase activity, acting on acid anhydrides,
involved in cellular and subcellular movement - asymmetric protein localization involved in cell
fate commitment - cell-cell signaling involved in cell fate
commitment - protein secretion involved in cell fate
commitment
83involved in
- hydrolase activity, acting on acid anhydrides,
involved in cellular and subcellular movement - This is a term because GO does not have the
resources to express is-involved-in as a
relation between terms - note problems with commas
84involved in
- hydrolase activity,
- acting on acid anhydrides,
- involved in cellular and subcellular
movement - is-a hydrolase activity, acting on acid anhydrides
85involved in
- hydrolase activity, acting on acid anhydrides,
involved in cellular and subcellular movement
is-a hydrolase activity, acting on acid
anhydrides - is ok hydrolase activity, acting on anhydrides
can but need not be involved in cellular and
subcellular movement
86involved in
- asymmetric protein localization involved in cell
fate commitment is-a cell fate commitment - should be a part-of relation
- (compare breathing involved in running is a
running)
87involved in
- cell-cell signaling involved in cell fate
commitment is-a cell fate commitment - ditto should be a part-of relation
88these, though, are good
- asymmetric protein localization involved in cell
fate commitment is-a asymmetric protein
localization - cell-cell signaling involved in cell fate
commitment is-a cell-cell signaling
89involved in
- protein secretion involved in cell fate
commitment synonym of protein secretion - are there instances of protein secretion not
involved in cell fate commitment? - Problems with GOs peculiar use of synonym
90Consequences of inconsistent and/or indeterminate
use of operators
- there are 29.42 distinct terms within GO which
contain one or more polysemous operators - but these terms receive only 13.96 of the
annotations present within GO - Hypothesis This lower percentage of annotations
reflects the fact that poorly defined operators
are not well understood by annotators, who thus
avoid the corresponding terms
91Principle of Compositionality
- The meanings of compound terms should be
determined - 1. by the meanings of constituent terms
- together with
- 2. the rules governing the syntactic operators
92Principle of Objectivity
- which classes exist is not a function of our
biological knowledge. - (Terms such as unclassified or unknown
ligand or not otherwise classified as peptides
do not designate biological natural kinds.)
93GO0008372 cellular component unknown cellular
component unknown is-a cellular
componentunlocalized is-a cellular
componentHolliday junction helicase complex
is-a unlocalized
94GOs excuse
- unlocalized is used as a placeholder only
- but automatic information retrieval systems
cannot distinguish it from other, genuine class
names - formal tools exist which can deal with the
addition of knowledge into a classification
system without the need to create fake classes - (Theory of Granular Partitions)
95Principle of Positivity
- Class names should be positive. Logical
complements of classes are not themselves
classes. - (Terms such as non-mammal or non-membrane or
invertebrate or do not designate natural
kinds.) -
96- Terms such as
- Veterinary proprietary drug AND/OR biological
- do not designate natural kinds. (Which biological
classes exist is not a matter of logic.) - has 2532 children in SNOMED-CT
97Principle of Explicitness
- if a link between two classes holds only under
certain specific restrictions, then this
restriction should be made explicit in the
statement of the corresponding link-axiom - cf. GOs sensu
98GO
- can in practice be used only by trained
biologists (with know how) - whether a GO-term truly stands in the is_a
relation depends e.g. on the type of organism
involved - glycosome is part-of cytoplasm
- only for Kinetoplastidae
- Computers have no counterpart of such
context-dependent know-how
99Principle of Single Inheritance
- no class in a classificatory hierarchy should
have more than one parent on the immediate higher
level
100Principle of Taxonomic Levels
- the terms in a classificatory hierarchy should
be divided into predetermined levels (analogous
to the levels of kingdom, phylum, class, order,
etc., in traditional biology). - depth in GOs hierarchies not determinate
because of multiple inheritance
101Principle of Partonomic Levels
- Terms in a partonomic hierarchy should be divided
into predetermined granularity levels, for
example organism, organ, cell, molecule, etc.) - (GO is about to break physiological process into
'cell physiological process' and 'organism
physiological process'.) - take granularity seriously
102Principle of Exhaustiveness
- the classes on any given level should exhaust
the domain of the classificatory hierarchy.
103Single Inheritance Exhaustiveness JEPD
- for Jointly Exhaustive and Pairwise Disjoint
- Exhaustiveness often difficult to satisfy in the
realm of biological phenomena but its acceptance
as an ideal is presupposed as a goal by every
scientist. - Single inheritance accepted in all traditional
(species-genus) classifications, now under threat
because multiple inheritances is a
computationally useful device (allows one to
avoid certain kinds of combinatory explosion).
104Problems with multiple inheritance
- B C
- is-a1 is-a2
- A
- is-a no longer univocal
105GOs is-a is pressed into service to mean a
variety of different things
- the resulting ambiguities make the rules for
correct coding difficult to communicate to human
curators in terms of generally intelligible
principles - they also serve as obstacles to integration with
neighboring ontologies
106Problems with multiple inheritance
- B C
- is-a1
is-a2 - A
E - D
- sibling is no longer determinate
- Principle of levels is violated
107(No Transcript)
108(No Transcript)
109A storage vacuole is not a special kind of vacuole
- a box used for storage is not a special kind of
box
110(No Transcript)
111Another term-forming operator
- lytic vacuole within a protein storage vacuole
- lytic vacuole within a protein storage vacuole
is-a protein storage vacuole - time-out within a baseball game is-a baseball
game - embryo within a uterus is-a uterus
112Problems with Location
- is-located-at / is-located-in and similar
relations need to be expressed in GO via some
combination of is-a and part-of - is-a unlocalized
- is-a site of
- within
- in
113Problems with location
- extrinsic to membrane part-of membrane
- extrinsic to plasma membrane part-of plasma
membrane - extrinsic to vacuolar membrane part-of vacuolar
membrane
114Differentiation and Development
- development cellular process
- cell differentiation
115Cell differentation is-a development
- But according to GOs own definitions the agent
or subject of differentiation is the cell, while
the agent or subject of development is the whole
organism - (again GO has problems in keeping track of
entities on differerent levels of granularity)
116cell differentiation is-a development
- but
- hemocyte differentiation hemocyte
development
part-of
117- GO0007514 garland cell differentiation
- Definition Development of garland cells, a small
group of nephrocytes which take up waste
materials from the hemolymph by endocytosis. - (Illustrates GOs problems with definitions)
118(No Transcript)
119Part Three
- How to do things right
- so far only scratched the surface
- sensu
- synonyms
- GOs definitions
- GOs logical relationships
120Principles for GO terms
- Temporal coherence
- Dependence
- Univocity
- Compositionality
- Objectivity
- Positivity
- Explicitness
- Taxonomic Levels
- Partonomic Levels
- Single Inheritance
- Exhaustiveness
121Should these principles be satisfied?
- Michael Ashburner
- GOs philosophy from the beginning was just in
time - that is, we made no great attempt to
complete the ontologies . If you try and
complete an ontology, or worse try and get it
right, then you will fail
122Can these principles be satisfied?
- Compare GO with Foundational Model of Anatomy
(FMA)
123 Principle GO FMA
Temporal coherence No N/A
Dependence No N/A
Univocity No Yes
Compositionality No Yes
Objectivity No Yes
Positivity No Yes
Explicitness No N/A
Taxonomic Levels No Yes
Partonomic Levels No Yes
Single Inheritance No Yes
Exhaustiveness No No
124 Principle GO FMA
Temporal coherence No N/A
Dependence No N/A
Univocity No Yes
Compositionality No Yes
Objectivity No Yes
Positivity No Yes
Explicitness No N/A
Taxonomic Levels No Yes
Partonomic Levels No Yes
Single Inheritance No Yes
Exhaustiveness No No
125Is GO an ontology
- GO a controlled vocabulary
- (ramshackle) syntactic regimentation
- but because is-a and part-of are not given
uniform readings, this does NOT mean the sort of
semantic regimentation which would amount to an
ontology in the proper sense of the word
126rules for definitions
- intelligibility the terms used in a definition
should be simpler (more intelligible) than the
term to be defined -
- definitions do not confuse definitions with the
communication of new knowledge
127Principle of Substitutability
- in all so-called extensional contexts a defined
term should be substitutable by its definition in
such a way that the result is both grammatically
correct and has the same truth-value as the
sentence with which we begin -
- GO0015070 toxin activity
- Definition Acts as to cause injury to other
living organisms. -
128substitutability
- There is toxin activity here
- There is acts as to cause injury to other living
organisms here
129Defining is-a
- A is-a B every instance of A is an instance of
B - A is-a B A and B are natural kinds and every
instance of A is an instance of B - A is-a B A and B are natural kinds and every
instance of A is as a matter of necessity an
instance of B
130Solutions to these problems
- part_of should mean part_of