Title: Medical%20Ontologies:%20An%20Overview
1Medical Ontologies An Overview
- Barry Smith
- http//ifomis.de
- January 2004
2IFOMIS
- Institute for Formal Ontology and
- Medical Information Science
- Faculty of Medicine
- University of Leipzig
3Partners
- Laboratory for Applied Ontology, Trento and Rome
- Language Computing nv, Zonnegem, Belgium
- Ontology Works, Baltimore
- Structural Informatics Group, Department of
Biological Structure, University of Washington,
Seattle, USA - Cognitive Science Laboratory, Princeton University
4Three levels of ontology
- formal (top-level) ontology dealing with
categories employed in every domain - object, event, whole, part, instance, class
- 2) domain ontology, applies top-level system to
a particular domain - cell, gene, drug, disease, therapy
- 3) terminology-based ontology
- large, lower-level system
- Dupuytrens disease of palm, nodules with no
contracture
5Three levels of ontology
- formal (top-level) ontology dealing with
categories employed in every domain - object, event, whole, part, instance, class
- 2) domain ontology, applies top-level system to
a particular domain - cell, gene, drug, disease, therapy
- 3) terminology-based ontology
- large, lower-level system
- Dupuytrens disease of palm, nodules with no
contracture
6Three levels of ontology
- formal (top-level) ontology dealing with
categories employed in every domain - object, event, whole, part, instance, class
- 2) domain ontology, applies top-level system to
a particular domain - cell, gene, drug, disease, therapy
- 3) terminology-based ontology
- large, lower-level system
- Dupuytrens disease of palm, nodules with no
contracture
7IFOMIS
- Institute for Formal Ontology and Medical
Information Science - Leipzig
- http//ifomis.de
- philosophers and medical informaticians
attempting to build and test a Basic Formal
Ontology for applications in biomedical and
related domains
8IFOMIS
- use basic principles of philosophical ontology
- for quality assurance and alignment of
biomedical ontologies
9Compare
- pure mathematics (theories of structures such as
order, set, function, mapping) employed in every
domain - applied mathematics, applications of these
theories re-using the same definitions,
theorems, proofs in new application domains - physical chemistry, biophysics, etc. adding
detail
10Three levels of ontology
?????
- formal (top-level) ontology
- medical ontology has nothing like the technology
of definitions, theorems and proofs provided by
pure mathematics - 2) domain ontology
- UMLS Semantic Network, GALEN CORE
- 3) terminology-based ontology
- UMLS, SNOMED-CT, GALEN, FMA
11Strategy
- Part 1 Provide an overview of medical ontologies
and of the top-level ontologies which they
implicitly define - Part 2 Show how principles of classification and
definition derived from top-level ontology can
help in quality assurance of terminology-based
ontologies and in ontology alignment - Part 3 The Gene Ontology
- Part 4 Medical Fact Net
12(No Transcript)
13UMLS Semantic Network
- entity event
- physical conceptual
- object entity
-
14UMLS Semantic Network
- entity event
- physical conceptual
- object entity
-
15conceptual entity
- Organism Attribute
- Finding
- Idea or Concept
- Occupation or Discipline
- Organization
- Group
- Group Attribute
- Intellectual Product
- Language
16-
conceptual - entity
- idea or concept
- functional concept
- body system
17- entity
-
- physical conceptual
- object entity
- idea or concept
- functional concept
- body system
confusion of entity and concept
18Functional Concept
- Body system is_a Functional Concept.
- but
- Concepts do not perform functions or have
physical parts.
19This
is not a concept
20The Hydraulic Equation
- BP COPVR
-
- arterial blood pressure is directly proportional
to the product of blood flow (cardiac output, CO)
and peripheral vascular resistance (PVR)
21Confusion of Ontology and Epistemology
- blood pressure is an Organism Function,
- cardiac output is a Laboratory or Test Result or
Diagnostic Procedure - BP COPVR thus asserts that
- blood pressure is proportional either to a
laboratory or test result or to a diagnostic
procedure
22entities
- independent dependent occurrents
- continuants continuants (always
dependent) - ORGANISMS ROLES PROCESSES
- CELLS FUNCTIONS HISTORIES
- MOLECULES CONDITIONS LIVES
(diseases) (courses of -
diseases)
23entities
- independent dependent occurrents
- continuants continuants (always
dependent) - ORGANISMS ROLES PROCESSES
- CELLS FUNCTIONS HISTORIES
- MOLECULES CONDITIONS LIVES
(diseases) (courses of -
diseases)
classes
instances
24A three-category ontology along these lines
accepted by
- DOLCE first module of Semantic Web Wonderweb
Foundational Ontologies Library - BFO IFOMIS Basic Formal Ontology
- LC LinKBase
- UMLS-SN
- Gene Ontology
25(No Transcript)
26Principles for Building Medical Ontologies
27Examples
- Dont confuse entities with concepts
- Dont confuse domain entities with logical or
computational structures - Dont confuse ontology with epistemology
- Dont confuse is_a with has_role
28Further Principles
- univocity terms should have the same meanings
(and thus point to the same referents) on every
occasion of use - UMLS-SN
- organization body plan
- organization social organization
29univocity
- Gene Ontology
- part_of can be part of (flagellum part_of
cell) - part_of is sometimes part of (replication
fork part_of the nucleoplasm) - part_of is included as a sublist in
30dont forget instances
- part_of as a relation between classes
- vs. part as a relation between instances
- A part_of B
- every instance of A is part of some instance of B
- every instance of B has some instance of A as part
31Part_of as a relation between classes is more
problematic than is standardly supposed
- testis part_of human being ?
-
- heart part_of human being ?
32objectivity
- which classes exist is not a function of our
biological knowledge. - (Terms such as unknown or unclassified or
unlocalized do not designate biological natural
kinds.) - GO
- aminoadipate-semialdehyde dehydrogenase complex
is_a unlocalized
33rules for definitions
- intelligibility the terms used in a definition
should be simpler (more intelligible) than the
term to be defined -
- definitions do not confuse definitions with the
communication of new knowledge
34substitutability
- in all so-called extensional contexts a defined
term should be substitutable by its definition in
such a way that the result is both grammatically
correct and has the same truth-value as the
sentence with which we begin -
- GO0015070 toxin activity
- Definition Acts as to cause injury to other
living organisms. -
35substitutability
- There is toxin activity here
- There is acts as to cause injury to other living
organisms here
36(No Transcript)
37GO the Gene Ontology
-
- 3 large telephone directories of standardized
designations for gene functions and products - organized into hierarchies via is_a and part_of
38GO
- can in practice be used only by trained
biologists (with know how) - whether a GO-term truly stands in the is_a
relation depends e.g. on the type of organism
involved - glycosome is part-of cytoplasm only for
Kinetoplastidae - Computers have no counterpart of such
context-dependent know-how
39GO divided into three disjoint term hierarchies
- the cellular component ontology,
- e.g. flagellum, chromosome, cell
- the molecular function ontology,
- e.g. ice nucleation, binding, protein
stabilization - the biological process ontology,
- e.g. glycolysis, death
40Primary aim of GO
- not rigorous definition and principled
classification - but rather providing a practically useful
framework for keeping track of the biological
annotations that are applied to gene products
41Thesis 1
- With increasing size, GO will be required to
increase the degree to which it is a controlled
vocabulary which satisfies not merely the needs
of human biologists but also the needs of
automatic consistency-checking and updating
systems
42Thesis 2
- GO can realize its goal more adequately (and
avoid many coding errors) by taking ontology
(especially the logic of classifications and
definitions) seriously
43GO the Gene Ontology
-
- GO divided into 3 separate hierarchies each
organized via is_a and part_of
44Problems with is_a
- A is_a B every instance of A is an instance of B
45Problems with is_a
- Holliday junction helicase complex is_a
- unlocalized
- protein storage vacuole is_a
- vacuole (sensu Streptophyta)
- R7 differentiation is_a eye photoreceptor
differentiation (sensu Drosophilia).
46Uses of part_of
- membrane part-of cell, intended to mean a
membrane is a part-of any cell - flagellum part-of cell, intended to mean a
flagellum is part-of some cells - replication fork part-of cell cycle, intended
to mean a replication fork is part-of the
nucleoplasm only during certain times of the cell
cycle - regulation of sleep part-of sleep, should be
corrected to regulation of sleep is co-located
with and is causally involved with the sleep
process.
47Problems with part_of
- part_of can be part of (flagellum part_of
cell) - part_of is sometimes part of (replication
fork part_of the nucleoplasm) - part_of is included as a sublist in
48Problems with GO Molecular Functions
- anti-coagulant activity (defined as a
substance that retards or prevents coagulation) - enzyme activity (defined as a substance that
catalyzes) - structural molecule (defined as the action of
a molecule that contributes to structural
integrity)
49GO0005199 structural constituent of cell wall
- Definition The action of a molecule that
contributes to the structural integrity of a cell
wall. - confuses actions, which GO includes in its
function ontology, with constituents, which GO
includes in its cellular component ontology
50- extracellular matrix structural constituent
- puparial glue (sensu Diptera)
- structural constituent of bone
- structural constituent of chorion (sensu Insecta)
- structural constituent of chromatin
- structural constituent of cuticle
- structural constituent of cytoskeleton
- structural constituent of epidermis
- structural constituent of eye lens
- structural constituent of muscle
- structural constituent of myelin sheath
- structural constituent of nuclear pore
- structural constituent of peritrophic membrane
(sensu Insecta) - structural constituent of ribosome
- structural constituent of tooth enamel
- structural constituent of vitelline membrane
(sensu Insecta)
51Why do these problems arise?
- Because GO has no clear formal understanding of
the role of temporal relations in organizing an
ontology - (thus also no clear understanding of the
difference between a function and the activity
which is the realization of a function GO runs
these two together)
52As GO increases in size and scope
- it will be increasingly difficult to maintain
the semantic consistency we desire without
software tools that perform consistency checks
and controlled updates. - The addition of each new term will require the
curator to understand the entire structure of GO
in order to avoid redundancy and to ensure that
all appropriate linkages are made with other
terms.
53Problems with GOs compositionality
sensu / with from in resulting regulating regulation of complex constituting constitution
54/
- GO0008608 microtubule/kinetochore interaction
- df Physical interaction between microtubules and
chromatin via proteins making up the kinetochore
complex, - GO0001539 ciliary/flagellar motility
- df Locomotion due to movement of cilia or
flagella.
55/
- GO0045798 negative regulation of chromatin
assembly/disassembly - df Any process that stops, prevents or reduces
the rate of chromatin assembly and/or disassembly - GO0000082 G1/S transition of mitotic cell cycle
- defined as Progression from G1 phase to S phase
of the standard mitotic cell cycle.
56/
- GO0001559 interpretation of nuclear/cytoplasmic
to regulate cell growth - df The process where the size of the nucleus
with respect to its cytoplasm signals the cell to
grow or stop growing.
57/
- GO0015539 hexuronate (glucuronate/galacturonate)
porter activity - df Catalysis of the reaction hexuronate(out)
cation(out) hexuronate(in) cation(in)
58Problems with GOs consistency
- GO 0030430 host cell cytoplasm part-of GO018995
host - host cell cytoplasm df The cytoplasm of a host
cell. - host df Any organism in which another organism,
especially a parasite or symbiont, spends part or
all of its life cycle and from which it obtains
nourishment and/or protection.
59Cellular Component
- Another problem with host
- It is not a cellular component (and not a
molecular function, and not a biological process,
either) - GO has adult walking behavior
- but not adult or walking
- GO has eye pigmentation but not eye
60Solution
- Link GO to external ontologies
- of organism types (to solve the sensu problem)
- of anatomy, to solve the eye problem
- of coarse medical reality, to solve the adult
walking behavior problem) (see MFN below)
61note that such linkages are possible
- only if GO itself has a coherent formal
architecture
62(No Transcript)
63Medical Fact Net
- Medical Belief Net (MBN)
- large, heterogeneous, open-source corpus of
medical sentences in the English language
expressed in the form of grammatically complete
statements and assessed by the degree to which
they are understandable and assented to by
typical non-expert human subjects. - Medical Fact Net (MFN) subclass of MBN
receiving high marks on the scale of correctnesss
from medical experts - MFN intersection of non-expert beliefs about
medical phenomena and truths validated by medical
experts.
64Medical Word Net
- lexical database extending the Princeton
WordNet by all the medical terms encountered in
MBN - First in (US) English
- Then in German
- First for adults, then for children
- First for medicine, then for
65MBN/MFN/MWN Formal Architecture
- Semi-automatically generated graph-based parsing
of each sentence - formal ontology of all MFN entities and
relationships - mapping into the UMLS Metathesaurus.
66Evaluation
- MFN will be integrated into an existing
term-search-based on-line consumer health portal
based in such a way that MFN sentences are used
to direct users to information sources. We will
then measure the degree to which this results in
greater user satisfaction by setting up an
experiment in which customers of the portal are
randomly assigned to one of two groups one to
which access to MFN is offered, and other for
which simple term-searching is used.
67Significance
- Non-expert language of family members, advisors,
administrators, nurses, paramedics, lawyers - Research on differences between everyday language
and technical language
68Mismatches in Doctor-Patient Communication
- Question Text My seven-year-old son developed a
rash today that I believe to be chickenpox. My
concern is that a friend of mine had her
10-day-old baby at my home last evening before we
were aware of the illness. Is there cause for
concern at this point? - Answer Text Chickenpox is the common name for
varicella infection. ... You are correct in
that a person with chickenpox can be contagious
for 48 hours before the first vesicle is seen.
...
69Non-Expert Language in Online Communication
- Need to integrate free text and structured data.
- E-health services need automatic ways to respond
to questions in standard forms, and to provide
internet-accessible medical knowledge that is
both reliable and accessible to the non-expert.
70Diagnostic decision support
- we might associate collections of utterances
stored in MBN describing symptoms sourced to
single patients with metadata recording
subsequent diagnosis. Trained on this corpus, the
system could establish patterns of association
between specific sequences of utterances and
specific diseases one could then test the degree
to which such associations are sufficiently
strong as to produce usable automatic diagnosis
on the basis of patient inputs. -
71Medical education/medical literacy
- Use MBN to evaluate of the reliability of the
medical knowledge of different non-expert
communities. - Use MFN to develop tools to support face-to-face
education of lay people in the fields of medicine
and health care - MBN provides opportunities for a new type of
research in the field of consumer health. - e.g. on basic kinds in the medical domain à la
Eleanor Rosch
72Medical Coverage in WordNet 2.0
- WordNets coverage of domains like medicine,
physics, and geology is very limited. - coverage of medical terms represents a mixture
of folk and expert vocabulary. -
73MFN From Words to Facts
- Do for (non-expert) medicine what Belsteins Fact
Database does for (expert) Biochemistry - Relation to CYC
- Relation to FrameNet
- Botany Knowledge Base
- DARPAs Rapid Knowledge Formation project.
74Sources
- Lexical knowledge bases, such as
- the relevant general lexical information
contained in WordNet - lexical knowledge-bases of lay medical vocabulary
- medical dictionaries and large medical
terminology and ontology systems such as the UMLS
Specialist Lexicon, the Foundational Model of
Anatomy - Statement or fact knowledge bases, such as
- d. open-source linguistic corpora, public health
documents, internet resources - e. the relevant example sentences in the FrameNet
and WordNet corpora - f. free text sources
- g. the results of transforming the content of
lexical knowledge bases (especially WordNet) into
statements
75Generation from lexical databases
- treat a database like WordNet or LinKBase as a
set of links tLt', between terms (where L ranges
over 'is-a', 'part-of', 'is-caused-by', etc.). - We form the subset of this set by restricting
the values of t and t' to those which terms occur
in MWN - Some members of the resulting class of tLt'
formula can then be transformed into English
sentences automatically. For example each t is-a
t'-formula can be transformed into a sentence of
the form ' a t is a type of t' ' - Other tLt' formula can be converted by hand into
English sentences, for example "forearm
HAS-PARTIAL-MATERIAL-OVERLAP wrist" can be
transformed into "the forearm overlaps with the
wrist" and "the wrist overlaps with the forearm".
76Problems to be Addressed
- generic medical knowledge of (non-expert) adults
77 Genericity
- Much generic medical knowledge relates to what
holds for the most part or in most cases or in a
statistically significant fraction of cases
(consider smoking causes cancer).
78Medical knowledge
- is intertwined with knowledge of other domains
- (things that can be involved in an accident )
79Knowledge
- Much medical knowledge of experts and
non-experts alike takes the form of knowledge of
specific cases (Aunt Marys arthritis is always
worse in the winter). - MFN should be a repository of medical knowledge
that is generic and context-independent, the
counterpart of the theoretical knowledge of the
sciences. - Note that lexical knowledge of the sort stored
in WordNet, too, is both generic and
context-independent.
80Expertise
- a crisp separation of expert and non-expert
sentences is impossible. - Viagra, anthrax, HIV, Prozac, SARS
- ? experimental design needed to avoid artifacts
81Completeness
- Problem elementary facts People have two eyes.
Babies are born. Arms move. - WordNet contains some coverage particularly of
elementary facts of the A is type/part of B form
in virtue of their specific formal architectures - WordNet synsets can be used to generate long
lists of elementary facts from single starting
points
82Six
- Transform MWN into a large corpus of generic
beliefs by turning WordNet on its side that is
we transform a relation such as t1, , tn IS-A
t1, , tm into n x m sentences of the form
ti IS-A tk - and impose filters
83A New Kind of Linguistics
- MFN part and parcel of recent attempts in the
biomedical sciences to confront problems of
similar scope in the development of large
fact-repositories such as KEGG or Swiss-Prot. - In its final form it should be consistent with
the knowledge that is contained also in other
fact repositories both at the expert and the
non-expert level and serve to integrate them
together in a federated database.
84Adult walking behavior
- will be freed from its lonely status inside GO
85