Title: VT
1VT
2Ontologie und die Integration des medizinischen
Wissens
3IFOMIS
- Institute for Formal Ontology and
- Medical Information Science
- Faculty of Medicine
- University of Leipzig
4http//ifomis.de
- Institut für formale Ontologie
- und Medizinische Informationswissenschaft
5Ontologie als Zweig der Philosophie
- die Wissenschaften von den Arten und Strukturen
von Objekten, Qualitäten, Prozessen, Ereignissen,
Funktionen und Relationen in allen Bereichen der
Wirklichkeit
6Aristotle
Erste Ontologe
7Eine biologische Ontologie
8Linnaeus
- 1763 Genera Morborum (Nosologie, oder Ontologie
von Krankheitsarten)
9Q Warum Ontologie in der Informatik?
- A Das Babelturmproblem der Informationssysteme
10Das Babelturmproblem
- Jede Krankenhausinformationssystem verwendet
eigene Termini- und Kategoriensysteme, um die
eingegebene Information zu organisieren. - Wie können wir die Inkompatibiläten lösen, die
entstehen, wenn Information von verschiedenen
Quellen zusammgebracht wird? - Vgl. Wie können wir Chemie und Biologie
fusionisieren? - Wie können wir Anatomie und Physiologie
fusionisieren?
11Wie lösen (z.B. Medizinstudenten) dieses Problem?
- durch die Begegnung mit dem Patienten
- Der Patient und die in ihm ablaufenden Prozesse
dienen als Kristallisationspunkt für eine
sinnvolle Ordnung sonst isoliert stehender
(gelernter) Fakten.Integration entsteht durch
die Bildung praktischen Wissens - (aus Wissen-dass wird Wissen-wie)
-
12Computer sind dumm
- Analog müssen in Medizininformations-systemen
isolierte Wissensartefakte zu einheitlichem und
anwendbarem Wissen integriert werden. - Aber wie?
13Ursprünglicher Traum der Ontologie in der
Informatik
- Eine einzige allumfassende Taxonomie von allen
Gegenstandsarten, die als zentrale einheitliche
Kategoriensystem für alle Informationssysteme
dienen würde. - Dieser Traum ist ausgeträumt ...
14Gegenwärtige Lösungen
- Standardisierte Terminologien
- UMLS
- SNOMED
- HL7
- ICD-10
- usw.
15UMLS
- Universal Medical Language System
- National Library of Medicine
- Bethesda, MD
- eine Zusammenstellung verschiedener
maschinenlesbarer Quellterminologien
16Example 1 UMLS
- 134 semantic types
- 800,000 concepts
- 10 million interconcept relationships
17Example 2 SNOMED-RT
- Systematized Nomenclature of Medicine
- A Reference Terminology der American College of
Pathologists
18Example 2 SNOMED-RT
121,000 concepts, 340,000 relationships commo
n reference point for comparison and aggregation
of data throughout the entire healthcare process
19Standardisierte Terminologien
- sollen Zugriff auf biomedizinische Literatur und
Faktendatenbanken erleichtern - eine neue Art medizinischer Forschung soll
dadurch ermöglicht werden
20Blood
21Representation of Blood in UMLS
Blut als Gewebe
22Representation of Blood in MeSH
Blut als Körperflüssigkeit
23Database standardization
- is desparately needed in medicine
- to enable the huge amounts of data
- resulting from clinical trials by different
groups working on the same drugs/therapies/diagnos
tic methods - to be fused together
24How make ONE SYSTEM out of this?
- To reap the benefits of standardization we need
to resolve such incompatibilities? -
25- Defizite traditioneller Kodiersysteme (SNOMED)
- 1DB-62110Diabetic nephropathy
- 2DB-61000Diabetes mellitus
- G-C025Causing
- D7-11000Nephropathy
- 3DB-61000Diabetes mellitus
- G-C025Causing
- DF-00000Disease
- G-C006Locatedin
- T-71000Kidney
- Fehlende formal Sprache
- Medizinische Begriffs-und Dokumentationssysteme
WS 2000/2001Barbara Heller, IMISE, UNI
Leipzig16.01.2001 / Folie 7 von
26- Defizite traditioneller Kodiersysteme (SNOMED)
- DB-62110Diabetic nephropathyG-C006LocatedinT-71000
Kidney - 5DB-62110Diabetic nephropathy
- G-C006LocatedinT-11000Bone6DB-62110Diabetic
nephropathy
27It will develop medical ontologies
- at different levels of granularity
- cell ontology
- drug ontology
- protein ontology
- gene ontology
- already exists (but in a variety of mutually
incompatible forms)
28and also
- anatomical ontology
- epidemiological ontology
- disease ontology
- therapy ontology
- pathology ontology
29 together with
- physicians ontology
- patients ontology
- and even
- hospital management ontology
30Presentation overview
- Problem description patient eligibility for
clinical trial - Meaning theories
- Required technology for natural language
understanding - Implementation of a realist ontology for medical
natural language understanding - Conclusions
- If enough time a guided tour of LinkFactory
31The Medical Informatics Dogma
- Everything should be structured
- Fact computers can only deal with structured
representations of reality - structured data
- relational databases, spreadsheets
- structured information
- XML simulates context
- structured knowledge
- rule-based knowledge systems
- Typical conclusion (Dogma?)
- there is a need for structured data, hence
- there is a need for structured data entry
32Structured data entry
- Current technical solutions
- rigid data entry forms
- coding and classification systems
- But
- the description of biological variability
requires the flexibility of natural language and
it is generally desirable not to interfere with
the traditional manner of medical recording
(Wiederhold, 1980) - Initiatives to facilitate the entry of narrative
data have focused on the control rather than the
ease of data entry (Tanghe, 1997)
33Drawbacks of structured data entry
- Loss of information
- qualitatively
- limited expressiveness and inherent defects of
coding and classification systems, controlled
vocabularies, and traditional medical
terminologies - use of purpose oriented systems
- dont use data for another purpose than
originally foreseen (J VDL) - quantitatively
- too time-consuming to code all information
manually - Speech recognition and forms for structured data
entry are not best friends
34Areas for application of medical natural language
understanding
- Coding patient data
- Structured information extraction from
unstructured clinical notes - Clinical protocols and guidelines
- Assessing patient eligibility for clinical trial
entry - Triggering and alerts
- Linking case descriptions to scientific
literature - Easy access to content
- ... towards a medical semantic web
35Clinical history description
- Mr. Kovács is an 83-year-old man with a past
medical history of hypertension, congestive heart
failure, atrial fibrillation, hypercholesterolemia
, and a history of CVA who presented himself to
Budapest Emergency Room on April 25 with primary
complaint of right-sided chest pain since April
24. The patient was in his usual state of health
until April 24 when he experienced right-sided
chest pain after 10 minutes of bicycling exercise
at the YMCA. He described the chest pain as a
dull ache in the right side of his chest
radiating posteriorly to the right scapular area.
He rated the intensity as 7 out of 10. The chest
pain lasted about 3 minutes and resolved with
rest. That same night, the patient once again
experienced right-sided chest pain while lying in
bed just before he went to sleep. He describes
the pain as right-sided chest pain with same
radiation to posterior at an intensity of 6-7 out
of 10. The chest pain lasted about 10 minutes and
resolved spontaneously.
36Inclusion criteria of the INVEST study
- 1. Male or female
- 2. Age 50 to no upper limit
- 3. a) Hypertension documented as according to the
6th report of the Joint National Committee on
Detection and Evaluation of the treatment of high
BP (JNC VI) , b) and the need for drug therapy
(previously documented hypertension in patients
currently taking antihypertensive agents is
acceptable) - 4. Documented CAD (e.g., classic angina pectoris
stable angina pectoris Heberden angina
pectoris), myocardial infarction three or more
months ago, abnormal coronary angiography, or
concordant abnormalities on two different types
of stress tests - 5. Willingness to sign informed consent
37Do they match ?
- Mr. Kovács is an 83-year-old man with past
medical history of hypertension, congestive heart
failure, atrial fibrillation, hypercholesterolemia
, history of CVA who presented to Budapest
Emergency Room on April 25 with chief complaint
of right-sided chest pain since April 24. The
patient was in his usual state of health until
April 24 when he experienced right-sided chest
pain after 10 minutes of bicycling exercise at
YMCA. He described the chest pain as a dull ache
in the right side of his chest radiating
posteriorly to the right scapular area. He rated
the intensity as 7 out of 10. The chest pain
lasted about 3 minutes and resolved with rest.
That same night, the patient once again
experienced right-sided chest pain while lying in
bed right before he went to sleep. He describes
the pain as right-sided chest pain with same
radiation to posterior at an intensity of 6-7 out
of 10. The chest pain lasted about 10 minutes and
resolved spontaneously.
- 1. Male or female
- 2. Age 50 to no upper limit
- 3. Hypertension documented according to the 6th
report of the Joint National Committee on
Detection and Evaluation of the treatment of high
BP (JNC VI) and the need for drug therapy
(previously documented hypertension in patients
currently taking antihypertensive agents is
acceptable) - 4. Documented CAD (e.g., classic angina pectoris
(stable angina pectoris Heberden angina
pectoris), myocardial infarction three or more
months ago, abnormal coronary angiography, or
concordant abnormalities on two different types
of stress tests) - 5. Willingness to sign informed consent
??
38If the computer is to make this deduction ...
- 1. Male or female
- 2. Age 50 to no upper limit
- 3. Hypertension documented according to the 6th
report of the Joint National Committee on
Detection and Evaluation of the treatment of high
BP (JNC VI) and the need for drug therapy
(previously documented hypertension in patients
currently taking antihypertensive agents is
acceptable) - 4. Documented CAD (e.g., classic angina pectoris
(stable angina pectoris Heberden angina
pectoris), myocardial infarction three or more
months ago, abnormal coronary angiography, or
concordant abnormalities on two different types
of stress tests) - 5. Willingness to sign informed consent
- Mr. Kovács is an 83-year-old man with past
medical history of hypertension, congestive heart
failure, atrial fibrillation, hypercholesterolemia
, history of CVA who presented to Budapest
Emergency Room on April 25 with chief complaint
of right-sided chest pain since April 24. The
patient was in his usual state of health until
April 24 when he experienced right-sided chest
pain after 10 minutes of bicycling exercise at
YMCA. He described the chest pain as a dull ache
in the right side of his chest radiating
posteriorly to the right scapular area. He rated
the intensity as 7 out of 10. The chest pain
lasted about 3 minutes and resolved with rest.
That same night, the patient once again
experienced right-sided chest pain while lying in
bed right before he went to sleep. He describes
the pain as right-sided chest pain with same
radiation to posterior at an intensity of 6-7 out
of 10. The chest pain lasted about 10 minutes and
resolved spontaneously.
... it must be able to understand !
39What is understanding ?
- To understand something is to know what its
significance is. - What 'knowing significance' amounts to may be
very different in different contexts thus
understanding a piece of music requires different
things of us than understanding a sentence in a
language we are learning, for instance. It would
be useful, then, for theorists to look at the
different kinds of understanding that there are,
and examine them in detail and without prejudice,
rather than looking for the essence of
understanding. - (Tim Crane, philosopher of mind)
- The significance of a single sentence, too, can
vary from context to context.
40The etymology of understanding
- understanding ? Latin substare
- literally to stand under
- Websters Dictionary (1961) understanding the
power to render experience intelligible by
bringing perceived particulars under appropriate
concepts. - particulars what is NOT SAID of a subject
(Aristotle) - substances this patient, that tumor, ...
- qualities the red of that patients skin, his
body temperature, blood pressure, ... - processes that incision made by that surgeon,
the rise of that patients temperature,... - concepts may be taken in the above definition
as Aristotles universals what is SAID OF a
subject - Substantial concepts patient, tumor, ...
- Quality concepts white, temperature
- ...
41What is natural language understanding?
- NLU is constructing meaning from written
language by which the degree of understanding
involves a multifaceted meaning-making process
that depends on knowledge about language and
knowledge about the world. - ( cf. reading comprehension by humans. )
- But then what is meaning
42Why are concepts not enough?
- Why must our theory address also the referents in
reality? - Because referents are observable fixed points in
relation to which we can work out how the
concepts used by different communities relate to
each other - Because only by looking at referents can we
establish the degree to which concepts are good
for their purpose.
43But why then this fixation on normative
concepts in Medical Informatics (standards) ?
- CEN/TC251 ENV 12264
- This ENV is applicable to the description of the
categorial structure of systems of concepts
supporting computer-based terminological systems,
including coding systems, for health-care. - concept unit of thought constituted through
abstraction on the basis of properties common to
a set of one or more referents - BUT THEY NEVER IN FACT LOOK AT THE REFERENTS AT
ALL! - ISO/TC215/N142 Health informatics Vocabulary of
terminology - The purpose of this International Standard is to
define a set of basic concepts required to
describe and discuss formal representation of
concepts and characteristics, for use especially
in formal computer based concept representation
systems. - concept unit of knowledge created by a unique
combination of characteristics - THEY ARE ALREADY TWO LEVELS REMOVED FROM THE
REFERENT!
44Why existing ontologies dont match OUR
needsfor a core ontology
- MeSH inconsistency in hierarchical relationships
- MedDRA no difference between concepts and terms
- UMLS integrates various source terminologies
without taking different meanings of terms,
different structures, different purposes, etc...
into account - SNOMED formal system, but lacks sufficient depth
of the ontology - GALEN very detailed ontology for some parts of
healthcare but very poor coverage over healthcare
as a whole. The ontology is independent from
language as medium of communication (the ontology
does not accept language as part of reality) - ...
Most important all of them deal with alternative
realities or possible worlds and none is focused
on the referents in THIS world !
45Based on formal ontology
HAS-PARTIAL-SPATIAL-OVERLAP
46Example joint anatomy
- joint HAS-HOLE joint space
- joint capsule IS-OUTER-LAYER-OF joint
- meniscus
- IS-INCOMPLETE-FILLER-OF joint space
- IS-TOPO-INSIDE joint capsule
- IS-NON-TANGENTIAL-MATERIAL-PART-OF joint
- joint
- IS-CONNECTOR-OF bone X
- IS-CONNECTOR-OF bone Y
- synovia
- IS-INCOMPLETE-FILLER-OF joint space
- synovial membrane IS-BONAFIDE-BOUNDARY-OF joint
space
47Linking external ontologies
48Linguistic and domain ontologies
49From text to meaning
50Mr Kovács analysed syntactically, and features
used to drive mapping.
- The Orth slot of a word gives its surface string.
- The append( ) operator joins together its
arguments as a single string.
51Conclusions
- Understanding a message comes down to
identifying what parts of that message correspond
to reality, and what parts are expressions of
what doesnt exist. - If a machine has to understand, it must be based
on algorithms that use a realist ontology that
takes the world, language(s) and the relationship
amongst them, properly into account. - The medical informatics community (specifically
that part dealing with concept systems) must
become aware that most current approaches confuse
what is real with what is thought to be real.
52The Reference Ontology Community
- IFOMIS (Leipzig)
- Laboratories for Applied Ontology (Trento/Rome,
Turin) - Foundational Ontology Project (Leeds)
- Ontology Works (Baltimore)
- Ontek Corporation (Buffalo/Leeds)
- Language and Computing (LC) (Belgium/Philadelphia
)
53Domains of Current Work
- IFOMIS Leipzig Medicine, Bioinformatics
- Laboratories for Applied Ontology
- Trento/Rome Ontology of Cognition/Language
- Turin Law
- Foundational Ontology Project Space, Physics
- Ontology Works Genetics, Molecular Biology
- Ontek Corporation Biological Systematics
- Language and Computing Natural Language
Understanding
54Testing the BFO/MedO approach
- collaboration with
- Language and Computing nv (www.landcglobal.be)
55- LCs Semantic Indexing for Smart Information
Retrieval and Extraction solution allows
companies to more efficiently and accurately
manage and retrieve documents. LC also offers
solutions for information analysis, document
mining, information extraction, and coding.
56LC Technology
- FreePharma, LCs natural language analyzer for
converting free text (spoken or typed)
prescription and pharmacology information into
XML. - FastCode, LCs automated clinical coding
product for translation of free text strings into
ICD, SNOMED, MedDRA, etc. - LinKBase, the largest formal medical knowledge
base in the world, representing medicine in such
a way that it is understandable for a computer. - LinKFactory, LCs product suite for developing
and managing large formal multilingual
ontologies.
57LC statistical technology
- unearthed errors in SNOMED via pattern-recognition
of semantic connections
58The Project
- collaborate with LC to show how an ontology
constructed on the basis of philosophical
principles can help in overhauling and validating
the large terminology-based medical ontology
LinkBase used by LC for NLP
59LC
- LinKBase worlds largest terminology-based
ontology - with mappings to UMLS, SNOMED, etc.
- LinKFactory suite for developing and managing
large terminology-based ontologies
60LinKBase
- BFO and MedO designed to add better reasoning
capacity - by tagging LinKBase domain-entities with
corresponding BFO/MedO categories - by constraining links within LinKBase according
to the theory of granular partitions
61Three levels of ontology
- 1) formal ontology, seeks the construction of a
framework of the categories object, event,
whole, part employed in every domain, - 2) domain ontology, a top-level system applying
the structure of formal ontology to a particular
domain, such as medicine or genetics, - 3) terminology-based ontology, a very large,
lower-level system dealing with the complete
terminology of a given domain.
62LCs long-term goal
- Transform the mass of unstructured patient
records into a gigantic medical experiment
63IFOMISs long-term goal
- Build a robust high-level BFO-MedO framework
- THE WORLDS FIRST INDUSTRIAL-STRENGTH PHILOSOPHY
- which can serve as the basis for an
ontologically coherent unification of medical
knowledge and terminology
64The solution
-
- ONTOLOGY!
- But what does ontology mean?
-
65Example The Gene Ontology (GO)
- hormone GO0005179
- digestive hormone GO0046659
- peptide hormone GO0005180 adrenocorticotrop
in GO0017043 glycopeptide hormone
GO0005181 follicle-stimulating hormone
GO0016913 - subsumption (lower term is_a higher term)
-
66as tree
- hormone
- digestive hormone peptide hormone
- adrenocorticotropin
glycopeptide hormone -
follicle-stimulating hormone
67GO
- is very useful for purposes of standardization in
the reporting of genetic information - but it is not much more than a telephone
directory of standardized designations organized
into hierarchies
68First Problem
- can in practice be used only by trained
biologists (know how) - whether a GO-term stands in the subsumption
relationship depends on the context in which the
term is used - (for example on the type of organism)
69Second Problem
- GDB a gene is a DNA fragment that can be
transcribed and translated into a protein - GenBank a gene is a DNA region of biological
interest with a name and that carries a genetic
trait or phenotype - GO uses gene in its term hierarchy,
- but it does not tell us which of these
definitions is correct
70GO
- has no robust formal organization
- no capability to be aligned with systems which
would have the power to use it to reason with
genetic information
71SNOMED RT (2000)
- already has description logic definitions
- but it also has some bad coding, which derives
from failure to pay attention to ontological
principles - e.g.
- both testes is_a testis
72How resolve such incompatibilities?
- enforce terminological compatibility via
standardized term hierarchies, with standardized
definitions of terms
73Problem People are lazy
- Half the pages on Geocities are called Please
title this page
74Problem People are stupid
- The vast majority of the Internet's users
- (even those who are native speakers of English)
- cannot spell or punctuate
- Will internet users learn to accurately tag
their information with whatever hierarchy they're
supposed to be using?
75Solutions in the medical domain
- Problem 1 People lie
- Problem 2 People are lazy
- Problem 3 People are stupid
- None of these is true in the world of medical
informatics
76Solutions in the medical domain
- Problem 1 People lie
- Problem 2 People are lazy
- Problem 3 People are stupid
- Achieve quality control via division of labour
77Division of Labour
- 1. Clinical activities
- 2. Structured data representation
- 3. Software coding (e.g. for NLP)
78Division of Labour
- 1. Clinical activities
- 2. Structured data representation
- 3. Software coding
- 4. Ontology building
- Use 4. to constrain 2. and 3.
- to achieve better data processing via quality
control
79Problem Multiple descriptions
- Requiring everyone to use the same vocabulary to
describe their material is not always medically
practicable
80Clinicians
- often do not use category systems at all they
use unstructured text - from which usable data has to be extracted in a
further step - Why?
- Because every case is different, much patient
data is context-dependent
81David J. Rothwell(one of the two original
authors of SNOMED)
- The notion of a standard vocabulary and in
particular a coding system to serve as the answer
to the ills besetting adoption of an Electronic
Medical Record is, in my view, quite wrong.
Traditional coding schemes, SNOMED included, are
a nineteenth century idea, that despite 100 years
of effort have failed. There are narrowly defined
areas where codes function well, but these areas
must be precisely defined. e.g. ICD-O, Drugs,
organisms. It is my belief that natural language
is the "code" that, despite its difficulties, we
must learn to work with to address the issues
encountered in a medical record.
82SARS
- is NOT
- Severe Acute Respiratory Syndrome
- it is THIS collection of instances of
- Severe Acute Respiratory Syndrome
- associated with THIS coronavirus and ITS mutations
83BFO
- ontology not the standardization or
specification of concepts - (not a branch of knowledge or concept
engineering) - but an inventory of the types of entities
existing in reality
84BFO goal
- to remove ontological impedance by constraining
terminology systems with good ontology
85BFO not a computer application
- but a reference ontology
- (not a reference terminology
- in the sense of SNOMED)
86Recall
- GDB a gene is a DNA fragment that can be
transcribed and translated into a protein - Genbank a gene is a DNA region of biological
interest with a name and that carries a genetic
trait or phenotype
87Ontology
- fragment, region, name, carry, trait,
type - ... part, whole, function, inhere,
substance - are ontological terms in the sense of traditional
(philosophical) ontology
88UMLS Semantic Network
- entity event
- physical conceptual
- entity entity
- idea of concept
89- Idea or Concept
- Functional Concept
- Qualitative Concept
- Quantitative Concept
- Spatial Concept
- Body Location or Region
- Body Space or Junction
- Geographic Area
- Molecular Sequence
- Amino Acid Sequence
- Carbohydrate Sequence
- Nucleotide Sequence
90UMLS has ontological problems, too
- Idea or Concept
- Functional Concept
- Qualitative Concept
- Quantitative Concept
- Spatial Concept
- Body Location or Region
- Body Space or Junction
- Geographic Area
- Molecular Sequence
- Amino Acid Sequence
- Carbohydrate Sequence
- Nucleotide Sequence
91Sachsen-Anhalt
92UMLS has ontological problems, too
- Idea or Concept
- Functional Concept
- Qualitative Concept
- Quantitative Concept
- Spatial Concept
- Body Location or Region
- Body Space or Junction
- Geographic Area
- Molecular Sequence
- Amino Acid Sequence
- Carbohydrate Sequence
- Nucleotide Sequence
93Testing the BFO/MedO approach
- collaboration with
- Language and Computing nv (www.landcglobal.be)
94The Project
- collaborate with LC to show how an ontology
constructed on the basis of philosophical
principles can help in overhauling and validating
the large terminology-based medical ontology
LinkBase used by LC for NLP
95LC
- LinKBase worlds largest terminology-based
ontology - with mappings to UMLS, SNOMED, etc.
- LinKFactory suite for developing and managing
large terminology-based ontologies
96LinKBase
- BFO and MedO designed to add better reasoning
capacity - by tagging LinKBase domain-entities with
corresponding BFO/MedO categories - by constraining links within LinKBase according
to the theory of granular partitions
97LCs long-term goal
- Transform the mass of unstructured patient
records into a gigantic medical experiment
98IFOMISs long-term goal
- Build a robust high-level BFO-MedO framework
- THE WORLDS FIRST INDUSTRIAL-STRENGTH PHILOSOPHY
- which can serve as the basis for an
ontologically coherent unification of medical
knowledge and terminology
99END
- http//ontologist.com
- http//ifomis.de