Title: Cornerstone I: Representing Knowledge
1Cornerstone I Representing Knowledge
- From Data to Knowledge Through Concept-Oriented
Terminologies - James J. Cimino
2The first step on the path to knowledge is
getting things by their right names.
3Overview
- What is data to knowledge?
- Knowledge representation choices
- Knowledge-based terminology efforts
- Medical Entities Dictionary
- Proof of concepts
4What is data to knowledge?
- Start with patient data in the medical record
- Enhance knowledge by
- gaining a better understanding of the patient
- learning relevant knowledge
- bringing smart systems to bear to apply knowledge
- discovering new knowledge from health data
5Knowledge Representation
- Terminology for representing symbols
- Format for arranging the symbols
6Knowledge Representation Choices
7Guideline Implementation
- Starren and Xie, SCAMC, 1994
- National Cholesterol Education Panel Guideline
8National Cholesterol Education Panel Guideline
Measure Cholesterol Assess Risk Factors
9Guideline Implementation
- Starren and Xie, SCAMC, 1994
- National Cholesterol Education Panel Guideline
- Three representations
- PROLOG (first-order logic)
10NCEP Guideline in PROLOG
- rule_j(PID)-
- check_lab(PID,hdl,HDL,_),!,
- HDL gt 35,
- total_risk(PID,Risk),!,
- Risk lt 2,
- check_lab(PID,cholesterol), C,_),
- C gt 200,
- C lt 239,
- print_rule_j.
11Guideline Implementation
- Starren and Xie, SCAMC, 1994
- National Cholesterol Education Panel Guideline
- Three representations
- PROLOG (first-order logic)
12NCEP Guideline in CLASSIC
- (CL-DEFINE-CONCEPT C-PATIENT
- (AND
- (ALL CHOL
- (AND INTEGER
- (MIN 200) (MAX 239)))))
- (CL-DEFINE-CONCEPT G-PATIENT
- (AND C-PATIENT LOW-RISK-PATIENT
- (ALL HDL (AND INTEGER (MIN 35)))))
13Guideline Implementation
- Starren and Xie, SCAMC, 1994
- National Cholesterol Education Panel Guideline
- Three representations
- PROLOG (first-order logic)
- CLASSIC (frames)
14NCEP Guideline in CLIPS
- (defrule C2G2J Rules to reach box J
- ?f1 lt- (calculated-patient (state c)
- (done no) (hdl ?hdl) (name ?name)
- (test (gt ?hdl 35))
- gt
- (printout Patient ?name needs treatment)
15Guideline Implementation
- Starren and Xie, SCAMC, 1994
- National Cholesterol Education Panel Guideline
- Three representations
- PROLOG (first-order logic)
- CLASSIC (frames)
- CLIPS (production rules)
- All three representations proved adequate for
encoding the guideline
16Knowledge Representation Choices
17Terminology Representation Choices
18Frame-Based Representation
- Serum Glucose Test
- is-a Lab Test
- Measures Glucose
- Specimen Serum
- Units mg/dl
19Terminology Representation Choices
Terminology Representation Choices
20Semantic Network Representation
Serum Glucose Test
21Terminology Representation Choices
Terminology Representation Choices
- Frame-based
- Semantic network
22Conceptual Graph Representation
- Serum Glucose Test -
- (is-a) -gt Lab Test
- (measures) -gt Glucose
- (specimen) -gt Serum
23Terminology Representation Choices
Terminology Representation Choices
- Frame-based
- Semantic network
- Conceptual graphs
24Knowledge Representation Choices
- Guideline implementation
- Terminologic knowledge
25Knowledge Representation
- Terminology for representing symbols
- Format for arranging the symbols
- Terminology and format for representing
terminologic knowledge
26Knowledge-Based Terminology Efforts
- Jochen Bernauer, SCAMC, 1991
27Jochen Bernauer, SCAMC, 1991
- Conceptual graphs to model findings
28Knowledge-Based Terminology Efforts
- Jochen Bernauer, SCAMC, 1991
- Rector, Nolan and Glowinski, SCAMC, 1993
29Rector, Nolan and Glowinski, SCAMC, 1993
- GALEN project
- conditions grammatically haveLocation bodyparts
- fractures sensibly haveLocation bones
- femurs sensiblyAndNecessarily haveDivision neck
30Knowledge-Based Terminology Efforts
- Jochen Bernauer, SCAMC, 1991
- Rector, Nolan and Glowinski, SCAMC, 1993
- Campbell and Musen, SCAMC, 1993
31Campbell and Musen, SCAMC, 1993
- Conceptual graphs and SNOMED
- Pain Chest Radiation to Left Arm
Pain -
(located in) -gt Chest (radiating to) -gt
Arm -gt (with laterality) -gt Left
32Knowledge-Based Terminology Efforts
- Jochen Bernauer, SCAMC, 1991
- Rector, Nolan and Glowinski, SCAMC, 1993
- Campbell and Musen, SCAMC, 1993
- Lindberg, Humphreys, McCray, Methods 1993
33Lindberg, Humphreys, McCray, Methods 1993
- Unified Medical Language System
Concept
Lexical group
String
String
34Knowledge-Based Terminology Efforts
- Jochen Bernauer, SCAMC, 1991
- Rector, Nolan and Glowinski, SCAMC, 1993
- Campbell and Musen, SCAMC, 1993
- Lindberg, Humphreys, McCray, Methods 1993
- Rocha, Huff, et al., CBM, 1994
35Rocha, Huff, et al., CBM, 1994
- VOSER
- A server architecture for managing terminologic
knowledege
36Knowledge-Based Terminology Efforts
- Jochen Bernauer, SCAMC, 1991
- Rector, Nolan and Glowinski, SCAMC, 1993
- Campbell and Musen, SCAMC, 1993
- Lindberg, Humphreys, McCray, Methods 1993
- Rocha, Huff, et al., CBM, 1994
- Campbell, Cohn, Chute, et al., SCAMC 1996
37Campbell, Cohn, Chute, et al., SCAMC 1996
- Convergent Medical Terminology
- SNOMED/Kaiser/Mayo
- Galapagos
38Knowledge-Based Terminology Efforts
- Jochen Bernauer, SCAMC, 1991
- Rector, Nolan and Glowinski, SCAMC, 1993
- Campbell and Musen, SCAMC, 1993
- Lindberg, Humphreys, McCray, Methods 1993
- Rocha, Huff, et al., CBM, 1994
- Campbell, Cohn, Chute, et al., SCAMC 1996
- Brown, ONeil and Price, Methods, 1997
39Brown, ONeil and Price, Methods, 1997
- Read Codes
- Representation with GALEN model
40Knowledge-Based Terminology Efforts
- Jochen Bernauer, SCAMC, 1991
- Rector, Nolan and Glowinski, SCAMC, 1993
- Campbell and Musen, SCAMC, 1993
- Lindberg, Humphreys, McCray, Methods 1993
- Rocha, Huff, et al., CBM, 1994
- Campbell, Cohn, Chute, et al., SCAMC 1996
- Brown, ONeil and Price, Methods, 1997
- Spackman, Campbell, and Côte, SCAMC 1997
41Spackman, Campbell, and Côte, SCAMC 1997
- SNOMED RT (Reference Terminology)
- Convergent Medical Terminology
- Description Logic Format
42Knowledge-Based Terminology Efforts
- Jochen Bernauer, SCAMC, 1991
- Rector, Nolan and Glowinski, SCAMC, 1993
- Campbell and Musen, SCAMC, 1993
- Lindberg, Humphreys, McCray, Methods 1993
- Rocha, Huff, et al., CBM, 1994
- Campbell, Cohn, Chute, et al., SCAMC 1996
- Brown, ONeil and Price, Methods, 1997
- Spackman, Campbell, and Côte, SCAMC 1997
- Huff, Rocha, McDonald, et al., JAMIA 1998
43Huff, Rocha, McDonald, et al., JAMIA 1998
- Logical Observations, Identfiers, Names and Codes
(LOINC) - 4764-5 GLUCOSE3H POST 100 G GLUCOSE PO SCNC
PT SER/PLAS QN
44Knowledge-Based Terminology Efforts
- Jochen Bernauer, SCAMC, 1991
- Rector, Nolan and Glowinski, SCAMC, 1993
- Campbell and Musen, SCAMC, 1993
- Lindberg, Humphreys, McCray, Methods 1993
- Rocha, Huff, et al., CBM, 1994
- Campbell, Cohn, Chute, et al., SCAMC 1996
- Brown, ONeil and Price, Methods, 1997
- Spackman, Campbell, and Côte, SCAMC 1997
- Huff, Rocha, McDonald, et al., JAMIA 1998
- Pharmacy system knowledge base vendors
45Pharmacy System Knowledge Base Vendors
Country-Specific Packaged Product
Ingredient
Manufactured Components
Composite Trademark Drug
46Knowledge-Based Terminology Efforts
- Jochen Bernauer, SCAMC, 1991
- Rector, Nolan and Glowinski, SCAMC, 1993
- Campbell and Musen, SCAMC, 1993
- Lindberg, Humphreys, McCray, Methods 1993
- Rocha, Huff, et al., CBM, 1994
- Campbell, Cohn, Chute, et al., SCAMC 1996
- Brown, ONeil and Price, Methods, 1997
- Spackman, Campbell, and Côte, SCAMC 1997
- Huff, Rocha, McDonald, et al., JAMIA 1998
- Pharmacy system knowledge base vendors
47Medical Entities Dictionary (MED)
- New York Presbyterian Hospital
- 60,000 concepts (procs, results, drugs, probs)
- 208,242 synonyms
- 84,677 hierarchical links
- 113,906 semantic links
- 238,040 other attributes
- 66,404 translations (ICD9-CM, LOINC, MeSH, UMLS)
48Central Controlled Terminology
49MED Data Structures
50MED Semantic Network
Medical Entity
Plasma Glucose
51MED Data Structures
52MED MUMPS Global
- med(1600) ltSERUM GLUCOSE MEASUREMENTgt
- med(1600,1) ltC0202041gt
- . . ,4) lt32703,50000gt
- . . ,5) ltgt
- . . ,6) ltSerum Glucose Measurementgt
- . . ,7) ltgt
- . . ,8) lt1724gt
- . . ,12) ltGLUCgt
- . . ,14) lt169gt
- . . ,16) lt31987gt
- . . ,17) ltmg/dlgt
- . . ,20) ltC000006gt
- . . ,23) lt1178gt
- . . ,50) ltSerum Glucosegt
- . . ,138) lt40444,40445,40446,59165gt
- . . ,156) ltMCNCgt
- . . ,161) ltQNgt
53MED Data Structures
- Semantic network
- MUMPS global
54MED DB2 Tables
55MED Data Structures
- Semantic network
- MUMPS global
- DB2
56MED UNIX Data Structure
- 1600SERUM GLUCOSE MEASUREMENT 1C020241432703
45000012GLUC17mg/dl........
57MED Data Structures
- Semantic network
- MUMPS global
- DB2
- UNIX
58Proof of Concepts
- Merging data and application knowledge
59Merging Data and Application Knowledge
- Class-based, reusable lab summaries
Chem20 Display
Serum Glucose Test
Fingerstick Glucose Test
Plasma Glucose Test
60DOP Summary
61WebCIS Summary
62Merging Data and Application Knowledge
- Class-based, reusable lab summaries
Chem20 Display
Serum Glucose Test
Fingerstick Glucose Test
Plasma Glucose Test
- Expert system for application maintenance
63Proof of Concepts
- Merging data and application knowledge
- Smarter retrievals from the record
64Smarter Retrievals from the Record
- Repository stores events and results
- Clinical problems at a different level of
granularity - Re-use knowledge to map from problems to clinical
data - Produce problem-specific views of the medical
record
65Concept-oriented (Heart)
Radiology 2/28/96 Head CT
Lab 12/28/96 Sickle Cell Test
Admission 3/14/96 Stroke
Lab 1/1/99 Blood Type Test
Radiology 2/1/97 Knee X Ray
Admission 2/14/98 Angina
Discharge 1/15/99 CHF
Radiology 2/23/99 Chest X Ray
Lab 1/1/99 Cardiac Enzyme Test
66(No Transcript)
67(No Transcript)
68(No Transcript)
69(No Transcript)
70Proof of Concepts
- Merging data and application knowledge
- Smarter retrievals from the record
71Just-in-time Education
- Medline button
- Infobuttons
72(No Transcript)
73(No Transcript)
74(No Transcript)
75(No Transcript)
76(No Transcript)
77(No Transcript)
78(No Transcript)
79(No Transcript)
80(No Transcript)
81(No Transcript)
82Just-in-time Education
- Medline button
- Infobuttons
83(No Transcript)
84(No Transcript)
85(No Transcript)
86(No Transcript)
87(No Transcript)
88(No Transcript)
89(No Transcript)
90(No Transcript)
91(No Transcript)
92(No Transcript)
93Just-in-time Education
- Medline button
- Infobuttons
- Text-to-Web
94Proof of Concepts
- Merging data and application knowledge
- Smarter retrievals from the record
- Just-in-Time education
95Expert Systems
- Hripcsak, et al., Ann. Int. Med., 1995
96Hripcsak, et al., Ann. Int. Med., 1995
- Identify chest x-ray reports suspicious for 6
clinical conditions to trigger alerts - Method Sens Spec
- Laypersons 22-47 97-99
- Radiologists 73-98 96-99
- Internists 68-98 97-99
- Keyword 51-79 79-92
- NLP/MED/Rule-based 81 98
97Expert Systems
- Hripcsak, et al., Ann. Int. Med., 1995
- Clinical decision support system
98Clinical Decision Support System
- Data monitor runs rules against incoming reports
- Tuberculosis cultures come back 4-8 weeks later
- One day, hundreds of TB alerts came in
99What Happened to the Tuberculosis Alert?
?
Medical Logic Module
No Growth to Date
No Growth
100How We Outsmarted the Lab
?
Medical Logic Module
No Growth to Date
No Growth
101Expert Systems
- Hripcsak, et al., Ann. Int. Med., 1995
- Clinical decision support system
102DXplain Button
- Elhanan, et al., SCAMC 1997
- Convert of test results to clinical findings
Serum Cholesterol Test
103(No Transcript)
104(No Transcript)
105(No Transcript)
106(No Transcript)
107Expert Systems
- Hripcsak, et al., Ann. Int. Med., 1995
- Clinical decision support system
- DXplain Button
108Proof of Concepts
- Merging data and application knowledge
- Smarter retrievals from the record
- Just-in-Time education
- Expert systems
109Data Mining
- Wilcox and Hripcsak, SCAMC 1997
110Wilcox and Hripcsak, SCAMC 1997
111Data Mining
- Wilcox and Hripcsak, SCAMC 1997
- Wilcox and Hripcsak, SCAMC 1998
112Wilcox and Hripcsak, SCAMC 1998
- Compare traditional coding methods with NLP to
identify conditions in a set of patient records
(x-ray reports) - Method Sens Spec
- Laypersons 36 86
- Expert-coded cases 27-37 95-98
- ICD-9-coded cases 12-29 86-90
- Physicians 85 98
- NLP/MED/Rule-based 81 98
113Data Mining
- Wilcox and Hripcsak, SCAMC 1997
- Wilcox and Hripcsak, SCAMC 1998
114Proof of Concepts
- Merging data and application knowledge
- Smarter retrievals from the record
- Just-in-Time education
- Expert systems
- Data mining
- Database maintenance and use
115Database Maintenance and Use
- Tables, columns, events all modeled in the MED
- Allows linkage of data model to controlled
terminology - Terminologies can be reused
- Impact of terminology changes on data model can
be tracked
116Proof of Concepts
- Merging data and application knowledge
- Smarter retrievals from the record
- Just-in-Time education
- Expert systems
- Data mining
- Database maintenance and use
- Terminology maintenance and use
117Terminology Maintenance and Use
- Integrating terminologies from merging hospitals
- Automated update of medication terminology
- Detection of errors and inconsistencies
118Proof of Concepts
- Merging data and application knowledge
- Smarter retrievals from the record
- Just-in-Time education
- Expert systems
- Data mining
- Database maintenance and use
- Terminology maintenance and use
119Is it Worth the Trouble?
- Meed
- noun
- 1 archaic an earned reward or wage
- 2 a fitting return or recompense
- Date before 12th century
- Etymology from Old English
- MED
120Summary
- Putting knowledge in your terminology gets you
- Better ways to get knowledge out of your EMR
- Better ways to get knowledge out of resources
- Better ways to use other knowledge bases
- Bettter ways to use terminology
- Better ways to manage applications
- Better ways to manage data and terminology
- Representation scheme is less important
- Desiderata for controlled terminology
121Desiderata
- Desirable qualities for terminology
122Desiderata
- Desirable qualities for terminology
Go placidly amid the noise and haste, and
remember what peace there may be in
silence. Id rather be sailing