Title: The Neighborhood Auditing Tool
1The NeighborhoodAuditing Tool
- James Geller
- Michael Halper
- Yehoshua Perl
- C. Paul Morrey
2Research Paper
- C.P. Morrey, J. Geller, M. Halper, Y. Perl. The
Neighborhood Auditing Tool A hybrid interface
for auditing the UMLS. J Biomed Inform,
42(3)468-89, 2009.
2
3Overview
- Goals of an Auditors Tool for the UMLS
- Principles of Auditing with Neighborhoods
- The Idea of a Hybrid Display
- Current State of the NAT Serving the Auditor
- Presentation of NAT Features
- Live Audit Session
- Planned State of the NAT Guiding the Auditor
- Conclusions
- Future Work
3
3
4Auditing the UMLS
- About 150 source vocabularies
- It is natural that inconsistencies will appear
- Over 2.1 million concepts and nearly 9.7 million
terms - Two level structure consisting of the Semantic
Network and the Metathesaurus
4
UMLS Metathesaurus version 2009AA
5Previous Work on Auditing
- H. Gu, Y. Perl, J. Geller, M. Halper, L. Liu, and
J.J. Cimino. Representing the UMLS as an
Object-oriented Database Modeling Issues and
Advantages. J Am Med Inform Assoc, 7(1)66-80,
2000. - J. Geller, H. Gu, Y. Perl, and M. Halper.
Semantic refinement and error correction in large
terminological knowledge bases. Data Knowledge
Engineering, 45(1)1-32, 2003. - J.J. Cimino, H. Min, and Y. Perl. Consistency
across the hierarchies of the UMLS Semantic
Network and Metathesaurus. J Biomed Inform,
36(6)450-461, 2003. - H. Gu, Y. Perl, G. Elhanan, H. Min, L. Zhang, Y.
Peng. Auditing concept categorizations in the
UMLS. Artif Intell Med, 31(1)29-44, 2004. - Y. Chen, Y. Perl, J. Geller, and J.J. Cimino.
Analysis of a study of the users, uses, and
future agenda of the UMLS. J Am Med Inform
Assoc, 14(2)221-231, 2007.
6Previous Work on Auditing (contd)
- H. Gu, G. Hripcsak, Y. Chen, C.P. Morrey, G.
Elhanan, J.J. Cimino, J. Geller, and Y. Perl.
Evaluation of a UMLS auditing process of semantic
type assignments. In J.M. Teich, J. Suermondt,
and G. Hripcsak, editors, Proc AMIA Symp, pages
294-298, Chicago IL, Nov. 2007. - Y. Chen, H. Gu, Y. Perl, J. Geller, M. Halper.
Structural group auditing of a UMLS semantic
type's extent. J Biomed Inform. 2009
Feb42(1)41-52. - L. Chen, C.P. Morrey, H. Gu, M. Halper, Y. Perl.
Modeling multi-typed structurally viewed
chemicals with the UMLS Refined Semantic Network.
J Am Med Inform Assoc, 16(1)116-31, 2009. - Y. Chen, H. Gu, Y. Perl, J. Geller. Structural
group-based auditing of missing hierarchical
relationships in UMLS. J Biomed Inform. 2009
Jun42(3)452-67. - Y. Chen, H. Gu, Y. Perl, M. Halper, and J. Xu,
Expanding the extent of a UMLS Semantic Type via
Group Neighborhood Auditing. J Am Med Inform
Assoc, Accepted for publication.
6
7How we did it before the NAT Provide Info as
Paper Form
CPT C1081844 Antonospora locustae SRC NCBI STY
T004T009 Fungus Invertebrate DEF SYN
Antonospora locustae Nosema locustae PAR
AntonosporaSTY Invertebrate CHD
Data shown for this concept is from the UMLS
Metathesaurus version 2006AC
8Auditing Results also Paper Form
- (C1081844) Antonospora locustae
- STY Fungus Invertebrate
- No errors
- Semantic Type Error Fungus
- Semantic Type Error Invertebrate
- Add Semantic Type______________________
- Ambiguity
- Other error_____________________________
- Comments _____________________________
______________________________________
8
9Goals of an Auditors Tool for the UMLS
- Display relevant information to the auditor.
- Do not overwhelm the auditor with too much
information. - Help the auditor focus on areas most likely to
contain errors. - Algorithms suggest likely erroneous concepts
- Concepts are reviewed in a neighborhood display
9
10Principles of Auditing with Neighborhoods
- Several years of experience Auditing is to a
large degree a local activity. - Concepts have two kinds of knowledge elements
- Textual Knowledge Elements Preferred term, CUI,
synonyms, LUI, definition, sources, semantic
types - Contextual Knowledge Elements Neighbors
10
11Neighborhoods
- Focus concept The concept presently under review
- Immediate Neighborhood The set of concepts
reachable from the focus concept by stepping one
relationship (up, down, lateral, etc.) - Extended neighborhood Includes parents of
parents (grandparents), children of children
(grandchildren) and siblings. No lateral chains.
11
12References about Neighborhood
- M.S. Tuttle, D.D. Sherertz, N.E. Olson, M.S.
Erlbaum, W.D. Sperzel, and L.F. Fuller, et al.
Using META-1, the first version of the UMLS
Metathesaurus. In Proc 14th Annu Symp Comput Appl
Med Care, pages 131-135, Washington, D.C., 1990. - S.J. Nelson, M.S. Tuttle, W.G. Cole, D.D.
Sherertz, W. D. Sperzel, M.S. Erlbaum, L.L.
Fuller, N.E. Olson, From meaning to term
semantic locality in the UMLS Metathesaurus. In
Proc Annu Symp Comput Appl Med Care, pages
209-213, Washington, D.C., 1991.
13Immediate Neighborhood
13
14Extended Neighborhood
14
15Up-Extended and Down-Extended Neighborhood
- An up-extended neighborhood includes grandparents
and the immediate neighborhood. - A down-extended neighborhood includes
grandchildren and the immediate neighborhood. - Give auditor all s/he needs but not more.
16Semantic Type Neighborhood
- If we provide the semantic types for every
concept, those also form a neighborhood. - It is important to keep the information of which
semantic types are assigned to which concepts.
17The Idea of a Hybrid Display
- Diagrams are wonderful as long as they fit on
one screen. - Indented text is wonderful as long as there are
no or very few multiple parents. - But the UMLS does not fit onto one screen and
there are many cases of multiple parents.
17
18What makes a diagram wonderful?
- You can follow parent/child paths with your eyes.
- You can get a feeling for everything a concept is
connected to with one look. - You can see multiple parents and multiple paths
with one look. - You can see global features (short and bushy
versus tall and sparse, or (gasp!) tall and
bushy).
18
19What makes indented text wonderful?
- Indentation expresses parenthood compactly and
elegantly. - There are no lines crossing.
- You dont need a layout algorithm.
- There is a linear order in which to study text.
19
20The Idea of a Hybrid Display (cont.)
- Keep the best features of text and the best
features of diagrams. - Maintain relative positions between the focus
concept and its children, parents, etc. - Eliminate clutter of arrows.
20
21A Hybrid Diagram/Form Display of a Neighborhood
Parents
Synonyms
Relationships
Focus Concept
Children
21
22Desirable Information Beyond Neighborhoods
- Concept definition for Focus Concept
- Sources for concepts and relationships
- Assigned Semantic Types of concepts
- Definitions of relevant Semantic Types
- Global view of the Semantic Network
- Indented (better for wide branches)
- Graphical (better for almost everything else)
22
23Current State of the NAT Serving the Auditor
- The Neighborhood Auditing Tool has been
implemented to fully support display of
neighborhoods. - Navigation to adjacent neighboring concepts is an
easy click. - Additional features listed before have been
implemented.
23
24Demonstration of NAT Features
- Neighborhood
- Grandparents and grandchildren
- Synonyms
- Relationships Concept, Sibling, Term
- Focus concept definition
- Sources Concepts, Relationships
- Display CUIs
- Semantic Type display
- Semantic Type definition
- Semantic Network (indented)
- Semantic Network (diagram)
- Navigation
- Search (full, partial)
- Viewing History
- Choice of release
- Choice of sources
offline version
24
24
25Audit Example A Cycle of Three Concepts
- An SQL query found three concepts that
participate in a PAR/CHD cycle. - We follow an auditors review of this cycle.
- O. Bodenreider, Circular hierarchical
relationships in the UMLS etiology, diagnosis,
treatment, complications and prevention. Proc
AMIA Symp. 200157-61
offline version
25
25
26The Cycle of Three Concepts
27Recommended Modeling
28Audit Example Semantic Types
- An algorithm determined that the concept
Antonospora locustae was likely assigned
incorrect semantic types. - We follow an auditors review of this concept.
offline version
28
29Preliminary Evaluation Study with NAT
- Compare paper-based auditing and NAT-based
auditing. - Counterbalanced groups.
- Recall improves with NAT use. Auditors seem
willing to investigate more concepts. - Precision stays the same. Auditors mental
process does not improve.
30Conclusions
- Preliminary study showed that people are more
successful finding errors with NAT than with
paper sources. ? - Recall improved with the NAT, precision did not.
- NAT seems to nicely complement use of the UMLSKS.
30
31Future Work
- Integration of algorithms for developing audit
sets with NAT. - Recording and reporting auditor recommendations.
- Facilitate team auditing where several auditors
review the same sample. - Managing and reporting work flow of auditor teams.
31
32Thank you!
The Neighborhood Auditing Tool is available
online at http//nat.njit.edu
3333
34Preliminary Evaluation Study
Auditor Errors Errors Recall Recall Precision Precision F F
Auditor with NAT w/o NAT with NAT w/o NAT with NAT w/o NAT with NAT w/o NAT
1 57 45 0.97 0.82 0.53 0.51 0.86 0.63
2 22 20 0.43 0.35 0.55 0.55 0.48 0.43
3 39 34 0.64 0.58 0.46 0.53 0.54 0.55
4 56 44 0.55 0.54 0.30 0.34 0.39 0.42
Avg. 44 36 0.65 0.57 0.46 0.48 0.57 0.51
35Improved Recall
- The auditor finds it easy to search for more
errors in the neighborhood of the suspicious
concept. - With better recall and the same precision you
still find more errors.
36Semantic Types Example
- The concept Antonospora locustae was selected for
audit by an algorithm that found it was the only
concept assigned to the intersection Fungus
Invertebrate in the UMLS 2007AA.
37(No Transcript)
38(No Transcript)
39(No Transcript)
40(No Transcript)
41(No Transcript)
42(No Transcript)
43(No Transcript)
44(No Transcript)
45(No Transcript)
46(No Transcript)
47(No Transcript)
48(No Transcript)
49(No Transcript)
50(No Transcript)
51NAT Features Demonstration
52Neighborhood
53(No Transcript)
54(No Transcript)
55(No Transcript)
56(No Transcript)
57(No Transcript)
58(No Transcript)
59(No Transcript)
60(No Transcript)
61(No Transcript)
62(No Transcript)
63(No Transcript)
64(No Transcript)
65(No Transcript)
66(No Transcript)
67(No Transcript)
68(No Transcript)
69(No Transcript)
70(No Transcript)
71(No Transcript)
72(No Transcript)
73(No Transcript)
74Cycle Example
- An SQL query provided us with a list of concepts
in the Metathesaurus that participate in cycles
of length three. - One of these cycles exists among the concepts
Bipolar Disorder, Mood Disorders, and Affective
Disorders, Psychotic.
75(No Transcript)
76(No Transcript)
77(No Transcript)
78(No Transcript)
79(No Transcript)