The Neighborhood Auditing Tool - PowerPoint PPT Presentation

1 / 74
About This Presentation
Title:

The Neighborhood Auditing Tool

Description:

Goals of an Auditor's Tool for the UMLS. Principles of ... You can see global features (short and bushy versus tall and sparse, or (gasp) tall and bushy) ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 75
Provided by: HSS48
Category:

less

Transcript and Presenter's Notes

Title: The Neighborhood Auditing Tool


1
The Neighborhood Auditing Tool
  • James Geller
  • Yehoshua Perl
  • C. Paul Morrey

2
Participating Student Developers
  • Dayanand Sagar
  • Kushal Chopra
  • Sandeep Ramachandran
  • Anisa Vishnani
  • Aditi Dekhane
  • Kandarp Shah
  • Rajesh Gupta
  • Suraj Pal Singh
  • Saurabh Patel
  • Kartik Gopal
  • Yakup Kav
  • Rahul Bhave
  • Sirish Motati
  • Pratik Shah
  • Saurabh Singhi
  • Sirish Motati Reddy
  • Sandeep Pasuparthy
  • Ramya Gokanakonda

2
3
Overview
  • Goals of an Auditors Tool for the UMLS
  • Principles of Auditing with Neighborhoods
  • The Idea of a Hybrid Display
  • Current State of the NAT Serving the Auditor
  • Feature Presentation
  • Live Audit Session
  • Planned State of the NAT Guiding the Auditor
  • Conclusions and Future Work

3
4
Auditing the UMLS
  • The UMLS consists of over 100 terminologies.
  • It is natural that inconsistencies will appear
  • Over 1.5 million concepts and over 7 million
    terms
  • Two level structure consisting of the Semantic
    Network and the Metathesaurus

4
5
How We did it before the NAT Paper Form
CPT C1081844 Antonospora locustae SRC NCBI STY
T004T009 Fungus Invertebrate DEF SYN
Antonospora locustae Nosema locustae PAR
AntonosporaSTY Invertebrate CHD
6
Previous Work on Auditing
  • H. Gu, Y. Perl, J. Geller, M. Halper, L. Liu, and
    J.J. Cimino. Representing the UMLS as an
    Object-oriented Database Modeling Issues and
    Advantages. J Am Med Inform Assoc, 7(1)66-80,
    2000.
  • J. Geller, H. Gu, Y. Perl, and M. Halper.
    Semantic refinement and error correction in large
    terminological knowledge bases. Data Knowledge
    Engineering, 45(1)1-32, 2003.
  • Y. Chen, Y. Perl, J. Geller, and J.J. Cimino.
    Analysis of a study of the users, uses, and
    future agenda of the UMLS. J Am Med Inform
    Assoc, 14(2)221-231, 2007.
  • H. Gu, G. Hripcsak, Y. Chen, C.P. Morrey, G.
    Elhanan, J.J. Cimino, J. Geller, and Y. Perl.
    Evaluation of a UMLS auditing process of semantic
    type assignments. In J.M. Teich, J. Suermondt,
    and G. Hripcsak, editors, Proc AMIA Symp, pages
    294-298, Chicago IL, Nov. 2007.

7
Auditing Results Paper Form
  • (C1081844) Antonospora locustae
  • STY Fungus Invertebrate
  • No errors
  • Semantic Type Error Fungus
  • Semantic Type Error Invertebrate
  • Ambiguity
  • Add Semantic Type______________________
  • Other error_____________________________
  • Comments _____________________________
    ______________________________________

7
8
Goals of an Auditors Tool for the UMLS
  • Display relevant information to the auditor.
  • Do not overwhelm the auditor with too much
    information.
  • Helps the auditor focus on areas most likely to
    contain errors.
  • Neighborhood display of reviewed concepts
  • Algorithms suggest likely erroneous concepts

8
9
Principles of Auditing with Neighborhoods
  • Several years of experience Auditing is to a
    large degree a local activity.
  • Concepts have two kinds of knowledge elements
  • Textual Knowledge Elements Preferred term, CUI,
    synonyms, LUI, definition, sources, semantic
    types
  • CONtextual Knowledge Elements Neighbors

9
10
Neighborhoods
  • Focus concept The concept presently under review
  • Immediate Neighborhood The set of concepts
    reachable from the focus concept by stepping one
    relationship (up, down, lateral, etc.)
  • Extended neighborhood Includes parents of
    parents (grandparents), children of children
    (grandchildren) and siblings. No lateral chains.

10
11
Immediate Neighborhood
11
12
Extended Neighborhood
12
13
Up-Extended and Down-Extended Neighborhood
  • An up-extended neighborhood includes grandparents
    and the immediate neighborhood.
  • A down-extended neighborhood includes
    grandchildren and the immediate neighborhood.
  • Give auditor all s/he needs but not more.

14
Semantic Type Neighborhood
  • If we provide the semantic types for every
    concept, those also form a neighborhood.
  • It is important to keep the information which
    semantic types belong to which concepts.

15
References about Neighborhood
  • M.S. Tuttle, D.D. Sherertz, N.E. Olson, M.S.
    Erlbaum, W.D. Sperzel, and L.F. Fuller, et al.
    Using META-1, the first version of the UMLS
    Metathesaurus. In Proc 14th Annu Symp Comput Appl
    Med Care, pages 131-135, Washington, D.C., 1990.
  • S.J. Nelson, M.S. Tuttle, W.G. Cole, D.D.
    Sherertz, W. D. Sperzel, M.S. Erlbaum, L.L.
    Fuller, N.E. Olson, From meaning to term
    semantic locality in the UMLS Metathesaurus. In
    Proc Annu Symp Comput Appl Med Care, pages
    209-213, Washington, D.C., 1991.
  • J.J. Cimino, H. Min, and Y. Perl. Consistency
    across the hierarchies of the UMLS Semantic
    Network and Metathesaurus. J Biomed Inform,
    36(6)450-461, 2003.

16
Desirable Information Beyond Neighborhoods
  • Concept definition for Focus Concept
  • Concept sources for Focus Concept
  • Assigned Semantic Types of concepts
  • Definitions of relevant Semantic Types
  • Global view of the Semantic Network
  • Indented (better for wide branches)
  • Graphical (better for almost everything else) we
    set the standard on this.

16
17
The Idea of a Hybrid Display
  • Diagrams are wonderful as long as they fit on
    one screen.
  • Indented text is wonderful as long as there are
    no or very few multiple parents.
  • But the UMLS does not fit onto one screen and
    there are many cases of multiple parents.

17
18
WHAT makes a diagram wonderful?
  • You can follow parent/child paths with your eyes.
  • You can get a feeling for everything a concept is
    connected to with one look.
  • You can see multiple parents and paths with one
    look.
  • You can see global features (short and bushy
    versus tall and sparse, or (gasp) tall and bushy).

18
19
What makes Indented Text Wonderful?
  • Indentation expresses parenthood elegantly.
  • There are no lines crossing.
  • You dont need a layout algorithm.
  • There is a linear order in which to study text.

19
20
The Idea of a Hybrid Display (cont.)
  • Keep the best features of text and the best
    features of diagrams.
  • Maintain relative positions between the focus
    concept and its children, parents, etc.
  • Eliminate clutter of arrows.

20
21
A Hybrid Diagram/Form Display of a Neighborhood
Parents
Synonyms
Relationships
Focus Concept
Children
21
22
Important Auditing Principles
  • If a concept C has a combination of semantic
    types assigned, and very few other concepts C1Cn
    (n lt 6) have that same combination assigned, then
    C and C1Cn are suspicious concepts.
  • We call this a small intersection.
  • Group-based auditing Audit sets of similar
    concepts.
  • Y. Chen, H. Gu, Y. Perl, J. Geller, and M.
    Halper. Structural group auditing of a UMLS
    semantic types extent. J Biomed Inform, 2007.
    Accepted for publication.

23
Current State of the NAT Serving the Auditor
  • The Neighborhood Auditing Tool has been
    implemented to fully support display of
    neighborhoods.
  • Navigation to adjacent neighborhoods is easy.
  • Additional features listed before have been
    implemented.

23
24
Demonstration of NAT Features
  • Neighborhood
  • Relationships
  • Siblings
  • Grandparents and grandchildren
  • Synonyms
  • Focus concept definition
  • Focus concept sources
  • Semantic Type display
  • Semantic Type definition
  • Semantic Network (indented)
  • Semantic Network (diagram)
  • Display Options
  • Navigation
  • Search
  • Viewing History
  • UMLS version

offline version
24
25
Audit Example
  • An algorithm determined that the concept
    Antonospora locustae was likely assigned
    incorrect semantic types.
  • We follow an auditors review of this concept
    using the data from 2007AA.

offline version
25
26
Preliminary Evaluation Study with NAT
  • Compare paper-based auditing and NAT-based
    auditing.
  • Counterbalanced groups.
  • Recall improves with NAT use. Auditors seem
    willing to investigate more concepts.
  • Precision stays the same. Auditors mental
    process does not improve (?).

27
Planned State of the NATGuiding the Auditor by
Finding (i.e. Computing) Audit Sets
  • As noted before, errors are likely in small
    intersections.
  • Planned new version of the NAT will compute and
    display small intersections.
  • Errors are clearly visible in small groups of
    supposedly similar concepts.
  • Planned new version of the NAT will compute small
    groups of supposedly similar concepts.

27
28
28
29
Finding Successively Smaller Groups of Concepts
  • Finding Audit sets by selecting
  • Concepts with same semantic type.
  • Concepts with 1. and same root.
  • Concepts with 1. and 2. that have the same
    relationships.

29
30
(No Transcript)
31
Audit Set Examples
  • Example A A selection of concepts in the
    intersection of Manufactured Object
    Organization under the root School (environment).
  • Example B All concepts that are in a
    non-chemical intersection with an extent size
    less than five.

31
32
Possible Auditors Recommendations (see Pg. 7)
  • Mark concept as reviewed and correct.
  • Mark semantic types that should be removed.
  • Mark semantic types that should be added.
  • Mark other kinds of errors.
  • Attach notes to a reviewed concept.

32
33
33
34
Conclusions and Future Work
  • Preliminary study showed that people are more
    successful finding errors with NAT than with
    paper sources. ?
  • Recall improved with the NAT, precision did not.
  • NAT seems to nicely complement use of the UMLSKS.

34
35
Conclusions and Future Work (cont.)
  • This year, work with more human subjects to
    quantify these observations.
  • Integration of algorithms for finding audit sets
    with NAT.
  • By extent size
  • Using roots, and relationship patterns within
    extents.

35
36
Thank you!
37
37
38
Preliminary Evaluation Study
Auditor Errors Errors Recall Recall Precision Precision F F
Auditor with NAT w/o NAT with NAT w/o NAT with NAT w/o NAT with NAT w/o NAT
1 57 45 0.97 0.82 0.53 0.51 0.86 0.63
2 22 20 0.43 0.35 0.55 0.55 0.48 0.43
3 39 34 0.64 0.58 0.46 0.53 0.54 0.55
4 56 44 0.55 0.54 0.30 0.34 0.39 0.42
Avg. 44 36 0.65 0.57 0.46 0.48 0.57 0.51
39
Improved Recall
  • The auditor finds it easy to search for more
    errors in the neighborhood of the suspicious
    concept.
  • With better recall and the same precision you
    still find more errors.

40
Auditing Demonstration
  • The concept Antonospora locustae was selected for
    audit by an algorithm that found it was the only
    concept assigned to the intersection Fungus
    Invertebrate in the UMLS 2007AA.

40
41
41
42
42
43
43
44
44
45
(No Transcript)
46
46
47
47
48
48
49
49
50
(No Transcript)
51
(No Transcript)
52
(No Transcript)
53
NAT Features Demonstration
53
54
Neighborhood
55
(No Transcript)
56
(No Transcript)
57
(No Transcript)
58
(No Transcript)
59
(No Transcript)
60
(No Transcript)
61
(No Transcript)
62
(No Transcript)
63
(No Transcript)
64
(No Transcript)
65
(No Transcript)
66
(No Transcript)
67
(No Transcript)
68
(No Transcript)
69
(No Transcript)
70
(No Transcript)
71
(No Transcript)
72
(No Transcript)
73
(No Transcript)
74
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com