Information Extraction from Semistructured Patient Records

1 / 21
About This Presentation
Title:

Information Extraction from Semistructured Patient Records

Description:

Automatically extract information from semi-structured patient records. ... Each patient is either current smoker, former smoker, or nonsmoker. Texts ' ... –

Number of Views:97
Avg rating:3.0/5.0
Slides: 22
Provided by: NanZ1
Category:

less

Transcript and Presenter's Notes

Title: Information Extraction from Semistructured Patient Records


1
Information Extraction from Semi-structured
Patient Records
  • Davis Zhou
  • College of Information Science Technology
    Drexel University

2
Agenda
  • Problem Addressed
  • Methods
  • Approach to numeric values
  • Approach to medical terms
  • Approach to categorical values
  • Implementation
  • Evaluation
  • Future Work

3
Problem Addressed
  • Descriptions
  • Automatically extract information from
    semi-structured patient records.
  • Three types of information
  • Number blood pressure, weight, pulse, etc.
  • Medical terms past medical history
  • Classification smoking behavior, alcohol use,
    appearance, etc.
  • Each record consists of multiple sections
    beginning with fixed strings. Each section is
    written in natural language.

4
Problem Addressed (cont.)
  • Examples

5
Problem Addressed (cont.)
  • Examples

6
Approach to Numeric Values (1)
  • Number Identification
  • Tokenization
  • Named Entity Recognition
  • Concept Identification
  • String Match
  • Synonym Expansion
  • Association
  • Pattern based approach
  • Linkage based approach ( our approach)

7
Approach to Numeric Values (2)
  • Pattern Approach
  • Examples
  • CONCEPT is NUMBER
  • CONCEPT of NUMBER
  • CONCEPT, NUMBER
  • CONCEPT NUMBER
  • Very simple but has generalization problem.
  • Linkage-based Association Approach
  • Convert linkage diagram to graph
  • Calculate the shortest distance of any pair of
    concept and number in a sentence.

8
Approach to Numeric Values (3)
  • Link Grammar Parser
  • Converts word to node, link to (weighted) edge
  • Assume that if a number is the value of a certain
    concept, the numbers shortest distance from the
    concept must be less than from any other concept
    in the sentence.

9
Approach to Medical Terms (1)
  • State of the Art
  • Current NER algorithms dont work well for
    medical terms identification
  • Ontology is important to achieve high accuracy of
    medical term extraction.
  • Search of any combination of sequence in sentence
    through ontology is not efficient.
  • Solution
  • POS-based Ordered Patterns Search

10
Approach to Medical Terms (2)
  • Flow
  • Part of speech tagging
  • Ordered Patterns Matching, for example
  • JJ NN NN
  • NN NN
  • JJ NN
  • NN
  • Normalization of the candidate term.
  • Search candidate term through Ontology (e.g.
    UMLS).

11
Approach to Categorical Values (1)
  • Available Methods
  • Analytic approach
  • Machine learning
  • Decision tree is frequently used in natural
    language understanding
  • Examples
  • Each patient is either current smoker, former
    smoker, or nonsmoker.
  • Texts
  • She quitted smoking five years ago (former)
  • She is currently a smoker (current)
  • None (never)

12
Approach to Categorical Values (2)
  • Word-based Boolean Feature Extraction
  • Choose one or multiple part of speeches verb,
    noun, adjective, and adverb.
  • Choose one or multiple sentence constituents
    subject, verb, object, and supplement.
  • Head noun or head adjective only. If this option
    is enabled, for noun phrase or adjective phrase,
    only head word is extracted.
  • Use lemma (uninfected form) of any word. If this
    option is enabled, denies, denied and deny
    will be treated as the same feature.
  • ID3-based Decision Tree
  • The criteria for feature selection is maximum
    Information Gain (mutual information)
  • ID3 yield fewer features than other algorithms

13
Approach to Categorical Values (3)
  • Example ID3-based Decision Tree for
    Classification of Smoking Behavior.

14
Implementation
15
Evaluation
  • 50 semi-structured patient records
  • The goal is to extract 24 attributes (18 fields),
    4 medical terms, 8 numbers, and 12 categorical
    attributes.
  • Measures
  • Precision is defined as the proportion of
    correctly extracted instances of those extracted.
  • Recall is the proportion of correctly extracted
    instances of total instances.

16
Evaluation of Numeric Attributes
  • The precisions (recall) for eight numeric
    attributes are all 100.
  • By examining all 50 records manually, we find
    that the extremely high precision is in part
    attributed to the very consistent writing style.
  • If the size of data set increases and diversified
    writing styles are introduced, the performance
    may be degraded.

17
Evaluation of Smoking Behavior
  • 45 cases, 5 former smokers, 12 current smokers,
    and 28 nonsmokers.
  • 5-folder cross-validation
  • Run experiments for 10 rounds. (For each round,
    data set is randomly shuffled.)
  • Average precision (recall) is 92.2
  • The number of features used ranges from 4 to 7)

18
Evaluation of Medical Terms (1)
  • Each attribute can have multiple values (medical
    terms).
  • Where
  • ETruei number of extracted true terms in i-th
    subject.
  • ETotali number of extracted terms in i-th
    subject.
  • TInsti number of total true terms in i-th
    subject.

19
Evaluation of Medical Terms (2)
  • Extracted false terms and unextracted true terms
    are mainly caused by the incompleteness of domain
    ontology
  • The low recall of predefined past surgical
    history and low precision of other past surgical
    history are due to failure to recognize the
    synonyms of predefined surgical terms and
    improper recognition of them as other surgical
    terms.

20
Future Work
  • Test our work on larger data set
  • Medical Terms Extraction
  • Ontology selection
  • The use of synonym
  • Text Classification
  • How to deal with categories containing numeric
    threshold information

21
Questions
  • ?
Write a Comment
User Comments (0)
About PowerShow.com