Knowledge-based Information Management for Biomedical Applications - PowerPoint PPT Presentation

About This Presentation
Title:

Knowledge-based Information Management for Biomedical Applications

Description:

Knowledge-based Information Management for Biomedical Applications Wesley Chu Computer Science Department University of California Los Angeles, CA – PowerPoint PPT presentation

Number of Views:115
Avg rating:3.0/5.0
Slides: 66
Provided by: wwc1
Learn more at: http://web.cs.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: Knowledge-based Information Management for Biomedical Applications


1
Knowledge-based Information Management for
Biomedical Applications
  • Wesley Chu
  • Computer Science Department
  • University of California
  • Los Angeles, CA
  • wwc_at_cs.ucla.edu
  • www.kmed.cs.ucla.edu

2
Outline
  • Data types
  • Uses of knowledge bases to enhance information
    management
  • Sample systems
  • Structured data
  • Multi-media
  • Free-text
  • Conclusion

3
Information Formats used in Biomedical
Applications
  • Structure Data
  • Multi-media Images
  • Semi-structure
  • Free-text

4
Uses of Knowledge Bases to Enhance Information
Management
  • Approximate matching
  • Query conditions
  • Image features
  • Similar conceptual terms

5
Uses of Knowledge Bases to Enhance Information
Management
  • KB query processing
  • Similarity query answering
  • Associative query answering
  • Scenario-specific query answering
  • Sentinel --Triggering and alerting

6
Examples of KB Information Systems
  • CoBase (1990-1998), DARPA
  • A database that cooperates with the user for
    structure data
  • KMeD (1991-2000), NSF
  • A Knowledge-based medical multi-media database
  • Medical Digital Library (2001-2005), NIH
  • A knowledge-based digital file room for patient
    care, education, and research.

7
CoBase www.cobase.cs.ucla.edu
  • Project leader Wesley W. Chu
  • Graduate studentsK. Chiang
  • C. Larson
  • R. Lee
  • M. Merzbacher
  • M. Minock
  • Frank Meng
  • Wenlei MaoMark Yang
  • K. Zhang
  • Staff
  • Q. Chen
  • Gladys ChowHua Yang

8
CoBase Cooperative Databases
  • Conventional query answering
  • Need to know the detailed data based schema
  • Cannot get approximate answers
  • Cannot answer conceptual queries
  • Cooperative query answering
  • Derive approximate answers
  • Answer conceptual queries
  • Provide additional relevant answers that user
    does not (or does not know how to) ask for

9
Cooperative Queries
CoBase Servers
Heterogeneous Information Sources
Find a nearby friendly airport that can land F-15
Find hospitals with facility similar to St.
Johns near LAX
CoBase provides Relaxation
Approximation Association Explanation
Domain Knowledge
Find a seaport with railway facility in Los
Angeles
10
Generalization and Specialization
11
Cooperative Querying for Medical Applications
  • Query
  • Find the treatment used for the tumor similar-to
    (loc, size) X1 on 12 year-old Korean males.
  • Relaxed Query
  • Find the treatment used for the tumor Class X on
    preteen Asians.
  • Association
  • The success rate, side effects, and cost of the
    treatment.

12
Type Abstraction Hierarchies forMedical Domain
13
KB Type Abstraction Hierarchy
  • Using clustering technique to group similar
  • Attribute values
  • Image features
  • Spatial relationships among objects
  • Provides multi-level knowledge (conceptual)
    representation

14
Data mining for TAH for NumericalAttribute Values
  • Clustering metrics relaxation error
  • Difference between the exact value and the
    returned approximate value
  • Relaxation error is weighted by the probability
    of occurrence of each value
  • Can be extended to multiple attributes

15
Query Relaxation
16
Summary CoBase
  • Derive Approximate Answers
  • Answer Conceptual Queries
  • Provide Associative Query Answers

17
KMeD www.kmed.cs.ucla.edu
  • PI Wesley Chu, Ph.D, Computer Science Department
  • Co-PIs
  • A. Cardenas, Ph.D, Computer Science Department
  • Ricky Taira , Ph.D, School of Medicine
  • Graduate studentsAlex Bui
  • Chrisitna Chu
  • John Dionisio
  • T. PlattnerD. Johnson
  • C. Hsu
  • T. Ieong
  • ConsultantsDenies Aberle, M.D.
  • C.M. Breant, Ph.D

18
KMeD Goal Retrieval of Images by Features
Content
  • Features
  • size, shape, texture, density, histology
  • Spatial Relations
  • angle of coverage, shortest distance, overlapping
    ratio, contact ratio, relative direction
  • Evolution of Object Growth
  • fusion, fission

19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
Characteristics of Medical Queries
  • Multimedia
  • Temporal
  • Evolutionary
  • Spatial
  • Imprecise

23
(No Transcript)
24
(No Transcript)
25
Knowledge-Based Image Model
Representation Level (features and content)
26
Queries
Query Analysis and Feature Selection
Knowledge- Based Query Processing
Knowledge-Based Content Matching Via TAHs
Query Relaxation
Query Answers
27
User Model
  • To customize users
  • interest and preference, needs, and goals.
  • e.g. query conditions, relaxation control,
    etc.
  • User type
  • Default Parameter Values
  • Feature and Content Matching Policies
  • Complete Match
  • Partial Match

28
User Model (cont.)
  • Relaxation Control Policies
  • Relaxation Order
  • Unrelaxable Object
  • Preference List
  • Measure for Ranking
  • Triggering conditions

29
Query Preprocessing
  • Segment and label contours for objects of
    interest
  • Determine relevant features and spatial
    relationships (e.g., location, containment,
    intersection) of the selected objects
  • Organize the features and spatial relationships
    of objects into a feature database
  • Classify the feature database into a Type
    Abstraction Hierarchy (TAH)

30
(No Transcript)
31
Similarity Query Answering
  • Determine relevant features based on query input
  • Select TAH based on these features
  • Traverse through the TAH nodes to match all the
    images with similar features in the database
  • Present the images and rank their similarity
    (e.g., by mean square error)

32
Visual Query Language and Interface
  • Point-click-drag interface
  • Objects may be represented by icons
  • Spatial relationships among objects are
    represented graphically

33
(No Transcript)
34
Visual Query Example
Retrieve brain tumor cases where a tumor is
located in the region as indicated in the picture
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
Implementation
  • Sun Sparc 20 workstations (128 MB RAM, 24-bit
    frame buffer)
  • Oracle Database Management System
  • C
  • Mass Storage of Images (9 GB)

39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
Summary KMeD
  • Image retrieval by feature and content
  • Matching images based on features
  • Processing of queries based on spatial
    relationships among objects
  • Answering of imprecise queries
  • Expression of queries via visual query language
  • Integrated view of temporal multimedia data in a
    timeline metaphor

43
(No Transcript)
44
Medical Digital Librarywww.kmed.cs.ucla.edu
  • Project leader Wesley W. Chu
  • Graduate studentsVictor Z. LiuWenlei
    MaoQinghua Zou
  • ConsultantsHooshang Kangaloo, M.D.Denies
    Aberle, M.D.

45
Data Types Used in a Medical Digital Library
  • Structured data (patient lab data, demographic
    data,)--CoBase
  • Images (X rays, MRI, CT scans)--KMeD
  • Free-text (Patient reports, Teaching files,
    Literature, News articles)--FTRS (Free-text
    retrieval system)

46
A Free-Text Retrieval System (FTRS)
Knowledge-based Free- Text Retrieval System
(FTRS)
Ad hoc query
Patient report for content correlation
Query results
News Articles
Patient reports
Medical literature
Teaching materials
47
A Sample Patient Report
  • Tissue Source
  • LUNG (FINE NEEDLE ASPIRATION) (LEFT LOWER LOBE)
  • FINAL DIAGNOSIS
  • - LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE
    ASPIRATION)
  • - LUNG CANCER, SMALL CELL, STAGE II.

Tissue Source LUNG (FINE NEEDLE ASPIRATION)
(LEFT LOWER LOBE) FINAL DIAGNOSIS - LUNG
NODULE, LEFT LOWER LOBE (FINE NEEDLE
ASPIRATION) - LUNG CANCER, SMALL CELL, STAGE
II.
48
Scenario-Specific Retrieval
Tissue Source LUNG (FINE NEEDLE ASPIRATION)
(LEFT LOWER LOBE) FINAL DIAGNOSIS - LUNG
NODULE, LEFT LOWER LOBE (FINE NEEDLE
ASPIRATION) - LUNG CANCER, SMALL CELL, STAGE
II.
49
Challenge I Indexing for Free-Text
  • Extracting key concepts in the free-text for
    indexing
  • Free-text Lung cancer, small cell, stage II
  • Concept terms in knowledge source stage II small
    cell lung cancer
  • Conventional methods use NLP
  • Not scalable

50
Challenge II Mismatch between terms used in
query and documents
  • Example

Query lung cancer,
?
?
?
Document 3 anti-cancerdrug combinations
Document 1 lung carcinoma
Document 2 lung neoplasm
51
Challenge III Terms used in the query are too
general
  • Expanding the general terms in the query to
    specific terms that are used in the document

Query lung cancer, diagnosis options
Query lung cancer, chest x-ray, bronchography,
Document the effectiveness of chest x-ray and
bronchography on patients with lung cancer
52
A Medical KBUnified Medical Language System
(UMLS)
  • Meta-thesaurus - control vocabulary (1.6M
    biomedical phrases, representing 800K concepts)
  • Semantic Network classify concepts into classes
    (e.g. disease and syndrome, treated by,
    therapeutic procedure, etc.)
  • Specialized Lexicon

53
Using knowledge sources to resolve these
challenges
  • Challenge I Automatic indexing of free text
  • Challenge II Mismatch between terms in the
    query and the documents
  • Challenge III Terms in the query are too general

54
IndexFinder Extracting domain-specific key
concepts
  • Technique
  • Permute words from text to generate concept
    candidates.
  • Use knowledge base to select the valid
    candidates.
  • Problem
  • Valid candidates may be irrelevant to the
    document.
  • Redundant concept

55
Filtering out Irrelevant Concepts
  • Syntactic filter
  • Limit permutation of words within a sentence.
  • Semantic filter
  • Use the semantic type (e.g. body part, disease,
    treatment, diagnosis) to filter out irrelevant
    concepts
  • Use ISA relationship to filter out general
    concepts and yield specific concepts.

56
IndexFinder Performance
  • Two orders of magnitude faster than conventional
    approaches
  • No NLP
  • Time complexity is linear with the number of
    distinct words in the text
  • Preliminary Evaluation
  • IndexFinder generates more valid terms than that
    of NLP (using a single noun phrase)
  • Filtering is effective to eliminate irrelevant
    terms

57
Using knowledge sources to resolve these
challenges
  • Challenge I Automatic indexing of free text
  • Challenge II Mismatch between terms in the
    query and the documents
  • Challenge III Terms in the query are too general

58
Phrase-based Vector Space Model (VSM)
Query lung cancer,
Query lung cancer,
lung cancer lung carcinoma
missing!!!
parent_of
anti-cancer drug combinations
Document lung carcinoma
Document lung neoplasm
Document anti-cancer drugcombinations
Document anti-cancer drugcombinations
lung neoplasm
Knowledge source
59
Phrase-based VSM Examples
(C0242379) lung cancer
(C0003393) anti cancer drug combin
60
Using knowledge sources to resolve these
challenges
  • Challenge I Automatic indexing of free text
  • Challenge II Mismatch between terms in the
    query and the documents
  • Challenge III Terms in the query are too general

61
Query Expansion (QE)
  • Queries in the following form benefit from
    expansionltkey conceptgt ltgeneral supporting
    concept(s)gte.g. lung cancer e.g. treatment
    options

ltkey conceptgt ltspecific supporting
concept(s)gte.g. lung cancer e.g.
chemotherapy, radiotherapy
62
Knowledge-based Scenario-specific Expansion
63
Retrieval Effectiveness Comparison (Corpus
OHSUMED, KB UMLS)
Overall improvement 33,100 queries vs. 5, 50
queries
64
FTRS Scenario-specificQuery Answering
  • Sample templatesltdiseasegt, treatment,ltdiseas
    egt, diagnosis

65
FTRS Scenario-specific content correlation
  • IndexFinder extracts key concepts from free-text
    for content correlation

66
Summary KB Free-text retrieval
  • Technologies
  • IndexFinder extracts key concepts from the
    free-text
  • Phrase-based VSM a new document indexing
    paradigm (concept and its word stems) to improve
    retrieval effectiveness
  • Knowledge-based query expansion match query
    with scenario-specific documents
  • provides scenario-specific free-text retrieval

67
Conclusions
  • Knowledge sources
  • provides
  • Approximate matching
  • Query conditions
  • Image features
  • Query processing
  • Similarity query answering
  • User modeling
  • Associative answering
  • Triggering and alerting
  • Document retrieval
  • Convert ad hoc free-text into controlled
    vocabulary
  • Phrase-based VSM
  • Content correlation
  • Scenario-specific retrieval
  • Increase capabilities and effectiveness
    Information Management

68
Acknowledgement
This research is supported by DARPA, NSF Grant
9619345, and NIC/NIH Grant4442511-33780
Write a Comment
User Comments (0)
About PowerShow.com