Title: Applications of Machine Learning to Medical Imaging
1Applications of Machine Learning to Medical
Imaging
- Daniela S. Raicu, PhD
- Associate Professor, CDM
- DePaul University
- Email draicu_at_cs.depaul.edu
- Lab URL http//facweb.cs.depaul.edu/research/vc/
2About me
- BS in Mathematics from University of Bucharest,
Romania
3My dissertation work
- Research areas Data Mining Computer Vision
- Dissertation topic Content-based image retrieval
- Research hypothesis
- A picture is worth thousands of words
-
- There is enough information in the image content
to perform image retrieval whose similarity
results correspond to the human perceived
similarity.
4My dissertation work (cont)
- Research hypothesis
- There is enough information in the image content
to perform image retrieval whose similarity
results correspond to the human perceived
similarity. - Methodology
- 1) extract color image features, 2) define
color-based similarity, 3) cluster images based
on color, 4) retrieve similar images - Output
- Color-based CBIR for general purpose image
datasets
5Towards an academic career
- Assistant Professor at DePaul, 2002-2008
- Associate Professor, 2008- Present
- Teaching areas research interests
- data analysis, data mining, image processing,
computer vision medical informatics - Co-director of the Intelligent Multimedia
Processing, Medical Informatics lab the NSF REU
Program in Medical Informatics
6Outline
- Part I Introduction to Medical Informatics
- Medical Informatics
- Clinical Decision Making
- Imaging Modalities and Medical Imaging
- Basic Concepts in Image Processing
- Part II Advances in Medical Imaging Research
- Computer-Aided Diagnosis
- Computer-Aided Diagnostic Characterization
- Texture-based Classification
- Content-based Image Retrieval
7Medical informatics research
- What is medical informatics?
-
- Medical informatics is the application of
computers, communications and information
technology and systems to all fields of medicine - - medical care
- - medical education
- - medical research.
- MF Collen, MEDINFO '80, Tokyo
8What is medical informatics?
- Medical informatics is the branch of science
concerned with the use of computers and
communication technology to acquire, store,
analyze, communicate, and display medical
information and knowledge to facilitate
understanding and improve the accuracy,
timeliness, and reliability of decision-making. -
- Warner, Sorenson and Bouhaddou, Knowledge
Engineering in Health Informatics, 1997
9Clinical decision making
- Making sound clinical decisions requires
- right information, right time, right format
- Clinicians face a surplus of information
- ambiguous, incomplete, or poorly organized
- Rising tide of information
- Expanding knowledge sources
- 40K new biomedical articles per month
- Publicly accessible online health info
- Hundreds of pictures per scan for one patient
10Clinical decision making What is the problem?
- Man is an imperfect data processor
- We are sensitive to the quantity and
organization of information - Army officers and pilots commit fatal errors
when given too many, too few, or poorly organized
data - The same is true for clinicians who watch for
events - Clinicians are particularly susceptible to errors
of omission
11Clinical decision making What is the problem?
- Humans are non-perfectable data processors
- - Better performance requires more time to
process - - Irony
- Clinicians increasingly face productivity
expectations - Clinicians face increasing administrative
tasks
12Subdomains of medical informatics (by Wikipedia)
- imaging informatics
- clinical informatics
- nursing informatics
- consumer health informatics
- public health informatics
- dental informatics
- clinical research informatics
- bioinformatics
- pharmacy informatics
13What is medical imaging (MI)?
The study of medical imaging is concerned with
the interaction of all forms of radiation with
tissue and the
development of appropriate technology to extract
clinically useful information (usually displayed
in an image format) from observation of this
technology.
Sources of Images
- Structural/anatomical information (CT, MRI, US) -
within each elemental volume, tissue-differentiati
ng properties are measured. - Information about function (PET, SPECT, fMRI).
-
-
14Examples of medical images
15The imaging chain
Reconstruction
Filtering
Raw data
Raw data
Signal acquisition
Processing
Analysis
123 2346.. 65789 6578..
Quantitative output
16Image analysis Turning an image into data
- User extracted qualitative features
- User extracted quantitative features
- Semi automated
- Automated
Exam Level Feature 1 Feature 2 Feature 3
. . Finding Feature 1 Feature
2 . .
17Major advances in medical imaging
- Image Segmentation
- Image Classification
- Computer-Aided Diagnosis Systems
- Computer-Aided Diagnostic Characterization
- Content-based Image Retrieval
- Image Annotation
- These major advances can play a major role in
early detection, diagnosis, and computerized
treatment planning in cancer radiation therapy.
18Computer-Aided Diagnosis
- Computed Aided Diagnosis (CAD) is diagnosis made
by a radiologist when the output of computerized
image analysis methods has been incorporated into
his or her medical decision-making process. - CAD may be interpreted broadly to incorporate
both - the detection of the abnormality task and
- the classification task likelihood that the
abnormality represents a malignancy
19Motivation for CAD systems
- The amount of image data acquired during a CT
scan is becoming overwhelming for human vision
and the overload of image data for interpretation
may result in oversight errors. - Computed Aided Diagnosis for
- Breast Cancer
- Lung Cancer
- A thoracic CT scan generates about 240 section
images for radiologists to interpret. - Colon Cancer
- CT colonography (virtual colonoscopy) is being
examined as a potential screening device (400-700
images)
20CAD for Breast Cancer
- A mammogram is an X-ray of breast tissue used as
a screening tool searching for cancer when there
are no symptoms of anything being wrong. A
mammogram detects lumps, changes in breast tissue
or calcifications when they're too small to be
found in a physical exam.
- Abnormal tissue shows up a dense white on
mammograms. - The left scan shows a normal breast while the
right one shows malignant calcifications.
21CAD for Lung Cancer
- Identification of lung nodules in thoracic CT
scan the identification is complicated by the
blood vessels - Once a nodule has been detected, it may be
quantitatively analyzed as follows
- The classification of the nodule as benign or
malignant - The evaluation of the temporal size in the nodule
size.
22CAD for Colon Cancer
- Virtual colonoscopy (CT colonography) is a
minimally invasive imaging technique that
combines volumetrically acquired helical CT data
with advanced graphical software to create two
and three-dimensional views of the colon.
Three-dimensional endoluminal view of the colon
showing the appearance of normal haustral folds
and a small rounded polyp.
23Role of Image Analysis Machine Learning for CAD
- An overall scheme for computed aided diagnosis
systems
24SoC Medical imaging research projects
- 1. Computer-aided characterization for lung
nodules - Goal establish the link between computer-based
image features of lung nodules in CT scans and
visual descriptors defined by human experts
(semantic concepts) for automatic interpretation
of lung nodules - Example This lung nodule has a solid texture
and has a sharp margin
25Why computer-aided characterization?
Reader 1 Reader 2
Reader 3 Reader 4
Lobulation4 Malignancy5 highly
suspicious Sphericity2
Lobulation1 marked Malignancy5 highly
suspicious Sphericity4
Lobulation2 Malignancy5 highly
suspicious Sphericity5 round
Lobulation5 none Malignancy5 highly
suspicious Sphericity3 ovoid
- Ratings and Boundaries across radiologists are
different!!!
25
26Computer-aided characterization
- Research Hypothesis
- The working hypothesis is that certain
radiologists assessments can be mapped to the
most important low-level image features. - Methodology
- new semi-supervised probabilistic learning
approaches that will deal with both the
inter-observer variability and the small set of
labeled data (annotated lung nodules). - Our proposed learning approach will be based on
an ensemble of classifiers (instead of a single
classifier as with most CAD systems) built to
emulate the LIDC ensemble (panel) of
radiologists.
27Computer-aided characterization (cont.)
- Expected outcome
- an optimal set of quantitative diagnostic
features linked to the visual descriptors
(semantic concepts). - Significance
- The derived mappings can serve to show
- the computer interpretation of the corresponding
radiologist rating in terms of a set of standard
and objective image features, - automatically annotate new images,
- and augment the lung nodule retrieval results
with their probabilistic diagnostic
interpretations.
28Computer-aided characterization
- Preliminary results
- NIH Lung Image Database Consortium (LIDC)
- 149 distinct nodules from about 85
cases/patients - four radiologists marked the nodules using 9
semantic characteristics on a scale from 1 to 5
except for calcification (1 to 6) and internal
structure (1 to 4)
29Computer-aided characterization
- LIDC high level concepts ratings
Characteristic Possible Scores
Margin 1. Poorly Defined 2. . 3. . 4. . 5. Sharp
Sphericity 1. Linear 2. . 3. Ovoid 4. . 5. Round
Spiculation 1. Marked 2. . 3. . 4. . 5. None
Subtlety 1. Extremely Subtle 2. Moderately Subtle 3. Fairly Subtle 4. Moderately Obvious 5. Obvious
Texture 1. Non-Solid 2. . 3. Part Solid/(Mixed) 4. . 5. Solid
Characteristic Possible Scores
Calcification 1. Popcorn 2. Laminated 3. Solid 4. Non-central 5. Central 6. Absent
Internal structure 1. Soft Tissue 2. Fluid 3. Fat 4. Air
Lobulation 1. Marked 2. . 3. . 4. . 5. None
Malignancy 1. Highly Unlikely 2. Moderately Unlikely 3. Indeterminate 4. Moderately Suspicious 5. Highly Suspicious
29
30Computer-aided characterization
Shape Features Size Features Intensity Features Texture Features
Circularity Area MinIntensity 11 Haralick features calculated from co-occurrence matrices
Roughness ConvexArea Maxintensity 24 Gabor features
Elongation Perimeter SDIntensity 5 Markov Random Field features
Compactness ConvexPerimeter MinIntensityBG
Eccentricity EquivDiameter MaxIntensityBG
Solidity MajorAxisLength MeanIntensityBG
Extent MinorAxisLength SDIntensityBG
RadialDistanceSD IntensityDifference
30
31Computer-aided characterization
Characteristics Decision trees Add instances predicted with high confidence (60) Add instances predicted with high confidence (60) and instances with low margin (5)
Lobulation 27.44 81.00 69.66
Malignancy 42.22 96.31 96.31
Margin 35.36 98.68 96.83
Sphericity 36.15 91.03 90.24
Spiculation 36.15 63.06 58.84
Subtlety 38.79 93.14 92.88
Texture 53.56 97.10 97.36
Average 38.52 88.62 86.02
31
32Computer-aided characterization
- Challenges
- Small number of training samples and large
number of features curse of dimensionality
problem - Nodule size
- Variation in the nodules boundaries
- Different types of imaging acquisition
parameters - Clinical evaluation observer performance studies
- require collaboration with medical schools or
hospitals
33SoC Medical imaging research projects
-
2. Texture-based Pixel Classification - tissue
segmentation - context-sensitive tools for
radiology reporting
Organ Segmentation
34Texture-based Pixel Classification
- Texture Feature extraction consider texture
around the pixel of interest. - Capture texture characteristic based on
- estimation of joint conditional probability
- of pixel pair occurrences Pij(d,?).
- Pij denotes the normalized co-occurrence matrix
of specify by displacement vector (d) and angle
(?).
35Haralick Texture Features
36Haralick Texture Features
37Examples of Texture Images
Texture images original image, energy and
cluster tendency, respectively. M. Kalinin, D. S.
Raicu, J. D. Furst, D. S. Channin,, " A
Classification Approach for Anatomical Regions
Segmentation", The IEEE International Conference
on Image Processing (ICIP), Genoa, Italy,
September 11-14, 2005.
38Texture Classification of Tissues in CT
Chest/Abdomen
Example of Liver Segmentation (J.D. Furst, R.
Susomboon, and D.S. Raicu, "Single Organ
Segmentation Filters for Multiple Organ
Segmentation", IEEE 2006 International Conference
of the Engineering in Medicine and Biology
Society (EMBS'06))
Region growing at 70
Region growing at 60
Segmentation Result
39Classification models challenges
- (a) Optimal selection of an adequate set of
textural features is a challenge, especially with
the limited data we often have to deal with in
clinical problems. Consequently, the
effectiveness of any classification system will
always be conditional on two things - (i) how well the selected features describe the
tissues - (ii) how well the study group reflects the
overall target patient population for the
corresponding diagnosis
40Classification models challenges
- (b) how other type of information can be
incorporated into the classification models - - metadata
- - image features from other imaging modalities
(need of image fusion) - (c) how stable and general the classification
models are
41Content-based medical image retrieval (CBMS)
systems
-
Definition of Content-based Image
Retrieval Content-based image retrieval is a
technique for retrieving images on the basis of
automatically derived image features such as
texture and shape.
- Applications of Content-based Image Retrieval
- Teaching
- Research
- Diagnosis
- PACS and Electronic Patient Records
42Diagram of a CBIR
http//viper.unige.ch/muellerh/demoCLEFmed/index.
php
43CBIR as a Diagnosis Aid
An image retrieval system can help when the
diagnosis depends strongly on direct visual
properties of images in the context of
evidence-based medicine or case-based reasoning.
44CBIR as a Teaching Tool
An image retrieval system will allow
students/teachers to browse available data
themselves in an easy and straightforward fashion
by clicking on show me similar images.
Advantages - stimulate self-learning and a
comparison of similar cases - find optimal cases
for teaching
- Teaching files
- Casimage http//www.casimage.com
- myPACS http//www.mypacs.net
45CBIR as a Research Tool
- Image retrieval systems can be used
- to complement text-based retrieval methods
- for visual knowledge management whereby the
images and associated textual data can be
analyzed together - multimedia data mining can be applied to learn
the unknown links between visual features and
diagnosis or other patient information - for quality control to find images that might
have been misclassified -
46CBIR as a tool for lookup and reference in CT
chest/abdomen
- Case Study lung nodules retrieval
- Lung Imaging Database Resource for Imaging
Research http//imaging.cancer.gov/programsandres
ources/Inf ormationSystems/LIDC/page7 - 29 cases, 5,756 DICOM images/slices, 1,143 nodule
images - 4 radiologists annotated the images using 9
nodule characteristics calcification, internal
structure, lobulation, malignancy, margin,
sphericity, spiculation, subtlety, and texture - Goals
- Retrieve nodules based on image features
- Texture, Shape, and Size
- Find the correlations between the image features
and the radiologists annotations
47(No Transcript)
48M. Lam, T. Disney, M. Pham, D. Raicu, J. Furst,
Content-Based Image Retrieval for Pulmonary
Computed Tomography Nodule Images, SPIE Medical
Imaging Conference, San Diego, CA, February 2007
49Retrieved Images
50CBIR systems challenges
- Type of features
- image features
- - texture features statistical, structural,
model and filter-based - - shape features
- textual features (such as physician annotations)
- Similarity measures
- -point-based and distribution based metrics
- Retrieval performance
- precision and recall
- clinical evaluation
51uestions ?