Title: Integration of Radiologists
1Integration of Radiologists Feedback into
Computer-Aided Diagnosis Systems
- Sarah A. Jabona
- Daniela S. Raicub
- Jacob D. Furstb
- aRose-Hulman Institute of Technology, Terre
Haute, IN 47803 - bSchool of Computing, CDM, DePaul Universtiy,
Chicago, IL 60604
2Overview
- Introduction
- Related Work
- The Data
- Methodology
- Simple Distance Metrics
- Linear Regression
- Principle Component Analysis
- Results
- Simple Distance Metrics
- Linear Regression
- Principle Component Analysis
- Conclusions
- Future Work
3Introduction
- The 2008 official estimate
- 215,020 cases diagnosed
- 161,840 deaths will occur
- Five-year relative-survival rate (1996 2004)
15.2 - Computer-aided diagnosis systems can help improve
early detection
4Related Work
- El-Naqa et al.
- mammography images
- neural networks and support vector machines
- Muramatsu et al.
- mammography images.
- three-layered artificial neural network to
predict the semantic similarity rating between
two nodules - Park et al.
- linear distance-weighted K-nearest neighbor
algorithm to identify similar images
5Related Work
- ASSERT by Purdue University
- Content-based features co-occurrence, shape,
Fourier Transforms, global gray level statistics - Radiologists also provide features
- BiasMap by Zhou and Huang
- Relevance feedback, content-based features
- Analysis biased-discriminant analysis (BDA)
6The Data
- Lung Image Database Consortium
- Reduced 1,989 images down to 149 (one for each
nodule) - Summarized the radiologists ratings (up to 4)
into a single vector - Each nodule has 7 semantic based characteristics
and 64 content-based characteristics
7Overview
- Introduction
- Related Work
- The Data
- Methodology
- Simple Distance Metrics
- Linear Regression
- Principle Component Analysis
- Results
- Simple Distance Metrics
- Linear Regression
- Principle Component Analysis
- Conclusions
- Future Work
8Methodology
9Methodology Simple Distance Metrics
Semantic-Based Similarity
Content-Based Similarity
10Simple Distance Metrics
- Content-Based Similarity Values (Euclidean)
- Semantic-Based Similarity Values (1 Cosine)
11Methodology Linear Regression
12Methodology Principle Component Analysis
Lobulation Malignancy Margin Sphericity Spiculation Subtlety Texture
Lobulation 1.000 .199 .085 -.008 .815 .065 .101
Malignancy .199 1.000 .346 .187 .155 .594 .351
Margin .085 .346 1.000 .391 .109 .533 .717
Sphericity -.008 .187 .391 1.000 .078 .300 .230
Spiculation .815 .155 .109 .078 1.000 .156 .146
Subtlety .065 .594 .533 .300 .156 1.000 .523
Texture .101 .351 .717 .230 .146 .523 1.000
- Content-Based Features
- 77 pairs with a correlation gt 0.9
- 136 pairs with a correlation gt 0.8 or lt -0.8
13Scree Plots 5 9 Matches
14Methodology Principle Component Analysis
- PCA on content-based features
- accounts for 99 of the variance
- 23 components
- PCA on semantic-based characteristics
- Method 1
- accounts for 92 of the variance
- 4 components
- Method 2
- accounts for 98 of the variance
- 6 components
15Overview
- Introduction
- Related Work
- The Data
- Methodology
- Simple Distance Metrics
- Linear Regression
- Principle Component Analysis
- Results
- Simple Distance Metrics
- Linear Regression
- Principle Component Analysis
- Conclusions
- Future Work
16Results Simple Distance Metric
Matches Gabor Markov Co-Occurrence Gabor, Markov, and Co-Occurrence All Features
6 10 24 18 31 36 43
2 5 107 104 94 98 93
0 1 18 27 24 15 13
17Matches Nodule 117
18Simple Distance Metrics
195 9 Matches PCA and Linear Regression
20Results Linear Regression
Data Set No. of Nodule Pairs ( 2/3 Set) Correlation Euclidean vs. Semantic R2 Adj. R2 Feature Set Distance
6 9 Matches 166 -0.016 0.948 0.871 2 -
6 9 Matches 166 -0.016 0.802 0.679 1 dist3
5 9 Matches 218 -0.006 0.927 0.850 2 -
5 9 Matches 218 -0.006 0.733 0.624 1 dist3
21Results Linear Regression
Data Set No. of Nodule Pairs (1/3 Set) Correlation Euclidean vs. Semantic RMSD Euclidean Correlation Predicted vs. Semantic RMSD Predicted Features
6 9 Matches 85 -0.023 0.2328 0.710 0.0242 128
6 9 Matches 85 -0.023 0.2328 0.748 0.0181 64
5 9 Matches 108 -0.039 0.1985 0.829 0.0136 128
5 9 Matches 108 -0.039 0.1985 0.733 0.0155 64
22Results Linear Regression
23Results Linear Regression
24Results PCA
Data Set No. of Nodule Pairs ( 1/3 Set) Correlation Euclidean vs. Semantic RMSD Euclidean Correlation Predicted vs. Semantic RMSD Predicted Features
6 9 Matches 85 -0.115 0.3043 0.787 0.0061 128
6 9 Matches 85 -0.115 0.3043 0.393 0.0114 64
5 9 Matches 108 -0.094 0.2664 0.570 0.0096 128
5 9 Matches 108 -0.094 0.2664 0.136 0.0112 64
25Results PCA
26Results PCA
27RMSD Percent of Range
Linear Regression No PCA Linear Regression No PCA Linear Regression PCA Linear Regression PCA
Data Set Features Euclidean Predicted Euclidean Predicted
6 9 Matches 128 23.3 17.3 30.4 6.7
6 9 Matches 64 23.3 12.9 30.4 12.5
5 9 Matches 128 19.9 9.7 26.6 10.1
5 9 Matches 64 19.9 11.1 26.6 11.8
28Example Nodule 37 and Nodule 38
Nodule 38 Nodule 37 Euclidean Similarity Value PCA Similarity Value
0.549066 0.004379
Nodule Number Lobulation Malignancy Margin Sphericity Spiculation Subtlety Texture
37 5 3 5 5 5 4 5
38 5 3 5 5 5 5 5
29Future Work
- Perform the analysis only nodules on which all
three radiologists agree - In order to address the small size of the data
set, perform the analysis using a leave one out
technique (instead of 2/3 training and 1/3
testing) - Incorporate relevance feedback into the system
30Questions?