NSF REU Program in Medical Informatics - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

NSF REU Program in Medical Informatics

Description:

Title: Slide 1 Last modified by: Daniela Stan Raicu Document presentation format: Custom Other titles: Arial Arial Unicode MS Times New Roman Wingdings Symbol ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 2
Provided by: facwebCti
Category:

less

Transcript and Presenter's Notes

Title: NSF REU Program in Medical Informatics


1
NSF REU Program in Medical Informatics
1D. Raicu, 1J. Furst, 2D. Channin, 3S. Armato,
and 3K. Suzuki 1DePaul University, 2Northwestern
University, and University of Chicago
REU Data
  • Overview
  • Goal continue promoting interdisciplinary
    studies at the frontier between information
    technology and medicine to undergraduate students
    - especially students from groups historically
    underrepresented in exact sciences
  • Duration 10 weeks over the summer
  • Example Teaching
  • Interdisciplinary tutorials Image processing,
    machine learning
  • Technology tools tutorials MatLab, SPSS
  • Presentations by mentors about projects
  • Example Activities
  • Follow-on activities
  • Bi-weekly group meetings. presentations to entire
    MedIX group, final reports (in conference
    formats), seminars to support student publication
  • Special events
  • Day in the life of a PhD student, Developing a
    research career, Women in science, Tours of
    medical facilities, etc
  • Unique Site Aspect
  • Multi-institution multi-disciplinary site _at_ the
    frontier between computer science medicine
  • Statistics (2005-2007)
  • Students demographics 8 per year
  • Female 46 First generation college 15
    Outside of home institutions 73
  • Previously presenting a visual (poster) research
    presentation (31) or an oral research
    presentation (27), (co-) authored a publication
    in an academic journal (12), or in the previous
    two years been involved in any research projects
    (42).
  • Total number of Faculty mentors 4
  • Years of operation 2005 to 2010
  • Example Research topics see on the left side
  • Outcomes (2005-2007)
  • 88 students had at least one research
    publication
  • over 23 publications (1 journal paper, 15
    conference papers, 8 extended abstracts)
  • 3 honor theses senior projects, 4 graduate
    fellowships, and 1 CRA) honor mention for
    outstanding undergraduate research

Content-based versus Semantic-based Similarity
Retrieval A LIDC Case Study Sarah Jabona, Jacob
Furstb, Daniela Raicub aRose-Hulman Institute of
Technology, Terre Haute, IN 47803, bIntelligent
Multimedia Processing Laboratory, School of
Computer Science, Telecommunications, and
Information Systems, DePaul University, Chicago,
IL, USA, 60604
Introduction
Calculating Similarity
Similarity Comparisons
This work thoroughly investigates ways to
predict the results of a semantic-based image
retrieval system by using solely content-based
image features. We extend our previous work1 by
studying the relationships between the two types
of retrieval, content-based and semantic-based,
with the final goal of integrating them into a
system that will take advantage of both retrieval
approaches. Our results on the Lung Image
Database Consortium (LIDC) dataset show that a
substantial number of nodules identified as
similar based on image features are also
identified as similar based on semantic
characteristics. Furthermore, by integrating the
two types of features, the similarity retrieval
improves with respect to certain nodule
characteristics.
In order to assess the correlation between the
two similarity measures, we used a round robin
approach where we extracted one nodule as a query
and compared it to the remaining 148 nodules. We
took the k most similar values from each querys
semantic-based similarity ordered list and
content-based similarity ordered list and counted
how many nodules were common to both lists.
Here is an example with nodule 117 as the query
nodule. Below are the most similar nodules listed
with their attributes. Not
ice that the semantic similarity values have a
much smaller range from 0 to about 0.3, whereas
the content-based similarities range from 0 to 1.
Most of the semantic features are very similar.
A ranking of i signifies that nodule was the ith
most similar nodule in the list of similar
nodules based on the appropriate feature set.
Semantic-Based Features
Content-Based Features
The cosine similarity measure minimized the
ceiling effect. The similarity value calculation
using the cosine formula is shown below. The
histogram to the right is of the semantic-based
similarity values for all 11,026 nodule pairs.
Although the values do not represent a perfect
normal curve, the ceiling effect was drastically
improved from performing a simple distance on the
seven characteristics.
No. Image Semantic-Based Content-Based Semanti
c Feature Vector Ranking Similarity Value Ranki
ng Similarity Value Lob Mal Mar Sph Spic Su
b Tex 117 ? - 0 - 0 2 3 5 5 2 4 5 1
04 ? 2 0.004452 5 0.415918 2 3 4 4 2 3
4 126 ? 3 0.004596 6 0.421249 2 3 5 5
1 4 5 98 ? 6 0.006817 17 0.505317 2 3 4
5 2 3 5 28 ? 8 0.009119 16 0.504996 1
3 5 5 1 4 5 27 ? 11 0.012752 2 0.38051
7 1 3 5 5 1 3 5 137 ? 14 0.013072 9 0
.430289 1 3 4 5 1 3 4 127 ? 16 0.013606
11 0.474226 2 4 5 4 3 4 5 119 ? 17 0
.015268 20 0.538589 3 3 4 4 2 3 5 90 ?
20 0.016383 7 0.425751 1 2 3 4 1 2 4
Methodology


Figure 3 Histogram of Semantic- Based
Similarity
At right is a histogram of the content-based
similarity values for all 11,026 nodule pairs.
The similarity values are calculated with the
Euclidean distance, which is defined below, and
then min-max normalization is applied.4 At
the end of the feature extraction process, each
nodule is represented by a vector as shown below,
where c stands for a semantic concept and f for a
image feature.
Figure 5 Example of Image Retrieval Results
Figure 4 Histogram of Content- Based
Similarity
Analysis
Using k Number of Matches The number of nodules
that had 2 - 5 matches was relatively consistent
throughout all image features, but slightly
higher for Gabor and Markov. No combination of
image features had more than 10 matches out of
the twenty most similar. Below is a scatter
plot of the content-based similarity versus the
semantic-based similarity value.
Applying a Threshold We analyzed the
difference in the scales of similarity by seeing
how many matches there were based on thresholds.
Below is a graph of two different thresholds of
similarity0.02 and 0.04. These thresholds are
applied to the semantic similarity values. There
were many more matches within these thresholds.
Figure 1 Methodology
Image Data
Matches Gabor Markov Co-Occurrence Gabor, Mark
ov, and Co-Occurrence All Features 6 10 24
18 31 36 43 2 5 107 104 94 98 93 0
1 18 27 24 15 13
  • The LIDC contains complete thoracic CT scans for
    85 patients with lesions. Nodules with a diameter
    larger than three millimeters were rated by a
    panel of four radiologists.2
  • They rated 9 characteristics of the nodules the
    masses that they considered nodules. Seven of
    those characteristics are useful to our analysis,
    which were all on a scale of one to five
  • Lobulation, Malignancy, Margin, Sphericity,
    Spiculation, Subtlety, and Texture
  • For each image, we calculated 64 different
    content-based features1
  • Shape Features circularity, roughness,
    elongation, compactness, eccentricity, solidity,
    extent, and standard deviation of radial distance
  • Size Features area, convex area, perimeter,
    convex perimeter, equivalence diameter, major
    axis length, and minor axis length

Figure 6 Match Count in 20 Most Similar Nodules
Rad. Lob. Mal. Marg. Spher. Spic. Subt. Tex
t. A 3 4 4 2 4 3 4 B 4 3 4 4 3 5 5

C 4 2 3 4 3 4 5 D 4 3 2 2 4 3 3
Figure 6 Match Based on All Features and
Thresholds
4 3 4 3 3 3 5 Summarized
Conclusions
Our preliminary results show that a
substantial number of nodules identified as
similar based on image features are also
identified as similar based on semantic
characteristics and therefore, the image features
capture properties that radiologists look at when
interpreting lung nodules. There are many
similarity metrics that can be used to try to
correlate the two retrieval systems. We found the
Euclidean distance to be better for the
contentbased features and the cosine similarity
measure to be best for the semantic-based
characteristics. In our future work, we will try
principle component analysis and linear
regression on the data. Further research is
necessary to investigate further the correlations
between the two types of features and integrate
them in one retrieval system that will be of
clinical use.
Figure 2 Sample CT Scan with Four Radiologists
Ratings
Figure 7 Content-Based Similarity vs.
Semantic-Based Similarity
References
1 Lam, M., Disney, T., Pham, M., Raicu, D.,
Furst, J., Content-BasedRetrievalComputed
Tomography Nodule Images, SPIE Medical Imaging
Conference, San Diego, CA, February 2007. 2
The National Cancer Institute, Lung Imaging
Database Consortium (LIDC), http//imaging.cancer.
gov/programsandresources/InformationSystems/LIDC.
3 Li, Q., Li, F., Shiraishi, J., Katsuragwa,
S., Sone, S., Doi, K., Investigation of New
Psychophysical Measures for Evaluation of Similar
Images on Thoracic Computed Tomography for
Distinction between Benign and Malignant
Nodules, Medical Physics 302584-2593, 2003. 4
Han, J., Kamber, M., Data Mining Concepts and
Techniques, London Academic P, 2001.
Write a Comment
User Comments (0)
About PowerShow.com