Towards the Goal of Searchable Clinical Image Repositories - PowerPoint PPT Presentation

1 / 43

About This Presentation

Title:

Towards the Goal of Searchable Clinical Image Repositories

Description:

Towards the Goal of Searchable Clinical Image Repositories – PowerPoint PPT presentation

Number of Views:49

Avg rating:3.0/5.0

Slides: 44

Provided by: ulys5

Category:

more less

Transcript and Presenter's Notes

Title: Towards the Goal of Searchable Clinical Image Repositories

1
Towards the Goal of Searchable Clinical Image
Repositories

Ulysses J. Balis, M.D.
Director of Clinical Informatics
Co-Director, Division of Pathology Informatics
Department of Pathology
University of Michigan
ulysses_at_umich.edu

2
Learning Objectives

Overview of the salient history of the underlying
of digital imagery technology of histopathology
repositories
Recognize digital representation of images as the
key transformative element enabling Digital
Microscopy
Familiarity with the topic of dimensional
reduction and its utility in reducing the search
complexity associated with large repositories
Familiarity with some candidate algorithmic /
heuristic approaches to image search /
content-based image retrieval (CBIR)
Followed by representative real-time
demonstrations of clinically-relevant CBIR

3
(No Transcript)
4
(No Transcript)
5
(No Transcript)
6
Some Observations

Moore's Law is equally applicable to Pathology
Informatics as it is to semiconductor scaling
We (Pathologists) must operate as the manifest
stewards of our own data (or be rendered as moot,
in the overall enterprise IT equation)
New modalities are heavily dependent on IT
understanding and support
High-throughput molecular testing platforms
All-digital signout model made possible by whole
slide imaging
Shift from qualitative approaches to quantitative
ones, as we shift from clinical to pre-clinical
diagnostic arenas (e.g. whole-slide analysis and
multispectral analysis)
Unsustainability of the current trajectory of
monolithic EHR architectures.

7
Some Observations

Moore's Law is equally applicable to Pathology
Informatics as it is to semiconductor scaling
We (Pathologists) must operate as the manifest
stewards of our own data (or be rendered as moot,
in the overall enterprise IT equation)
New modalities are heavily dependent on IT
understanding and support
High-throughput molecular testing platforms
All-digital signout model made possible by whole
slide imaging
Shift from qualitative approaches to quantitative
ones, as we shift from clinical to pre-clinical
diagnostic arenas (e.g. whole-slide analysis and
multispectral analysis)
Unsustainability of the current trajectory of
monolithic EHR architectures.

8
The CCD the fundamental transformative
technology enabling creation of wide-field
datasets
9
(No Transcript)
10
Digital Representation of Images as the Key
Transformative Element Enabling Digital Microscopy

Without the image data in digital format, there
is no cogent question that can be asked, as there
is not dataset available to query.
With the advent of increasingly comprehensive
digital image repositories, we encounter an
entirely different situation essentially an
embarrassment of riches as we now have more data
than is easily parsed by conventional linear
programming.
this is a transformative enabling step,
nonetheless
As a confirming reality check Radiology has
already firmly entered the realm of investigation
of computer aided diagnosis (CAD), although it is
cogent to recognize that their current datasets
are much smaller that those now possible with
digital whole-slide imaging
And as such, the question becomes one of
algorithmic and heuristic development.
Hint, we know this is possible, as the human
brain carries out real-time CBIR with high
sensitivity and specificity.
Caveat recognizing the that human brain is
massively parallel in construction,
recapitulating this with current computational
technology may be impractical

11
Some Observations Concerning Slide data Density

Characteristics
2.5 by 7.5 cm
1/3 used for label
2.5 x 5.0 cm for tissue display
Typical light microscopy is diffraction-limited
to 0.25 microns
Yields an effective required pixel count of 100K
by 200k pixels (2.3 Gb) or a 20k MPixel Image
This is the same things as saying that one would
need to capture 20,000 images with a 1 MPixel
camera to obtain a single slide
Linear programming on datasets of this size is
costly, in terms of time and storage.

2.5 cm
7.5 cm
5 cm
(1000 x 25) / 0.25 microns 100,000 linear pixels
(1000 x 50) / 0.25 microns 200,000 linear pixels
vs. a relatively insignificant 4 MPixel
Image
This is a 20 GPixel image
12
Compelling Use Cases for Image Query

Diagnostic decision support
Longitudinal evaluation
Differential diagnosis generation
Detection of rare events
Teaching
Discovery

13
Current World View of Pathology Imagery
Repositories

Model 1 Relational Database
Image Metadata associated with case-level data
Entire Schema required to carry out discovery
Text-based
Image data is a passive component of the query

Model 2 Metadata-tagged Images
Image Metadata associated with each image
Image becomes a self-contained dataset available
for discovery
Text-based
Image data is a passive component of the query

Entry in master accession table
Associated case and image descriptors
14
Highly Desirable World View of Pathology Imagery
Repositories (Future State)

Model 3 Metadata-tagged surface map
Image Metadata exists at the image level and is
spatially coupled to underlying digital imagery
Discovery can be carried out on the image-space
itself, with retrieved metadata classifiers
available for generating search result sets (e.g.
differential diagnosis generation)
Image-based

Model 4 Surface discovery
Non-metadata-associated digital imagery is
spatially probed for statistical convergence with
an image-based query set
Imagery becomes a self-contained dataset
available for discovery
Image-based

?
?
15
Lop Nor
Vector quantization a forgotten algorithm.
16
Attributes of an ideal search system

Self-training, domain independent image
segmentation / classification tool.
Allows for at least two novel image search
modalities
Region of interest Query by example (image space
search not text based)
Retrieve diagnostic information associated with
prior classified fields, enabling the generation
of dynamically generated differential diagnosis
Useful as a bridge for exploration of stochastics
of multi-dimensional image space data when
queried in tandem with high-dimensionality data
sets types (genomics, proteomics, etc.)
i.e. Morphogenomics
Ability to carry out real time assessment of
regions of interest against Terascale / Petascale
image repositories.

17
On the prospect of analyzing 1000s of Gigabytes
of data in real-time
18
1.415461031044954789001553027745e9864
2 x 2 vector 2564 possible values in a
four-dimensional space
What is an Image Vector?
4,294,967,296 possible values
Typically, vectors have ordinality of 8 x 8 or
greater
19
General Approaches to Image Analysis

Supervised Learning
Algorithm interacts with expert or another
training data source such that features of
interest are actively selected and classified
during the training stage
Time consuming
Potential to converge to a solution with smaller
training sets
Variable robustness of predictive power when
convergence is detected

Unsupervised Learning
Algorithm parses data autonomously, without
user/expert intervention
Faster/ suitable for turnkey automation
Slower convergence (if ever) on a solution set.
Need for higher-dimensional systems
Statistically robust when convergence is
identified

20
General Approaches to Image Analysis

Conventional Image Analysis
Algorithms based upon spatially-, frequency- or
phase-space data present in image
Length scale hypothesis in effect structural
elements are usually the target
Often requires manual length-scale and magnitude
scale optimization to enhance detection accuracy
Some expertise in algorithm operation desirable

Unstructured Classification
Classification of vectors in high-dimensional
space based upon all-comers hypothesis
No tuning required
No expertise required
Approach leads to a plurality of classifiers for
every atomic spatial element, which must then be
annealed to a superclass. (this can require
manual vector sorting)

21
Candidate Algorithmic / Heuristic Approaches to
Image Search / Content-based Image Retrieval
(CBIR)

Principle component analysis (PCA)
Bayesian Belief Networks
Support Vector Engines couple to multi-parametric
conventional image analysis
Dimensional reduction via manifold projection
techniques, where high-dimensional distinctions
of statistical significance are preserved in the
low-dimensional projection.
Vector Quantization
Galois Field Manifold Basis operators as an
inductive extrapolative technique of probable
(but unspecified) adjacency characteristics of
low dimensional candidate manifolds (manifold
extrapolation)
Many others.
All the above approached have strengths and
weaknesses there is currently no one best
solution.

22
An Issue of Dimensional Reduction

Problem With the prospect of a typical 100x100
kernel (10,000 dimensional spaces), computational
approaches carried out on raw data sets can take
millions of years to complete, even with our
fastest current supercomputers. (bad for
turn-around time)
Fortunately, there are mathematical operations
that can sidestep this computational annoyance.
Support Vector Engines
K-means approaches
Bayesian Networks
Vector Quantization
Galois Field Manifold Projection / Tensor
Integration

23
Pythagorean Theorem
b
On all PCs and high-end workstations (and most
Macs), 916 does indeed result in 25
5 x 5
3 x 3
a
4 x 4
b
24
Vector Quantization
Original Image
Division of image into local domains
Extraction of Local Domain Composite Vectors
?
VKSLx0y0Order , LxnymOrder
Vectorization of each local kernel
Individual assessment of each vector dimension
25
Vector Quantization
VKSLx0y0Order , LxnymOrder
Established Vocabulary
Query Against library (Vocabulary) of established
Galois Vectors
Novel Vector
Previously Identified Vector
Assignment of a unique serial number and
inclusion into global vocabulary
Assembly of compressed dataset
38857448643
26
VQ-Based Image Compressiona fantastic
opportunity for automated search
Raw Data
Restored Data
Compressed data (preserved spatial organization
of original data)
Depending on the selected compression ratio,
restored loss-compression imagery may or may not
be of diagnostic quality.
27
(No Transcript)
28
Galois Field Theory
29
A Typical Dimensional Reduction Galois Field
Question

What is the mean densitometrically-weighted
distance of a single test vector to a statistical
manifold of established centroids (thus
establishing similarity or difference)?

30
(No Transcript)
31
What are the boundary conditions?
32
General Form
What is the integral of the Galois Field?
?
33
Which, after integration by parts, yields
34

1,1
1,2
2,1
1,1 1,2 .. 1,n
2,1 2,2 .. 2,n
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
n,1 n,2 .. n,n
n,n
Resultant Input Vector Kernel of n?n?3
dimensionality
Initial n by n sub-region of image
For every location

Canonical V.Q. Tensor
Each location is an RGB triplet hence, each
vector component is itself a triplet sub-vector.
Galois Field Transform
35
Typical Galois Field mapped to the even
Jacobian/Chebyshev tensor polynomials manifested
on the edge of the complexity transition

On Galois Fields
Not merely a clustering algorithm
The resulting field is a non-linear N-space
manifold selected for its distinctiveness from
all other modular functions in the Galois set
space
Fields may have local minima and local extrema
Any Galois manifold is exclusive of any other
Galois set
Non-trivial to calculate trivial to query

36
Local Islands in Galois Field Space of
statistical convergence and near-convergence to
high-probability feature matches using support
vector analysis
37
Convergence with increasing Vocabulary Size
38
Regions of a typical Galois manifold with no
correlation to established vocabulary tensors are
easily recognized as exhibiting chaotic behavior
and are therefore excluded.
39
How does this approach differ from traditional
N-space cluster analysis?

Conventional
Algorithms are custom designed for a narrow
recognition task
Often requires customization with expert
programming
Low tolerance to variability in source format

VQ-Galois
General matching algorithm agnostic to input data
format
No end-user customization required
Designed to improve with increased data pool size
(self-training)

40
(No Transcript)
41
(No Transcript)
42
Some Demonstrations
43
Summary

Increasing availability of whole slide digital
data creates at least the possibility to carry
our CBIR for basic clinical tasks
Similar case retrieval
Differential diagnosis generation
Grading /staging decision support
Rare event identification
Much effort is still required to increase the
speed and accuracy of the current generation of
both supervised and unsupervised approaches for
the time being, these algorithms should be viewed
as investigational use only, unless otherwise
stated.
Initial reports in this field suggest that the
computational challenge can be solved.
Pilot toolsets will be available for
investigative use via the internet, within the
year if not sooner.