Towards the Goal of Searchable Clinical Image Repositories - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Towards the Goal of Searchable Clinical Image Repositories

Description:

Towards the Goal of Searchable Clinical Image Repositories – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 44
Provided by: ulys5
Category:

less

Transcript and Presenter's Notes

Title: Towards the Goal of Searchable Clinical Image Repositories


1
Towards the Goal of Searchable Clinical Image
Repositories
  • Ulysses J. Balis, M.D.
  • Director of Clinical Informatics
  • Co-Director, Division of Pathology Informatics
  • Department of Pathology
  • University of Michigan
  • ulysses_at_umich.edu

2
Learning Objectives
  • Overview of the salient history of the underlying
    of digital imagery technology of histopathology
    repositories
  • Recognize digital representation of images as the
    key transformative element enabling Digital
    Microscopy
  • Familiarity with the topic of dimensional
    reduction and its utility in reducing the search
    complexity associated with large repositories
  • Familiarity with some candidate algorithmic /
    heuristic approaches to image search /
    content-based image retrieval (CBIR)
  • Followed by representative real-time
    demonstrations of clinically-relevant CBIR

3
(No Transcript)
4
(No Transcript)
5
(No Transcript)
6
Some Observations
  • Moore's Law is equally applicable to Pathology
    Informatics as it is to semiconductor scaling
  • We (Pathologists) must operate as the manifest
    stewards of our own data (or be rendered as moot,
    in the overall enterprise IT equation)
  • New modalities are heavily dependent on IT
    understanding and support
  • High-throughput molecular testing platforms
  • All-digital signout model made possible by whole
    slide imaging
  • Shift from qualitative approaches to quantitative
    ones, as we shift from clinical to pre-clinical
    diagnostic arenas (e.g. whole-slide analysis and
    multispectral analysis)
  • Unsustainability of the current trajectory of
    monolithic EHR architectures.

7
Some Observations
  • Moore's Law is equally applicable to Pathology
    Informatics as it is to semiconductor scaling
  • We (Pathologists) must operate as the manifest
    stewards of our own data (or be rendered as moot,
    in the overall enterprise IT equation)
  • New modalities are heavily dependent on IT
    understanding and support
  • High-throughput molecular testing platforms
  • All-digital signout model made possible by whole
    slide imaging
  • Shift from qualitative approaches to quantitative
    ones, as we shift from clinical to pre-clinical
    diagnostic arenas (e.g. whole-slide analysis and
    multispectral analysis)
  • Unsustainability of the current trajectory of
    monolithic EHR architectures.

8
The CCD the fundamental transformative
technology enabling creation of wide-field
datasets
9
(No Transcript)
10
Digital Representation of Images as the Key
Transformative Element Enabling Digital Microscopy
  • Without the image data in digital format, there
    is no cogent question that can be asked, as there
    is not dataset available to query.
  • With the advent of increasingly comprehensive
    digital image repositories, we encounter an
    entirely different situation essentially an
    embarrassment of riches as we now have more data
    than is easily parsed by conventional linear
    programming.
  • this is a transformative enabling step,
    nonetheless
  • As a confirming reality check Radiology has
    already firmly entered the realm of investigation
    of computer aided diagnosis (CAD), although it is
    cogent to recognize that their current datasets
    are much smaller that those now possible with
    digital whole-slide imaging
  • And as such, the question becomes one of
    algorithmic and heuristic development.
  • Hint, we know this is possible, as the human
    brain carries out real-time CBIR with high
    sensitivity and specificity.
  • Caveat recognizing the that human brain is
    massively parallel in construction,
    recapitulating this with current computational
    technology may be impractical

11
Some Observations Concerning Slide data Density
  • Characteristics
  • 2.5 by 7.5 cm
  • 1/3 used for label
  • 2.5 x 5.0 cm for tissue display
  • Typical light microscopy is diffraction-limited
    to 0.25 microns
  • Yields an effective required pixel count of 100K
    by 200k pixels (2.3 Gb) or a 20k MPixel Image
  • This is the same things as saying that one would
    need to capture 20,000 images with a 1 MPixel
    camera to obtain a single slide
  • Linear programming on datasets of this size is
    costly, in terms of time and storage.

2.5 cm
7.5 cm
5 cm
(1000 x 25) / 0.25 microns 100,000 linear pixels
(1000 x 50) / 0.25 microns 200,000 linear pixels
vs. a relatively insignificant 4 MPixel
Image
This is a 20 GPixel image
12
Compelling Use Cases for Image Query
  • Diagnostic decision support
  • Longitudinal evaluation
  • Differential diagnosis generation
  • Detection of rare events
  • Teaching
  • Discovery

13
Current World View of Pathology Imagery
Repositories
  • Model 1 Relational Database
  • Image Metadata associated with case-level data
  • Entire Schema required to carry out discovery
  • Text-based
  • Image data is a passive component of the query
  • Model 2 Metadata-tagged Images
  • Image Metadata associated with each image
  • Image becomes a self-contained dataset available
    for discovery
  • Text-based
  • Image data is a passive component of the query

Entry in master accession table
Associated case and image descriptors
14
Highly Desirable World View of Pathology Imagery
Repositories (Future State)
  • Model 3 Metadata-tagged surface map
  • Image Metadata exists at the image level and is
    spatially coupled to underlying digital imagery
  • Discovery can be carried out on the image-space
    itself, with retrieved metadata classifiers
    available for generating search result sets (e.g.
    differential diagnosis generation)
  • Image-based
  • Model 4 Surface discovery
  • Non-metadata-associated digital imagery is
    spatially probed for statistical convergence with
    an image-based query set
  • Imagery becomes a self-contained dataset
    available for discovery
  • Image-based

?
?
15
Lop Nor
Vector quantization a forgotten algorithm.
16
Attributes of an ideal search system
  • Self-training, domain independent image
    segmentation / classification tool.
  • Allows for at least two novel image search
    modalities
  • Region of interest Query by example (image space
    search not text based)
  • Retrieve diagnostic information associated with
    prior classified fields, enabling the generation
    of dynamically generated differential diagnosis
  • Useful as a bridge for exploration of stochastics
    of multi-dimensional image space data when
    queried in tandem with high-dimensionality data
    sets types (genomics, proteomics, etc.)
  • i.e. Morphogenomics
  • Ability to carry out real time assessment of
    regions of interest against Terascale / Petascale
    image repositories.

17
On the prospect of analyzing 1000s of Gigabytes
of data in real-time
18
1.415461031044954789001553027745e9864
2 x 2 vector 2564 possible values in a
four-dimensional space
What is an Image Vector?
4,294,967,296 possible values
Typically, vectors have ordinality of 8 x 8 or
greater
19
General Approaches to Image Analysis
  • Supervised Learning
  • Algorithm interacts with expert or another
    training data source such that features of
    interest are actively selected and classified
    during the training stage
  • Time consuming
  • Potential to converge to a solution with smaller
    training sets
  • Variable robustness of predictive power when
    convergence is detected
  • Unsupervised Learning
  • Algorithm parses data autonomously, without
    user/expert intervention
  • Faster/ suitable for turnkey automation
  • Slower convergence (if ever) on a solution set.
  • Need for higher-dimensional systems
  • Statistically robust when convergence is
    identified

20
General Approaches to Image Analysis
  • Conventional Image Analysis
  • Algorithms based upon spatially-, frequency- or
    phase-space data present in image
  • Length scale hypothesis in effect structural
    elements are usually the target
  • Often requires manual length-scale and magnitude
    scale optimization to enhance detection accuracy
  • Some expertise in algorithm operation desirable
  • Unstructured Classification
  • Classification of vectors in high-dimensional
    space based upon all-comers hypothesis
  • No tuning required
  • No expertise required
  • Approach leads to a plurality of classifiers for
    every atomic spatial element, which must then be
    annealed to a superclass. (this can require
    manual vector sorting)

21
Candidate Algorithmic / Heuristic Approaches to
Image Search / Content-based Image Retrieval
(CBIR)
  • Principle component analysis (PCA)
  • Bayesian Belief Networks
  • Support Vector Engines couple to multi-parametric
    conventional image analysis
  • Dimensional reduction via manifold projection
    techniques, where high-dimensional distinctions
    of statistical significance are preserved in the
    low-dimensional projection.
  • Vector Quantization
  • Galois Field Manifold Basis operators as an
    inductive extrapolative technique of probable
    (but unspecified) adjacency characteristics of
    low dimensional candidate manifolds (manifold
    extrapolation)
  • Many others.
  • All the above approached have strengths and
    weaknesses there is currently no one best
    solution.

22
An Issue of Dimensional Reduction
  • Problem With the prospect of a typical 100x100
    kernel (10,000 dimensional spaces), computational
    approaches carried out on raw data sets can take
    millions of years to complete, even with our
    fastest current supercomputers. (bad for
    turn-around time)
  • Fortunately, there are mathematical operations
    that can sidestep this computational annoyance.
  • Support Vector Engines
  • K-means approaches
  • Bayesian Networks
  • Vector Quantization
  • Galois Field Manifold Projection / Tensor
    Integration

23
Pythagorean Theorem
b
On all PCs and high-end workstations (and most
Macs), 916 does indeed result in 25
5 x 5
3 x 3
a
4 x 4
b
24
Vector Quantization
Original Image
Division of image into local domains
Extraction of Local Domain Composite Vectors
?
VKSLx0y0Order , LxnymOrder
Vectorization of each local kernel
Individual assessment of each vector dimension
25
Vector Quantization
VKSLx0y0Order , LxnymOrder
Established Vocabulary
Query Against library (Vocabulary) of established
Galois Vectors
Novel Vector
Previously Identified Vector
Assignment of a unique serial number and
inclusion into global vocabulary
Assembly of compressed dataset
38857448643
26
VQ-Based Image Compressiona fantastic
opportunity for automated search
Raw Data
Restored Data
Compressed data (preserved spatial organization
of original data)
Depending on the selected compression ratio,
restored loss-compression imagery may or may not
be of diagnostic quality.
27
(No Transcript)
28
Galois Field Theory
29
A Typical Dimensional Reduction Galois Field
Question
  • What is the mean densitometrically-weighted
    distance of a single test vector to a statistical
    manifold of established centroids (thus
    establishing similarity or difference)?

30
(No Transcript)
31
What are the boundary conditions?
32
General Form
What is the integral of the Galois Field?
?
33
Which, after integration by parts, yields
34


1,1
1,2
2,1
1,1 1,2 .. 1,n
2,1 2,2 .. 2,n
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
n,1 n,2 .. n,n
n,n
Resultant Input Vector Kernel of n?n?3
dimensionality
Initial n by n sub-region of image
For every location

Canonical V.Q. Tensor
Each location is an RGB triplet hence, each
vector component is itself a triplet sub-vector.
Galois Field Transform
35
Typical Galois Field mapped to the even
Jacobian/Chebyshev tensor polynomials manifested
on the edge of the complexity transition
  • On Galois Fields
  • Not merely a clustering algorithm
  • The resulting field is a non-linear N-space
    manifold selected for its distinctiveness from
    all other modular functions in the Galois set
    space
  • Fields may have local minima and local extrema
  • Any Galois manifold is exclusive of any other
    Galois set
  • Non-trivial to calculate trivial to query

36
Local Islands in Galois Field Space of
statistical convergence and near-convergence to
high-probability feature matches using support
vector analysis
37
Convergence with increasing Vocabulary Size
38
Regions of a typical Galois manifold with no
correlation to established vocabulary tensors are
easily recognized as exhibiting chaotic behavior
and are therefore excluded.
39
How does this approach differ from traditional
N-space cluster analysis?
  • Conventional
  • Algorithms are custom designed for a narrow
    recognition task
  • Often requires customization with expert
    programming
  • Low tolerance to variability in source format
  • VQ-Galois
  • General matching algorithm agnostic to input data
    format
  • No end-user customization required
  • Designed to improve with increased data pool size
    (self-training)

40
(No Transcript)
41
(No Transcript)
42
Some Demonstrations
43
Summary
  • Increasing availability of whole slide digital
    data creates at least the possibility to carry
    our CBIR for basic clinical tasks
  • Similar case retrieval
  • Differential diagnosis generation
  • Grading /staging decision support
  • Rare event identification
  • Much effort is still required to increase the
    speed and accuracy of the current generation of
    both supervised and unsupervised approaches for
    the time being, these algorithms should be viewed
    as investigational use only, unless otherwise
    stated.
  • Initial reports in this field suggest that the
    computational challenge can be solved.
  • Pilot toolsets will be available for
    investigative use via the internet, within the
    year if not sooner.
Write a Comment
User Comments (0)
About PowerShow.com