Effect%20of%20Linearization%20on%20Normalized%20Compression%20Distance - PowerPoint PPT Presentation

About This Presentation

Title:

Effect%20of%20Linearization%20on%20Normalized%20Compression%20Distance

Description:

Kolmogorov Complexity is an emerging similarity metric ... Li (2004): successful clustering of phylogeny trees, music, text files. 1D to 2D data? ... – PowerPoint PPT presentation

Number of Views:45

Avg rating:3.0/5.0

Slides: 42

Provided by: jonathanm7

Learn more at: http://facweb.cs.depaul.edu

Category:

more less

Transcript and Presenter's Notes

Title: Effect%20of%20Linearization%20on%20Normalized%20Compression%20Distance

1
Effect of Linearization on Normalized Compression
Distance

Jonathan Mortensen
Julia Wu
DePaul University
July 2009

2
Introduction

Kolmogorov Complexity is an emerging similarity
metric
Transformation Distance
Universal Similarity Measure
Does not require feature identification and
selection
How can it be applied to images?
CBIR, Classification
Investigate its effectiveness
Discovered some fundamentals have been overlooked
thus far

3
Outline

Background
Kolmogorov Complexity and Complearn
Research Topics
Spatial Transformations
Intensity Transformations
Image Groupings
Conclusion
Future Work

4
Background

Li (2004) successful clustering of phylogeny
trees, music, text files
1D to 2D data?
Tran (2007) NCD not a good predictor of visual
indistinguishability
Only one photograph used, one type of
linearization (row-by-row)
Gondra (2008) CBIR using NCD produced
statistically significant measures against H0 of
random retrieval and other similarity measures
Test set of hundreds of images, inconsistent
methods of compression and concatenation,
linearization unclear

5
Kolmogorov Complexity

K(x) The length of the shortest program or
string x to produce x
K(xy) - The shortest binary string to convert
output x given input y
E(x,y)maxK(xy),K(yx)
Normalized Information Distance

6
Kolmogorov Complexity

Universal, in that it captures all other
semi-computable normalized distance measures
Therefore also semi-computable
Compression losslessly simplifies strings, and
therefore is used as an approximation, C(x)

The human brain is incapable of creating
anything which is really complex.--Kolmogorov,
A.N., Statistical Science, 6, p314, 1990
7
CompLearn

Open Source package which implements K-Complexity
Developed by Rudi Cilibrasi, Anna Lissa Cruz,
Steven de Rooij, and Maarten Keijzer
Uses basic linux compression tools to develop the
comparison map

8
(No Transcript)
9
Images from Google Similar Images
10
Initial Questions

Linearization Methods and Alternatives
How to Preserve a 2D signal
Linearizations affect NCD on spatial
transformations and intensity shifts
Do additional feature images lower NCD?
CBIR Can K-Complexity be used with feature
vectors or image semantics

11
Spatial Transformations

Applied 4 types of linearization to 800 images
(original and 7 transformations)
Found that each linearization type produced
distinctly different NCDs
Certain linearizations result in lower NCDs for
certain transformations

12
Linearization Methods
Row Major
Column Major
Hilbert-Peano SPC Images transformed to 128x128
SCPO Images transformed to 35 of original size
13
Spatial Transformations
Original Image
Down Shift
Left Shift
180 rotation
90 rotation
270 rotation
Reflection Y Axis
Reflection X Axis
14
Intensity Transformations

Additive Constant
Three types of noise
Gaussian
Speckle
Salt and Pepper
Least Significant Bit (LSB) Steganography
Contrast Windowing

15
Additive Constant
Image 937.jpg 32 and 64 respectively

P Intensity Constant
4, 8, 12 100
16 bit
255 (4)-gt 259
Truncation
255 (4)-gt 255
Wrap
255 (4)-gt 4

16
Additive Constant
17
Various Noise

Gaussian (Statistical)
Speckle (Multiplicative)
Salt and Pepper (Drop-off)

0.32 and 0.64 Variance/Noise Density Respectively
18
Noise Cont

Gaussian and Speckle Noise dont compress well
Gaussian and Salt Pepper experience some
posterior decay

19
Least Significant Bit Steganography

Hide4PGP
Scrambles message
Changes pixel bit to most similar color with
opposite bit assignment
Spreads secret data over entire file
True Grayscale Changes two bits per pixel

Image with No Text
Image hiding Gettysburg Address
20
LSB Steganography
21
Hamming Distance
22
(No Transcript)
23
Contrast Windowing

Computed Tomography image enhancement that
increases contrast in certain structures
Brief Medical Exploration

24
Contrast Windowing

Bone Window (300 HU, width 1500 HU)

Lung Window (-200 HU, width 2000 HU)

Patient 5 Original Image top left

Soft Tissue Window (50 HU, width 350 HU)

25
P1
P3
original bone lung tiss
p1 0 1.028241 1.049258 1.02429
bone 1.028241 0 1.036157 1.011354
lung 1.049258 1.036157 0 1.039524
tiss 1.02429 1.011519 1.039524 0
p3 0 1.02097 1.043942 1.025635
bone 1.020539 0 1.037073 1.014142
lung 1.044137 1.037073 0 1.037244
tiss 1.026016 1.014354 1.037244 0
p5 0 1.020947 1.047888 1.023039
bone 1.020947 0 1.038712 1.019146
lung 1.047888 1.038712 0 1.036131
tiss 1.023039 1.019924 1.036131 0
P5
26
Cross Dicom Comparison
p1tiss p1lung p1bone p1 p3tiss p3lung p3bone p3 p5tiss p5lung p5bone p5
p1tiss 0.0000 1.0395 1.0115 1.0243 0.9739 1.0390 1.0157 1.0223 0.9813 1.0325 1.0066 1.0234
p1lung 1.0395 0.0000 1.0362 1.0493 1.0362 0.9772 1.0361 1.0485 1.0410 0.9853 1.0412 1.0477
p1bone 1.0114 1.0362 0.0000 1.0282 1.0158 1.0378 0.9642 1.0278 1.0197 1.0365 0.9761 1.0247
p1 1.0243 1.0493 1.0282 0.0000 1.0255 1.0460 1.0258 0.9811 1.0258 1.0455 1.0240 1.0025
p3tiss 0.9741 1.0362 1.0168 1.0255 0.0000 1.0372 1.0144 1.0260 0.9810 1.0328 1.0140 1.0222
p3lung 1.0390 0.9772 1.0378 1.0460 1.0372 0.0000 1.0371 1.0441 1.0434 0.9874 1.0418 1.0513
p3bone 1.0137 1.0361 0.9650 1.0258 1.0141 1.0371 0.0000 1.0205 1.0175 1.0360 0.9728 1.0220
p3 1.0238 1.0485 1.0271 0.9811 1.0256 1.0439 1.0210 0.0000 1.0278 1.0414 1.0218 0.9997
p5tiss 0.9932 1.0410 1.0180 1.0258 0.9821 1.0434 1.0172 1.0278 0.0000 1.0361 1.0199 1.0230
p5lung 1.0325 0.9853 1.0365 1.0455 1.0328 0.9874 1.0360 1.0414 1.0361 0.0000 1.0387 1.0479
p5bone 1.0062 1.0412 0.9757 1.0240 1.0142 1.0418 0.9724 1.0217 1.0191 1.0387 0.0000 1.0209
p5 1.0234 1.0477 1.0247 1.0025 1.0222 1.0513 1.0220 0.9997 1.0230 1.0479 1.0209 0.0000
27
Conclusion "How Many" vs "How Little"

NCD for Ordinal Comparisons
Numerical Redundancy

Selective
Entire Picture
Gaussian Speckle Noise
Salt and Pepper Noise
Steganography
Additive Constants
Contrast Windowing
Larger NCD
Smaller NCD
28
Feature Image Comparison and Grouping

Feature Image Pixel based values derived from
the original image
3 Main Types of Linearization
Avg NCD inter gt Avg NCD intra
The greater inter - intra, the better NCD finds
groupings

29
Feature Image Linearization

Image-At-Once row-order one feature image at a
time
Row Concatenation Appends all images, then
performs row-order linearization
Pixel Order Selects value from same pixel of
each feature image in row-order fashion
Gray Row-Major Grayscales an image and follows
row-order on intensities

30
(No Transcript)
31
Data Set and Methods

Corel Image Database with 10 predefined groupings
Linearized by 5 methods
NCDs were found within a group and then to the
left and to the right

32
(No Transcript)
33
Results

Nearly every linearization produced statistically
different NCDs
Intra Group was always less than Inter Group
Gray provided the greatest difference Inter-Intra
Thought this was due to filesize
Triple Concated Gray creating equal filesize
Found an even greater difference

34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
Conclusion

NCD is a good model for predefined human
groupings and linearization has little impact on
this
Gray-Triple Row-Major may be the best form of
linearization
Direction of concatenation does not matter
Defined a methodology for any number of feature
images

38
Conclusion

Compressor Errors
Numerical Redundancy
Ordinal Variables vs Nominal Variables
EX 195 195 195 195 ltgt 198 198 198 198
NCD 0.100000
199 199 199 199 ltgt 202 202 202 202
NCD 0.128205
NCD needs refinement
2D image as a 1D string?

39
Future Work

Image Scaling and Normalization
Additional Feature Images
New Forms of Image concatenation
Investigate Compressors (Numeric?)

40
References

A. Itani and D. Manohar. Self-Describing
Context-Based pixel ordering. Lecture notes in
computer science, pages 124134, 2002.
M. Li, X. Chen, X. Li, B. Ma, and P. M.B Vitnyi.
The similarity metric. IEEE.Transactions on
Information Theory, 5012, 2004.
R. Dafner, D. Cohen-Or, and Y. Matias.
Context-based space lling curves. In Computer
Graphics Forum, volume 19, pages 209218.
Blackwell Publishers Ltd, 2000.
R. Cilibrasi, Anna L. Cruz, Steven de Rooij, and
Maarten Keijzer. CompLearn home.
http//www.complearn.org/.
R. Cilibrasi, P. Vitanyi, and R. de Wolf.
Algorithmic clustering of music. Arxiv preprint
cs.SD/0303025, 2003.
N. Tran. The normalized compression distance and
image distinguishability. Proceedings of SPIE,
649264921D, 2007.
I. Gondra and D. R. Heisterkamp. Content-based
image retrieval with the normalized information
distance. Computer Vision and Image
Understanding, 111(2)219228, 2008.