Title: Effect%20of%20Linearization%20on%20Normalized%20Compression%20Distance
1Effect of Linearization on Normalized Compression
Distance
- Jonathan Mortensen
- Julia Wu
- DePaul University
- July 2009
2Introduction
- Kolmogorov Complexity is an emerging similarity
metric - Transformation Distance
- Universal Similarity Measure
- Does not require feature identification and
selection - How can it be applied to images?
- CBIR, Classification
- Investigate its effectiveness
- Discovered some fundamentals have been overlooked
thus far
3Outline
- Background
- Kolmogorov Complexity and Complearn
- Research Topics
- Spatial Transformations
- Intensity Transformations
- Image Groupings
- Conclusion
- Future Work
4Background
- Li (2004) successful clustering of phylogeny
trees, music, text files - 1D to 2D data?
- Tran (2007) NCD not a good predictor of visual
indistinguishability - Only one photograph used, one type of
linearization (row-by-row) - Gondra (2008) CBIR using NCD produced
statistically significant measures against H0 of
random retrieval and other similarity measures - Test set of hundreds of images, inconsistent
methods of compression and concatenation,
linearization unclear
5Kolmogorov Complexity
- K(x) The length of the shortest program or
string x to produce x - K(xy) - The shortest binary string to convert
output x given input y - E(x,y)maxK(xy),K(yx)
- Normalized Information Distance
6Kolmogorov Complexity
- Universal, in that it captures all other
semi-computable normalized distance measures - Therefore also semi-computable
- Compression losslessly simplifies strings, and
therefore is used as an approximation, C(x)
The human brain is incapable of creating
anything which is really complex.--Kolmogorov,
A.N., Statistical Science, 6, p314, 1990
7CompLearn
- Open Source package which implements K-Complexity
- Developed by Rudi Cilibrasi, Anna Lissa Cruz,
Steven de Rooij, and Maarten Keijzer - Uses basic linux compression tools to develop the
comparison map
8(No Transcript)
9Images from Google Similar Images
10Initial Questions
- Linearization Methods and Alternatives
- How to Preserve a 2D signal
- Linearizations affect NCD on spatial
transformations and intensity shifts - Do additional feature images lower NCD?
- CBIR Can K-Complexity be used with feature
vectors or image semantics
11Spatial Transformations
- Applied 4 types of linearization to 800 images
(original and 7 transformations) - Found that each linearization type produced
distinctly different NCDs - Certain linearizations result in lower NCDs for
certain transformations
12Linearization Methods
Row Major
Column Major
Hilbert-Peano SPC Images transformed to 128x128
SCPO Images transformed to 35 of original size
13Spatial Transformations
Original Image
Down Shift
Left Shift
180 rotation
90 rotation
270 rotation
Reflection Y Axis
Reflection X Axis
14Intensity Transformations
- Additive Constant
- Three types of noise
- Gaussian
- Speckle
- Salt and Pepper
- Least Significant Bit (LSB) Steganography
- Contrast Windowing
15Additive Constant
Image 937.jpg 32 and 64 respectively
- P Intensity Constant
- 4, 8, 12 100
- 16 bit
- 255 (4)-gt 259
- Truncation
- 255 (4)-gt 255
- Wrap
- 255 (4)-gt 4
16Additive Constant
17Various Noise
- Gaussian (Statistical)
- Speckle (Multiplicative)
- Salt and Pepper (Drop-off)
0.32 and 0.64 Variance/Noise Density Respectively
18Noise Cont
- Gaussian and Speckle Noise dont compress well
- Gaussian and Salt Pepper experience some
posterior decay
19Least Significant Bit Steganography
- Hide4PGP
- Scrambles message
- Changes pixel bit to most similar color with
opposite bit assignment - Spreads secret data over entire file
- True Grayscale Changes two bits per pixel
Image with No Text
Image hiding Gettysburg Address
20LSB Steganography
21Hamming Distance
22(No Transcript)
23Contrast Windowing
- Computed Tomography image enhancement that
increases contrast in certain structures - Brief Medical Exploration
24Contrast Windowing
- Bone Window (300 HU, width 1500 HU)
- Lung Window (-200 HU, width 2000 HU)
- Patient 5 Original Image top left
- Soft Tissue Window (50 HU, width 350 HU)
25P1
P3
original bone lung tiss
p1 0 1.028241 1.049258 1.02429
bone 1.028241 0 1.036157 1.011354
lung 1.049258 1.036157 0 1.039524
tiss 1.02429 1.011519 1.039524 0
p3 0 1.02097 1.043942 1.025635
bone 1.020539 0 1.037073 1.014142
lung 1.044137 1.037073 0 1.037244
tiss 1.026016 1.014354 1.037244 0
p5 0 1.020947 1.047888 1.023039
bone 1.020947 0 1.038712 1.019146
lung 1.047888 1.038712 0 1.036131
tiss 1.023039 1.019924 1.036131 0
P5
26Cross Dicom Comparison
p1tiss p1lung p1bone p1 p3tiss p3lung p3bone p3 p5tiss p5lung p5bone p5
p1tiss 0.0000 1.0395 1.0115 1.0243 0.9739 1.0390 1.0157 1.0223 0.9813 1.0325 1.0066 1.0234
p1lung 1.0395 0.0000 1.0362 1.0493 1.0362 0.9772 1.0361 1.0485 1.0410 0.9853 1.0412 1.0477
p1bone 1.0114 1.0362 0.0000 1.0282 1.0158 1.0378 0.9642 1.0278 1.0197 1.0365 0.9761 1.0247
p1 1.0243 1.0493 1.0282 0.0000 1.0255 1.0460 1.0258 0.9811 1.0258 1.0455 1.0240 1.0025
p3tiss 0.9741 1.0362 1.0168 1.0255 0.0000 1.0372 1.0144 1.0260 0.9810 1.0328 1.0140 1.0222
p3lung 1.0390 0.9772 1.0378 1.0460 1.0372 0.0000 1.0371 1.0441 1.0434 0.9874 1.0418 1.0513
p3bone 1.0137 1.0361 0.9650 1.0258 1.0141 1.0371 0.0000 1.0205 1.0175 1.0360 0.9728 1.0220
p3 1.0238 1.0485 1.0271 0.9811 1.0256 1.0439 1.0210 0.0000 1.0278 1.0414 1.0218 0.9997
p5tiss 0.9932 1.0410 1.0180 1.0258 0.9821 1.0434 1.0172 1.0278 0.0000 1.0361 1.0199 1.0230
p5lung 1.0325 0.9853 1.0365 1.0455 1.0328 0.9874 1.0360 1.0414 1.0361 0.0000 1.0387 1.0479
p5bone 1.0062 1.0412 0.9757 1.0240 1.0142 1.0418 0.9724 1.0217 1.0191 1.0387 0.0000 1.0209
p5 1.0234 1.0477 1.0247 1.0025 1.0222 1.0513 1.0220 0.9997 1.0230 1.0479 1.0209 0.0000
27Conclusion "How Many" vs "How Little"
- NCD for Ordinal Comparisons
- Numerical Redundancy
Selective
Entire Picture
Gaussian Speckle Noise
Salt and Pepper Noise
Steganography
Additive Constants
Contrast Windowing
Larger NCD
Smaller NCD
28Feature Image Comparison and Grouping
- Feature Image Pixel based values derived from
the original image - 3 Main Types of Linearization
- Avg NCD inter gt Avg NCD intra
- The greater inter - intra, the better NCD finds
groupings
29Feature Image Linearization
- Image-At-Once row-order one feature image at a
time - Row Concatenation Appends all images, then
performs row-order linearization - Pixel Order Selects value from same pixel of
each feature image in row-order fashion - Gray Row-Major Grayscales an image and follows
row-order on intensities
30(No Transcript)
31Data Set and Methods
- Corel Image Database with 10 predefined groupings
- Linearized by 5 methods
- NCDs were found within a group and then to the
left and to the right
32(No Transcript)
33Results
- Nearly every linearization produced statistically
different NCDs - Intra Group was always less than Inter Group
- Gray provided the greatest difference Inter-Intra
- Thought this was due to filesize
- Triple Concated Gray creating equal filesize
Found an even greater difference
34(No Transcript)
35(No Transcript)
36(No Transcript)
37Conclusion
- NCD is a good model for predefined human
groupings and linearization has little impact on
this - Gray-Triple Row-Major may be the best form of
linearization - Direction of concatenation does not matter
- Defined a methodology for any number of feature
images
38Conclusion
- Compressor Errors
- Numerical Redundancy
- Ordinal Variables vs Nominal Variables
- EX 195 195 195 195 ltgt 198 198 198 198
- NCD 0.100000
- 199 199 199 199 ltgt 202 202 202 202
- NCD 0.128205
- NCD needs refinement
- 2D image as a 1D string?
39Future Work
- Image Scaling and Normalization
- Additional Feature Images
- New Forms of Image concatenation
- Investigate Compressors (Numeric?)
40References
- A. Itani and D. Manohar. Self-Describing
Context-Based pixel ordering. Lecture notes in
computer science, pages 124134, 2002. - M. Li, X. Chen, X. Li, B. Ma, and P. M.B Vitnyi.
The similarity metric. IEEE.Transactions on
Information Theory, 5012, 2004. - R. Dafner, D. Cohen-Or, and Y. Matias.
Context-based space lling curves. In Computer
Graphics Forum, volume 19, pages 209218.
Blackwell Publishers Ltd, 2000. - R. Cilibrasi, Anna L. Cruz, Steven de Rooij, and
Maarten Keijzer. CompLearn home.
http//www.complearn.org/. - R. Cilibrasi, P. Vitanyi, and R. de Wolf.
Algorithmic clustering of music. Arxiv preprint
cs.SD/0303025, 2003. - N. Tran. The normalized compression distance and
image distinguishability. Proceedings of SPIE,
649264921D, 2007. - I. Gondra and D. R. Heisterkamp. Content-based
image retrieval with the normalized information
distance. Computer Vision and Image
Understanding, 111(2)219228, 2008.
41Questions