Title: Entropic graphs: Applications
1Entropic graphs Applications
- Alfred O. Hero
- Dept. EECS, Dept BME, Dept. Statistics
- University of Michigan - Ann Arbor
hero_at_eecs.umich.edu - http//www.eecs.umich.edu/hero
- Dimension reduction and pattern matching
- Entropic graphs for manifold learning
- Simulation studies
- Applications to face and digit databases
21.Dimension Reduction and Pattern Matching
- 128x128 images of faces
- Different poses, illuminations, facial
expressions - The set of all face images evolve on a lower
dimensional imbedded manifold in R(16384)
3Face Manifold
4 Classification on Face Manifold
5Manifold LearningWhat is it good for?
- Interpreting high dimensional data
- Discovery and exploitation of lower dimensional
structure - Deducing non-linear dependencies between
populations - Improving detection and classification
performance - Improving image compression performance
-
6Background on Manifold Learning
- Manifold intrinsic dimension estimation
- Local KLE, Fukunaga, Olsen (1971)
- Nearest neighbor algorithm, Pettis, Bailey, Jain,
Dubes (1971) - Fractal measures, Camastra and Vinciarelli (2002)
- Packing numbers, Kegl (2002)
- Manifold Reconstruction
- Isomap-MDS, Tenenbaum, de Silva, Langford (2000)
- Locally Linear Embeddings (LLE), Roweiss, Saul
(2000) - Laplacian eigenmaps (LE), Belkin, Niyogi (2002)
- Hessian eigenmaps (HE), Grimes, Donoho (2003)
- Characterization of sampling distributions on
manifolds - Statistics of directional data, Watson (1956),
Mardia (1972) - Data compression on 3D surfaces, Kolarov, Lynch
(1997) - Statistics of shape, Kendall (1984), Kent, Mardia
(2001)
7Sampling on a Domain Manifold
2dim manifold
Embedding
Sampling distribution
Domain Sampling
A statistical sample
Observed sample
8Learning 3D Manifolds
Ref Tenenbaumetal (2000)
Ref Roweissetal (2000)
N400
N800
Swiss Roll
S-Curve
- Sampling density fy Uniform on manifold
9Sampled S-curve
Geodesic from A to B is shortest path
A
B
Euclidean Path is poor approximation
What is shortest path between points A and B
along manifold?
10Geodesic Graph Path Approximation
B
A
k-NNG skeleton k4
11ISOMAP (PCA) Reconstruction
- Compute k-NN skeleton on observed sample
- Run Dykstras shortest path algorithm between all
pairs of vertices of k-NN - Generate Geodesic pairwise distance matrix
approximation - Perform MDS on
- Reconstruct sample in manifold domain
12ISOMAP Convergence
- When domain mapping is an isometry, domain is
open and convex, and true domain dimension d is
known (de Silvaetal2001) - How to estimate d?
- How to estimate attributes of sampling density?
13How to Estimate d?
Landmark-ISOMAP residual curve For Abilene
Netflow data set
142. Entropic Graphs
- in D-dimensional
Euclidean space - Euclidean MST with edge power weighting gamma
- pairwise distance matrix over
- edge length matrix of spanning trees over
- Euclidean k-NNG with edge power weighting gamma
- When obtain Geodesic MST
15Example Uniform Planar Sample
16Example MST on Planar Sample
17Example k-NNG on Planar Sample
18Convergence of Euclidean MST
Beardwood, Halton, Hammersley Theorem
19GMST Convergence Theorem
Ref CostaHeroTSP2003
20k-NNG Convergence Theorem
21Shrinkwrap Interpretation
n400 n800
Dimension Shrinkage rate as vary number of
resampled points on M
22Joint Estimation Algorithm
- Convergence theorem suggests log-linear model
- Use bootstrap resampling to estimate mean graph
length and apply LS to jointly estimate slope and
intercept from sequence - Extract d and H from slope and intercept
233. Simulation Studies Swiss Roll
K4
GMST
kNN
- n400, fUniform on manifold
24Estimates of GMST Length
Bootstrap SE bar (83 CI)
25loglogLinear Fit to GMST Length
26GMST Dimension and Entropy Estimates
- From LS fit find
- Intrinsic dimension estimate
- Alpha-entropy estimate (
) - Ground truth
27MST/kNN Comparisons
MST
MST
n800
n400
kNN
kNN
n800
n400
28Entropic Graphs on S2 Sphere in 3D
GMST
kNN
- n500, fUniform on manifold
29k-NNG on Sphere S4 in 5D
- k7 for all algorithms
- kNN resampled 5 times
- Length regressed on 10 or 20 samples at end of
mean length sequence - 30 experiments performed
- ISOMAP always estimates d5
Histogram of resampled d-estimates of k-NNG
N1000 points uniformly distributed on S4
(sphere) in 5D
n
Table of relative frequencies of correct d
estimate
30kNN/GMST Comparisons
Table of relative frequencies of correct d
estimate
True Entropy
Estimated entropy (n 600)
31kNN/GMST Comparisons for Uniform Hyperplane
GMST
4-NN
32Improve Performance by Bootstrap Resampling
- Main idea Averaging of weak learners
- Using fewer (N) samples per MST estimate,
generate large number (M) of weak estimates of d
and H - Reduce bias by averaging these estimates
(Mgtgt1,N1) - Better than optimizing estimate of MST length
(M1,Ngtgt1)
Illustration of bootstrap resampling method
A,B N1 vs C M1
33kNN/GMST Comparisons for Uniform Hyperplane
Table of relative frequencies of correct d
estimate using the GMST, with (N 1) and without
(M 1) bias correction.
344. Application ISOMAP Face Database
- http//isomap.stanford.edu/datasets.html
- Synthesized 3D face surface
- Computer generated images representing 700
different angles and illuminations - Subsampled to 64 x 64 resolution (D4096)
- Disagreement over intrinsic dimensionality
- d3 (Tenenbaum) vs d4 (Kegl)
35 Application Yale Face Database
- Description of Yale face database 2
- Photographic folios of many peoples faces
- Each face folio contains images at 585 different
illumination/pose conditions - Subsampled to 64 by 64 pixels (4096 extrinsic
dimensions) - Objective determine intrinsic dimension and
entropy of a typical face folio
36Samples from Face database B
37GMST for 3 Face Folios
38Dimension Estimator Histograms for Face database B
Real valued intrinsic dimension estimates using
3-NN graph for face 1.
Real valued intrinsic dimension estimates using
3-NN graph for face 2.
39Remarks on Yale Facebase B
- GMST LS estimation parameters
- Local Geodesic approximation used to generate
pairwise distance matrix - Estimates based on 25 resamplings over 18 largest
folio sizes - To represent any folio we might hope to attain
- factor gt 600 reduction in degrees of freedom
(dim) - only 1/10 bit per pixel for compression
- a practical parameterization/encoder?
40Application MNIST Digit Database
Sample MNIST Handwritten Digits
41MNIST Digit Database
Histogram of intrinsic dimension estimates GMST
(left) and 5-NN (right) (M 1, N 10, Q 15).
Estimated intrinsic dimension
42MNIST Digit Database
ISOMAP (k 6) residual variance plot.
The digits database contains nonlinear
transformations, such as width distortions of
each digit, that are not adequately modeled by
ISOMAP!
43Conclusions
- Entropic graphs give accurate global and
consistent estimators of dimension and entropy - Manifold learning and model reduction
- LLE, LE, HE estimate d by finding local linear
representation of manifold - Entropic graph estimates d from global resampling
- Initialization of ISOMAP with entropic graph
estimator - Computational considerations
- GMST, kNN with pairwise distance matrix O(E log
E) - GMST with greedy neighborhood search O(d n log
n) - kNN with kdb tree partitioning O(d n log n)
44References
- A. O. Hero, B. Ma, O. Michel and J. D. Gorman,
Application of entropic graphs, IEEE Signal
Processing Magazine, Sept 2002. - H. Neemuchwala, A.O. Hero and P. Carson,
Entropic graphs for image registration, to
appear in European Journal of Signal Processing,
2003. - J. Costa and A. O. Hero, Manifold learning with
geodesic minimal spanning trees, to appear in
IEEE T-SP (Special Issue on Machine Learning),
2004. - A. O. Hero, J. Costa and B. Ma, "Convergence
rates of minimal graphs with random vertices,"
submitted to IEEE T-IT, March 2001. - J. Costa, A. O. Hero and C. Vignat, "On solutions
to multivariate maximum alpha-entropy Problems",
in Energy Minimization Methods in Computer Vision
and Pattern Recognition (EMM-CVPR), Eds. M.
Figueiredo, R. Rangagaran, J. Zerubia,
Springer-Verlag, 2003