Title: Dimensionality Reduction
1Dimensionality Reduction
- CS 685 Special Topics in Data Mining
- Jinze Liu
2Overview
- What is Dimensionality Reduction?
- Simplifying complex data
- Using dimensionality reduction as a Data Mining
tool - Useful for both data modeling and data
analysis - Tool for clustering and regression
- Linear Dimensionality Reduction Methods
- Principle Component Analysis (PCA)
- Multi-Dimensional Scaling (MDS)
- Non-Linear Dimensionality Reduction
3What is Dimensionality Reduction?
- Given N objects, each with M measurements, find
the best D-dimensional parameterization - Goal Find a compact parameterization or
Latent Variable representation - Given N examples of find
where -
- Underlying assumptions to DimRedux
- Measurements over-specify data, M gt D
- The number of measurements exceed the number of
true degrees of freedom in the system - The measurements capture all of the significant
variability
4Uses for DimRedux
- Build a compact model of the data
- Compression for storage, transmission,
retrieval - Parameters for indexing, exploring, and
organizing - Generate plausible new data
- Answer fundamental questions about data
- What is its underlying dimensionality?How many
degrees of freedom are exhibited?How many
latent variables? - How independent are my measurements?
- Is there a projection of my data set where
important relationships stand out?
5DimRedux in Data Modeling
- Data Clustering - Continuous to Discrete
- The curse of dimensionality the sampling density
is proportional to N1/p. - Need a mapping to a lower-dimensional space that
preserves important relations - Regression Modeling Continuous to Continuous
- A functional model that generates input data
- Useful for interpolation
- Embedding Space
6Todays Focus
- Linear DimRedux methods
- PCA Pearson (1901) Hotelling (1935)
- MDS Torgerson (1952), Shepard (1962)
- Linear Assumption
- Data is a linear function of the parameters
(latent variables) - Data lies on a linear (Affine) subspace
where the matrix M is m x d
7PCA What problem does it solve?
- Minimizes least-squares (Euclidean) error
- The D-dimensional model provided by PCA has the
smallest Euclidean error of any D-parameter
linear model. - where is the model predicted by the
D-dimensional PCA. - Projects data s.t. the variance is maximized
- Find an optimal orthogonal basis set for
describing the given data
8Principle Component Analysis
- Also known to engineers as the Karhunen-Loéve
Transform (KLT) - Rotate data points to align successive axes with
directions of greatest variance - Subtract mean from data
- Normalize variance along each direction, and
reorder according to the variance magnitude from
high to low - Normalized variance direction principle
component - Eigenvectors of systems Covariance
Matrixpermute to order eigenvectors in
descending order
9Simple PCA Example
- Simple 3D example
- gtgt x rand(2, 500)
- gtgt z 1,0 0,1 -1,-1 x 001 ones(1,
500) - gtgt m (100 rand(3,3)) z rand(3, 500)
- gtgt scatter3(m(1,), m(2,), m(3,), 'filled')
10Simple PCA Example (cont)
- gtgt mm (m- mean(m')' ones(1, 500))
- gtgt E,L eig(cov(mm ))
- gtgt E
- E
- 0.8029 -0.5958 0.0212
- 0.1629 0.2535 0.9535
- 0.5735 0.7621 -0.3006
- gtgt L
- L
- 172.2525 0 0
- 0 116.2234 0
- 0 0 0.0837
- gtgt newm E (m - mean(m)' ones(1, 500))
- gtgt scatter3(newm(1,), newm(2,), newm(3,),
'filled') - axis(-50,50, -50,50, -50,50)
11Simple PCA Example (cont)
12PCA Applied to Reillumination
- Illumination can be modeled as an additive
linear system.
13Simulating New Lighting
- We can simulate the appearance of a model under
new illumination by combining images taken from a
set of basis lights - We can then capture real-world lighting and use
it to modulate our basis lighting functions
14Problems
- There are too many basis lighting functions
- These have to be stored in order to use them
- The resulting lighting model can be huge, in
particular when representing high frequency
lighting - Lighting differences can be very subtle
- The cost of modulation is excessive
- Every basis image must be scaled and added
together - Each image requires a high-dynamic range
- Is there a more compact representation?
- Yes, use PCA.
15PCA Applied to Illumination
- More than 90 variance is captured in the first
five principle components - Generate new illumination by combining only 5
basis images
V0
for n lights
16Results Video
17Results Video
18Results Video
19MDS What problem does it solve?
- Takes as input a dissimilarity matrix M,
containing pairwise dissimilarities between
N-dimensional data points - Finds the best D-dimensional linear
parameterization compatible with M - (in other words, outputs a projection of data in
D-dimensional space where the pairwise distances
match the original dissimilarities as faithfully
as possible)
20Multidimensional Scaling (MDS)
- Dissimilarities can be metric or non-metric
- Useful when absolute measurements are
unavailable uses relative measurements - Computation is invariant to dimensionality of
data
21An example map of the US
- Given only the distance between a bunch of cities
22An example map of the US
- MDS finds suitable coordinates for the points of
the specified dimension.
23MDS Properties
- Parameterization is not unique Axes are
meaningless - Not surprising since Euclidean transformations
and reflections preserve distances between points - Useful for visualizing relationships in high
dimensional data. - Define a dissimilarity measure
- Map to a lower-dimensional space using MDS
- Common preprocess before cluster analysis
- Aids in understanding patterns and relationships
in data - Widely used in marketing and psychometrics
24Dissimilarities
- Dissimilarities are distance-like quantities that
satisfy the following conditions - A dissimilarity is metric if, in addition, it
satisfies - The triangle inequality
25Relating MDS to PCA
- Special case when distances are Euclidean
- PCA eigendecomposition of covariance matrix MTM
- Convert the pair-wise distance matrix to the
covariance matrix
26How to get MTM from Euclidean Pair-wise Distances
i
Law of cosines
j
Definition of a dot product
- Eigendecomposition on b to get VSVT
- VS1/2 matrix of new coordinates
27Algebraically
So we centered the matrix
28MDS Mechanics
- Given a Dissimilarity matrix, D, the MDS model is
computed as follows - Where, H, the so called centering matrix, is a
scaled identity matrix computed as follows - MDS coordinates given by (in order of decreasing
29MDS Stress
- The residual variance of B (i.e. the sum of the
remaining eigenvalues) indicate the goodness of
fit for the selected d-dimensional model - This term is often called MDS stress
- Examining the residual variance gives an
indication of the inherent dimensionality
30Reflectance Modeling Example
The top row of white, grey, and black balls have
the same physical reflectance parameters,
however, the bottom row is perceptually more
consistent.
- From Pellacini, et. al. Toward a
Psychophysically-Based Light Reflection Model for
Image Synthesis, SIGGRAPH 2000 - Objective Find a perceptually meaningful
parameterization for reflectance modeling
31Reflectance Modeling Example
- User Task Subjects were presented with 378
pairs of rendered spheres an asked to rate their
difference in glossiness on a scale of 0 (no
difference) to 100. - A dissimilarity 27 x 27 dissimilarity matrix was
constructed and MDS applied
32Reflectance Modeling Example
- Parameters of a 2D embedding space were
determined - Two axes of gloss were established
33Limitations of Linear methods
- What if the data does not lie within a linear
subspace? - Do all convex combinations of the measurements
generate plausible data? - Low-dimensional non-linear Manifold embedded in a
higher dimensional space - Next time Nonlinear Dimensionality Reduction
34Nonlinear Dimensionality Reduction
- Many data sets contain essential nonlinear
structures that invisible to PCA and MDS - Resorts to some nonlinear dimensionality
reduction approaches. - Kernel methods
- Depend on the kernels
- Most kernels are not data dependent
35Nonlinear Approaches- Isomap
Josh. Tenenbaum, Vin de Silva, John langford 2000
- Constructing neighbourhood graph G
- For each pair of points in G, Computing shortest
path distances ---- geodesic distances. - Use Classical MDS with geodesic distances.
- Euclidean distance? Geodesic
distance
36Sample points with Swiss Roll
- Altogether there are 20,000 points in the Swiss
roll data set. We sample 1000 out of 20,000.
37Construct neighborhood graph G
- K- nearest neighborhood (K7)
- DG is 1000 by 1000 (Euclidean) distance matrix of
two neighbors (figure A)
38Compute all-points shortest path in G
- Now DG is 1000 by 1000 geodesic distance matrix
of two arbitrary points along the manifold
(figure B)
39Use MDS to embed graph in Rd
Find a d-dimensional Euclidean space Y (Figure c)
to preserve the pariwise diatances.
40The Isomap algorithm
41PCA, MD vs ISOMAP
42Isomap Advantages
- Nonlinear
- Globally optimal
- Still produces globally optimal low-dimensional
Euclidean representation even though input space
is highly folded, twisted, or curved. - Guarantee asymptotically to recover the true
dimensionality.
43Isomap Disadvantages
- May not be stable, dependent on topology of data
- Guaranteed asymptotically to recover geometric
structure of nonlinear manifolds - As N increases, pairwise distances provide better
approximations to geodesics, but cost more
computation - If N is small, geodesic distances will be very
inaccurate.
44Applications
- Isomap and Nonparametric Models of Image
Deformation - LLE and Isomap Analysis of Spectra and Colour
Images - Image Spaces and Video Trajectories Using Isomap
to Explore Video Sequences - Mining the structural knowledge of
high-dimensional medical data using isomap
Isomap Webpage http//isomap.stanford.edu/
45Summary
- Linear dimensionality reduction tools are widely
used for - Data analysis
- Data preprocessing
- Data compression
- PCA transforms the measurement data s. t.
successive directions of greatest variance are
mapped to orthogonal axis directions (bases) - An D-dimensional embedding space
(parameterization) can be established by modeling
the data using only the first d of these basis
vectors - Residual modeling error is the sum of the
remaining eigenvalues
46Summary (cont)
- MDS finds a d-dimensional parameterization that
best preserves a given dissimilarity matrix - Resulting model can be Euclidean transformed to
align data with a more intuitive parameterization - An D-dimensional embedding spaces
(parameterization) are established by modeling
the data using only the first d coordinates of
the scaled eigenvectors - Residual modeling error (MDS stress) is the sum
of the remaining eigenvalues - If Euclidean metric dissimilarity matrix is used
for MDS the resulting d-dimensional model will
match the PCA weights for the same dimensional
model