Dimensionality Reduction - PowerPoint PPT Presentation

About This Presentation

Title:

Dimensionality Reduction

Description:

Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu – PowerPoint PPT presentation

Number of Views:196

Avg rating:3.0/5.0

Slides: 47

Provided by: Jinz151

Learn more at: http://protocols.netlab.uky.edu

Category:

more less

Transcript and Presenter's Notes

Title: Dimensionality Reduction

1
Dimensionality Reduction

CS 685 Special Topics in Data Mining
Jinze Liu

2
Overview

What is Dimensionality Reduction?
Simplifying complex data
Using dimensionality reduction as a Data Mining
tool
Useful for both data modeling and data
analysis
Tool for clustering and regression
Linear Dimensionality Reduction Methods
Principle Component Analysis (PCA)
Multi-Dimensional Scaling (MDS)
Non-Linear Dimensionality Reduction

3
What is Dimensionality Reduction?

Given N objects, each with M measurements, find
the best D-dimensional parameterization
Goal Find a compact parameterization or
Latent Variable representation
Given N examples of find
where
Underlying assumptions to DimRedux
Measurements over-specify data, M gt D
The number of measurements exceed the number of
true degrees of freedom in the system
The measurements capture all of the significant
variability

4
Uses for DimRedux

Build a compact model of the data
Compression for storage, transmission,
retrieval
Parameters for indexing, exploring, and
organizing
Generate plausible new data
Answer fundamental questions about data
What is its underlying dimensionality?How many
degrees of freedom are exhibited?How many
latent variables?
How independent are my measurements?
Is there a projection of my data set where
important relationships stand out?

5
DimRedux in Data Modeling

Data Clustering - Continuous to Discrete
The curse of dimensionality the sampling density
is proportional to N1/p.
Need a mapping to a lower-dimensional space that
preserves important relations
Regression Modeling Continuous to Continuous
A functional model that generates input data
Useful for interpolation
Embedding Space

6
Todays Focus

Linear DimRedux methods
PCA Pearson (1901) Hotelling (1935)
MDS Torgerson (1952), Shepard (1962)
Linear Assumption
Data is a linear function of the parameters
(latent variables)
Data lies on a linear (Affine) subspace

where the matrix M is m x d
7
PCA What problem does it solve?

Minimizes least-squares (Euclidean) error
The D-dimensional model provided by PCA has the
smallest Euclidean error of any D-parameter
linear model.
where is the model predicted by the
D-dimensional PCA.
Projects data s.t. the variance is maximized
Find an optimal orthogonal basis set for
describing the given data

8
Principle Component Analysis

Also known to engineers as the Karhunen-Loéve
Transform (KLT)
Rotate data points to align successive axes with
directions of greatest variance
Subtract mean from data
Normalize variance along each direction, and
reorder according to the variance magnitude from
high to low
Normalized variance direction principle
component
Eigenvectors of systems Covariance
Matrixpermute to order eigenvectors in
descending order

9
Simple PCA Example

Simple 3D example
gtgt x rand(2, 500)
gtgt z 1,0 0,1 -1,-1 x 001 ones(1,
500)
gtgt m (100 rand(3,3)) z rand(3, 500)
gtgt scatter3(m(1,), m(2,), m(3,), 'filled')

10
Simple PCA Example (cont)

gtgt mm (m- mean(m')' ones(1, 500))
gtgt E,L eig(cov(mm ))
gtgt E
E
0.8029 -0.5958 0.0212
0.1629 0.2535 0.9535
0.5735 0.7621 -0.3006
gtgt L
L
172.2525 0 0
0 116.2234 0
0 0 0.0837
gtgt newm E (m - mean(m)' ones(1, 500))
gtgt scatter3(newm(1,), newm(2,), newm(3,),
'filled')
axis(-50,50, -50,50, -50,50)

11
Simple PCA Example (cont)
12
PCA Applied to Reillumination

Illumination can be modeled as an additive
linear system.

13
Simulating New Lighting

We can simulate the appearance of a model under
new illumination by combining images taken from a
set of basis lights
We can then capture real-world lighting and use
it to modulate our basis lighting functions

14
Problems

There are too many basis lighting functions
These have to be stored in order to use them
The resulting lighting model can be huge, in
particular when representing high frequency
lighting
Lighting differences can be very subtle
The cost of modulation is excessive
Every basis image must be scaled and added
together
Each image requires a high-dynamic range
Is there a more compact representation?
Yes, use PCA.

15
PCA Applied to Illumination

More than 90 variance is captured in the first
five principle components
Generate new illumination by combining only 5
basis images

V0
for n lights
16
Results Video
17
Results Video
18
Results Video
19
MDS What problem does it solve?

Takes as input a dissimilarity matrix M,
containing pairwise dissimilarities between
N-dimensional data points
Finds the best D-dimensional linear
parameterization compatible with M
(in other words, outputs a projection of data in
D-dimensional space where the pairwise distances
match the original dissimilarities as faithfully
as possible)

20
Multidimensional Scaling (MDS)

Dissimilarities can be metric or non-metric
Useful when absolute measurements are
unavailable uses relative measurements
Computation is invariant to dimensionality of
data

21
An example map of the US

Given only the distance between a bunch of cities

22
An example map of the US

MDS finds suitable coordinates for the points of
the specified dimension.

23
MDS Properties

Parameterization is not unique Axes are
meaningless
Not surprising since Euclidean transformations
and reflections preserve distances between points
Useful for visualizing relationships in high
dimensional data.
Define a dissimilarity measure
Map to a lower-dimensional space using MDS
Common preprocess before cluster analysis
Aids in understanding patterns and relationships
in data
Widely used in marketing and psychometrics

24
Dissimilarities

Dissimilarities are distance-like quantities that
satisfy the following conditions
A dissimilarity is metric if, in addition, it
satisfies
The triangle inequality

25
Relating MDS to PCA

Special case when distances are Euclidean
PCA eigendecomposition of covariance matrix MTM
Convert the pair-wise distance matrix to the
covariance matrix

26
How to get MTM from Euclidean Pair-wise Distances
i
Law of cosines
j
Definition of a dot product

Eigendecomposition on b to get VSVT
VS1/2 matrix of new coordinates

27
Algebraically
So we centered the matrix
28
MDS Mechanics

Given a Dissimilarity matrix, D, the MDS model is
computed as follows
Where, H, the so called centering matrix, is a
scaled identity matrix computed as follows
MDS coordinates given by (in order of decreasing

29
MDS Stress

The residual variance of B (i.e. the sum of the
remaining eigenvalues) indicate the goodness of
fit for the selected d-dimensional model
This term is often called MDS stress
Examining the residual variance gives an
indication of the inherent dimensionality

30
Reflectance Modeling Example
The top row of white, grey, and black balls have
the same physical reflectance parameters,
however, the bottom row is perceptually more
consistent.

From Pellacini, et. al. Toward a
Psychophysically-Based Light Reflection Model for
Image Synthesis, SIGGRAPH 2000
Objective Find a perceptually meaningful
parameterization for reflectance modeling

31
Reflectance Modeling Example

User Task Subjects were presented with 378
pairs of rendered spheres an asked to rate their
difference in glossiness on a scale of 0 (no
difference) to 100.
A dissimilarity 27 x 27 dissimilarity matrix was
constructed and MDS applied

32
Reflectance Modeling Example

Parameters of a 2D embedding space were
determined
Two axes of gloss were established

33
Limitations of Linear methods

What if the data does not lie within a linear
subspace?
Do all convex combinations of the measurements
generate plausible data?
Low-dimensional non-linear Manifold embedded in a
higher dimensional space
Next time Nonlinear Dimensionality Reduction

34
Nonlinear Dimensionality Reduction

Many data sets contain essential nonlinear
structures that invisible to PCA and MDS
Resorts to some nonlinear dimensionality
reduction approaches.
Kernel methods
Depend on the kernels
Most kernels are not data dependent

35
Nonlinear Approaches- Isomap
Josh. Tenenbaum, Vin de Silva, John langford 2000

Constructing neighbourhood graph G
For each pair of points in G, Computing shortest
path distances ---- geodesic distances.
Use Classical MDS with geodesic distances.
Euclidean distance? Geodesic
distance

36
Sample points with Swiss Roll

Altogether there are 20,000 points in the Swiss
roll data set. We sample 1000 out of 20,000.

37
Construct neighborhood graph G

K- nearest neighborhood (K7)
DG is 1000 by 1000 (Euclidean) distance matrix of
two neighbors (figure A)

38
Compute all-points shortest path in G

Now DG is 1000 by 1000 geodesic distance matrix
of two arbitrary points along the manifold
(figure B)

39
Use MDS to embed graph in Rd
Find a d-dimensional Euclidean space Y (Figure c)
to preserve the pariwise diatances.
40
The Isomap algorithm
41
PCA, MD vs ISOMAP
42
Isomap Advantages

Nonlinear
Globally optimal
Still produces globally optimal low-dimensional
Euclidean representation even though input space
is highly folded, twisted, or curved.
Guarantee asymptotically to recover the true
dimensionality.

43
Isomap Disadvantages

May not be stable, dependent on topology of data
Guaranteed asymptotically to recover geometric
structure of nonlinear manifolds
As N increases, pairwise distances provide better
approximations to geodesics, but cost more
computation
If N is small, geodesic distances will be very
inaccurate.

44
Applications

Isomap and Nonparametric Models of Image
Deformation
LLE and Isomap Analysis of Spectra and Colour
Images
Image Spaces and Video Trajectories Using Isomap
to Explore Video Sequences
Mining the structural knowledge of
high-dimensional medical data using isomap

Isomap Webpage http//isomap.stanford.edu/
45
Summary

Linear dimensionality reduction tools are widely
used for
Data analysis
Data preprocessing
Data compression
PCA transforms the measurement data s. t.
successive directions of greatest variance are
mapped to orthogonal axis directions (bases)
An D-dimensional embedding space
(parameterization) can be established by modeling
the data using only the first d of these basis
vectors
Residual modeling error is the sum of the
remaining eigenvalues

46
Summary (cont)

MDS finds a d-dimensional parameterization that
best preserves a given dissimilarity matrix
Resulting model can be Euclidean transformed to
align data with a more intuitive parameterization
An D-dimensional embedding spaces
(parameterization) are established by modeling
the data using only the first d coordinates of
the scaled eigenvectors
Residual modeling error (MDS stress) is the sum
of the remaining eigenvalues
If Euclidean metric dissimilarity matrix is used
for MDS the resulting d-dimensional model will
match the PCA weights for the same dimensional
model