Dimensionality Reduction - PowerPoint PPT Presentation

About This Presentation
Title:

Dimensionality Reduction

Description:

Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu – PowerPoint PPT presentation

Number of Views:192
Avg rating:3.0/5.0
Slides: 47
Provided by: Jinz151
Category:

less

Transcript and Presenter's Notes

Title: Dimensionality Reduction


1
Dimensionality Reduction
  • CS 685 Special Topics in Data Mining
  • Jinze Liu

2
Overview
  • What is Dimensionality Reduction?
  • Simplifying complex data
  • Using dimensionality reduction as a Data Mining
    tool
  • Useful for both data modeling and data
    analysis
  • Tool for clustering and regression
  • Linear Dimensionality Reduction Methods
  • Principle Component Analysis (PCA)
  • Multi-Dimensional Scaling (MDS)
  • Non-Linear Dimensionality Reduction

3
What is Dimensionality Reduction?
  • Given N objects, each with M measurements, find
    the best D-dimensional parameterization
  • Goal Find a compact parameterization or
    Latent Variable representation
  • Given N examples of find
    where
  • Underlying assumptions to DimRedux
  • Measurements over-specify data, M gt D
  • The number of measurements exceed the number of
    true degrees of freedom in the system
  • The measurements capture all of the significant
    variability

4
Uses for DimRedux
  • Build a compact model of the data
  • Compression for storage, transmission,
    retrieval
  • Parameters for indexing, exploring, and
    organizing
  • Generate plausible new data
  • Answer fundamental questions about data
  • What is its underlying dimensionality?How many
    degrees of freedom are exhibited?How many
    latent variables?
  • How independent are my measurements?
  • Is there a projection of my data set where
    important relationships stand out?

5
DimRedux in Data Modeling
  • Data Clustering - Continuous to Discrete
  • The curse of dimensionality the sampling density
    is proportional to N1/p.
  • Need a mapping to a lower-dimensional space that
    preserves important relations
  • Regression Modeling Continuous to Continuous
  • A functional model that generates input data
  • Useful for interpolation
  • Embedding Space

6
Todays Focus
  • Linear DimRedux methods
  • PCA Pearson (1901) Hotelling (1935)
  • MDS Torgerson (1952), Shepard (1962)
  • Linear Assumption
  • Data is a linear function of the parameters
    (latent variables)
  • Data lies on a linear (Affine) subspace

where the matrix M is m x d
7
PCA What problem does it solve?
  • Minimizes least-squares (Euclidean) error
  • The D-dimensional model provided by PCA has the
    smallest Euclidean error of any D-parameter
    linear model.
  • where is the model predicted by the
    D-dimensional PCA.
  • Projects data s.t. the variance is maximized
  • Find an optimal orthogonal basis set for
    describing the given data

8
Principle Component Analysis
  • Also known to engineers as the Karhunen-Loéve
    Transform (KLT)
  • Rotate data points to align successive axes with
    directions of greatest variance
  • Subtract mean from data
  • Normalize variance along each direction, and
    reorder according to the variance magnitude from
    high to low
  • Normalized variance direction principle
    component
  • Eigenvectors of systems Covariance
    Matrixpermute to order eigenvectors in
    descending order

9
Simple PCA Example
  • Simple 3D example
  • gtgt x rand(2, 500)
  • gtgt z 1,0 0,1 -1,-1 x 001 ones(1,
    500)
  • gtgt m (100 rand(3,3)) z rand(3, 500)
  • gtgt scatter3(m(1,), m(2,), m(3,), 'filled')

10
Simple PCA Example (cont)
  • gtgt mm (m- mean(m')' ones(1, 500))
  • gtgt E,L eig(cov(mm ))
  • gtgt E
  • E
  • 0.8029 -0.5958 0.0212
  • 0.1629 0.2535 0.9535
  • 0.5735 0.7621 -0.3006
  • gtgt L
  • L
  • 172.2525 0 0
  • 0 116.2234 0
  • 0 0 0.0837
  • gtgt newm E (m - mean(m)' ones(1, 500))
  • gtgt scatter3(newm(1,), newm(2,), newm(3,),
    'filled')
  • axis(-50,50, -50,50, -50,50)

11
Simple PCA Example (cont)
12
PCA Applied to Reillumination
  • Illumination can be modeled as an additive
    linear system.

13
Simulating New Lighting
  • We can simulate the appearance of a model under
    new illumination by combining images taken from a
    set of basis lights
  • We can then capture real-world lighting and use
    it to modulate our basis lighting functions

14
Problems
  • There are too many basis lighting functions
  • These have to be stored in order to use them
  • The resulting lighting model can be huge, in
    particular when representing high frequency
    lighting
  • Lighting differences can be very subtle
  • The cost of modulation is excessive
  • Every basis image must be scaled and added
    together
  • Each image requires a high-dynamic range
  • Is there a more compact representation?
  • Yes, use PCA.

15
PCA Applied to Illumination
  • More than 90 variance is captured in the first
    five principle components
  • Generate new illumination by combining only 5
    basis images

V0
for n lights
16
Results Video
17
Results Video
18
Results Video
19
MDS What problem does it solve?
  • Takes as input a dissimilarity matrix M,
    containing pairwise dissimilarities between
    N-dimensional data points
  • Finds the best D-dimensional linear
    parameterization compatible with M
  • (in other words, outputs a projection of data in
    D-dimensional space where the pairwise distances
    match the original dissimilarities as faithfully
    as possible)

20
Multidimensional Scaling (MDS)
  • Dissimilarities can be metric or non-metric
  • Useful when absolute measurements are
    unavailable uses relative measurements
  • Computation is invariant to dimensionality of
    data

21
An example map of the US
  • Given only the distance between a bunch of cities

22
An example map of the US
  • MDS finds suitable coordinates for the points of
    the specified dimension.

23
MDS Properties
  • Parameterization is not unique Axes are
    meaningless
  • Not surprising since Euclidean transformations
    and reflections preserve distances between points
  • Useful for visualizing relationships in high
    dimensional data.
  • Define a dissimilarity measure
  • Map to a lower-dimensional space using MDS
  • Common preprocess before cluster analysis
  • Aids in understanding patterns and relationships
    in data
  • Widely used in marketing and psychometrics

24
Dissimilarities
  • Dissimilarities are distance-like quantities that
    satisfy the following conditions
  • A dissimilarity is metric if, in addition, it
    satisfies
  • The triangle inequality

25
Relating MDS to PCA
  • Special case when distances are Euclidean
  • PCA eigendecomposition of covariance matrix MTM
  • Convert the pair-wise distance matrix to the
    covariance matrix

26
How to get MTM from Euclidean Pair-wise Distances
i
Law of cosines
j
Definition of a dot product
  • Eigendecomposition on b to get VSVT
  • VS1/2 matrix of new coordinates

27
Algebraically
So we centered the matrix
28
MDS Mechanics
  • Given a Dissimilarity matrix, D, the MDS model is
    computed as follows
  • Where, H, the so called centering matrix, is a
    scaled identity matrix computed as follows
  • MDS coordinates given by (in order of decreasing

29
MDS Stress
  • The residual variance of B (i.e. the sum of the
    remaining eigenvalues) indicate the goodness of
    fit for the selected d-dimensional model
  • This term is often called MDS stress
  • Examining the residual variance gives an
    indication of the inherent dimensionality

30
Reflectance Modeling Example
The top row of white, grey, and black balls have
the same physical reflectance parameters,
however, the bottom row is perceptually more
consistent.
  • From Pellacini, et. al. Toward a
    Psychophysically-Based Light Reflection Model for
    Image Synthesis, SIGGRAPH 2000
  • Objective Find a perceptually meaningful
    parameterization for reflectance modeling

31
Reflectance Modeling Example
  • User Task Subjects were presented with 378
    pairs of rendered spheres an asked to rate their
    difference in glossiness on a scale of 0 (no
    difference) to 100.
  • A dissimilarity 27 x 27 dissimilarity matrix was
    constructed and MDS applied

32
Reflectance Modeling Example
  • Parameters of a 2D embedding space were
    determined
  • Two axes of gloss were established

33
Limitations of Linear methods
  • What if the data does not lie within a linear
    subspace?
  • Do all convex combinations of the measurements
    generate plausible data?
  • Low-dimensional non-linear Manifold embedded in a
    higher dimensional space
  • Next time Nonlinear Dimensionality Reduction

34
Nonlinear Dimensionality Reduction
  • Many data sets contain essential nonlinear
    structures that invisible to PCA and MDS
  • Resorts to some nonlinear dimensionality
    reduction approaches.
  • Kernel methods
  • Depend on the kernels
  • Most kernels are not data dependent

35
Nonlinear Approaches- Isomap
Josh. Tenenbaum, Vin de Silva, John langford 2000
  • Constructing neighbourhood graph G
  • For each pair of points in G, Computing shortest
    path distances ---- geodesic distances.
  • Use Classical MDS with geodesic distances.
  • Euclidean distance? Geodesic
    distance

36
Sample points with Swiss Roll
  • Altogether there are 20,000 points in the Swiss
    roll data set. We sample 1000 out of 20,000.

37
Construct neighborhood graph G
  • K- nearest neighborhood (K7)
  • DG is 1000 by 1000 (Euclidean) distance matrix of
    two neighbors (figure A)

38
Compute all-points shortest path in G
  • Now DG is 1000 by 1000 geodesic distance matrix
    of two arbitrary points along the manifold
    (figure B)

39
Use MDS to embed graph in Rd
Find a d-dimensional Euclidean space Y (Figure c)
to preserve the pariwise diatances.
40
The Isomap algorithm
41
PCA, MD vs ISOMAP
42
Isomap Advantages
  • Nonlinear
  • Globally optimal
  • Still produces globally optimal low-dimensional
    Euclidean representation even though input space
    is highly folded, twisted, or curved.
  • Guarantee asymptotically to recover the true
    dimensionality.

43
Isomap Disadvantages
  • May not be stable, dependent on topology of data
  • Guaranteed asymptotically to recover geometric
    structure of nonlinear manifolds
  • As N increases, pairwise distances provide better
    approximations to geodesics, but cost more
    computation
  • If N is small, geodesic distances will be very
    inaccurate.

44
Applications
  • Isomap and Nonparametric Models of Image
    Deformation
  • LLE and Isomap Analysis of Spectra and Colour
    Images
  • Image Spaces and Video Trajectories Using Isomap
    to Explore Video Sequences
  • Mining the structural knowledge of
    high-dimensional medical data using isomap

Isomap Webpage http//isomap.stanford.edu/
45
Summary
  • Linear dimensionality reduction tools are widely
    used for
  • Data analysis
  • Data preprocessing
  • Data compression
  • PCA transforms the measurement data s. t.
    successive directions of greatest variance are
    mapped to orthogonal axis directions (bases)
  • An D-dimensional embedding space
    (parameterization) can be established by modeling
    the data using only the first d of these basis
    vectors
  • Residual modeling error is the sum of the
    remaining eigenvalues

46
Summary (cont)
  • MDS finds a d-dimensional parameterization that
    best preserves a given dissimilarity matrix
  • Resulting model can be Euclidean transformed to
    align data with a more intuitive parameterization
  • An D-dimensional embedding spaces
    (parameterization) are established by modeling
    the data using only the first d coordinates of
    the scaled eigenvectors
  • Residual modeling error (MDS stress) is the sum
    of the remaining eigenvalues
  • If Euclidean metric dissimilarity matrix is used
    for MDS the resulting d-dimensional model will
    match the PCA weights for the same dimensional
    model
Write a Comment
User Comments (0)
About PowerShow.com