ECE 471571 Lecture 5

About This Presentation

Title:

ECE 471571 Lecture 5

Description:

The Curse of Dimensionality 1st Aspect. The number of training samples ... Curse of Dimensionality 2nd Aspect. Accuracy and overfitting ... – PowerPoint PPT presentation

Number of Views:15

Avg rating:3.0/5.0

Slides: 18

Provided by: QI1

Category:

more less

Transcript and Presenter's Notes

Title: ECE 471571 Lecture 5

1
ECE 471/571 - Lecture 5

Dimensionality Reduction
01/29/09

2
The Curse of Dimensionality 1st Aspect

The number of training samples
What would the probability density function look
like if the dimensionality is very high?
For a 7-dimensional space, where each variable
could have 20 possible values, then the 7-d
histogram contains 207 cells. To distributed a
training set of some reasonable size (1000) among
this many cells is to leave virtually all the
cells empty

3
Curse of Dimensionality 2nd Aspect

Accuracy and overfitting
In theory, the higher the dimensionality, the
less the error, the better the performance.
However, in realistic PR problems, the opposite
is often true. Why?
The assumption that pdf behaves like Gaussian is
only approximately true
When increasing the dimensionality, we may be
overfitting the training set.
Problem excellent performance on the training
set, poor performance on new data points which
are in fact very close to the data within the
training set

4
Curse of Dimensionality - 3rd Aspect

Computational complexity

5
Dimensionality Reduction

Fishers linear discriminant
Best discriminating the data
Principal component analysis (PCA)
Best representing the data

6
Fishers Linear Discriminant

For two-class cases, projection of data from
d-dimension onto a line
Principle Wed like to find vector w (direction
of the line) such that the projected data set can
be best separated

Projected mean Sample mean
7
Other Approaches?

Solution 1 make the projected mean as apart as
possible
Solution 2?

Scatter matrix
Between-class scatter matrix
Within-class scatter matrix
8
The Generalized Rayleigh Quotient
Canonical variate
9
Some Math Preliminaries

Positive definite
A matrix S is positive definite if yxTSxgt0 for
all Rd except 0
xTSx is called the quadratic form
The derivative of a quadratic form is
particularly useful
Eigenvalue and eigenvector
x is called the eigenvector of A iff x is not
zero, and Axlx
l is the eigenvalue of x

10
Multiple Discriminant Analysis

For c-class problem, the projection is from
d-dimensional space to a (c-1)-dimensional space
(assume d gt c)
Sec. 3.8.3

11
Principal Component Analysis or K-L Transform

How to find a new feature space (m-dimensional)
that is adequate to describe the original feature
space (d-dimensional). Suppose mltd

x2
y1
y2
x1
12
K-L Transform (1)

Describe vector x in terms of a set of basis
vectors bi.
The basis vectors (bi) should be linearly
independent and orthonormal, that is,

13
K-L Transform (2)

Suppose we wish to ignore all but m (mltd)
components of y and still represent x, although
with some error. We will thus calculate the first
m elements of y and replace the others with
constants

Error
14
K-L Transform (3)

Use mean-square error to quantify the error

15
K-L Transform (4)

Find the optimal ai to minimize e2
Therefore, the error is now equal to

16
K-L Transform (5)

The optimal choice of basis vectors is the
eigenvectors of Sx
The expansion of a random vector in terms of the
eigenvectors of the covariance matrix is referred
to as the Karhunen-Loeve expansion, or the K-L
expansion
Without loss of generality, we will sort the
eigenvectors bi in terms of their eigenvalues.
That is l1 gt l2 gt gt ld. Then we refer to b1,
corresponding to l1, as the major eigenvector,
or principal component