Matrix Decompositions in Dimension Reduction for Undersampled Clustered Data

1 / 24

About This Presentation

Title:

Matrix Decompositions in Dimension Reduction for Undersampled Clustered Data

Description:

... in pose, facial expression. Severely Undersampled ... Facial Recognition. Original images/data. Form Data Matrix. Lower Dimensional Representation ... –

Number of Views:137

Avg rating:3.0/5.0

Slides: 25

Provided by: cheongh

Category:

more less

Transcript and Presenter's Notes

Title: Matrix Decompositions in Dimension Reduction for Undersampled Clustered Data

1
Matrix Decompositions in Dimension Reduction
for Undersampled Clustered Data

Haesun Park
8803 Num
Numerical Methods in CSE

2
Cluster Structure Preserving Dimension Reduction
for Feature Extraction

Algorithms
LDA/GSVD, Orthogonal Centroid,
Extension to Kernel-based Nonlinear
method,
Applications
Text classification, Face
recognition, Fingerprint
classification
Experimental Results Classification with
kNN, SVM, Centroid-based
classification

3
2D VisualizationImportance of utilizing cluster
structure
2D representation of 150x1000 data with 7
clusters LDA vs. SVD
4
Facial Recognition
ATT (ORL ) Face Database

image size 92 x 112
400 frontal images 40 person 1o images each
variations in pose, facial expression
Severely Undersampled

The 1st sample
The 35th sample
5
Dimension Reduction
Original images/data
r
i
1
2
Data preprocessing
Form Data Matrix
m x n
m x 1
Dimension Reducing Transformation
r
1
2
i
q x 1
q x n
Classification
Lower Dimensional Representation
Want Dimension reducing transformation that can
be effectively applied across many
application areas
6
Measure for Cluster Quality

A a1 ... an mxn, clustered data
Ni
items in class i, Ni ni total r
classesci average of data items in class i,
centroidc global average, global centroid

Within-class scatter matrix
Sw ?1 i r ? j?Ni (aj ci ) (aj
ci )T

(2) Between-class scatter matrix Sb
?1 i r ? j ?Ni (ci c) (ci c)T
(3)Total scatter matrix St ?1 i n
(ai c ) (ai c )T
7
Trace of Scatter Matrices
trace (Sw ) ?1 i r ? j ? Ni aj ci
2
2
trace (Sb ) ?1 i r ? j ? Ni ci - c 2
2
trace (St ) ?1 i r ? j ? Ni aj c 2
Dimension Reducing Transformation
2

Within-class scatter

Between-class scatter
8
Optimal Dimension Reducing Transformation
GT qxm
GTy qx1
y mx1

High quality clusters have
small trace(Sw) large
trace(Sb)
Want dimension reduction by GT

s.t. min trace(GT SwG) max trace(GT Sb G)
max trace ((GT SwG)-1 (GT Sb G)) ? LDA (Fisher
36, Rao 48)
max trace (GT Sb G) ? Orthogonal Centroid (Park
et al. 03)
max trace (GT (SwSb )G) ? PCA (Hotelling 33)
max trace (GT A AT G) ? LSI (Deerwester et al. 90)

GTGI
GTGI
GTGI
9
Classical LDA(Fisher 36, Rao 48)

max trace ((GT SwG)-1 (GT Sb G))
G leading (r-1) e.vectors of Sw-1Sb, SbxlSwx
Fails when mgtn (undersampled), Sw singular

SwHw HwT, Hwa1-c1, a2-c1, , an-cr mxn

SbHb HbT, Hb 1/ n1(c1 -c), , 1/ nr (cr -
c) mxr

10
LDA based on GSVD (LDA/GSVD) (Howland, Jeon, Park
03, SIMAX)

Works regardless of singularity of scatter
matrices
Sw-1Sb x l x ? d2Hb HbTx b2Hw HwTx
Columns of G are leading (r-1) generalized
singular vectors of HbT and HwT

UT HbT X
(Sb
0)
0
(Sw
0)
VT HwT X
0
11
Generalized Singular Value Decomposition(Van
Loan 76, Paige and Saunders 81)
XTSb X
XTSw X
d2Hb HbTx b2Hw HwTx , X X1 X2 X3 X4 where
12
Generalization of LDA for Undersampled Problems

Regularized LDA (Friedman 89, Zhao et al. 99
)
LDA/GSVD Solution G X1 X2 (Howland,
Jeon, Park 03)
Solutions based on Null(Sw ) and Range(Sb )
(Chen et al. 00, Yu Yang 01, Park
Park 03 )
Two-stage methods

Face Recognition PCA LDA (Swets Weng 96 ,
Zhao et al. 99 )
Information Retrieval LSI LDA (Torkkola 01)
Mathematical Equivalence (Howland and Park 03)

LSI LDA/GSVD LDA/GSVD PCALDA/GSVD
LDA/GSVD More efficient QRD LDA/GSVD
13
Orthogonal Centroid (OC) Algorithm (Park,
Jeon, Rosen 03, BIT)

Algorithm
1. Form Centroid matrix
Cc1 , , cr m x r
2. Compute QRD of C C Q R, Q m x r
Dimension reduction by QT to r dim. Space
y m x 1 QT y r x 1
Q solves max trace(GTSbG) trace(QTSbQ)
trace(Sb)
Need QRD of C m x r vs. EVD of Sb m x m
(or
SVD of Hb m x r)

GTGI
14
Text Classification on Medline Data
(Kim, Howland, Park 03, JMLR)
Classification accuracy (), 5
classes Similarity measures L2 norm and Cosine
15
Text Classification on Reuters Data
(Kim, Howland, Park 03, JMLR)
Classification accuracy (), 90
classes Similarity measures L2 norm and Cosine
16
Face Recognition on ATT Data

Orthogonal Centroid 88 96
LDA/GSVD 90 98

Orthogonal Centroid
LDA/GSVD
Query Image
Top choice
Second choice
Third choice
Classification Accuracy using centroid, kNN (1,
3, 5, 7) with L2 norm
Average of 100 runs, random split of training and
test data
17
Face Recognition on Yale Data
(C. Park and H. Park)
Dim. Red. Method Dim
kNN
k1 k5 k9 Full Space
8586 79.4 76.4
72.1 LDA/GSVD 14
98.8 (90) 98.8 98.8 Regularized LDA (l1)
14 97.6 (85) 97.6 97.6 Proj.
to null (Sw) 14 97.6 (84)
97.6 97.6 (Chen et al., 00) Transf. to
range(Sb) 14 89.7 (82) 94.6
91.5 (Yu Yang, 01)
Prediction Accuracy in , leave-one-out (average
of 100 random split) Yale Face Database 243 x
320 pixels full dimension of 77760
11 images/person x 15 people
165 images After Preprocessing (avg 3x3) 8586 x
165
18
Nonlinear Dimension Reduction by Kernel Functions

Ex. Feature mapping F

x12 2 x1x2 x22
x1 x2
x
F(x)
,
k (x, y) lt F(x), F(y) gt lt x, y gt2
(a polynomial kernel function)
F
19
Nonlinear Dimension Reduction by Kernel Functions

If k(x,y) satisfies Mercers condition, then
there is a mapping F to an inner product space,
k(x,y) lt F(x), F(y) gt

F
A lt x, y gt
(A)
F
k(x,y) lt F(x), F(y) gt
Mercers condition for Aa1,,an kernel
matrix K k(ai, aj) 1i, jn is
positive semi-definite.
Ex) RBF Kernel Function k(ai, aj) exp(-sai
aj 2)
20
Kernel Orthogonal Centroid (KOC)
(C. Park and H. Park, PR)

Apply OC in feature mapped space F(A)

Need QRD of Centroid matrix C in F(A),
but C is unknown

C 1/n1 S F(ai), , 1/nr S F(ai) QR
C 1/n1 S F(ai), , 1/nr S F(ai) QR
i?N1
i?Nr

CTC MT K M RTR, where K k(ai , aj)1lt
i, j ltn
z QT y R-T CT y
? RT z CT y

1/n1 S k(ai ,y)

i?N1
...
1/nrS k(ai ,y)
i?Nr
21
Experimental Results

Musk (from UCI) kNN OC KOC
Kernel PCA
dim 167 k1 87.2
95.7 87.8
of classes 2 15 88.5 96.0
89.2
of data 6599 29 88.5 96.1
88.5

OC KOC
Kernel PCA
..
(Scholkopf et al., 1999)
22
Fingerprint Classification
Left Loop Right Loop Whorl
Arch Tented Arch
Construction of Directional Images by DFT
1. Compute directionality in local neighborhood
by FFT 2. Compute the dominant direction 3. Find
core point for unified centering of fingerprints
within the same class

23
Fingerprint Classification Results on NIST
Fingerprint Database 4
(C. Park and H. Park 03)
KDA/GSVD Nonlinear Extension of LDA/GSVD
based on Kernel Functions
Rejection rate() 0 1.8
8.5 KDA/GSVD 90.7
91.3 92.8 kNN NN Jain et al., 99 -
90.0 91.2 NN SVM Yao et al., 03
- 90.0 92.2
4000 fingerprint images of size 512x512 By
KDA/GSVD, dimension reduced from 105x105 to 4
24
Support Vector Machine (SVM)
for binary classification (hard margin)
(Vapnik, Scholkopf, Burges)
Feature space H
Input space Rd
K(xi,xj )
SVM constructs the optimal separating hyperplane
which maximizes the margin between two
classes. For problems not linearly separable,
input data are mapped to feature space using a
kernel function K(xi,xj). Support vectors are
identified with an extra circle.

Write a Comment

User Comments (0)