Matrix Decompositions in Dimension Reduction for Undersampled Clustered Data

1 / 24
About This Presentation
Title:

Matrix Decompositions in Dimension Reduction for Undersampled Clustered Data

Description:

... in pose, facial expression. Severely Undersampled ... Facial Recognition. Original images/data. Form Data Matrix. Lower Dimensional Representation ... –

Number of Views:137
Avg rating:3.0/5.0
Slides: 25
Provided by: cheongh
Category:

less

Transcript and Presenter's Notes

Title: Matrix Decompositions in Dimension Reduction for Undersampled Clustered Data


1
Matrix Decompositions in Dimension Reduction
for Undersampled Clustered Data
  • Haesun Park
  • 8803 Num
  • Numerical Methods in CSE

2
Cluster Structure Preserving Dimension Reduction
for Feature Extraction
  • Algorithms
    LDA/GSVD, Orthogonal Centroid,
    Extension to Kernel-based Nonlinear
    method,
  • Applications
    Text classification, Face
    recognition, Fingerprint
    classification
  • Experimental Results Classification with
    kNN, SVM, Centroid-based
    classification

3
2D VisualizationImportance of utilizing cluster
structure
2D representation of 150x1000 data with 7
clusters LDA vs. SVD
4
Facial Recognition
ATT (ORL ) Face Database
  • image size 92 x 112
  • 400 frontal images 40 person 1o images each
    variations in pose, facial expression
  • Severely Undersampled

The 1st sample
The 35th sample
5
Dimension Reduction
Original images/data
r
i
1
2
Data preprocessing
Form Data Matrix
m x n
m x 1
Dimension Reducing Transformation
r
1
2
i
q x 1
q x n
Classification
Lower Dimensional Representation
Want Dimension reducing transformation that can
be effectively applied across many
application areas
6
Measure for Cluster Quality
  • A a1 ... an mxn, clustered data
    Ni
    items in class i, Ni ni total r
    classesci average of data items in class i,
    centroidc global average, global centroid
  • Within-class scatter matrix
  • Sw ?1 i r ? j?Ni (aj ci ) (aj
    ci )T

(2) Between-class scatter matrix Sb
?1 i r ? j ?Ni (ci c) (ci c)T
(3)Total scatter matrix St ?1 i n
(ai c ) (ai c )T
7
Trace of Scatter Matrices
trace (Sw ) ?1 i r ? j ? Ni aj ci
2
2
trace (Sb ) ?1 i r ? j ? Ni ci - c 2
2
trace (St ) ?1 i r ? j ? Ni aj c 2
Dimension Reducing Transformation
2
  • Within-class scatter

Between-class scatter
8
Optimal Dimension Reducing Transformation
GT qxm
GTy qx1
y mx1
  • High quality clusters have
    small trace(Sw) large
    trace(Sb)
  • Want dimension reduction by GT

    s.t. min trace(GT SwG) max trace(GT Sb G)
  • max trace ((GT SwG)-1 (GT Sb G)) ? LDA (Fisher
    36, Rao 48)
  • max trace (GT Sb G) ? Orthogonal Centroid (Park
    et al. 03)
  • max trace (GT (SwSb )G) ? PCA (Hotelling 33)
  • max trace (GT A AT G) ? LSI (Deerwester et al. 90)

GTGI
GTGI
GTGI
9
Classical LDA(Fisher 36, Rao 48)
  • max trace ((GT SwG)-1 (GT Sb G))
  • G leading (r-1) e.vectors of Sw-1Sb, SbxlSwx
  • Fails when mgtn (undersampled), Sw singular

x
  • SwHw HwT, Hwa1-c1, a2-c1, , an-cr mxn
  • SbHb HbT, Hb 1/ n1(c1 -c), , 1/ nr (cr -
    c) mxr

10
LDA based on GSVD (LDA/GSVD) (Howland, Jeon, Park
03, SIMAX)
  • Works regardless of singularity of scatter
    matrices
  • Sw-1Sb x l x ? d2Hb HbTx b2Hw HwTx
  • Columns of G are leading (r-1) generalized
    singular vectors of HbT and HwT

UT HbT X
(Sb
0)
0
(Sw
0)
VT HwT X
0
11
Generalized Singular Value Decomposition(Van
Loan 76, Paige and Saunders 81)
XTSb X
XTSw X
d2Hb HbTx b2Hw HwTx , X X1 X2 X3 X4 where
12
Generalization of LDA for Undersampled Problems
  • Regularized LDA (Friedman 89, Zhao et al. 99
    )
  • LDA/GSVD Solution G X1 X2 (Howland,
    Jeon, Park 03)
  • Solutions based on Null(Sw ) and Range(Sb )
    (Chen et al. 00, Yu Yang 01, Park
    Park 03 )
  • Two-stage methods
  • Face Recognition PCA LDA (Swets Weng 96 ,
    Zhao et al. 99 )
  • Information Retrieval LSI LDA (Torkkola 01)
  • Mathematical Equivalence (Howland and Park 03)

LSI LDA/GSVD LDA/GSVD PCALDA/GSVD
LDA/GSVD More efficient QRD LDA/GSVD
13
Orthogonal Centroid (OC) Algorithm (Park,
Jeon, Rosen 03, BIT)
  • Algorithm
  • 1. Form Centroid matrix
  • Cc1 , , cr m x r
  • 2. Compute QRD of C C Q R, Q m x r
  • Dimension reduction by QT to r dim. Space
    y m x 1 QT y r x 1
  • Q solves max trace(GTSbG) trace(QTSbQ)
    trace(Sb)
  • Need QRD of C m x r vs. EVD of Sb m x m
  • (or
    SVD of Hb m x r)

GTGI
14
Text Classification on Medline Data
(Kim, Howland, Park 03, JMLR)
Classification accuracy (), 5
classes Similarity measures L2 norm and Cosine
15
Text Classification on Reuters Data
(Kim, Howland, Park 03, JMLR)
Classification accuracy (), 90
classes Similarity measures L2 norm and Cosine
16
Face Recognition on ATT Data
  • Orthogonal Centroid 88 96
  • LDA/GSVD 90 98

Orthogonal Centroid
LDA/GSVD
Query Image
Top choice
Second choice
Third choice
Classification Accuracy using centroid, kNN (1,
3, 5, 7) with L2 norm
Average of 100 runs, random split of training and
test data
17
Face Recognition on Yale Data
(C. Park and H. Park)
Dim. Red. Method Dim
kNN
k1 k5 k9 Full Space
8586 79.4 76.4
72.1 LDA/GSVD 14
98.8 (90) 98.8 98.8 Regularized LDA (l1)
14 97.6 (85) 97.6 97.6 Proj.
to null (Sw) 14 97.6 (84)
97.6 97.6 (Chen et al., 00) Transf. to
range(Sb) 14 89.7 (82) 94.6
91.5 (Yu Yang, 01)
Prediction Accuracy in , leave-one-out (average
of 100 random split) Yale Face Database 243 x
320 pixels full dimension of 77760
11 images/person x 15 people
165 images After Preprocessing (avg 3x3) 8586 x
165
18
Nonlinear Dimension Reduction by Kernel Functions
  • Ex. Feature mapping F

x12 2 x1x2 x22
x1 x2
x
F(x)
,
k (x, y) lt F(x), F(y) gt lt x, y gt2
(a polynomial kernel function)
F
19
Nonlinear Dimension Reduction by Kernel Functions
  • If k(x,y) satisfies Mercers condition, then
    there is a mapping F to an inner product space,
  • k(x,y) lt F(x), F(y) gt

F
A lt x, y gt
(A)
F
k(x,y) lt F(x), F(y) gt
Mercers condition for Aa1,,an kernel
matrix K k(ai, aj) 1i, jn is
positive semi-definite.
Ex) RBF Kernel Function k(ai, aj) exp(-sai
aj 2)
20
Kernel Orthogonal Centroid (KOC)
(C. Park and H. Park, PR)
  • Apply OC in feature mapped space F(A)
  • Need QRD of Centroid matrix C in F(A),
  • but C is unknown

C 1/n1 S F(ai), , 1/nr S F(ai) QR
C 1/n1 S F(ai), , 1/nr S F(ai) QR
i?N1
i?Nr
  • CTC MT K M RTR, where K k(ai , aj)1lt
    i, j ltn
  • z QT y R-T CT y
  • ? RT z CT y

1/n1 S k(ai ,y)

i?N1
...
1/nrS k(ai ,y)
i?Nr
21
Experimental Results
  • Musk (from UCI) kNN OC KOC
    Kernel PCA
  • dim 167 k1 87.2
    95.7 87.8
  • of classes 2 15 88.5 96.0
    89.2
  • of data 6599 29 88.5 96.1
    88.5

OC KOC
Kernel PCA
..
(Scholkopf et al., 1999)
22
Fingerprint Classification
Left Loop Right Loop Whorl
Arch Tented Arch
Construction of Directional Images by DFT
1. Compute directionality in local neighborhood
by FFT 2. Compute the dominant direction 3. Find
core point for unified centering of fingerprints
within the same class

23
Fingerprint Classification Results on NIST
Fingerprint Database 4
(C. Park and H. Park 03)
KDA/GSVD Nonlinear Extension of LDA/GSVD
based on Kernel Functions
Rejection rate() 0 1.8
8.5 KDA/GSVD 90.7
91.3 92.8 kNN NN Jain et al., 99 -
90.0 91.2 NN SVM Yao et al., 03
- 90.0 92.2
4000 fingerprint images of size 512x512 By
KDA/GSVD, dimension reduced from 105x105 to 4
24
Support Vector Machine (SVM)
for binary classification (hard margin)
(Vapnik, Scholkopf, Burges)
Feature space H
Input space Rd
K(xi,xj )
SVM constructs the optimal separating hyperplane
which maximizes the margin between two
classes. For problems not linearly separable,
input data are mapped to feature space using a
kernel function K(xi,xj). Support vectors are
identified with an extra circle.
Write a Comment
User Comments (0)
About PowerShow.com