(Semi-)Supervised Probabilistic Principal Component Analysis

About This Presentation

Title:

(Semi-)Supervised Probabilistic Principal Component Analysis

Description:

Title: PowerPoint Presentation Last modified by: Shipeng Yu Created Date: 1/1/1601 12:00:00 AM Document presentation format: On-screen Show Other titles – PowerPoint PPT presentation

Number of Views:184

Avg rating:3.0/5.0

Slides: 30

Provided by: dbsIfiLm

Category:

more less

Transcript and Presenter's Notes

Title: (Semi-)Supervised Probabilistic Principal Component Analysis

1
(Semi-)Supervised Probabilistic Principal
Component Analysis

Shipeng Yu
University of Munich, Germany
Siemens Corporate Technology
http//www.dbs.ifi.lmu.de/spyu
Joint work with Kai Yu, Volker Tresp,
Hans-Peter Kriegel, Mingrui Wu

2
Dimensionality Reduction

We are dealing with high-dimensional data
Texts e.g. bag-of-words features
Images color histogram, correlagram, etc.
Web pages texts, linkages, structures, etc.
Motivations
Noisy features how to remove or down-weight
them?
Learnability curse of dimensionality
Inefficiency high computational cost
Visualization
A pre-processing for many data mining tasks

3
Unsupervised versus Supervised

Unsupervised Dimensionality Reduction
Only the input data are given
PCA (principal component analysis)
Supervised Dimensionality Reduction
Should be biased by the outputs
Classification FDA (Fisher discriminant
analysis)
Regression PLS (partial least squares)
RVs CCA (canonical correlation analysis)
More general solutions?
Semi-Supervised?

4
Our Settings and Notations

data points, input features, output
labels
We aim to derive a mapping such
that

Unlabeled Data!
Unsupervised
Semi-supervised
Supervised
5
Outline

Principal Component Analysis
Probabilistic PCA
Supervised Probabilistic PCA
Related Work
Conclusion

6
PCA Motivation

Find the K orthogonal projection directions which
have the most data variances
Applications
Visualization
De-noising
Latent semantic indexing
Eigenfaces

1st PC
2nd PC
7
PCA Algorithm

Basic Algorithm
Centralize data
Compute the sample covariance matrix
Do eigen-decomposition (sort eigenvalues
decreasingly)
PCA directions are given in , the first K
columns of
The PCA projection of a test data is

8
Supervised PCA?

PCA is unsupervised
When output information is available
Classification labels 0/1
Regression responses real values
Ranking orders rank labels / preferences
Multi-outputs output dimension gt 1
Structured outputs,
Can PCA be biased by outputs?
And how?

9
Outline

Principal Component Analysis
Probabilistic PCA
Supervised Probabilistic PCA
Related Work
Conclusion

10
Latent Variable Model for PCA

Another interpretation of PCA Pearson 1901
PCA is minimizing the reconstruction error of
are latent variables PCA
projections of
are factor loadings PCA mappings
Equivalent to PCA up to a scaling factor
Lead to idea of PPCA

11
Probabilistic PCA TipBis99

Latent variable model
Conditional independence
If , PPCA leads to PCA solution (up to
a rotation and scaling factor)
is Gaussian distributed

Mean vector
12
From Unsupervised to Supervised

Key insights of PPCA
All the M input dimensions are conditionally
independent given the K latent variables
In PCA we are seeking the K latent variables that
best explain the data covariance
When we have outputs , we believe
There are inter-covariances between and
There are intra-covariances within if
Idea Let the latent variables explain all of
them!

13
Outline

Principal Component Analysis
Probabilistic PCA
Supervised Probabilistic PCA
Related Work
Conclusion

14
Supervised Probabilistic PCA

Supervised latent variable model
All input and output dimensions are conditionally
independent
are jointly Gaussian distributed

15
Semi-Supervised Probabilistic PCA

Idea A SPPCA model with missing data!
Likelihood

16
S2PPCA EM Learning

Model parameters
EM Learning
E-step estimate for each data point (a
projection problem)
M-step maximize data log-likelihood w.r.t.
parameters
An extension of EM learning for PPCA model
Can be kernelized!
By product an EM learning algorithm for kernel
PCA

Inference Problem
17
S2PPCA Toy Problem - Linear
18
S2PPCA Toy Problem - Nonlinear
19
S2PPCA Toy Problem - Nonlinear
20
S2PPCA Properties

Semi-supervised projection
Take PCA and kernel PCA as special cases
Applicable to large data sets
Primal O(t(ML)NK) time, O((ML)N) space
Dual O(tN2K) time, O(N2) space
A latent variable solution Yu et al, SIGIR05
Cannot deal with semi-supervised setting!
Closed form solutions for SPPCA
No closed form solutions for S2PPCA

cheaper than Primal if MgtN
21
SPPCA Primal Solution
(ML)(ML)
22
SPPCA Dual Solution
New kernel matrix!
23
Experiments

Train Nearest Neighbor classifier after
projection
Evaluation metrics
Multi-class classification error rate
Multi-label classification F1-measure, AUC

24
Multi-class Classification

S2PPCA is almost always better than SPPCA
LDA is very good for FACE data
S2PPCA is very good on TEXT data
S2PPCA has good scalability

25
Multi-label Classification

S2PPCA is always better than SPPCA
S2PPCA is better in most cases
S2PPCA has good scalability

26
Extensions

Put priors on factor loading matrices
Learn MAP estimates for them
Relax Gaussian noise model for outputs
Better way to incorporate supervised information
We need to do more approximations (using, e.g.
EP)
Directly predict missing outputs (i.e. single
step)
Mixture modeling in latent variable space
Achieve local PCA instead of global PCA
Robust supervised PCA mapping
Replace Gaussian with Student-t
Outlier detection in PCA

27
Related Work

Fisher discriminant analysis (FDA)
Goal Find directions to maximize between-class
distance while minimizing within-class distance
Only deal with outputs of multi-class
classification
Limited number of projection dimensions
Canonical correlation analysis (CCA)
Goal Maximize the correlation between inputs and
outputs
Ignore intra-covariance of both inputs and
outputs
Partial least squares (PLS)
Goal Sequentially find orthogonal directions to
maximize covariance with respect to outputs
A penalized CCA poor generalization on new
output dimensions