An Overview on Semi-Supervised Learning Methods - PowerPoint PPT Presentation

1 / 15

About This Presentation

Title:

An Overview on Semi-Supervised Learning Methods

Description:

Note: Citations omitted here (given in. my literature review) Semi-Supervised Learning ... Homotopy Continuation (Corduneanu etal.) Just like in Supervised Learning: ... – PowerPoint PPT presentation

Number of Views:125

Avg rating:3.0/5.0

Slides: 16

Provided by: Matthia56

Category:

more less

Transcript and Presenter's Notes

Title: An Overview on Semi-Supervised Learning Methods

1
An Overview onSemi-Supervised LearningMethods

Matthias SeegerMPI for Biological Cybernetics
Tuebingen, Germany

2
Overview

The SSL Problem
Paradigms for SSL. Examples
The Importance ofInput-dependent Regularization
Note Citations omitted here (given inmy
literature review)

3
Semi-Supervised Learning

SSL is Supervised Learning...
Goal Estimate P(yx) from Labeled DataDl
(xi,yi)
But Additional Source tells about P(x)(e.g.,
Unlabeled Data Duxj)

The Interesting Case
4
Obvious Baseline Methods
The Goal of SSL is To Do Better Not Uniformly
and always(No Free Lunch and yes (of course)
Unlabeled data can hurt) But (as always) If our
modelling and algorithmic efforts reflecttrue
problem characteristics

Do not use info about P(x)? Supervised Learning
Fit a Mixture Modelusing Unsupervised
Learning, thenlabel up components using yi

5
The Generative Paradigm

Model Class Distributions and
Implies model for P(yx)and for P(x)

6
The Joint Likelihood

Natural Criterion in this context
Maximize using EM (idea as old as EM)
Early and recent theoretical work onasymptotic
variance
Advantage Easy to implement forstandard mixture
model setups

7
Drawbacks of Generative SSL

Choice of source weighting l crucial
Cross-Validation fails for small n
Homotopy Continuation (Corduneanu etal.)
Just like in Supervised Learning
Model for P(yx) specified indirectly
Fitting not primarily concerned with
P(yx).Also Have to represent P(x) generally
wellNot just aspects which help with P(yx).

8
The Diagnostic Paradigm

Model P(yx,q) and P(xm)directly
But Since q,m areindependent a priori,q does
not depend on m, given data? Knowledge of m does
not influence P(yx) prediction in a
probabilistic setup!

9
What To Do About It

Non-probabilistic diagnostic techniques
Replace expected lossbyTong, Koller Chapelle
etal.? Very limited effect if n small
Some old work (eg., Anderson)
Drop the prior independence of q,m?
Input-dependent Regularization

10
Input-Dependent Regularization
q

Conditional priors P(qm)make P(yx)
estimationdependent on P(x),
Now, unlabeled data can really help...
And can hurt for the same reason!

11
The Cluster Assumption (CA)

Empirical Observation Clustering of data xj
w.r.t. sensible distance / features often
fairly compatible with class regions
Weaker Class regions do not tend to cut
high-volume regions of P(x)
Why? Ask Philosophers! My guessSelection bias
for features/distance

No Matter Why Many SSL Methods implement theCA
and work fine in practice
12
Examples For IDR Using CA

Label Propagation, Gaussian Random Fields
Regularization depends on graph structure which
is built from all xj? More smoothness in
regions of high connectivity / affinity
flows
Cluster kernels for SVM (Chapelle etal.)
Information Regularization(Corduneanu, Jaakkola)

13
More Examples for IDR

Some methods do IDR, but implement the CA only in
special cases
Fisher Kernels (Jaakkola etal.)Kernel from
Fisher features? Automatic feature induction
from P(x) model
Co-Training (Blum, Mitchell)Consistency across
diff. views (features)

14
Is SSL Always Generative?

Wait We have to model P(x) somehow.Is this not
always generative then? ... No!
Generative Model P(xy) fairly directly, P(yx)
model and effect of P(x) are implicit
Diagnostic IDR
Direct model for P(yx), more flexibility
Influence of P(x) knowledge on P(yx) prediction
directly controlled, eg. through CA? Model for
P(x) can be much less elaborate

15
Conclusions