Hybrids of generative and discriminative methods for machine learning

1 / 44

About This Presentation

Title:

Hybrids of generative and discriminative methods for machine learning

Description:

G( ) = p( ) n p(xn, cn| ) 1 reusable model per class, can deal with incomplete data ... power of generative models while performing at discriminating = hybrid models ... –

Number of Views:111

Avg rating:3.0/5.0

Slides: 45

Provided by: ResearchM53

Category:

more less

Transcript and Presenter's Notes

Title: Hybrids of generative and discriminative methods for machine learning

1
Hybrids of generative anddiscriminative methods
for machine learning
MSRC Summer School - 30/06/2009
Cambridge UK
2
Motivation

Generative models
prior knowledge
handle missing data such as labels
Discriminative models
perform well at classification
However
no straightforward way to combine them

3
Content

Generative and discriminative methods
A principled hybrid framework
Study of the properties on a toy example
Influence of the amount of labelled data

4
Content

Generative and discriminative methods
A principled hybrid framework
Study of the properties on a toy example
Influence of the amount of labelled data

5
Generative methods

Answer what does a cat look like? and a dog?
gt data and labels joint distribution

x data c label ? parameters
6
Generative methods

Objective function
G(?) p(?) p(X, C?)
G(?) p(?) ?n p(xn, cn?)
1 reusable model per class, can deal with
incomplete data
Example GMMs

7
Example of generative model
8
Discriminative methods

Answer is it a cat or a dog? gt labels
posterior distribution

x data c label ? parameters
9
Discriminative methods

The objective function is
D(?) p(?) p(CX, ?)
D(?) p(?) ?n p(cnxn, ?)
Focus on regions of ambiguity, make faster
predictions
Example neural networks, SVMs

10
Example of discriminative model
SVMs / NNs
11
Generative versus discriminative
No effect of the double mode on the decision
boundary
12
Content

Generative and discriminative methods
A principled hybrid framework
Study of the properties on a toy example
Influence of the amount of labelled data

13
Semi-supervised learning

Few labelled data / lots of unlabelled data
Discriminative methods overfit, generative models
only help classify if they are good
Need to have the modelling power of generative
models while performing at discriminating gt
hybrid models

14
Discriminative trainingBach et al, ICASSP 05

Discriminative objective function
D(?) p(?) ?n p(cnxn, ?)
Using a generative model
D(?) p(?) ?n p(xn, cn?) / p(xn?)
D(?) p(?) ?n

15
Convex combinationBouchard et al, COMPSTAT 04

Generative objective function
G(?) p(?) ?n p(xn, cn?)
Discriminative objective function
D(?) p(?) ?n p(cnxn, ?)
Convex combination
log L(?) ? ? log D(?) (1- ?) ? log G(?)
??0,1

16
A principled hybrid model
17
A principled hybrid model
18
A principled hybrid model
19
A principled hybrid model
20
A principled hybrid model

? - posterior distribution of the labels
?- marginal distribution of the data
? and ? communicate through a prior
Hybrid objective function

L(?,?) p(?,?) ? ?n p(cnxn, ?) ? ?n p(xn?)
21
A principled hybrid model

? ? gt p(?, ?) p(?) ?(?-?)
L(?,?) p(?) ?(?-?) ?n p(cnxn, ?) ?n
p(xn?)
L(?) G(?) generative
case
? ? ? gt p(?, ?) p(?) p(?)
L(?,?) p(?) ?n p(cnxn, ?) ?
p(?) ?n p(xn?)
L(?,?) D(?) ? f(?) discriminative
case

22
A principled hybrid model

Anything in between hybrid case
Choice of prior
p(?, ?) p(?) N(??, ?(a))
a ? 0 gt ? ? 0 gt ? ?
a ? 1 gt ? ? ? gt ? ? ?

23
Why principled?

Consistent with the likelihood of graphical
models
gt one way to train a system
Everything can now be modelled
gt potential to be Bayesian
Potential to learn a

24
Learning

EM / Laplace approximation / MCMC
either intractable or too slow
Conjugate gradients
flexible, easy to check BUT sensitive to
initialisation, slow
Variational inference

25
Content

Generative and discriminative methods
A principled hybrid framework
Study of the properties on a toy example
Influence of the amount of labelled data

26
Toy example
27
Toy example

2 elongated distributions
Only spherical gaussians allowed gt wrong model
2 labelled points per class gt strong risk of
overfitting

28
Toy example
29
Decision boundaries
30
Content

Generative and discriminative methods
A principled hybrid framework
Study of the properties on a toy example
Influence of the amount of labelled data

31
A real example

Images are a special case, as they contain
several features each
2 levels of supervision at the image level, and
at the feature level
Image label only gt weakly labelled
Image label segmentation gt fully labelled

32
The underlying generative model
multinomial
multinomial
gaussian
33
The underlying generative model
weakly fully labelled
34
Experimental set-up

3 classes bikes, cows, sheep
? 1 Gaussian per class gt poor generative model
75 training images for each category

35
HF framework
36
HF versus CC
37
Results

When increasing the proportion of fully labelled
data, the trend is
generative ? hybrid ? discriminative
Weakly labelled data has little influence on the
trend
With sufficient fully labelled data, HF tends to
perform better than CC

38
Experimental set-up

3 classes lions, tigers and cheetahs
? 1 Gaussian per class gt poor generative model
75 training images for each category

39
HF framework
40
HF versus CC
41
Results

Hybrid models consistently perform better
However, generative and discriminative models
havent reached saturation
No clear difference between HF and CC

42
Conclusion

Principled hybrid framework
Possibility to learn the best trade-off
Helps for ambiguous datasets when labelled data
is scarce
Problem of optimisation

43
Future avenues

Bayesian version (posterior distribution of ?)
under study
Replace ? by a diagonal matrix ? to allow
flexibility gt need for the Bayesian version
Choice of priors

44
Thank you!

Write a Comment

User Comments (0)