Title: Hybrids of generative and discriminative methods for machine learning
1Hybrids of generative anddiscriminative methods
for machine learning
MSRC Summer School - 30/06/2009
Cambridge UK
2Motivation
- Generative models
- prior knowledge
- handle missing data such as labels
- Discriminative models
- perform well at classification
- However
- no straightforward way to combine them
3Content
- Generative and discriminative methods
- A principled hybrid framework
- Study of the properties on a toy example
- Influence of the amount of labelled data
4Content
- Generative and discriminative methods
- A principled hybrid framework
- Study of the properties on a toy example
- Influence of the amount of labelled data
5Generative methods
- Answer what does a cat look like? and a dog?
gt data and labels joint distribution
x data c label ? parameters
6Generative methods
- Objective function
- G(?) p(?) p(X, C?)
- G(?) p(?) ?n p(xn, cn?)
- 1 reusable model per class, can deal with
incomplete data - Example GMMs
7Example of generative model
8Discriminative methods
- Answer is it a cat or a dog? gt labels
posterior distribution
x data c label ? parameters
9Discriminative methods
- The objective function is
- D(?) p(?) p(CX, ?)
- D(?) p(?) ?n p(cnxn, ?)
- Focus on regions of ambiguity, make faster
predictions - Example neural networks, SVMs
10Example of discriminative model
SVMs / NNs
11Generative versus discriminative
No effect of the double mode on the decision
boundary
12Content
- Generative and discriminative methods
- A principled hybrid framework
- Study of the properties on a toy example
- Influence of the amount of labelled data
13Semi-supervised learning
- Few labelled data / lots of unlabelled data
- Discriminative methods overfit, generative models
only help classify if they are good - Need to have the modelling power of generative
models while performing at discriminating gt
hybrid models
14Discriminative trainingBach et al, ICASSP 05
- Discriminative objective function
- D(?) p(?) ?n p(cnxn, ?)
- Using a generative model
- D(?) p(?) ?n p(xn, cn?) / p(xn?)
- D(?) p(?) ?n
15Convex combinationBouchard et al, COMPSTAT 04
- Generative objective function
- G(?) p(?) ?n p(xn, cn?)
- Discriminative objective function
- D(?) p(?) ?n p(cnxn, ?)
- Convex combination
- log L(?) ? ? log D(?) (1- ?) ? log G(?)
-
??0,1
16A principled hybrid model
17A principled hybrid model
18A principled hybrid model
19A principled hybrid model
20A principled hybrid model
- ? - posterior distribution of the labels
- ?- marginal distribution of the data
- ? and ? communicate through a prior
- Hybrid objective function
L(?,?) p(?,?) ? ?n p(cnxn, ?) ? ?n p(xn?)
21A principled hybrid model
- ? ? gt p(?, ?) p(?) ?(?-?)
- L(?,?) p(?) ?(?-?) ?n p(cnxn, ?) ?n
p(xn?) - L(?) G(?) generative
case - ? ? ? gt p(?, ?) p(?) p(?)
- L(?,?) p(?) ?n p(cnxn, ?) ?
-
p(?) ?n p(xn?) - L(?,?) D(?) ? f(?) discriminative
case
22A principled hybrid model
- Anything in between hybrid case
- Choice of prior
- p(?, ?) p(?) N(??, ?(a))
- a ? 0 gt ? ? 0 gt ? ?
- a ? 1 gt ? ? ? gt ? ? ?
23Why principled?
- Consistent with the likelihood of graphical
models - gt one way to train a system
- Everything can now be modelled
- gt potential to be Bayesian
- Potential to learn a
24Learning
- EM / Laplace approximation / MCMC
- either intractable or too slow
- Conjugate gradients
- flexible, easy to check BUT sensitive to
initialisation, slow - Variational inference
25Content
- Generative and discriminative methods
- A principled hybrid framework
- Study of the properties on a toy example
- Influence of the amount of labelled data
26Toy example
27Toy example
- 2 elongated distributions
- Only spherical gaussians allowed gt wrong model
- 2 labelled points per class gt strong risk of
overfitting
28Toy example
29Decision boundaries
30Content
- Generative and discriminative methods
- A principled hybrid framework
- Study of the properties on a toy example
- Influence of the amount of labelled data
31A real example
- Images are a special case, as they contain
several features each - 2 levels of supervision at the image level, and
at the feature level - Image label only gt weakly labelled
- Image label segmentation gt fully labelled
32The underlying generative model
multinomial
multinomial
gaussian
33The underlying generative model
weakly fully labelled
34Experimental set-up
- 3 classes bikes, cows, sheep
- ? 1 Gaussian per class gt poor generative model
- 75 training images for each category
35HF framework
36HF versus CC
37Results
- When increasing the proportion of fully labelled
data, the trend is - generative ? hybrid ? discriminative
- Weakly labelled data has little influence on the
trend - With sufficient fully labelled data, HF tends to
perform better than CC
38Experimental set-up
- 3 classes lions, tigers and cheetahs
- ? 1 Gaussian per class gt poor generative model
- 75 training images for each category
39HF framework
40HF versus CC
41Results
- Hybrid models consistently perform better
- However, generative and discriminative models
havent reached saturation - No clear difference between HF and CC
42Conclusion
- Principled hybrid framework
- Possibility to learn the best trade-off
- Helps for ambiguous datasets when labelled data
is scarce - Problem of optimisation
43Future avenues
- Bayesian version (posterior distribution of ?)
under study - Replace ? by a diagonal matrix ? to allow
flexibility gt need for the Bayesian version - Choice of priors
44Thank you!