Hybrids of generative and discriminative methods for machine learning

1 / 44
About This Presentation
Title:

Hybrids of generative and discriminative methods for machine learning

Description:

G( ) = p( ) n p(xn, cn| ) 1 reusable model per class, can deal with incomplete data ... power of generative models while performing at discriminating = hybrid models ... –

Number of Views:111
Avg rating:3.0/5.0
Slides: 45
Provided by: ResearchM53
Category:

less

Transcript and Presenter's Notes

Title: Hybrids of generative and discriminative methods for machine learning


1
Hybrids of generative anddiscriminative methods
for machine learning
MSRC Summer School - 30/06/2009
Cambridge UK
2
Motivation
  • Generative models
  • prior knowledge
  • handle missing data such as labels
  • Discriminative models
  • perform well at classification
  • However
  • no straightforward way to combine them

3
Content
  • Generative and discriminative methods
  • A principled hybrid framework
  • Study of the properties on a toy example
  • Influence of the amount of labelled data

4
Content
  • Generative and discriminative methods
  • A principled hybrid framework
  • Study of the properties on a toy example
  • Influence of the amount of labelled data

5
Generative methods
  • Answer what does a cat look like? and a dog?
    gt data and labels joint distribution

x data c label ? parameters
6
Generative methods
  • Objective function
  • G(?) p(?) p(X, C?)
  • G(?) p(?) ?n p(xn, cn?)
  • 1 reusable model per class, can deal with
    incomplete data
  • Example GMMs

7
Example of generative model
8
Discriminative methods
  • Answer is it a cat or a dog? gt labels
    posterior distribution

x data c label ? parameters
9
Discriminative methods
  • The objective function is
  • D(?) p(?) p(CX, ?)
  • D(?) p(?) ?n p(cnxn, ?)
  • Focus on regions of ambiguity, make faster
    predictions
  • Example neural networks, SVMs

10
Example of discriminative model
SVMs / NNs
11
Generative versus discriminative
No effect of the double mode on the decision
boundary
12
Content
  • Generative and discriminative methods
  • A principled hybrid framework
  • Study of the properties on a toy example
  • Influence of the amount of labelled data

13
Semi-supervised learning
  • Few labelled data / lots of unlabelled data
  • Discriminative methods overfit, generative models
    only help classify if they are good
  • Need to have the modelling power of generative
    models while performing at discriminating gt
    hybrid models

14
Discriminative trainingBach et al, ICASSP 05
  • Discriminative objective function
  • D(?) p(?) ?n p(cnxn, ?)
  • Using a generative model
  • D(?) p(?) ?n p(xn, cn?) / p(xn?)
  • D(?) p(?) ?n

15
Convex combinationBouchard et al, COMPSTAT 04
  • Generative objective function
  • G(?) p(?) ?n p(xn, cn?)
  • Discriminative objective function
  • D(?) p(?) ?n p(cnxn, ?)
  • Convex combination
  • log L(?) ? ? log D(?) (1- ?) ? log G(?)

  • ??0,1

16
A principled hybrid model
17
A principled hybrid model
18
A principled hybrid model
19
A principled hybrid model
20
A principled hybrid model
  • ? - posterior distribution of the labels
  • ?- marginal distribution of the data
  • ? and ? communicate through a prior
  • Hybrid objective function

L(?,?) p(?,?) ? ?n p(cnxn, ?) ? ?n p(xn?)
21
A principled hybrid model
  • ? ? gt p(?, ?) p(?) ?(?-?)
  • L(?,?) p(?) ?(?-?) ?n p(cnxn, ?) ?n
    p(xn?)
  • L(?) G(?) generative
    case
  • ? ? ? gt p(?, ?) p(?) p(?)
  • L(?,?) p(?) ?n p(cnxn, ?) ?

  • p(?) ?n p(xn?)
  • L(?,?) D(?) ? f(?) discriminative
    case

22
A principled hybrid model
  • Anything in between hybrid case
  • Choice of prior
  • p(?, ?) p(?) N(??, ?(a))
  • a ? 0 gt ? ? 0 gt ? ?
  • a ? 1 gt ? ? ? gt ? ? ?

23
Why principled?
  • Consistent with the likelihood of graphical
    models
  • gt one way to train a system
  • Everything can now be modelled
  • gt potential to be Bayesian
  • Potential to learn a

24
Learning
  • EM / Laplace approximation / MCMC
  • either intractable or too slow
  • Conjugate gradients
  • flexible, easy to check BUT sensitive to
    initialisation, slow
  • Variational inference

25
Content
  • Generative and discriminative methods
  • A principled hybrid framework
  • Study of the properties on a toy example
  • Influence of the amount of labelled data

26
Toy example
27
Toy example
  • 2 elongated distributions
  • Only spherical gaussians allowed gt wrong model
  • 2 labelled points per class gt strong risk of
    overfitting

28
Toy example
29
Decision boundaries
30
Content
  • Generative and discriminative methods
  • A principled hybrid framework
  • Study of the properties on a toy example
  • Influence of the amount of labelled data

31
A real example
  • Images are a special case, as they contain
    several features each
  • 2 levels of supervision at the image level, and
    at the feature level
  • Image label only gt weakly labelled
  • Image label segmentation gt fully labelled

32
The underlying generative model
multinomial
multinomial
gaussian
33
The underlying generative model
weakly fully labelled
34
Experimental set-up
  • 3 classes bikes, cows, sheep
  • ? 1 Gaussian per class gt poor generative model
  • 75 training images for each category

35
HF framework
36
HF versus CC
37
Results
  • When increasing the proportion of fully labelled
    data, the trend is
  • generative ? hybrid ? discriminative
  • Weakly labelled data has little influence on the
    trend
  • With sufficient fully labelled data, HF tends to
    perform better than CC

38
Experimental set-up
  • 3 classes lions, tigers and cheetahs
  • ? 1 Gaussian per class gt poor generative model
  • 75 training images for each category

39
HF framework
40
HF versus CC
41
Results
  • Hybrid models consistently perform better
  • However, generative and discriminative models
    havent reached saturation
  • No clear difference between HF and CC

42
Conclusion
  • Principled hybrid framework
  • Possibility to learn the best trade-off
  • Helps for ambiguous datasets when labelled data
    is scarce
  • Problem of optimisation

43
Future avenues
  • Bayesian version (posterior distribution of ?)
    under study
  • Replace ? by a diagonal matrix ? to allow
    flexibility gt need for the Bayesian version
  • Choice of priors

44
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com