Title: Christopher M' Bishop
1Generative vs. Discriminative Approaches to
Object Recognition
Microsoft Research Cambridge
International Workshop on Object
RecognitionSicily, October 2004
2Collaborator
3Generative vs. Discriminative Models
- Generative model joint distribution of data and
classesthen evaluate posterior probabilities
using Bayes theorem - Discriminative directly model posterior
probabilities - In both cases usually work in a feature space
4Weakly Labelled Images
- Images labelled with object category only
- Patch-based models, no spatial relationships
- Illustrative 2-class experiments cows vs. sheep
5Interest Points and Features
- Difference of Gaussians or Harris
- SIFT optionally colour
- Interest point code from Cordelia Schmid
6A Discriminative Approach
7A Discriminative Approach
- Patch labels are hidden
- Class labels (not mutually exclusive) for whole
image - Image carries class label k if at least one patch
does, so posterior probability for image is
8A Discriminative Approach
- Error function given by negative log
likelihoodwhere - Gradient based optimization
- Model learns to predict class for each patch,
even though training labels are only given for
the whole image
9A Discriminative Approach
- Only patches which help to discriminate between
the object classes become labelled with object
category - All others, including most foreground patches,
are classified as background meaning
non-discriminative - This discriminative approach therefore labels the
image, but does not segment the object
10(No Transcript)
11(No Transcript)
12(No Transcript)
13An Alternative Discriminative Approach
- Cluster feature vectors from all patches in all
training images using K-means (K 100) - For each image assign each patch to closest
prototype - Gives fixed-length histogram feature vector
- Use (normalized) histogram as feature vector for
classifier
14Automatic Relevance Determination
- Select relevant features using automatic
relevance determination (MacKay and Neal) - Prior distribution over parameters governed by
hyper-parameters - Optimize hyper-parameters by maximizing marginal
likelihood - High implies low relevance for feature
15(No Transcript)
16(No Transcript)
17(No Transcript)
18(No Transcript)
19(No Transcript)
20(No Transcript)
21(No Transcript)
22Preliminary Comparative Results
- 167 images from each class
- 10 fold cross-validation
23A Generative Approach
24A Generative Approach
- Joint distribution specified by
- Determine parameters by maximum likelihood
25(No Transcript)
26(No Transcript)
27Discussion
- Generative model required good initialization
can we use a discriminative model to do this? - Generative models can exploit mix of strongly
labelled, weakly labelled and unlabelled data - Would be much more satisfactory to learn both
interest point detector and local descriptors