The PASCAL Visual Object Classes Challenge2006

About This Presentation

Title:

The PASCAL Visual Object Classes Challenge2006

Description:

The PASCAL Visual Object Classes Challenge2006 – PowerPoint PPT presentation

Number of Views:52

Avg rating:3.0/5.0

Slides: 22

Provided by: scie5

Category:

more less

Transcript and Presenter's Notes

Title: The PASCAL Visual Object Classes Challenge2006

1
The PASCAL Visual Object Classes Challenge 2006

Jonathan Huang (jch1_at_cs.cmu.edu)
Tomasz Malisiewicz (tomasz_at_cmu.edu)
April 17, 2006

2
The PASCAL Dataset

Images contain different classes of objects,
often with multiple instances of a class within
an image at different scales
Significant occlusions

3
The PASCAL Dataset

Varying lighting conditions and camera parameters
Interclass variability in texture and color,
highly deformable objects

4
Strategy

Compute multiple segmentations of each image
Use texton histogram based bag of words
representations
Learn parameters of a generative model for these
representations (specifically, Latent Dirichlet
Allocation)

5
Region Finding

The Segmentation Soup
Multiple Normalized Cuts
Varying the number of segments allows for us to
capture features at different scales

6
Region Descriptors

Create a histogram of textons for each region
(S-words)
Run k-means on the normalized histograms (we
normalize for scale invariance)

7
Foreground/Background Labeling

Foreground Labels
Assign several topics to each object class. (One
per viewpoint)
10 Object classes x 5 viewpoints 50 foreground
topics
Use ground truth labelings in training set to
label segments
Background Labels
Use multiple background topics
To cluster background segments, we ran kmeans
(with k30) on the segments which fell outside of
the bounding boxes

Bldg
Sky
Mountain
MSRC2 Dataset
Water
Boat
8
A Generative Model
Sky
Cow
Cow
Grass
Grass
Water
9
A Generative Model

LDA (Latent Dirichlet Allocation) Generative
Model
For each image
Choose ? Dirichlet(?)
Choose z to be Object Class i with probability zi
Choose w to be s-word j conditioned on z
according to a learned distribution P(wz)
CTM (Correlated Topics Model) Generative Model
Same generative process, except here the class
mixture proportions (?s) from a Logistic Normal
distribution which models covariance structure
across topics. For example, we might want to
capture the fact that cows and cars are unlikely
to appear in the same scene, but cows are grass
almost always occur together.
Instead of using the unsupervised machinery of
LDA and CTM as in Sivic et al (2005), we train
our models in a supervised method using the
ground truth labels

10
Dirichlet vs. Logistic Normal Priors
Dirichlet Distributions

Dirichlet Distribution Advantages
Member of the Exponential Family (this greatly
simplifies computation)
Dirichlet Distribution Disadvantages
Does not model topic correlations well

Logistic Normal Distributions

Logistic-Normal Distribution Advantages
Covers a much richer family of distributions and
can model covariance structure
Logistic-Normal Distribution Disadvantages
Not a member of the exponential family, which
makes inference difficult

11
Flowchart
Training Data
Filterbank Responses
Segment Soup
Per-Pixel Texton
Histogram of Textons
S-words
Label Foreground Segments
Label Bad Segments
Label Background Segments
sLDA
?,?
12
Inference on Novel Images

What does inference return?
First, we obtain an approximate distribution over
topics

Contracting topics 5 topics per class ? 1 topic
per class
13
Inference on Novel Images

What else?
For each word, we get an approximate topic
distribution
For each pixel, we average these topic
distributions to obtain a topic response image

14
Object Localization

Put a bounding box around the strong responses
Find the most responsive topic per pixel
Place a bounding box around each connected
component

15
Detection Results (LDA)
16
Detection Results (CTM)
17
Comparison to PASCAL VOC2005

For last years challenge
There were only 4 object categories
And, some images came with segmentation masks

2005 Results
.134
.198
.030
.142
SLDA
Our Results
.109
.181
.030
.098
SCTM

Our results perform comparably to detection
results from the 2005 PASCAL challenge despite a
more challenging dataset with 10 object categories

18
Results

Some bad ones (redmislabeled bounding box)

Bike
Bus
Bike
Horse
Horse
Person
19
Results

Some good ones (yellowground truth, dotted
greenour system)

20
Conclusions

What worked well
Object Localization via multiple segmentations
Cows, sheep,
What did not.
Small objects
Lengthy preprocessing times (how many times did
we run kmeans??!)
Object classes with high appearance variability
(people, cats)
Training set biases (red buses!)

21
References

The Pascal Visual Object Classes Challenge 2006
http//www.pascal-network.org/challenges/VOC/voc20
06/index.html
J. Sivic, B. Russell, A. Efros, A. Zisserman, and
W. Freeman.Discovering object categories in
image collectionsProceedings of the
International Conference on Computer Vision,
2005.
D. Blei, A. Ng, and M. Jordan. Latent Dirichlet
allocation. Journal of Machine Learning Research,
39931022, January 2003.
D. Blei and J. Lafferty. Correlated topic models.
In Advances in Neural Information Processing
Systems 18 , 2006.
T. Minka. Estimating a Dirichlet distribution.
2000.
Hoff, P.D. Nonparametric modeling of
hierarchically exchangeable data. UW Statistics
Department Technical Report no. 421. 2003
J. Shi and J. Malik. Normalized cuts and image
segmentation. IEEE Transactions on Pattern
Analysis and Machine Intelligence (PAMI), 2000.
J. Winn, A. Criminisi and T. Minka. Object
Categorization by Learned Universal Visual
Dictionary. Proc. IEEE Intl. Conf. on Computer
Vision (ICCV), Beijing 2005