The PASCAL Visual Object Classes Challenge2006 - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

The PASCAL Visual Object Classes Challenge2006

Description:

The PASCAL Visual Object Classes Challenge2006 – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 22
Provided by: scie5
Category:

less

Transcript and Presenter's Notes

Title: The PASCAL Visual Object Classes Challenge2006


1
The PASCAL Visual Object Classes Challenge 2006
  • Jonathan Huang (jch1_at_cs.cmu.edu)
  • Tomasz Malisiewicz (tomasz_at_cmu.edu)
  • April 17, 2006

2
The PASCAL Dataset
  • Images contain different classes of objects,
    often with multiple instances of a class within
    an image at different scales
  • Significant occlusions

3
The PASCAL Dataset
  • Varying lighting conditions and camera parameters
  • Interclass variability in texture and color,
    highly deformable objects

4
Strategy
  • Compute multiple segmentations of each image
  • Use texton histogram based bag of words
    representations
  • Learn parameters of a generative model for these
    representations (specifically, Latent Dirichlet
    Allocation)

5
Region Finding
  • The Segmentation Soup
  • Multiple Normalized Cuts
  • Varying the number of segments allows for us to
    capture features at different scales

6
Region Descriptors
  • Create a histogram of textons for each region
    (S-words)
  • Run k-means on the normalized histograms (we
    normalize for scale invariance)

7
Foreground/Background Labeling
  • Foreground Labels
  • Assign several topics to each object class. (One
    per viewpoint)
  • 10 Object classes x 5 viewpoints 50 foreground
    topics
  • Use ground truth labelings in training set to
    label segments
  • Background Labels
  • Use multiple background topics
  • To cluster background segments, we ran kmeans
    (with k30) on the segments which fell outside of
    the bounding boxes

Bldg
Sky
Mountain
MSRC2 Dataset
Water
Boat
8
A Generative Model
Sky
Cow
Cow
Grass
Grass
Water
9
A Generative Model
  • LDA (Latent Dirichlet Allocation) Generative
    Model
  • For each image
  • Choose ? Dirichlet(?)
  • Choose z to be Object Class i with probability zi
  • Choose w to be s-word j conditioned on z
    according to a learned distribution P(wz)
  • CTM (Correlated Topics Model) Generative Model
  • Same generative process, except here the class
    mixture proportions (?s) from a Logistic Normal
    distribution which models covariance structure
    across topics. For example, we might want to
    capture the fact that cows and cars are unlikely
    to appear in the same scene, but cows are grass
    almost always occur together.
  • Instead of using the unsupervised machinery of
    LDA and CTM as in Sivic et al (2005), we train
    our models in a supervised method using the
    ground truth labels

10
Dirichlet vs. Logistic Normal Priors
Dirichlet Distributions
  • Dirichlet Distribution Advantages
  • Member of the Exponential Family (this greatly
    simplifies computation)
  • Dirichlet Distribution Disadvantages
  • Does not model topic correlations well

Logistic Normal Distributions
  • Logistic-Normal Distribution Advantages
  • Covers a much richer family of distributions and
    can model covariance structure
  • Logistic-Normal Distribution Disadvantages
  • Not a member of the exponential family, which
    makes inference difficult

11
Flowchart
Training Data
Filterbank Responses
Segment Soup
Per-Pixel Texton
Histogram of Textons
S-words
Label Foreground Segments
Label Bad Segments
Label Background Segments
sLDA
?,?
12
Inference on Novel Images
  • What does inference return?
  • First, we obtain an approximate distribution over
    topics

Contracting topics 5 topics per class ? 1 topic
per class
13
Inference on Novel Images
  • What else?
  • For each word, we get an approximate topic
    distribution
  • For each pixel, we average these topic
    distributions to obtain a topic response image

14
Object Localization
  • Put a bounding box around the strong responses
  • Find the most responsive topic per pixel
  • Place a bounding box around each connected
    component

15
Detection Results (LDA)
16
Detection Results (CTM)
17
Comparison to PASCAL VOC2005
  • For last years challenge
  • There were only 4 object categories
  • And, some images came with segmentation masks

2005 Results
.134
.198
.030
.142
SLDA
Our Results
.109
.181
.030
.098
SCTM
  • Our results perform comparably to detection
    results from the 2005 PASCAL challenge despite a
    more challenging dataset with 10 object categories

18
Results
  • Some bad ones (redmislabeled bounding box)

Bike
Bus
Bike
Horse
Horse
Person
19
Results
  • Some good ones (yellowground truth, dotted
    greenour system)

20
Conclusions
  • What worked well
  • Object Localization via multiple segmentations
  • Cows, sheep,
  • What did not.
  • Small objects
  • Lengthy preprocessing times (how many times did
    we run kmeans??!)
  • Object classes with high appearance variability
    (people, cats)
  • Training set biases (red buses!)

21
References
  • The Pascal Visual Object Classes Challenge 2006
    http//www.pascal-network.org/challenges/VOC/voc20
    06/index.html
  • J. Sivic, B. Russell, A. Efros, A. Zisserman, and
    W. Freeman.Discovering object categories in
    image collectionsProceedings of the
    International Conference on Computer Vision,
    2005.
  • D. Blei, A. Ng, and M. Jordan. Latent Dirichlet
    allocation. Journal of Machine Learning Research,
    39931022, January 2003.
  • D. Blei and J. Lafferty. Correlated topic models.
    In Advances in Neural Information Processing
    Systems 18 , 2006.
  • T. Minka. Estimating a Dirichlet distribution.
    2000.
  • Hoff, P.D. Nonparametric modeling of
    hierarchically exchangeable data. UW Statistics
    Department Technical Report no. 421. 2003
  • J. Shi and J. Malik. Normalized cuts and image
    segmentation. IEEE Transactions on Pattern
    Analysis and Machine Intelligence (PAMI), 2000.
  • J. Winn, A. Criminisi and T. Minka. Object
    Categorization by Learned Universal Visual
    Dictionary. Proc. IEEE Intl. Conf. on Computer
    Vision (ICCV), Beijing 2005
Write a Comment
User Comments (0)
About PowerShow.com