Title: Agenda
1Agenda
- Introduction
- Bag-of-words model
- Visual words with spatial location
- Part-based models
- Discriminative methods
- Segmentation and recognition
- Recognition-based image retrieval
- Datasets Conclusions
2(No Transcript)
3Analogy to documents
Of all the sensory impressions proceeding to the
brain, the visual experiences are the dominant
ones. Our perception of the world around us is
based essentially on the messages that reach the
brain from our eyes. For a long time it was
thought that the retinal image was transmitted
point by point to visual centers in the brain
the cerebral cortex was a movie screen, so to
speak, upon which the image in the eye was
projected. Through the discoveries of Hubel and
Wiesel we now know that behind the origin of the
visual perception in the brain there is a
considerably more complicated course of events.
By following the visual impulses along their path
to the various cell layers of the optical cortex,
Hubel and Wiesel have been able to demonstrate
that the message about the image falling on the
retina undergoes a step-wise analysis in a system
of nerve cells stored in columns. In this system
each cell has its specific function and is
responsible for a specific detail in the pattern
of the retinal image.
4A clarification definition of BoW
- Independent features
- Histogram representation of image
- Discrete appearance representation
5Representation
2.
1.
3.
61.Feature detection and representation
71.Feature detection and representation
- Regular grid
- Vogel Schiele, 2003
- Fei-Fei Perona, 2005
81.Feature detection and representation
- Regular grid
- Vogel Schiele, 2003
- Fei-Fei Perona, 2005
- Interest point detector
- Csurka, et al. 2004
- Fei-Fei Perona, 2005
- Sivic, et al. 2005
91.Feature detection and representation
Compute SIFT descriptor Lowe99
Normalize patch
Detect patches Mikojaczyk and Schmid 02 Mata,
Chum, Urban Pajdla, 02 Sivic Zisserman,
03
Slide credit Josef Sivic
101.Feature detection and representation
112. Codewords dictionary formation
122. Codewords dictionary formation
Vector quantization
Slide credit Josef Sivic
132. Codewords dictionary formation
Fei-Fei et al. 2005
14Image patch examples of codewords
Sivic et al. 2005
153. Image representation
frequency
codewords
16Representation
2.
1.
3.
category models (and/or) classifiers
17Learning and Recognition
category models (and/or) classifiers
18Learning and Recognition
- Generative method
- - topic models
- Discriminative method
- - SVM
category models (and/or) classifiers
19Probabilistic Latent Semantic Analysis (pLSA)
- Background Hoffman, 2001 Blei, Ng Jordan,
2004 ? Latent Dirichlet Allocation - Object categorization
- Sivic et al. 2005 Sudderth et al. 2005
- Natural scene categorization Fei-Fei et al. 2005
- In this case, use it for unsupervised
learningfrom image collections
20Probabilistic Latent Semantic Analysis
dj the jth image in an image collection z
latent theme or topic of the patch N number of
patches per image wi visual word of patch
Sivic et al. ICCV 2005
21Feature detection and representation
Image collection
d
w
P(widj)
22The pLSA model
Slide credit Josef Sivic
23Learning the pLSA parameters
Observed counts of word i in document j
Maximize likelihood of data using EM
M number of codewords N number of images
Slide credit Josef Sivic
24Recognition using pLSA
Slide credit Josef Sivic
25Demo
26task face detection no labeling
27Demo learnt parameters
- Learning the model do_plsa(config_file_1)
- Evaluate and visualize the model
do_plsa_evaluation(config_file_1)
Codeword distributions per theme (topic)
Theme distributions per image
28Demo recognition examples
29Learning and Recognition
- Generative method
- - topic models
- Discriminative method
- - SVM
category models (and/or) classifiers
30Discriminative methods based on bag of words
representation
- Grauman Darrell, 2005, 2006
- SVM w/ Pyramid Match kernels
- Others
- Csurka, Bray, Dance Fan, 2004
- Serre Poggio, 2005
31Summary Pyramid match kernel
optimal partial matching between sets of features
- Pyramid is in feature space, spatial information
not used - Efficient to compute linear in
features/image - Satisfies Mercer Condition, so can be used as a
kernel in an SVM
Grauman Darrell, 2005, Slide credit Kristen
Grauman
32Pyramid Match (Grauman Darrell 2005)
Histogram intersection
Slide credit Kristen Grauman
33Pyramid Match (Grauman Darrell 2005)
Histogram intersection
Slide credit Kristen Grauman
34Pyramid match kernel
- Weights inversely proportional to bin size
- Normalize kernel values to avoid favoring large
sets
Slide credit Kristen Grauman
35Example pyramid match
Level 0
Slide credit Kristen Grauman
36Example pyramid match
Level 1
Slide credit Kristen Grauman
37Example pyramid match
Level 2
Slide credit Kristen Grauman
38Example pyramid match
pyramid match
optimal match
Slide credit Kristen Grauman
39Object recognition results
- ETH-80 database 8 object classes
- (Eichhorn and Chapelle 2004)
- Features
- Harris detector
- PCA-SIFT descriptor, d10
Kernel Complexity Recognition rate
Match Wallraven et al. 84
Bhattacharyya affinity Kondor Jebara 85
Pyramid match 84
d descriptor dim. m features L
levels in pyramid
Slide credit Kristen Grauman