The Visual Recognition Machine - PowerPoint PPT Presentation

About This Presentation
Title:

The Visual Recognition Machine

Description:

V: image pixels. E: connections between pairs of nearby pixels ... but it is sparse with O(N) nonzero entries, where N is the number of pixels. ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 43
Provided by: sche59
Category:

less

Transcript and Presenter's Notes

Title: The Visual Recognition Machine


1
The Visual Recognition Machine
  • Jitendra Malik
  • University of California at Berkeley

2
From images to objects
Labeled sets tiger, grass etc
3
Recognition
4
Three stages
  • Segmentation Images Regions
  • Association Regions Super-regions
  • Matching Super-regions Prototype views

5
(No Transcript)
6
Three stages
  • Segmentation Images Regions
  • Association Regions Super-regions
  • Matching Super-regions Prototype views

7
Boundaries of image regions defined by a number
of attributes
  • Brightness/color
  • Texture
  • Motion
  • Stereoscopic depth
  • Familiar configuration

8
Image Segmentation as Graph Partitioning
Build a weighted graph G(V,E) from image
V image pixels E connections between pairs of
nearby pixels
Partition graph so that similarity within group
is large and similarity between groups is small
-- Normalized Cuts ShiMalik 97
9
Some Terminology for Graph Partitioning
  • How do we bipartition a graph

10
Normalized Cut, A measure of dissimilarity
  • Minimum cut is not appropriate since it favors
    cutting small pieces.
  • Normalized Cut, Ncut

11
Solving the Normalized Cut problem
  • Exact discrete solution to Ncut is NP-complete
    even on regular grid,
  • Papadimitriou97
  • Drawing on spectral graph theory, good
    approximation can be obtained by solving a
    generalized eigenvalue problem.

12
Normalized Cut As Generalized Eigenvalue problem
  • after simplification, we get

13
Computational Aspects
  • Solving for the generalized eigensystem
  • (D-W) is of size , but it is sparse
    with O(N) nonzero entries, where N is the number
    of pixels.
  • Using Lanczos algorithm.

14
Three stages
  • Segmentation Images Regions
  • Association Regions Super-regions
  • Matching Super-regions Prototype views

15
Association
  • Number of super-regions of size k in image with
    n regions is approximately (4k)n/k
  • For typical images, this ranges between 1000 and
    10000
  • Plausibility ordering could reduce effective
    number substantially
  • Computing time for this stage negligible

16
Three stages
  • Segmentation Images Regions
  • Association Regions Super-regions
  • Matching Super-regions Prototype views

17
Matching
  • Objects are represented by a set of prototypical
    views (10 per object)
  • For each super-region S, calculate probability
    that it is an instance of view V
  • Determine most probable labeling of image into
    objects

18
(No Transcript)
19
Matching super-regions to views
  • Based on color, texture and shape similarity
  • Color, texture matching is relatively well
    understood and fast
  • Shape matching is difficult because the algorithm
    should tolerate pose, illumination and
    intra-category variation
  • GOAL small misclassification error with few
    views.

20
Core idea
  • Find corresponding points on the two shapes and
    use those to deform prototype into alignment
  • Allowing this flexibility reduces number of
    prototype views needed

21
(No Transcript)
22
(No Transcript)
23
MNIST Handwritten Digits
24
Digit Prototypes
25
Matching with original and deformed prototypes
Prototype
Test
Error
26
Deforming prototypes using thin plate splines
27
Only 25 deformable templates needed (instead of
60 K) to get 5 error
28
COIL Object Database
29
(No Transcript)
30
Computing cost on a Pentium PC
  • Segmentation 5 minutes /image
  • Matching 0.5 sec / match

31
Cost on 104 node machine
  • Segmentation 0.03 sec /image, which is 30 Hz
    (video rate)
  • Matching 20K matches/sec at full resolution
    (100 points/shape)

32
How many prototype views can one match at 1 Hz?
  • 1K candidate super-regions
  • Consider only 1 of matches at full resolution
    (10 pass color/texture filter, 10 of those
    pass low resolution shape filter)
  • If half time spent in pruning and half in full
    resolution matching, 1000 prototype views can be
    matched at 1 Hz.

33
What can one do with matching 1000 views a second?
  • Worst case 100 object categories
  • Best case depends on how well one can exploit
    context, hierarchy and hashing.
  • Cf. humans can recognize 10-100K objects

34
Memory requirements
  • 10 K object categories 10 views/category 100
    100 pixels/view 1 byte/pixel gives us 1
    Gigabyte.

35
Concluding remarks
  • Speech in 1985 was in the same state as vision in
    2000. Hidden Markov Models adoption led to a
    decade of research which refined the paradigm for
    continuous speech recognition.
  • The proposed 3 stage framework for recognition
    segmentation, association and matching, could
    provide the same focus and coherence to vision
    research leading to general purpose object
    recognition in 10 years.

36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com