Explicit and implicit 3D object models - PowerPoint PPT Presentation

1 / 69
About This Presentation
Title:

Explicit and implicit 3D object models

Description:

Explicit and implicit 3D object models – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 70
Provided by: peopleC
Category:

less

Transcript and Presenter's Notes

Title: Explicit and implicit 3D object models


1
6.870 Object Recognition and Scene Understanding
http//people.csail.mit.edu/torralba/courses/6.870
/6.870.recognition.htm
  • Lecture 4
  • Explicit and implicit 3D object models

2
Monday
  • Recognition of 3D objects
  • Presenter Alec Rivers
  • Evaluator

3
2D frontal face detection
Amazing how far they have gotten with so little
4
People have the bad taste of not being
rotationally symmetric
Examples of un-collaborative subjects
5
Objects are not flat
In the old days, some toy makers and few people
working on face detection suggested that flat
objects could be a good approximation to real
objects.
6
Solution to deal with 3D variationsdo not deal
with it
not-Dealing with rotations and pose
Train a different model for each view.
The combined detector is invariant to pose
variations without an explicit 3D model.
7
So, how many classifiers?
And why should we stop with pose? Lets do the
same with styles, lighting conditions, etc, etc,
etc
Need to detect Nclasses Nviews Nstyles, in
clutter. Lots of variability within classes, and
across viewpoints.
8
Depth without objects
  • Random dot stereograms (Bela Julesz)

3D is so important for humans that we decided to
grow two eyes in front of the face instead of
having one looking to the front and another to
the back. (this is not something that Julesz
said but he could, maybe he did)
9
Objects 3D shape priors
by H Bülthoff Max-Planck-Institut für biologische
Kybernetik in Tübingen Video taken from
http//www.michaelbach.de/ot/fcs_hollow-face/index
.html
10
3D drives perception of important object
attributes
by Roger Shepard (Turning the Tables)
Depth processing is automatic, and we can not
shut it down
11
3D drives perception of important object
attributes
The two Towers of Pisa
Frederick Kingdom, Ali Yoonessi and Elena
Gheorghiu of McGill Vision Research unit.
12
It is not all about objects
3D percept is driven by the scene, which imposes
its ruling to the objects
13
Class experiment
14
Class experiment
  • Experiment 1 draw a horse (the entire body, not
    just the head) in a white piece of paper.
  • Do not look at your neighbor! You already know
    how a horse looks like no need to cheat.

15
Class experiment
  • Experiment 2 draw a horse (the entire body, not
    just the head) but this time chose a viewpoint as
    weird as possible.

16
Anonymous participant
17
3D object categorization
  • Wait object categorization in humans is not
    invariant to 3D pose

18
3D object categorization
Despite we can categorize all three pictures as
being views of a horse, the three pictures do not
look as being equally typical views of horses.
And they do not seem to be recognizable with the
same easiness.
by Greg Robbins
19
Observations about pose invariancein humans
Two main families of effects have been observed
  • Canonical perspective
  • Priming effects

20
Canonical Perspective
Experiment (Palmer, Rosch Chase 81)
participants are shown views of an object and are
asked to rate how much each one looked like the
objects they depict(scale 1very much like,
7very unlike)
5
2
21
Canonical Perspective
Examples of canonical perspective
In a recognition task, reaction time correlated
with the ratings. Canonical views are recognized
faster at the entry level.
Why?
From Vision Science, Palmer
22
Canonical Viewpoint
  • Frequency hypothesis
  • Maximal information hypothesis

23
Canonical Viewpoint
  • Frequency hypothesis easiness of recognition is
    related to the number of times we have see the
    objects from each viewpoint.
  • For a computer, using its Google memory, a horse
    looks like

It is not a uniform sampling on viewpoints (some
artificial datasets might contain non natural
statistics)
24
Canonical Viewpoint
  • Frequency hypothesis easiness of recognition is
    related to the number of times we have see the
    objects from each viewpoint.

Can you think of some examples in which this
hypothesis might be wrong?
25
Canonical Viewpoint
  • Maximal information hypothesis Some views
    provide more information than others about the
    objects.

Best views tend to show multiple sides of the
object.
Can you think of some examples in which this
hypothesis might be wrong?
26
Canonical Viewpoint
  • Maximal information hypothesis

Clocks are preferred as purely frontal
27
Canonical Viewpoint
  • Frequency hypothesis
  • Maximal information hypothesis
  • Probably both are correct.

28
Observations about pose invariancein humans
Two main families of effects have been observed
  • Canonical perspective
  • Priming effects

29
Priming effects
  • Priming paradigm recognition of an object is
    faster the second time that you see it.

Biederman Gerhardstein 93
30
Priming effects
Same exemplars
Different exemplars
Biederman Gerhardstein 93
31
Priming effects
Biederman Gerhardstein 93
32
Object representations
  • Explicit 3D models use volumetric
    representation. Have an explicit model of the 3D
    geometry of the object.

Appealing but hard to get it to work
33
Object representations
  • Implicit 3D models matching the input 2D view to
    view-specific representations.

Not very appealing but somewhat easy to get it to
work
we all know what I mean by work
34
Object representations
  • Implicit 3D models matching the input 2D view to
    view-specific representations.

The object is represented as a collection of 2D
views (maybe the most frequent views seen in the
past). Tarr Pinker (89) show people are faster
at recognizing previously seen views, as if they
were storing them. People were also able to
recognize unseen views, so they also generalize
to new views. It is not just template matching.
35
Why do I explain all this?
  • As we build systems and develop algorithms it is
    good to
  • Get inspiration from what others have thought
  • Get intuitions about what can work, and how
    things can fail.

36
Explicit 3D model
Object Recognition in the Geometric Era a
Retrospective, Joseph L. Mundy
37
Explicit 3D model
  • Not all explicit 3D models were disappointing.
  • For some object classes, with accurate geometric
    and appearance models, it is possible to get
    remarkable results.

38
A Morphable Model for the Synthesis of 3D Faces
Blanz Vetter, Siggraph 99
39
(No Transcript)
40
A Morphable Model for the Synthesis of 3D Faces
Blanz Vetter, Siggraph 99
41
We have not achieved yet the same level of
description for other object classes
42
Implicit 3D models
43
Aspect Graphs
  • The nodes of the graph represent object views
    that are adjacent to each other on the unit
    sphere of viewing directions but differ in some
    significant way. The most common view
    relationship in aspect graphs is based on the
    topological structure of the view, i.e., edges in
    the aspect graph arise from transitions in the
    graph structure relating vertices, edges and
    faces of the projected object. Joseph L. Mundy

44
Aspect Graphs
45
Affine patches
  • Revisit invariants as a local description of 3D
    objects Indeed, although smooth surfaces are
    almost never planar in the large, they are always
    planar in the small

3D Object Modeling and Recognition Using Local
Affine-Invariant Image Descriptors and Multi-View
Spatial Constraints. F. Rothganger, S. Lazebnik,
C. Schmid, and J. Ponce, IJCV 2006
46
Affine patches
  • Two steps
  • Detection of salient image regions
  • Extraction of a descriptor around the detected
    locations

47
Affine patches
  • Two steps
  • Detection of salient image regions (Garding and
    Lindeberg, 96 Mikolajczyk and Schmid, 02)

a) an elliptical image region is deformed to
maximize the isotropy of the corresponding
brightness pattern. b) its characteristic scale
is determined as a local extreme of the
normalized Laplacian in scale space. c) the
Harris (1988) operator is used to refine the
position of the ellipses center. The elliptical
region obtained at convergence can be shown to be
covariant under affine transformations.
48
Affine patches
49
Affine patches
50
Affine patches
51
Affine patches
52
Affine patches
Each region is represented withthe SIFT
descriptor.
53
Affine patches
A coherent 3D interpretation of all the matches
is obtained using a formulation derived
fromstructure-from-motion and RANSAC to deal
with outliers.
54
Affine patches
55
Patch-based single view detector
Vidal-Naquet, Ullman (2003)
Screen model
Car model
56
For a single view
First we collect a set of part templates from a
set of training objects. Vidal-Naquet, Ullman
(2003)

57
Extended fragments
View-Invariant Recognition Using Corresponding
Object Fragments E. Bart, E. Byvatov, S. Ullman
58
Extended fragments
View-Invariant Recognition Using Corresponding
Object Fragments E. Bart, E. Byvatov, S. Ullman
59
Extended fragments
View-Invariant Recognition Using Corresponding
Object Fragments E. Bart, E. Byvatov, S. Ullman
60
Extended fragments
Extended patches are extracted using short
sequences. Use Lucas-Kanade motion estimation to
track patches across the sequence.
61
Learning
  • Once a large pool of extended fragments is
    created, there is a training stage to select the
    most informative fragments.
  • For each fragment evaluate
  • Select the fragment B with
  • In the subsequent rounds, use

Class label
Fragment present/absent
All these operations are easy to compute. It is
just counting.
62
F
C
1
1
0
1
1
1
P(C1, F1) 3 / 10
1
1
P(C1, F0)
P(C0, F1)
0
1
P(C0, F0)
0
0
0
0
0
0
0
0
1
0
63
Training without sequences
  • Challenges
  • We do not know which fragments are in
    correspondence (we can not use motion estimation
    due to strong transformation)
  • Fragments that are in correspondence will have
    detections that are correlated across viewpoints.

Bart Ullman
64
Shared features for Multi-view object detection
Training does not require having different views
of the same object.
View invariant features
View specific features
Torralba, Murphy, Freeman. PAMI 07
65
Shared features for Multi-view object detection
Sharing is not a tree. Depends also on 3D
symmetries.

Torralba, Murphy, Freeman. PAMI 07
66
Multi-view object detection
Strong learner H response for car as function of
assumed view angle
Torralba, Murphy, Freeman. PAMI 07
67
Voting schemes
Towards Multi-View Object Class
Detection Alexander Thomas Vittorio
Ferrari Bastian Leibe Tinne Tuytelaars Bernt
Schiele Luc Van Gool
68
Viewpoint-Independent Object Class Detection
using 3D Feature Maps
Training dataset synthetic objects
Features
Each cluster casts votes for the voting bins of
the discrete poses contained in its internal list.
Voting scheme and detection
Liebelt, Schmid, Schertler. CVPR 2008
69
Monday
  • Recognition of 3D objects
  • Presenter Alec Rivers
  • Evaluator
Write a Comment
User Comments (0)
About PowerShow.com