Title: Algebraic Functions of Views for 3D Object Recognition
1Algebraic Functions of Views for 3D Object
Recognition
- CS773C Advanced Machine Intelligence Applications
- Spring 2008 Object Recognition
2Object Appearance
- The appearance of an object can have a large
range of variation due to - Photometric effects
- Scene clutter
- Changes in shape (e.g., non-rigid objects)
- Viewpoint changes
3Algebraic Functions of Views (AFoVs)
- A powerful mathematical foundation for
investigating variations in the geometrical
appearance of an object due to viewpoint changes. - the variety of of 2D views depicting the
geometrical appearance of a 3D object can be
expressed as a combination of a small number of
2D views of the object
S. Ullman and R. Basri, "Recognition by Linear
Combinations of Models", IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol.
13, no. 10, pp. 992-1006, 1991.
4Orthographic Projection
- Case of
- 3D rigid
- transformations
- (3 ref. views)
5Orthographic Projection
- Case of 3D linear transformations
(2 ref views)
6More Results
- Perspective projection
- (2 ref. views, obtained under orthographic
projection) - Objects with smooth surfaces and non-rigid
objects - More reference views are required.
A. Shashua, Algebraic functions for
recognition, IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 17, no.
8, pp. 779-789, 1995.
7A Word of Caution!
- Only common features in the reference views can
be predicted in a novel view.
reference view
reference view
novel view
8Recognition Framework Using AFoVs
- novel 2D views of a 3D object can be recognized
by matching them to combinations of a small
number of known 2D views of the object
9Representation and Matching using AFoVs
- Representation
- Objects are represented by a small number of
views. - Each view is represented by some geometric
features (e.g., points) - Matching
- Predict the geometric appearance of an object in
a novel view by combining a small number of
reference views of the object.
10Advantages of the Method
- No 3D models or camera calibration are required.
- Only a small number of 2D views are required.
- Novel views can be different from the stored
ones. - Simpler verification scheme.
- More general framework (family of methods).
- Evidence that the human visual system works
similarly.
11Main Challenges
- Which model views to combine to predict a novel
view? - How to establish the correspondences between
novel and reference views? - How to find the coefficients of the combination?.
- How to handle occlusions?
- How to choose the reference views?
Integrate AFoVs with Indexing!
12Method Overview(G. Bebis, M. Georgiopoulos, M.
Shah, and N. da Vitoria Lobo, "Indexing Based on
Algebraic Functions of Views", Computer Vision
and Image Understanding (CVIU), Vol. 72, No. 3,
pp. 360-378, 1998)
- Preprocessing step
- (1) Extract groups of points from each model.
- (2) Sample the space of appearances of each
group. - (3) Store information about the groups in an
index table - Recognition step
- (1) Extract groups of points from the scene.
- (2) Predict their appearance.
- (3) Verify the predictions.
13Overview of the Method (contd)
14Which Model Groups to Choose?
- Cluster geometric features into higher level
descriptions. - Consider properties that are unlikely to occur at
random. - Property used in our work convexity
15Which Model Groups to Choose? (contd)
16How to generate the appearances of a group?
- Estimate each parameters range of values
- Sample the space of parameter values
- Generate a new appearance for each sample of
values
17Estimate the Range of Values of the Parameters
or
and
Using SVD
and
18Estimate the Range of Values of the Parameters
(contd)
- Assume normalized coordinates
- Use Interval Arithmetic (Moore, 1966)
- (note that the solutions
will be identical)
19Example
20Preconditioning the Reference Views
-
- Transform the original views to new views
effect of the condition number of P on the
intervals
such that has the best possible condition.
21Preconditioning the Reference Views (contd)
- Choosing
- This implies
- Thus
22 Example (preconditioned views)
23Decouple Image Coordinates
- Same transformation generates the x- and
y-coordinates - Represent only the x-coordinates in the index
table. - For each group, store the following entry
24Hypothesis Generation and Verification
1.take intersection of hypotheses
2. apply constraints to reject invalid hypotheses
model
25How to Choose the Scene Groups?
- Using convex grouping to extract salient scene
groups.
26Implementation Issues
- Space requirements
- select salient groups
- reject groups giving rise to bad conditioned
matrices - coarse sampling of parameters
- Index computation and table size
27Important Implementation Issues (contd)
- Sampling step (i.e., parameters of AFoVs)
- Noise tolerance
actual
predicted
make additional entries in a neighborhood
around the indexed location
28Experiments and Results
model objects and reference views used in our
experiments
29Experiments and Results (contd)
novel view
novel view
reference views
reference views
30Experiments and Results (contd)
novel view
novel view
reference views
31Experiments and Results (contd)
novel view
novel view
reference views
reference views
32Criticism of the Method
- Relies heavily on feature extraction
- It has high memory requirements.
- The index table might represent unrealistic model
appearances. - Indexing based on hashing is not very efficient.
- No explicit ranking of hypotheses.
33Improving AFoVs Recognition Framework
- Reject unrealistic appearances
- Reduce storage requirements and improve speed
- Develop a probabilistic hypothesis generation
scheme - Learn shape appearance
- Rank hypotheses
- Represent object appearance more efficiently
using improved indexing schemes and probabilistic
models.
W. Li, G. Bebis, and N. Bourbakis, "Integrating
Algebraic Functions of Views with Indexing and
Learning for 3D Object Recognition", IEEE
Workshop on Learning in Computer Vision and
Patter Recognition (in conjunction with CVPR04),
Washington DC, June 28, 2004.
34Combine Indexing with Learning
- Sample the space of appearances sparsely and
represent the samples in a K-d tree - Sample the space of views densely and represent
the samples using probabilistic models. - Given a novel view
- (1) Use K-d tree to retrieve a small number of
candidate models - (2) For each candidate model, compute the
probability that it might have produced the novel
view - (3) Verify most likely hypotheses first
35Combine Indexing with Learning (contd)
- The first stage provides hypothetical matches
fast. - The second stage evaluates the feasibility of
hypothetical matches fast, without having to
apply verification explicitly. - Only highly likely hypotheses are verified
explicitly.
36Improved Framework
TRAINING PHASE
RECOGNITION PHASE
Reference views
New image
Extract image groups
Extract model groups
Access
Using SVD IA
Retrieve
Estimate the range of AFoVs parameters
K-d Tree
Hypothetical matches
Sampling AFoVs parameter space
Rank hypotheses
dense
coarse
dense
Validate views
Estimate AFoVs parameters
Random Projection
coarse
Low-dimensional representation
Verify hypotheses
Manifold learning using EM
Recognition results
37Eliminate Unrealistic Model Appearances
- Under the assumption of linear transformations,
many unrealistic views could be generated. - Impose rigidity constraints to eliminate them.
- Storage requirements can be reduced
significantly. - Recognition becomes faster and more efficient.
38Eliminate Unrealistic Model Appearances
Unrealistic Views (without constraints)
Realistic Views (with constraints)
39Indexing Appearances
- Sample the space of views coarsely and
represent the samples in an index table. - Hashing might not very well in this case ...
- Need an improved indexing scheme.
40Range Search vs Nearest Neighbor Search
- Range search is not appropriate when storing a
sparse number of views. - K-d trees perform a nearest-neighbor search.
Nearest Neighbor Search
Range Search
41K-d Trees for Indexing
- K-d trees perform a nearest-neighbor search.
42Learning Geometric Appearance
- We can pre-compute the views that an object can
produce off-line. - These views form a manifold in lower dimensional
space. - Model object appearance using a pdf.
- Sample the space of appearances.
- Fit a parametric model (e.g., mixtures of
Gaussians using EM). - Use mutual information theory to choose the
number of components. - EM has problems when the dimensionality of the
data is high. - Apply Random Projection first, then run EM
algorithm.
43Manifolds of Real Objects An Example
- Need to store a small number of parameters only
for each model
44Hypothesis Ranking
- Each hypothesis generated by the K-d tree is
ranked by computing its probability using mixture
models. - For each test group, we compute two
probabilities, one from x coordinates, and the
other from y coordinates. - The overall probability for a particular
hypothesis is computed according to the
following equation
where
45Reference Views
1st Reference view
2nd Reference view
46Reference Views (contd)
1st Reference view
2nd Reference view
47Test Views
(a)
(b)
(c)
(d)
(f)
(e)
48Test Views (contd)
Hypothesis rejected
Hypothesis rejected
49Integrate Geometric Appearance with Intensity
Appearance
- Using geometrical information only does not
provide enough discrimination for objects having
similar geometric appearance but probably
different intensity appearance. - Integrating geometric and intensity apperance
during hypothesis verification to improve
discrimination power and robustness.
W. Li, G. Bebis, and N. Bourbakis, "3D Object
Recognition Using 2D Views", IEEE Transactions
on Image Processing (under revision).
50Dense Correspondences
- For each group of corresponding points, apply
triangulation recursively to get denser
correspondences. - Divide triangles into four sub-triangles by
considering the middle point of each side of each
triangle.
51Refine AFoVs parameters
(before refinement)
(after refinement)
52Predict Intensity Appearance - Example
Reference view 1
Reference view 2
Test view
Prediction
53Predict Intensity Appearance - Example
Reference view 2
Reference view 1
Test view
Prediction
54Predict Intensity Appearance - Example
(hypothesis accepted)
(hypothesis rejected)