Title: Introduction to Object Recognition
1Introduction to Object Recognition
- CS773C Machine Intelligence Advanced Applications
- Spring 2008 Object Recognition
2Outline
- The Problem of Object Recognition
- Approaches to Object Recognition
- Requirements and Performance Criteria
- Representation Schemes
- Matching Schemes
- Example Systems
- Indexing
- Grouping
- Error Analysis
3Problem Statement
- Given some knowledge of how certain objects may
appear and an image of a scene possibly
containing those objects, report which objects
are present in the scene and where.
Recognition should be (1) invariant to view
point changes and object transformations (2)
robust to noise and occlusions
4Challenges
- The appearance of an object can have a large
range of variation due to - photometric effects
- scene clutter
- changes in shape (e.g.,non-rigid objects)
- viewpoint changes
- Different views of the same object can give rise
to widely different images !!
5Object Recognition Applications
- Quality control and assembly in industrial
plants. - Robot localization and navigation.
- Monitoring and surveillance.
- Automatic exploration of image databases.
6Human Visual Recognition
- A spontaneous, natural activity for humans and
other biological systems. - People know about tens of thousands of different
objects, yet they can easily distinguish among
them. - People can recognize objects with movable parts
or objects that are not rigid. - People can balance the information provided by
different kinds of visual input.
7Why Is It Difficult?
- Hard mathematical problems in understanding the
relationship between geometric shapes and their
projections into images. - We must match an image to one of a huge number of
possible objects, in any of an infinite number of
possible positions (computational complexity)
8Why Is It Difficult? (contd)
- We do not understand the recognition problem
9What do we do in practice?
- Impose constraints to simplify the problem.
- Construct useful machines rather than modeling
human performance.
10Approaches Differ According To
- Knowledge they employ
- Model-based approach (i.e., based on explicit
model of the object's shape or appearance) - Context-based approach (i.e., based on the
context in which objects may be found) - Function-based approach (i.e., based on the
function for which objects may serve)
11Approaches Differ According To (contd)
- Restrictions on the form of the objects
- 2D or 3D objects
- Simple vs complex objects
- Rigid vs deforming objects
- Representation schemes
- Object-centered
- Viewer-centered
12Approaches Differ According To (contd)
- Matching scheme
- Geometry-based
- Appearance-based
- Image formation model
- Perspective projection
- Affine transformation (e.g., planar objects)
- Orthographic projection scale
13Requirements
- Viewpoint Invariant
- Translation, Rotation, Scale
- Robust
- Noise (i.e., sensor noise)
- Local errors in early processing modules (e.g.,
edge detection) - Illumination/Shadows
- Partial occlusion (i.e., self and from other
objects) - Intrinsic shape distortions (i.e., non-rigid
objects)
14Performance Criteria
- Scope
- What kind of objects can be recognized and in
what kinds of scenes ? - Robustness
- Does the method tolerate reasonable amounts of
noise and occlusion in the scene ? - Does it degrade gracefully as those tolerances
are exceeded ?
15Performance Criteria (contd)
- Efficiency
- How much time and memory are required to search
the solution space ? - Accuracy
- Correct recognition
- False positives (wrong recognitions)
- False negatives (missed recognitions)
16Representation Schemes
17Object-centered Representation
- Associates a coordinate system with the object
- The object geometry is expressed in this frame
Advantage every view of the object is
available Disadvantage might not be easy to
build (i.e., reconstruct 3D from 2D).
18Object-centered Representation (contd)
- Two different matching approaches
- (1) Derive a similar object-centered description
from the scene and match it with the models (e.g.
using shape from X methods). - (2) Apply a model of the image formation process
on the candidate model to back-project it onto
the scene (camera calibration required).
19Viewer-centered Representation
- Objects are described by a set of characteristic
views or aspects
Advantages (i) Easier to build compared to
object-centered, (ii) matching is easier since it
involves 2D descriptions. Disadvantages
Requires a large number of views.
20Predicting New Views
- There is some evidence that the human visual
system uses a viewer-centered representation
for object recognition. - It predicts the appearance of objects in images
obtained under novel conditions by generalizing
from familiar images of the objects.
21Predicting New Views (contd)
Familiar Views
Predict Novel View
22Matching Schemes
(1) Geometry-based
explore correspondences between model and
scene features
(2) Appearance-based
represent objects from all possible viewpoints
and all possible illumination directions.
23Geometry-based Matching
- Advantage efficient in segmenting the object
of interest from the scene and robust in handling
occlusion - Disadvantage rely heavily on feature extraction
and their performance degrades when imaging
conditions give rise to poor segmentations.
24Appearance-based Matching
- Advantage circumvent the feature extraction
problem by enumerating many possible object
appearances in advance. - Disadvantages (i) difficulties with segmenting
the objects from the background and dealing with
occlusions, (ii) too many possible appearances,
(iii) how to sample the space of appearances ?
25Model-Based Object Recognition
- The environment is rather constraint and
recognition relies upon the existence of a set of
predefined objects.
26 Goals of Matching
- Identify a group of features from an unknown
scene which approximately match a set of features
from a known view of a model object. - Recover the geometric transformation that the
model object has undergone
27Transformation Space
- 2D objects (2 translation, 1 rotation, 1 scale)
- 3D objects, perspective projection (3 rotation, 3
translation) - 3D objects, orthographic projection scale
(essentially 5 parameters and a constant for
depth)
28Matching Two Steps
- Hypothesis generation the identities of one or
more models are hypothesized. - Hypothesis verification tests are performed to
check if a given hypothesis is correct or not.
Models
29Hypothesis Generation-Verification Example
30Efficient Hypothesis Generation
- How to choose the scene groups?
- Do we need to consider every possible group?
- How to find groups of features that are likely
to belong to the same object? - Use grouping schemes
- Database organization and searching
- Do we need to search the whole database of
models? - How should we organize the model database to
allow for fast and efficient storage and
retrieval? - Use indexing schemes
31Interpretation Trees(E. Grimson and T.
Lozano-Perez, 1987)
- Nodes of the tree represent match pairs (i.e.,
scene to model feature match). - Each level of the tree represents all possible
matches between an image feature fi and a model
feature mj - The tree represents the complete search space.
32Interpretation Trees (contd)(E. Grimson and T.
Lozano-Perez, 1987)
- Interpretation a path through the tree.
- (Model features m1, m2, m3, m4)
- (Scene features f1, f2)
- Use a Depth-first-tree search to find a match
(or interpretation).
33Interpretation Trees (contd)(E. Grimson and T.
Lozano-Perez, 1987)
- Search space is very large (i.e., exponential
number of matches). - Find consistent interpretations without exploring
all possible ways of matching image and model
features. - Use geometric constraints to prune the tree
- Unary constraints properties of individual
features (e.g., length/orientation of a line) - Binary constraints properties of pairs of
features (e.g., distance/angle between two
lines)
34Alignment Approach(Huttenlocher and Ullman, 1990)
- Most approaches searched for the largest pairing
of model and image features for which there exist
a single geometric transformation mapping each
model feature to its corresponding image feature. - The alignment approach seeks to recover the
geometric transformation between the model and
the scene using a minimum number of
correspondences.
35Alignment Approach (contd)(Huttenlocher and
Ullman, 1990)
- Weak perspective model (3 correspondences -
O(m3n3) cases) - x ?(sRxb)
- ? orthographic projection
- s scale
- R 3D rotation
- b translation
- Equivalent to an affine transformation (valid
when object is far from camera and object depth
small relative to distance from camera) -
xLxb
36Pose Clustering(e.g., Thompson and Mundy, 1987,
Ballard, 1981)
- Main idea
- If there is a transformation that can bring into
alignment a large number of features, then this
transformation will receive a large number of
votes.
37Pose Clustering(e.g., Thompson and Mundy, 1987,
Ballard, 1981)
- Main Steps
- (1) Quantize the space of possible
transformations (usually 4D - 6D). - (2) For each hypothetical match, solve for the
transformation that aligns the matched
features. - (3) Cast a vote in the corresponding
transformation space bin. - (4) Find "peak" in transformation space.
38Pose Clustering (example)(e.g., Thompson and
Mundy, 1987, Ballard, 1981)
39Appearance-based Recognition(e.g., Murase and
Nayar, 1995, Turk and Petland, 1991)
- Represent an object by the set of its possible
appearances (i.e., under all possible viewpoints
and illumination conditions). - Identifying an object implies finding the closest
stored image.
40Appearance-based Recognition(e.g., Murase and
Nayar, 1995, Turk and Petland, 1991)
- In practice, a subset of all possible appearances
is used. - Images are highly correlated, so compress them
into a low-dimensional space that captures key
appearance characteristics (e.g., use Principal
Component Analysis (PCA)).
41Indexing-based Recognition
- Preprocessing step groups of model features are
used to index the database and the indexed
locations are filled with entries containing
references to the model objects and information
that later can be used for pose recovering. - Recognition step groups of scene features are
used to index the database and the model objects
listed in the indexed locations are collected
into a list of candidate models (hypotheses).
42Indexing-Based Recognition (contd)
- Use a-priori stored information about the models
to quickly eliminate non-feasible matches during
recognition.
43Invariants
- Properties that do not change with object
transformations or viewpoint changes. - Ideally, we would like the index computed from a
group of model features to be invariant. - Only one entry per group needs to be stored this
way.
44Planar (2D) objects
- The index is computed based on invariant
properties. - One entry per group needs to be stored in this
case.
affine invariants (geometric hashing) Lamdan et
al., 1988
45Geometric Hashing
46Three-Dimensional Objects
- No general-case invariants exist for single views
of general 3D objects (Clemens Jacobs, 1991). - Special case and model-based invariants (Rothwell
et al., 1995, Weinshall, 1993)
47Indexing for 3D Object Recognition (contd)
- One approach might be ...
48Indexing for 3D Object Recognition (contd)
- Another approach might be ...
49Grouping
- Grouping is the process that organizes the image
into parts, each likely to come from a single
object. - It reduces the number of hypotheses dramatically.
- Non-accidental properties (grouping clues)
- Orientation, Collinearity, Parallelism, Proximity
Convex groups (Jacobs, 1996)
50Error Analysis
- Uncertainty in feature locations
- It is important to analyze the sensitivity of
each algorithm with respect to uncertainty in the
location of the image features. - Case of Indexing
- Analyze how errors in the locations of the points
affects the invariants.
51Error Analysis (contd)
52References
- E. Grimson and T. Lozano-Perez, "Localizing
overlapping parts by searching the interpretation
tree", IEEE Pattern Analysis and Machine
Intelligence, vol. 9, no. 4, pp. 469-482, July
1987. - D. Huttenlocher and S. Ullman, "Recognizing solid
objects by alignment with an image",
International Journal of Computer Vision, vol. 5,
no. 2, pp. 195-212, 1990. - Y. Lamdan, J. Schwartz, and H. Wolfson, "Affine
invariant model-based object recognition", IEEE
Trans. on Robotics and Automation, vol. 6, no. 5,
pp. 578-589, October 1990. - Rigoutsos I. Hummel R., "A Bayesian approach to
model matching with geometric hashing", CVGIP
Image Understanding, 62, 11-26, 1995.
53References (contd)
- D. Clemens and D. Jacobs, "Space and time bounds
on indexing 3D models from 2D images", IEEE
Pattern Analysis and Machine Intelligence, vol.
13 no. 10, pp. 1007-1017, 1991. - D. Thompson and J. Mundy, "Three dimensional
model matching from an unconstrained viewpoint",
IEEE Conference on Robotics and Automation, pp.
208-220, 1987. - D. Ballard, "Generalizing the hough transform to
detect arbitrary patterns", Pattern Recognition,
vol. 13, no. 2, pp. 111-122, 1981. - H. Murase and S. Nayar, "Visual learning and
recognition of 3D objects from appearance",
International Journal of Computer Vision, vol.
14, pp. 5-24, 1995.
54References (contd)
- M. Turk and A. Pentland, "Eigenfaces for
Recognition", Journal of Cognitive Neuroscience,
Vol. 3, pp. 71-86, 1991. - D. Jacobs, "Robust and efficient detection of
salient convex groups", IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol.
18, no. 1, pp. 23-37, 1996. - Bowyer and C. Dyer, "Aspect graphs an
introduction and survey of recent results",
International Journal of Imaging Systems and
Technology, vol. 2, pp. 315-328, 1990.