Object Recognition using Local Affine Frames on Maximally Stable Extremal Regions

About This Presentation

Title:

Object Recognition using Local Affine Frames on Maximally Stable Extremal Regions

Description:

Appearance is consistent with the transformation. scaling, rotation, shearing ... Hessian-Affine. Edge. Intensity Extrema. Salient Regions. MSER. Harris-affine ... –

Number of Views:86

Avg rating:3.0/5.0

Slides: 34

Provided by: cse4

Category:

more less

Transcript and Presenter's Notes

Title: Object Recognition using Local Affine Frames on Maximally Stable Extremal Regions

1
Object Recognition using Local Affine Frames on
Maximally Stable Extremal Regions

Stepan Obdrzalek
Jiri Matas

2
Proposed Algorithm

Identify affine-covariant regions of interest
MSER detector
Construct local affine frames (LAFs)
Invariant to geometry and photometrics
Normalize LAF geometry and color
Generate descriptors of patches
Discrete cosine transformation
Recognition Localization
Establish tentative correspondences
Find a globally consistent subset
Infer presence and location of object

3
Requirement for Region Detectors

Consistent
Discriminative
Invariant (actually covariant)
Appearance is consistent with the transformation
scaling, rotation, shearing
Fixed shape is insufficient
Shape must be covariant to object position
(Sticky)

4
Popular Affine Covariant Detectors

Harris-Affine
Hessian-Affine
Edge
Intensity Extrema
Salient Regions
MSER

5
Harris-affine Hessian-affine

Detect interest points
Identify corners in image using Harris corner
detector
Determine the characteristic scale
Maximization of Laplacian-of-Gaussians
Determine an elliptical region for each point
Second moment matrix

6
Edge based detector

Edges are stable across view, scale, illumination
Detect interest points
Identify corners in image using Harris corner
detector
Identify edges using canny
Combine to form a parallelogram
Determine the characteristic scale
Parallelograms where textures hit an extremum

7
Intensity based detector

Detect interest points
Identify local extremum in intensity
Analyze rays projecting radially
Determine the characteristic scale
Best-fit ellipse that passes through ray-points
with large intensity shifts

8
Salient region detector

Based on PDF of intensity values computed over
elliptical region
Detect interest points
Measure the pixel entropy within elliptical
regions
Select regions with high complexity
Determine the characteristic scale
Optimal scale is determined by the identified
region

9
Maximally Stable Extremal Region (MSER)

Connected component of thresholded image
Efficient to implement O(number pixels)
Detect interest points
All pixels inside the MSER have higher or lower
intensities than in the surrounding regions
Regions are selected to be stable over intensity
range
Determine the characteristic scale
Optimal scale is automatic to MSER algorithm

10
Runtime comparison
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
Local Affine Frame (LAF) from Features

Comparing transformed image regions can be
simplified by constructing a viewpoint invariant
coordinate system that is feature-based
Coordinates are based on local features
Coordinates stick to features
Features must describe 6 degrees of freedom
Simple points and ellipses are not sufficient
MSER regions are sufficient
Assumptions
Local planarity
Perspective camera

16
Local Affine Frame (LAF) from Features
17
Local Affine Frame (LAF) from Features

2D affine transformation has 6 degrees of
freedom
6 independent constraints must be found
Correspondence of 3 non-collinear points
Constraints are derived from detected primitives

18
Local Affine Frame (LAF) from Features

Region shape constructions
Center of gravity
2 constraints resolves translation
2x2 covariance matrix ?(ii)
3 constraints Together with COG, fixes affine
up to unknown rotation
Concavities
4 constraints line and point tangent to line
Dont require detection of whole region
Curvature inflection points
From concave to convex
Straight line segments of boundary

19
Local Affine Frame (LAF) from Features

Intensity Constructions pixels inside a region
Orientations of gradients
Rotation
Direction of dominant texture periodicity
Rotaion
Extrema of RGB or any scalar function
2 constraints

20
Local Affine Frame (LAF) from Features

Topology of regions Mutual configuration of
regions
Nested regions
Neighboring regions
Holes
Incident regions

21
LAF Construction

Construction of primitives covering 6 degrees of
freedom

22
Geometric Normalization

Translate between canonical / image frame
Origin (0,0)T, Basis Vectors (1,0)T, (0,1)T
Measurement Region (MR)
Image region used to determine local
correspondences
(-2,3) x (-2,3)

23
Photometric Normalization

Translate between canonical / image frame
Reflections and shadows are ignored
Illumination, gain, aperture, etc. is modeled by
affine transformations of color channels
Transformation between two patches I and I is
Requires 6 additional normalization parameters
Intensities are affinely transformed to have
zero mean
unit variance

24
Normalization of Local Representation

Translate between canonical / image frame
12 normalization parameters stored with the
descriptor
Coverage

25
Descriptors

Desirable properties
Distinguish between large number of regions
Maximize ratio of similarities between match
mismatch
Robust or invariant to localization errors
transformations
Efficient on memory and speed
Discrete Cosine Transformation (JPEG
compression)
Algorithms require O(n lg n)
Hardware implementations
Robust to misalignment
Same discrimination as SIFT

26
Matching detected frames with query frames

Comparison
Compute similarities between all detected and
query frames
Matching
Select most likely matches
Verification
Consistency check that incorporates geometric
constraints

27
Comparison

Determine the probability that a transformation
can take place
Based on training experience
If probability is below a threshold, 8
similarity
Otherwise, determined by descriptor similarity

28
Matching

Nearest Match
Most common
For each detected frame, find closest query
frame
Mutually Nearest Match
For symmetric matching (e.g. stereo)
For each detected, find closest query
For each query, find closest detected
Match if (close query close detected) or (diff
lt threshold)
All (or N most) similar
Repetitive structures (many ambiguous
correspondences)
Keep all correspondences, resolution left to
verification
High number of false correspondences

29
Verification

All matches should be consistent with same model
3D models would only be effective if visible
parts of the image are very large (building
interiors)
Sufficient to model as planar surfaces
If 2 tentative correspondences are part of the
same plane
Similar geometric transformation
Similar photometric transformation
Set of all correspondences is decomposed into
subsets of consistent correspondences
Each subset represents a single plane in the
scene
Small sets are rejected

30
Experimental Validation COIL-100

100 objects
72 images each object
5º pose intervals
Controlled lighting

31
Experimental Validation ZuBuD

201 buildings
5 pictures each

32
Experimental Validation FOCUS

Product logos
Logos occupy small image portion
360 color images

33
Conclusion

Object recognition based on local measurements
Affine invariance achieved by expressing local
appearance in terms of affine covariant
coordinates
Promising results
Problems
Speed is the primary issue
All query compared to all database
Speed improved using hashing, cost may be
accuracy
Planar surface assumption
Rigid objects
Shadow, etc.