Title: Object Recognition using Local Affine Frames on Maximally Stable Extremal Regions
1Object Recognition using Local Affine Frames on
Maximally Stable Extremal Regions
- Stepan Obdrzalek
- Jiri Matas
2Proposed Algorithm
- Identify affine-covariant regions of interest
- MSER detector
- Construct local affine frames (LAFs)
- Invariant to geometry and photometrics
- Normalize LAF geometry and color
- Generate descriptors of patches
- Discrete cosine transformation
- Recognition Localization
- Establish tentative correspondences
- Find a globally consistent subset
- Infer presence and location of object
3Requirement for Region Detectors
- Consistent
- Discriminative
- Invariant (actually covariant)
- Appearance is consistent with the transformation
- scaling, rotation, shearing
- Fixed shape is insufficient
- Shape must be covariant to object position
(Sticky)
4Popular Affine Covariant Detectors
- Harris-Affine
- Hessian-Affine
- Edge
- Intensity Extrema
- Salient Regions
- MSER
5Harris-affine Hessian-affine
- Detect interest points
- Identify corners in image using Harris corner
detector - Determine the characteristic scale
- Maximization of Laplacian-of-Gaussians
- Determine an elliptical region for each point
- Second moment matrix
6Edge based detector
- Edges are stable across view, scale, illumination
- Detect interest points
- Identify corners in image using Harris corner
detector - Identify edges using canny
- Combine to form a parallelogram
- Determine the characteristic scale
- Parallelograms where textures hit an extremum
7Intensity based detector
- Detect interest points
- Identify local extremum in intensity
- Analyze rays projecting radially
- Determine the characteristic scale
- Best-fit ellipse that passes through ray-points
with large intensity shifts
8Salient region detector
- Based on PDF of intensity values computed over
elliptical region - Detect interest points
- Measure the pixel entropy within elliptical
regions - Select regions with high complexity
- Determine the characteristic scale
- Optimal scale is determined by the identified
region
9Maximally Stable Extremal Region (MSER)
- Connected component of thresholded image
- Efficient to implement O(number pixels)
- Detect interest points
- All pixels inside the MSER have higher or lower
intensities than in the surrounding regions - Regions are selected to be stable over intensity
range - Determine the characteristic scale
- Optimal scale is automatic to MSER algorithm
10Runtime comparison
11(No Transcript)
12(No Transcript)
13(No Transcript)
14(No Transcript)
15Local Affine Frame (LAF) from Features
- Comparing transformed image regions can be
simplified by constructing a viewpoint invariant
coordinate system that is feature-based - Coordinates are based on local features
- Coordinates stick to features
- Features must describe 6 degrees of freedom
- Simple points and ellipses are not sufficient
- MSER regions are sufficient
- Assumptions
- Local planarity
- Perspective camera
16Local Affine Frame (LAF) from Features
17Local Affine Frame (LAF) from Features
- 2D affine transformation has 6 degrees of
freedom - 6 independent constraints must be found
- Correspondence of 3 non-collinear points
- Constraints are derived from detected primitives
18Local Affine Frame (LAF) from Features
- Region shape constructions
- Center of gravity
- 2 constraints resolves translation
- 2x2 covariance matrix ?(ii)
- 3 constraints Together with COG, fixes affine
up to unknown rotation - Concavities
- 4 constraints line and point tangent to line
- Dont require detection of whole region
- Curvature inflection points
- From concave to convex
- Straight line segments of boundary
19Local Affine Frame (LAF) from Features
- Intensity Constructions pixels inside a region
- Orientations of gradients
- Rotation
- Direction of dominant texture periodicity
- Rotaion
- Extrema of RGB or any scalar function
- 2 constraints
20Local Affine Frame (LAF) from Features
- Topology of regions Mutual configuration of
regions - Nested regions
- Neighboring regions
- Holes
- Incident regions
21LAF Construction
- Construction of primitives covering 6 degrees of
freedom
22Geometric Normalization
- Translate between canonical / image frame
- Origin (0,0)T, Basis Vectors (1,0)T, (0,1)T
-
- Measurement Region (MR)
- Image region used to determine local
correspondences - (-2,3) x (-2,3)
-
23Photometric Normalization
- Translate between canonical / image frame
- Reflections and shadows are ignored
- Illumination, gain, aperture, etc. is modeled by
affine transformations of color channels - Transformation between two patches I and I is
-
- Requires 6 additional normalization parameters
- Intensities are affinely transformed to have
- zero mean
- unit variance
24Normalization of Local Representation
- Translate between canonical / image frame
- 12 normalization parameters stored with the
descriptor - Coverage
25Descriptors
- Desirable properties
- Distinguish between large number of regions
- Maximize ratio of similarities between match
mismatch - Robust or invariant to localization errors
transformations - Efficient on memory and speed
- Discrete Cosine Transformation (JPEG
compression) - Algorithms require O(n lg n)
- Hardware implementations
- Robust to misalignment
- Same discrimination as SIFT
26Matching detected frames with query frames
- Comparison
- Compute similarities between all detected and
query frames - Matching
- Select most likely matches
- Verification
- Consistency check that incorporates geometric
constraints
27Comparison
- Determine the probability that a transformation
can take place - Based on training experience
- If probability is below a threshold, 8
similarity - Otherwise, determined by descriptor similarity
28Matching
- Nearest Match
- Most common
- For each detected frame, find closest query
frame - Mutually Nearest Match
- For symmetric matching (e.g. stereo)
- For each detected, find closest query
- For each query, find closest detected
- Match if (close query close detected) or (diff
lt threshold) - All (or N most) similar
- Repetitive structures (many ambiguous
correspondences) - Keep all correspondences, resolution left to
verification - High number of false correspondences
29Verification
- All matches should be consistent with same model
- 3D models would only be effective if visible
parts of the image are very large (building
interiors) - Sufficient to model as planar surfaces
- If 2 tentative correspondences are part of the
same plane - Similar geometric transformation
- Similar photometric transformation
- Set of all correspondences is decomposed into
subsets of consistent correspondences - Each subset represents a single plane in the
scene - Small sets are rejected
30Experimental Validation COIL-100
-
- 100 objects
- 72 images each object
- 5º pose intervals
- Controlled lighting
31Experimental Validation ZuBuD
-
- 201 buildings
- 5 pictures each
32Experimental Validation FOCUS
- Product logos
- Logos occupy small image portion
- 360 color images
33Conclusion
- Object recognition based on local measurements
- Affine invariance achieved by expressing local
appearance in terms of affine covariant
coordinates - Promising results
- Problems
- Speed is the primary issue
- All query compared to all database
- Speed improved using hashing, cost may be
accuracy - Planar surface assumption
- Rigid objects
- Shadow, etc.