Title: Local feature detector and descriptors
1Local feature detector and descriptors
2Contents
- Local feature detector
- Location/shape of the feature
- Point detectors and region detectors
- Evaluation
- Difference of covariant and invariant
- Descriptors
- Local image pattern computed within the region
- Evaluation
3Main reference
- Evaluation of Interest Point Detectors, C.Schmid,
etc. IJCV2000 - A comparison of affine region detectors K.
Mikolajczyk, etc, in IJCV2005 - A performance evaluation of local descriptors. In
PAMI 2005
4Interest Point Detectors
- Contour based methods
- Junctions, ends, etc.
- Intensity based methods
- Auto-correlation matrix
- Parametric-model based method
- L-corner
5Harris Detector Intuition
flat regionno change in all directions
edgeno change along the edge direction
cornersignificant change in all directions
6Harris Detector Mathematics
Change of intensity for the shift u,v
7Harris Detector Mathematics
For small shifts u,v we have a bilinear
approximation
where M is a 2?2 matrix computed from image
derivatives
Derivatives are computed using -2 -1 0 1 2
8Harris Detector Mathematics
Intensity change in shifting window eigenvalue
analysis
?1, ?2 eigenvalues of M
direction of the fastest change
Ellipse E(u,v) const
direction of the slowest change
(?max)-1/2
(?min)-1/2
9Harris Detector Mathematics
?2
Edge ?2 gtgt ?1
Classification of image points using eigenvalues
of M
Corner?1 and ?2 are large, ?1 ?2E
increases in all directions
?1 and ?2 are smallE is almost constant in all
directions
Edge ?1 gtgt ?2
Flat region
?1
10Other methods
- ImpHarris
- Replacing the -2 -1 0 1 2 with the derivative
of a Gaussian with - Cottier94
- Apply the Harris detector only to contour points,
contours are extracted using canny edge detector - Horaud90
- Intersection between neighborlines
- Heitger92
- Gabor like response
11Evaluation
- Ground truth generation
- Homography (planar surface)
12Criterion
- Repeatability
- The number of points repeated between two images
with respect to the total number of detected
points - Rotation, scaling, illumination variation,
viewing angle change, camera noise. -
13(No Transcript)
14(No Transcript)
15Illumination change
Viewing angle change
16Conclusion for point detectors
- Rotation ImpHarris
- Scaling ImpHarris and Cottier
- Illumination ImpHarris and Heitger
- Viewing angle ImpHarris
17Affine covariant region detectors
- To accommodate viewing angle change
- Fixed shape cannot cope with the geometric
deformations of caused by the viewpoint change
18Harris Affine Hession Affine
- Interest point
- harris detector or Hession matrix
The eigenvalues of this matrix represent two
principal signal changes in a neighbourhood of
the point.
Local maxima of determinant indicates the
presence of a blob structure.
19Scale selection
- Select the characteristic scale of a local
structure, for which a given function attains an
extremum over scales. - Laplacian operator
20Shape estimation
21Edge-based region detector
- Rational edges are rather stable features
- Starts from Harris corners and Canny edges
22Intensity extrema based regions
- Starts from points of local intensity extrema
23Maximally Stable Extremal region detector
- The word extremal refers to the property that
all pixels inside the MSER have either higher
(bright extremal regions) or lower (dark extremal
regions) intensity than all the pixels on its
outer boundary. - Simple thresholding
24Salient region detector
- Based on information content (intensity
distribution) -
Saliency
25Run time evaluation
26Viewing angle scale
27Blury Illumination
28Conclusions
- In many cases the highest score is obtained by
the MSER detector, followed by Hessian-Affine.
MSER performs well on images containing
homogeneous regions with distinctive boundaries. - EBR is suitable for scenes containing
intersections of edges.
29Point Descriptors
- We know how to detect points
- Next question
- How to match them?
?
- Point descriptor should be
- Invariant
- Distinctive
30Descriptors
- Distribution based descriptors
- Histogram, spin-image, SIFT, Shape context, etc
- Spatial-frequency techniques
- Gabor filters, wavelet, etc
- Differential descriptors
- Steerable filters, complex filters, etc.
31Normalization
- spatial normalization
- Size
- orientation
- Illumination normalization
32Normalization
33SIFT
- SIFT, PCA-SIFT
- 3D histogram of gradient location and
orientation. 128 dimension - Gradient location-orientation histogram (GLOH)
34SIFT vector formation
- Thresholded image gradients are sampled over
16x16 array of locations in scale space - Create array of orientation histograms
- 8 orientations x 4x4 histogram array 128
dimensions
35Shape Context
Key idea represent an image in terms of
descriptors at certain locations that describe
the edges relative to those locations Shape
context of a point is the histogram of the
relative positions of all other points in the
image. Use bins that are uniform in log-polar
space to emphasize close-by, local structure.
36Image Domain Spin Image
- SP is a 2-D (soft) histogram of image brightness
values in the neighborhood of a particular
reference (center) point.
The contribution of a pixel located in x to the
bin indexed by (d, i) is given by
37Other descriptors
- Streeable filters
- Complex filters
- Moment invariants
38Distance measure
- Mahalanobis distance
- steerable filter, differential invariants, moment
invariants and complex filters. - Euclidean distance
- histogram based descriptors, SIFT, GLOH,
- PCA-SIFT, shape context and spin images
39Dataset
- Rotation
- Camera rotation around optical axis(30-45deg)
- Scaling
- Camera zoom and focus (2-2.5)
- Viewport
- Camera aperture (50-60deg)
- Lighting
- Varying the camera aperture.
- JPEG
40Performance
- Ground-truth - Homography
- Evaluation criterion
- Precision-recall curve
Recall is the number of correctly matched regions
with respect to the number of corresponding
regions between two images of the same scene
The number of false matches relative to the total
number of matches is represented by 1-precision.
41Dataset
42Matching criterion
- Thresholding, nearest neighbor, distance ratio
43Dimensionality
44Different scene type (viewpoint)
Structured scene
Textured scene
45Scaling and Rotation
Hessian Affine regions
Harris-Laplacian regions
2-2.5 scaling, 30-45 rotation
46Image blur
Textured scene Harris Affine
Structured scene Hessian Affine
47Conclusion
- In most of the tests GLOH obtains the best
results, closely followed by SIFT descriptor. - Shape context also shows a high performance. But
not reliable for texture scenes. - The best low dimensional descriptors are gradient
moments and steerable filters. - Hessian regions are slightly better than Harris
regions