Object Recognition from Local ScaleInvariant Features

About This Presentation

Title:

Object Recognition from Local ScaleInvariant Features

Description:

Object Recognition using SIFT KEYS. Results & Further research. Literature list & The End ... A SIFT KEY is an image feature vector, that is fully invariant to image ... – PowerPoint PPT presentation

Number of Views:241

Avg rating:3.0/5.0

Slides: 66

Provided by: dsf4

Category:

more less

Transcript and Presenter's Notes

Title: Object Recognition from Local ScaleInvariant Features

1
Object Recognition from Local Scale-Invariant
Features

a piece of scientific work by David G. Lowe
University of
Britisch Columbia Vancouver, B.C., Canada

2
Overview

Introduction

Introduction

SIFT KEYS

Scale-Space

SIFT - Scale Invariant Feature Transform

Object Recognition using SIFT KEYS

Results Further research

Literature list The End

3
Introduction

Definition of object recognition
The visual perception of familiar objects
Given an image containing unknown objects, the
problem of object recognition is to find a match
between these objects and a set of known objects,
that are available in an appropriate
representation. The problem includes the question
for the objects poses in the image.
Representation of objects can exist as
3D-models - model - based recognition
images - appearence - based recognition

4
Introduction

Applications of object recognition for HMI
Content based image retrieval (web search)
Interactions with robots
Vision substitution for blind people
Personal assistance systems

5
Introduction

Template matching - an early approach
Given a template matrix T of the object we are
looking for, we can use the following approach to
detect the presence of the object in a search
image I
We move T pixel by pixel over I.
We create a new image matrix R, in which every
pixel (u,v) is the result of the
cross-correlation between the matrix T and the
matrix I, where T is centered at pixel (u,v) of
I.
Maxima in R indicate the presence of the searched
object.

6
Introduction

Template matching - an early approach

7
Introduction

Template Matching - problems
Templates are images of the whole object gt no
possibility to deal with occlusion / background
clutter
Only invariant to translation
Computational expensive
You must know the objects you are searching for,
otherwise you have to do template matching for
every known object in your database gt
computation gets really expensive!

8
Overview

Introduction

SIFT KEYS

SIFT KEYS

Scale-Space

SIFT - Scale Invariant Feature Transform

Object Recognition using SIFT KEYS

Results Further research

Literature list The End

9
SIFT KEYS

What is a SIFT KEY?
A SIFT KEY is an image feature vector, that is
fully invariant to image translation, rotation
and scaling. In addition it is partially
invariant to change in illumination and camera
viewpoint.

Image regions from which SIFT KEYS are created

10
SIFT KEYS

Properties of SIFT KEYS
Locality features are local, so robust to
occlusion and clutter.
Distinctiveness individual features can be
matched to a large database of features.
Quantity many features can be generated for even
small objects.
Efficiency generation close to real-time
performance.
Extensibility can easily be extended to wide
range of differing feature types, with each
adding robustness.

SIFT KEYS provide a good basis for object
recognition

11
Overview

Introduction

SIFT KEYS

Scale-Space

Scale-Space

SIFT - Scale Invariant Feature Transform

Object Recognition using SIFT KEYS

Results Further research

Literature list The End

12
Scale - Space

Motivation
Objects are perceived as meaningful entities at a
certain range of scales by humans.

bee - scales

atomic scale

A discussion of bees and ponds doesn t make
sense at other scales

cosmic scale

pond - scales

Images taken from A question of scale Quarks
to Quasars the whole image
series is located at http//www.wordwizz.com/pwr
sof10.htm

13
Scale - Space

Motivation
The fact, that objects have different appearance
depending on the observation scale, has important
implications if one tries to describe them.
In order to describe objects, information must be
gathered about them. Humans and computers do
this by analysis of signals resulting from real
world measurements.
Analysing signals is done by using certain
operators on them. The relationship between these
operators size (resolution) and the size of
actual structures in data has a great influence
on the information that can be derived.

If the size type of the operator isnt
appropriatly chosen, then it can be hard to
interpret the information derived from signal
analysis. Unfortunatly there is no obvious way
to determine the scales, at which the desired
structures are hidden in the signal.

14
Scale - Space

Solution
We represent the signal at all sensible scales!
Types of multi - scale representations are
wavelets
image pyramids
scale-space representation
The last one is a framework that has been
developed by computer vision community to
represent image data and its multi-scale nature
at the earliest stages in the chain of visual
processing performed by vision systems.
Scale space theory states, that the natural
operations that should be performed in visual
front-end are convolutions with gaussian kernels
and their derivatives.

15
Scale - Space

What is scale-space representation?
Scale-space representation of a signal is an
embedding of the original signal into an one
parameter family of derived signals constructed
by convolution of an one - parameter family of
Gaussian kernels of increasing width

Gaussian Kernel Family

Scale-Parameter

Signal

Scale - space representation is created by
convolution of the original signal and the
members of the GKF

The scale - Paramter indexes the members of the
GKF and the resulting derived signals

Scale-space representation

16
Scale - Space

Example of scale-space representation

Increasing distance gt

Increasing level of blur gt

Coarser features gt

14.5

17
Overview

Introduction

SIFT KEYS

Scale-Space

SIFT - Scale Invariant Feature Transform

SIFT - Scale Invariant Feature Transform

Object Recognition using SIFT KEYS

Results Further research

Literature list The End

18
SIFT - Scale Invariant Feature Transform

Overview

It is searched over all scales and image
locations to identify potential interest points,
that are invariant to scale and orientation.

1) Detection of scale - space
extrema

At each candidate location, a detailed model is
fit to determine location and scale. Keypoints
are selected based on measures of their stability.

2) Keypoint localization

One or more orientations are assigned to the
keypoint based on the local image gradient
directions.

3) Orientation Assignment

Based on the selected scale the local image
gradients are transformed into a representation
that allows for signicant levels of shape
distortion illumination change.

4) Keypoint descriptor

SIFT KEYS

19
SIFT - Scale Invariant Feature Transform

Detection of scale-space extrema
In order to detect locations that are invariant
to scale change of the image, an image pyramid is
computed containing the scale-space
representation L of the Image I.
Afterwards the DoG (difference of Gaussian)
pyramid is computed from the scale-space pyramid.
This is done for two reasons
DoG is an approximation of LoG (Laplacian of
Gaussian). Mikolajczyk showed, that the maxima
and minima of the LoG function produce the most
stable image features.
DoG pyramid very efficient to compute.

DoG Function

can be computed by subtracting two nearby scales
separated by a constant

20
SIFT - Scale Invariant Feature Transform

Detection of scale-space extrema

21
SIFT - Scale Invariant Feature Transform

Detection of scale-space extrema
When building the image pyramid sampling
frequency in scale and in the image domain must
be properly determined.
Parameter regulates the number of
scales per octave
s regulates the amount of initial smoothing
before creating the first octave and therefore
the resolution in image domain.

22
SIFT - Scale Invariant Feature Transform

Detection of scale-space extrema
The detection of the extrema in the DoG pyramid
is achieved by comparing each sample point with
all its neighbours in the current scale and the
scale above and below. It is selected if it is
smaller or greater than these.

23
SIFT - Scale Invariant Feature Transform

Accurate key point localization
The exact location of the maxima is determined by
fitting a 3D-quadratic function to the local
sample points, that were detected in step 1. This
is done using Taylor Expansion of the DoG
scale-space function
The new extrema is obtained by taking the first
derivative of the quadratic function with respect
to x and setting it to zero.

D(x,y,s) and its derivatives are evaluated at the
sample point, x(x,y,s)T is the offset

D (x) is the quadratic function

24
SIFT - Scale Invariant Feature Transform

Accurate key point localization
If there is a deviation between the sampled
keypoint and the interpolated key point larger
than 0.5 in any dimension, then the sample point
is changed and the computation is repeated at
this point. Otherwise the deviation is added to
the location of its sample point to get the
interpolated estimate of the extrema.
After the extrema have been accurate localized,
the following 2 operations are performed at this
stage
Unstable Extrema, that have low contrast are
discarded.
The DoG function finds edges. Key points that
belong to edges arent well localized along the
edge. This makes them very unstable to small
amounts of noise and therefore this type of key
points will be discarded too.

25
SIFT - Scale Invariant Feature Transform

Accurate key point localization

26
SIFT - Scale Invariant Feature Transform

Orientation assignment
Select the image of the Gaussian pyramid L thats
closest to the selected keypoint.
For each image sample in that scale precompute
the gradient magnitude and orientation.
Build orientation histogram with 36 bins from the
region around the keypoint in the following
manner Every gradient that is added to the
corresponding bin is weighted by its magnitude
and a Gaussian circular window with s1.5 x the s
of the corresponding scale.

27
SIFT - Scale Invariant Feature Transform

Orientation assignment
The keypoints orientation is chosen to be the
peak in the histogram. If there are other local
peaks gt 80 of the highest peak gt create
additional keypoints.
Like in accurate keypoint localization a parabola
is fit to the histogram peaks to get a more
precise estimate for the dominant gradient
direction.

28
SIFT - Scale Invariant Feature Transform

Orientation assignment

At this stage every keypoint has been assigned an
image location, scale and orientation. This
parameters impose a repeatable local 2D
coordinate system in which the local image
regions are described. The generated
descriptors are invariant to this transformations.

29
SIFT - Scale Invariant Feature Transform

Generation of the image descriptor
We want to compute a descriptor that is invariant
to the remaining variations (illumination,
viewpoint)!
In primary visual cortex one can find complex
neurons that respond to a gradient at a
particular direction and spatial frequency, but
the location of the gradient on the retina is
allowed to shift over a small receptive field.
Edelman et. Al hypothesis The function of these
neurons allows to match recognize 3D objects
from a range of viewpoints. Experiments showed,
that matching gradients while allowing for shifts
in their position indeed improves classification
under 3D rotation.
This is the key idea, that is used in descriptor
generation

30
SIFT - Scale Invariant Feature Transform

Generation of the image descriptor
Image gradient magnitudes orientations at all
levels of the pyramid have been precomputed
(orientation step).
The gradients are sampled in a small window
around every keypoint with respect to the scale
the keypoint belongs to. gt scale invariance
The gradients are rotated relative to the
keypoints orientation. gt rotation invariance
Gradient magnitudes are weighted with a gaussian
weighting function located at the center of the
window, with a variance of s windowsize / 2.
gt Avoidance
of sudden changes in the descriptor, when window
position is shifted ( now samples at the bounds
have a smaller influence).

31
SIFT - Scale Invariant Feature Transform

Generation of the image descriptor

Now angle discretised gradient orientation
histograms are builded. The value of an entry in
a histogram is calculated as the sum of all
gradient magnitudes from the corresponding
subwindow, whose orientations accord to the
direction of the entry.

32
SIFT - Scale Invariant Feature Transform

Generation of the image descriptor - affine
invariance
The histograms are invariant to positional shifts
of the gradients, as far as the gradients dont
cross the bounds of the window subregions.
To minimize the effects of crossing between
subregions and discretised angles, the assignment
of a particular gradient magnitude is done by
trilinear interpolation, so affine invariance is
improved.
The image descriptor is a vector, that contains
the values of the gradient orientation histogram
entries.
In the paper they sample a 16x16 region, that is
divided into 4x4 subregions. The gradient
orientation is discretised to angles of 45, so
each histogram has 8 entries.
gt the resulting image descriptor 128 element
vector.

33
SIFT - Scale Invariant Feature Transform

Generation of the image descriptor - illumimation
invariance
What remains is the question of illumination
invariance
Change in image contrast means multiplication of
gradients with a constant gt is canceled by
vector normalization.
Brightness change means addition of a constant
value to each pixel gt gradient computation not
affected.
Non linear illumination change can stronlgy
influence the magnitude of certain gradients, but
has almost no influence on their orientations.
Therefore D. Lowe puts a threshold of 0.2 on the
feature vector and then renormalizes the stored
values afterwards. The threshold of 0.2 was
experimentally evaluated.

34
SIFT - Scale Invariant Feature Transform

Image descriptor - sensitivity to affine change

35
SIFT - Scale Invariant Feature Transform

Image descriptor - distinctiveness

36
Overview

Introduction

SIFT KEYS

Scale-Space

SIFT - Scale invariant feature transform

Object Recognition using SIFT KEYS

Object Recognition using SIFT KEYS

Results Further research

Literature list The End

37
Object Recognition using SIFT KEYS

Overview

Keypoints of an image are matched to the database
of keypoints retrieved from training images using
a nearest neighbour algorithm.

1) SIFT KEY matching

Clusters of at least 3 matched features are
identified, that agree on an object and its pose.
These are interpreted as the occurrance of object.

2) Clustering of matched keys

Each cluster is checked by performing a detailed
geometric fit to the model. The quality of the
fit is used to accept or reject the
interpretation.

3) Fitting a geometric model

OBJECT RECOGNIZED IN IMAGE

38
Object Recognition using SIFT KEYS

Keypoint matching - quality of matching
Keypoint match is defined as the nearest
neighbour found in database. The nearest
neighbour is the keypoint, whose descriptor has
minimum euclidian distance.
However an image may contain features that wont
have any correct match in the training database
feature may result from background clutter
feature wasnt detected in training phase
The second closest neighbour is defined as the
closest feature in database, that is known to
belong to a different object.

We must find a possibility to discard this
features

39
Object Recognition using SIFT KEYS

Keypoint matching - quality of matching
The quality of a keypoint match is defined by the
ratio between NN SCN. This measure performs
well, because correct matches must have the NN
significantly closer than the SCN in order to
achieve reliable matching.
All matches with a ratio gt 0.8 are discarded

40
Object Recognition using SIFT KEYS

Keypoint matching - NN search
Finding the nearest neighbours in a database is
done by searching. If databases are large, linear
search is not applicable.
A better approach for searching in high
dimensional spaces are k-d trees. But also k-d
trees loose their advantage at dimension gt 10
Therefore an approximate algorithm from Beis
Lowe is used, called BBF (Best-Bin-First), which
returns the NN with high probability. BBF is
similar to k-d NN tree search.
BBF forces an upper limit, on how many bins are
inspected.
Standard NN search parses the tree according to
the structure, thats immanent to the tree after
it has been build. BBF parses the tree in an
order, that inspects at first the leaf nodes with
the least distance to the query point.

41
Object Recognition using SIFT KEYS

K-d trees - the datastructure for NN search
The following recursive procedure creates a k-d
tree from a set of k-dimensional points
Pp1,...pn, P ? IRk , that are bounded by a
hypercuboid H
find the dimension i, where P exhibits the
greatest variance
find the point pm ? P, whose ith entry mi is the
median in the dimension i
create a new tree element with i, mi
devide P into Piltmi and Pigtmi
repeat procedure with Piltmi and Pigtmi
This way H is devided recursively into smaller
hypercuboids. The hypercuboids represented by the
leaves of the k-d tree contain the points that
are included in their volume. Therefore they are
now called bins.

42
Object Recognition using SIFT KEYS

K-d trees - Example of a 2-d tree

43
Object Recognition using SIFT KEYS

K-d trees - Example of a 2-d tree

44
Object Recognition using SIFT KEYS

K-d trees - Example of a 2-d tree

I (7,5)

45
Object Recognition using SIFT KEYS

K-d trees - Example of a 2-d tree

46
Object Recognition using SIFT KEYS

K-d trees - Example of a 3-d tree

47
Object Recognition using SIFT KEYS

Clustering with the Hough transform
Test images may contain multiple objects, that
the system has learned (can be different ones or
the same in different poses).
The ratio between NN and SCN is a good criteria
for discarding false matches arising from
background clutter, but doesnt solve the problem
of matched keypoints, that belong to other valid
objects.
Therefore we need to identify clusters of
features with a consistent interpretation in
terms of an object and its pose.
The probablity of the interpretation represented
by such a cluster is higher, the more features
belong to this cluster.
Clustering is done with Hough transform.

48
Object Recognition using SIFT KEYS

Clustering with the Hough transform

Imagine that we trained the system with the
images of this strange creatures. The SIFT KEYS
were created and stored in the database. Like
mentioned earlier, SIFT KEYS contain the local
coordinate system, that has underlied the
creation of the image descriptor. Furthermore for
every SIFT KEY it is known to which object(s) it
belongs to.

Fish

Glub

Creep

49
Object Recognition using SIFT KEYS

Clustering with the Hough transform

Now we want to recognize the creatures in the
following test image.

Suppose the NN matching has the following
results. We have one false match from the fish!

50
Object Recognition using SIFT KEYS

Clustering with the Hough transform

We can do Hough transform using a hash, as we
know the coordinate system of SIFT KEYS in the DB
as well in the image.

Object ? nr

Every key votes for the interpretation of an
image region as a known object at a certain
location, scale and orientation (transformation
?).

51
Object Recognition using SIFT KEYS

Clustering with the Hough transform

All clusters that collected more than 3 votes
will advance to the geometric fitting step.

Object ? nr

52
Object Recognition using SIFT KEYS

Fitting a geometric model - least squares
solution
Each cluster of SIFT KEYS with more than 3
entries is subject to a verification procedure.
With least-squares solution we try to find out
the the best affine parameters, that relate the
model image in the DB to the test image.
The affine transformation of a model point (x,y)T
to an image point (u,v)T can be written as

Affine transformation accounts correctly for 3D
rotation of planar surfaces under orthographic
projection. For general 3D-objects this is not
the case.

53
Object Recognition using SIFT KEYS

Fitting a geometric model
The equation can be reformulated to
This is a linear system of the type Axb. The
least-squares solution of such a linear system
can be computed by

54
Object Recognition using SIFT KEYS

Fitting a geometric model - iterative process
Outliers can be removed by checking for agreement
between image features and the model image.
If fewer than 3 features remain after discarding
outliers, the match is rejected (The
interpretation associated with the cluster is
considered to be false).
As outliers are discarded, the least-squares
solution is resolved. This process (step 1-3) is
repeated in iterative manner.

55
Overview

Introduction

SIFT KEYS

Scale-Space

SIFT - Scale invariant feature transform

Object Recognition using SIFT KEYS

Results Further research

Results Further research

Literature list The End

56
Object Recognition using SIFT KEYS

Results
The training images are shown on the left. The
keypoints used for recognition are shown as
squares with an extra line indicating
orientation. The size of the squares indicate the
image region, that was used for the construction
of the descriptor.

57
Object Recognition using SIFT KEYS

Results

Example image, where background is strongly
cluttered. This one may be difficult to recognize
for humans too!
The viewpoint was rotated by an angle of 30
compared to the image, from which the training
samples were taken.

58
Object Recognition using SIFT KEYS

Results
The original size of the image from the first
recognition example is (600x480), the size of the
later one (640x315).
In both cases the time required for recognition
of all object is less than 0.3 s on a 2 GHz
Pentium 4 processor.
In general textured planar surfaces can reliably
detected over a rotation depth about 50 in any
direction and under almost any illumination
condition (sufficient light must be provided, no
glare).
For general 3D-objects, the range of rotation in
depth diminishes to 30 and illumination change
is more disruptive.

59
Object Recognition using SIFT KEYS

Results

60
Object Recognition using SIFT KEYS

Results

61
Object Recognition using SIFT KEYS

Further research
Systematic tests with databases, that contain
images representing multi-views/
multi-illuminations.
Extension to color descriptors (Brown Lowe
2002)
Incorporation of other feature types than
gradients e.g. texture meassurements
Learning features, that are suited to recognize
whole object categories

62
Overview

Introduction

SIFT KEYS

Scale-Space

SIFT - Scale invariant feature transform

Object Recognition using SIFT KEYS

Results Further research

Literature list The End

Literature list The End

63
Literature list

Lowe, D.G. (2004). Distinctive Image Features
from Scale - Invariant Keypoints. International
Journal of Computer Vision, 60, 2 (2004), pp.
91-110. http//www.cs.ubc.ca/lowe/papers/ijcv04-a
bs.html
Lowe, D.G. (1999). Object from local
scale-invariant features. In International
Conference on Computer Vision, Corfu, Greece, pp.
1150-1157. http//www.cs.ubc.ca/spider/lowe/papers
/iccv99-abs.html
Lindeberg, T. (1994). Scale-space theory A basic
tool for analysing structures at different
scales. Journal of Applied Structures, 21(2)
224-270. http//www.nada.kth.se/tony/abstracts/Li
n94-SI-abstract.html

64
Literature list

Beis J. and Lowe, D.G. (1997). Shape indexing
using approximate nearest-neighbour search in
high-dimensional spaces. In Conference on
Computer Vision and Pattern Recognition, Puerto
Rico, pp. 1000-1006. http//www.cs.ubc.ca/spider/l
owe/papers/cvpr97-abs.html
Sample, N., Haines M., Arnold M. and Purcell, T.
(2001). Optimizing Search Strategies in k-d
Trees. http//graphics.stan
ford.edu/tpurcell/pubs/search.pdf