Object Recognition from Local ScaleInvariant Features - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

Object Recognition from Local ScaleInvariant Features

Description:

Object Recognition using SIFT KEYS. Results & Further research. Literature list & The End ... A SIFT KEY is an image feature vector, that is fully invariant to image ... – PowerPoint PPT presentation

Number of Views:241
Avg rating:3.0/5.0
Slides: 66
Provided by: dsf4
Category:

less

Transcript and Presenter's Notes

Title: Object Recognition from Local ScaleInvariant Features


1
Object Recognition from Local Scale-Invariant
Features
  • a piece of scientific work by David G. Lowe
    University of
    Britisch Columbia Vancouver, B.C., Canada

2
Overview
  • Introduction
  • Introduction
  • SIFT KEYS
  • Scale-Space
  • SIFT - Scale Invariant Feature Transform
  • Object Recognition using SIFT KEYS
  • Results Further research
  • Literature list The End
  • 1

3
Introduction
  • Definition of object recognition
  • The visual perception of familiar objects
  • Given an image containing unknown objects, the
    problem of object recognition is to find a match
    between these objects and a set of known objects,
    that are available in an appropriate
    representation. The problem includes the question
    for the objects poses in the image.
  • Representation of objects can exist as
  • 3D-models - model - based recognition
  • images - appearence - based recognition
  • 2

4
Introduction
  • Applications of object recognition for HMI
  • Content based image retrieval (web search)
  • Interactions with robots
  • Vision substitution for blind people
  • Personal assistance systems
  • 3

5
Introduction
  • Template matching - an early approach
  • Given a template matrix T of the object we are
    looking for, we can use the following approach to
    detect the presence of the object in a search
    image I
  • We move T pixel by pixel over I.
  • We create a new image matrix R, in which every
    pixel (u,v) is the result of the
    cross-correlation between the matrix T and the
    matrix I, where T is centered at pixel (u,v) of
    I.
  • Maxima in R indicate the presence of the searched
    object.
  • 4

6
Introduction
  • Template matching - an early approach
  • T
  • I
  • 5

7
Introduction
  • Template Matching - problems
  • Templates are images of the whole object gt no
    possibility to deal with occlusion / background
    clutter
  • Only invariant to translation
  • Computational expensive
  • You must know the objects you are searching for,
    otherwise you have to do template matching for
    every known object in your database gt
    computation gets really expensive!
  • 6

8
Overview
  • Introduction
  • SIFT KEYS
  • SIFT KEYS
  • Scale-Space
  • SIFT - Scale Invariant Feature Transform
  • Object Recognition using SIFT KEYS
  • Results Further research
  • Literature list The End
  • 7

9
SIFT KEYS
  • What is a SIFT KEY?
  • A SIFT KEY is an image feature vector, that is
    fully invariant to image translation, rotation
    and scaling. In addition it is partially
    invariant to change in illumination and camera
    viewpoint.
  • 8
  • Image regions from which SIFT KEYS are created

10
SIFT KEYS
  • Properties of SIFT KEYS
  • Locality features are local, so robust to
    occlusion and clutter.
  • Distinctiveness individual features can be
    matched to a large database of features.
  • Quantity many features can be generated for even
    small objects.
  • Efficiency generation close to real-time
    performance.
  • Extensibility can easily be extended to wide
    range of differing feature types, with each
    adding robustness.
  • SIFT KEYS provide a good basis for object
    recognition
  • 9

11
Overview
  • Introduction
  • SIFT KEYS
  • Scale-Space
  • Scale-Space
  • SIFT - Scale Invariant Feature Transform
  • Object Recognition using SIFT KEYS
  • Results Further research
  • Literature list The End
  • 10

12
Scale - Space
  • Motivation
  • Objects are perceived as meaningful entities at a
    certain range of scales by humans.
  • bee - scales
  • atomic scale
  • A discussion of bees and ponds doesn t make
    sense at other scales
  • cosmic scale
  • pond - scales
  • Images taken from A question of scale Quarks
    to Quasars the whole image
    series is located at http//www.wordwizz.com/pwr
    sof10.htm
  • 11

13
Scale - Space
  • Motivation
  • The fact, that objects have different appearance
    depending on the observation scale, has important
    implications if one tries to describe them.
  • In order to describe objects, information must be
    gathered about them. Humans and computers do
    this by analysis of signals resulting from real
    world measurements.
  • Analysing signals is done by using certain
    operators on them. The relationship between these
    operators size (resolution) and the size of
    actual structures in data has a great influence
    on the information that can be derived.
  • If the size type of the operator isnt
    appropriatly chosen, then it can be hard to
    interpret the information derived from signal
    analysis. Unfortunatly there is no obvious way
    to determine the scales, at which the desired
    structures are hidden in the signal.
  • 12

14
Scale - Space
  • Solution
  • We represent the signal at all sensible scales!
    Types of multi - scale representations are
  • wavelets
  • image pyramids
  • scale-space representation
  • The last one is a framework that has been
    developed by computer vision community to
    represent image data and its multi-scale nature
    at the earliest stages in the chain of visual
    processing performed by vision systems.
  • Scale space theory states, that the natural
    operations that should be performed in visual
    front-end are convolutions with gaussian kernels
    and their derivatives.
  • 13

15
Scale - Space
  • What is scale-space representation?
  • Scale-space representation of a signal is an
    embedding of the original signal into an one
    parameter family of derived signals constructed
    by convolution of an one - parameter family of
    Gaussian kernels of increasing width
  • Gaussian Kernel Family
  • Scale-Parameter
  • Signal
  • Scale - space representation is created by
    convolution of the original signal and the
    members of the GKF
  • The scale - Paramter indexes the members of the
    GKF and the resulting derived signals
  • Scale-space representation
  • 14

16
Scale - Space
  • Example of scale-space representation
  • Increasing distance gt
  • Increasing level of blur gt
  • Coarser features gt
  • 14.5

17
Overview
  • Introduction
  • SIFT KEYS
  • Scale-Space
  • SIFT - Scale Invariant Feature Transform
  • SIFT - Scale Invariant Feature Transform
  • Object Recognition using SIFT KEYS
  • Results Further research
  • Literature list The End
  • 15

18
SIFT - Scale Invariant Feature Transform
  • Overview
  • It is searched over all scales and image
    locations to identify potential interest points,
    that are invariant to scale and orientation.
  • 1) Detection of scale - space
    extrema
  • At each candidate location, a detailed model is
    fit to determine location and scale. Keypoints
    are selected based on measures of their stability.
  • 2) Keypoint localization
  • One or more orientations are assigned to the
    keypoint based on the local image gradient
    directions.
  • 3) Orientation Assignment
  • Based on the selected scale the local image
    gradients are transformed into a representation
    that allows for signicant levels of shape
    distortion illumination change.
  • 4) Keypoint descriptor
  • SIFT KEYS
  • 16

19
SIFT - Scale Invariant Feature Transform
  • Detection of scale-space extrema
  • In order to detect locations that are invariant
    to scale change of the image, an image pyramid is
    computed containing the scale-space
    representation L of the Image I.
  • Afterwards the DoG (difference of Gaussian)
    pyramid is computed from the scale-space pyramid.
    This is done for two reasons
  • DoG is an approximation of LoG (Laplacian of
    Gaussian). Mikolajczyk showed, that the maxima
    and minima of the LoG function produce the most
    stable image features.
  • DoG pyramid very efficient to compute.
  • DoG Function
  • can be computed by subtracting two nearby scales
    separated by a constant
  • 17

20
SIFT - Scale Invariant Feature Transform
  • Detection of scale-space extrema
  • 18

21
SIFT - Scale Invariant Feature Transform
  • Detection of scale-space extrema
  • When building the image pyramid sampling
    frequency in scale and in the image domain must
    be properly determined.
  • Parameter regulates the number of
    scales per octave
  • s regulates the amount of initial smoothing
    before creating the first octave and therefore
    the resolution in image domain.
  • 19

22
SIFT - Scale Invariant Feature Transform
  • Detection of scale-space extrema
  • The detection of the extrema in the DoG pyramid
    is achieved by comparing each sample point with
    all its neighbours in the current scale and the
    scale above and below. It is selected if it is
    smaller or greater than these.
  • 20

23
SIFT - Scale Invariant Feature Transform
  • Accurate key point localization
  • The exact location of the maxima is determined by
    fitting a 3D-quadratic function to the local
    sample points, that were detected in step 1. This
    is done using Taylor Expansion of the DoG
    scale-space function
  • The new extrema is obtained by taking the first
    derivative of the quadratic function with respect
    to x and setting it to zero.
  • D(x,y,s) and its derivatives are evaluated at the
    sample point, x(x,y,s)T is the offset
  • D (x) is the quadratic function
  • 21

24
SIFT - Scale Invariant Feature Transform
  • Accurate key point localization
  • If there is a deviation between the sampled
    keypoint and the interpolated key point larger
    than 0.5 in any dimension, then the sample point
    is changed and the computation is repeated at
    this point. Otherwise the deviation is added to
    the location of its sample point to get the
    interpolated estimate of the extrema.
  • After the extrema have been accurate localized,
    the following 2 operations are performed at this
    stage
  • Unstable Extrema, that have low contrast are
    discarded.
  • The DoG function finds edges. Key points that
    belong to edges arent well localized along the
    edge. This makes them very unstable to small
    amounts of noise and therefore this type of key
    points will be discarded too.
  • 22

25
SIFT - Scale Invariant Feature Transform
  • Accurate key point localization
  • 23

26
SIFT - Scale Invariant Feature Transform
  • Orientation assignment
  • Select the image of the Gaussian pyramid L thats
    closest to the selected keypoint.
  • For each image sample in that scale precompute
    the gradient magnitude and orientation.
  • Build orientation histogram with 36 bins from the
    region around the keypoint in the following
    manner Every gradient that is added to the
    corresponding bin is weighted by its magnitude
    and a Gaussian circular window with s1.5 x the s
    of the corresponding scale.
  • 24

27
SIFT - Scale Invariant Feature Transform
  • Orientation assignment
  • The keypoints orientation is chosen to be the
    peak in the histogram. If there are other local
    peaks gt 80 of the highest peak gt create
    additional keypoints.
  • Like in accurate keypoint localization a parabola
    is fit to the histogram peaks to get a more
    precise estimate for the dominant gradient
    direction.
  • 25

28
SIFT - Scale Invariant Feature Transform
  • Orientation assignment
  • At this stage every keypoint has been assigned an
    image location, scale and orientation. This
    parameters impose a repeatable local 2D
    coordinate system in which the local image
    regions are described. The generated
    descriptors are invariant to this transformations.
  • 26

29
SIFT - Scale Invariant Feature Transform
  • Generation of the image descriptor
  • We want to compute a descriptor that is invariant
    to the remaining variations (illumination,
    viewpoint)!
  • In primary visual cortex one can find complex
    neurons that respond to a gradient at a
    particular direction and spatial frequency, but
    the location of the gradient on the retina is
    allowed to shift over a small receptive field.
  • Edelman et. Al hypothesis The function of these
    neurons allows to match recognize 3D objects
    from a range of viewpoints. Experiments showed,
    that matching gradients while allowing for shifts
    in their position indeed improves classification
    under 3D rotation.
  • This is the key idea, that is used in descriptor
    generation
  • 27

30
SIFT - Scale Invariant Feature Transform
  • Generation of the image descriptor
  • Image gradient magnitudes orientations at all
    levels of the pyramid have been precomputed
    (orientation step).
  • The gradients are sampled in a small window
    around every keypoint with respect to the scale
    the keypoint belongs to. gt scale invariance
  • The gradients are rotated relative to the
    keypoints orientation. gt rotation invariance
  • Gradient magnitudes are weighted with a gaussian
    weighting function located at the center of the
    window, with a variance of s windowsize / 2.
    gt Avoidance
    of sudden changes in the descriptor, when window
    position is shifted ( now samples at the bounds
    have a smaller influence).
  • 28

31
SIFT - Scale Invariant Feature Transform
  • Generation of the image descriptor
  • Now angle discretised gradient orientation
    histograms are builded. The value of an entry in
    a histogram is calculated as the sum of all
    gradient magnitudes from the corresponding
    subwindow, whose orientations accord to the
    direction of the entry.
  • 29

32
SIFT - Scale Invariant Feature Transform
  • Generation of the image descriptor - affine
    invariance
  • The histograms are invariant to positional shifts
    of the gradients, as far as the gradients dont
    cross the bounds of the window subregions.
  • To minimize the effects of crossing between
    subregions and discretised angles, the assignment
    of a particular gradient magnitude is done by
    trilinear interpolation, so affine invariance is
    improved.
  • The image descriptor is a vector, that contains
    the values of the gradient orientation histogram
    entries.
  • In the paper they sample a 16x16 region, that is
    divided into 4x4 subregions. The gradient
    orientation is discretised to angles of 45, so
    each histogram has 8 entries.
    gt the resulting image descriptor 128 element
    vector.
  • 30

33
SIFT - Scale Invariant Feature Transform
  • Generation of the image descriptor - illumimation
    invariance
  • What remains is the question of illumination
    invariance
  • Change in image contrast means multiplication of
    gradients with a constant gt is canceled by
    vector normalization.
  • Brightness change means addition of a constant
    value to each pixel gt gradient computation not
    affected.
  • Non linear illumination change can stronlgy
    influence the magnitude of certain gradients, but
    has almost no influence on their orientations.
    Therefore D. Lowe puts a threshold of 0.2 on the
    feature vector and then renormalizes the stored
    values afterwards. The threshold of 0.2 was
    experimentally evaluated.
  • 31

34
SIFT - Scale Invariant Feature Transform
  • Image descriptor - sensitivity to affine change
  • 32

35
SIFT - Scale Invariant Feature Transform
  • Image descriptor - distinctiveness
  • 33

36
Overview
  • Introduction
  • SIFT KEYS
  • Scale-Space
  • SIFT - Scale invariant feature transform
  • Object Recognition using SIFT KEYS
  • Object Recognition using SIFT KEYS
  • Results Further research
  • Literature list The End
  • 34

37
Object Recognition using SIFT KEYS
  • Overview
  • Keypoints of an image are matched to the database
    of keypoints retrieved from training images using
    a nearest neighbour algorithm.
  • 1) SIFT KEY matching
  • Clusters of at least 3 matched features are
    identified, that agree on an object and its pose.
    These are interpreted as the occurrance of object.
  • 2) Clustering of matched keys
  • Each cluster is checked by performing a detailed
    geometric fit to the model. The quality of the
    fit is used to accept or reject the
    interpretation.
  • 3) Fitting a geometric model
  • OBJECT RECOGNIZED IN IMAGE
  • 35

38
Object Recognition using SIFT KEYS
  • Keypoint matching - quality of matching
  • Keypoint match is defined as the nearest
    neighbour found in database. The nearest
    neighbour is the keypoint, whose descriptor has
    minimum euclidian distance.
  • However an image may contain features that wont
    have any correct match in the training database
  • feature may result from background clutter
  • feature wasnt detected in training phase
  • The second closest neighbour is defined as the
    closest feature in database, that is known to
    belong to a different object.
  • We must find a possibility to discard this
    features
  • 36

39
Object Recognition using SIFT KEYS
  • Keypoint matching - quality of matching
  • The quality of a keypoint match is defined by the
    ratio between NN SCN. This measure performs
    well, because correct matches must have the NN
    significantly closer than the SCN in order to
    achieve reliable matching.
  • All matches with a ratio gt 0.8 are discarded
  • 37

40
Object Recognition using SIFT KEYS
  • Keypoint matching - NN search
  • Finding the nearest neighbours in a database is
    done by searching. If databases are large, linear
    search is not applicable.
  • A better approach for searching in high
    dimensional spaces are k-d trees. But also k-d
    trees loose their advantage at dimension gt 10
  • Therefore an approximate algorithm from Beis
    Lowe is used, called BBF (Best-Bin-First), which
    returns the NN with high probability. BBF is
    similar to k-d NN tree search.
  • BBF forces an upper limit, on how many bins are
    inspected.
  • Standard NN search parses the tree according to
    the structure, thats immanent to the tree after
    it has been build. BBF parses the tree in an
    order, that inspects at first the leaf nodes with
    the least distance to the query point.
  • 38

41
Object Recognition using SIFT KEYS
  • K-d trees - the datastructure for NN search
  • The following recursive procedure creates a k-d
    tree from a set of k-dimensional points
    Pp1,...pn, P ? IRk , that are bounded by a
    hypercuboid H
  • find the dimension i, where P exhibits the
    greatest variance
  • find the point pm ? P, whose ith entry mi is the
    median in the dimension i
  • create a new tree element with i, mi
  • devide P into Piltmi and Pigtmi
  • repeat procedure with Piltmi and Pigtmi
  • This way H is devided recursively into smaller
    hypercuboids. The hypercuboids represented by the
    leaves of the k-d tree contain the points that
    are included in their volume. Therefore they are
    now called bins.
  • 39

42
Object Recognition using SIFT KEYS
  • K-d trees - Example of a 2-d tree
  • x2
  • 40

43
Object Recognition using SIFT KEYS
  • K-d trees - Example of a 2-d tree
  • 40

44
Object Recognition using SIFT KEYS
  • K-d trees - Example of a 2-d tree
  • I (7,5)
  • 40

45
Object Recognition using SIFT KEYS
  • K-d trees - Example of a 2-d tree
  • 40

46
Object Recognition using SIFT KEYS
  • K-d trees - Example of a 3-d tree
  • 41

47
Object Recognition using SIFT KEYS
  • Clustering with the Hough transform
  • Test images may contain multiple objects, that
    the system has learned (can be different ones or
    the same in different poses).
  • The ratio between NN and SCN is a good criteria
    for discarding false matches arising from
    background clutter, but doesnt solve the problem
    of matched keypoints, that belong to other valid
    objects.
  • Therefore we need to identify clusters of
    features with a consistent interpretation in
    terms of an object and its pose.
  • The probablity of the interpretation represented
    by such a cluster is higher, the more features
    belong to this cluster.
  • Clustering is done with Hough transform.
  • 42

48
Object Recognition using SIFT KEYS
  • Clustering with the Hough transform
  • Imagine that we trained the system with the
    images of this strange creatures. The SIFT KEYS
    were created and stored in the database. Like
    mentioned earlier, SIFT KEYS contain the local
    coordinate system, that has underlied the
    creation of the image descriptor. Furthermore for
    every SIFT KEY it is known to which object(s) it
    belongs to.
  • Fish
  • Glub
  • Creep
  • 43

49
Object Recognition using SIFT KEYS
  • Clustering with the Hough transform
  • Now we want to recognize the creatures in the
    following test image.
  • Suppose the NN matching has the following
    results. We have one false match from the fish!
  • 43

50
Object Recognition using SIFT KEYS
  • Clustering with the Hough transform
  • We can do Hough transform using a hash, as we
    know the coordinate system of SIFT KEYS in the DB
    as well in the image.
  • Object ? nr
  • Every key votes for the interpretation of an
    image region as a known object at a certain
    location, scale and orientation (transformation
    ?).
  • 43

51
Object Recognition using SIFT KEYS
  • Clustering with the Hough transform
  • All clusters that collected more than 3 votes
    will advance to the geometric fitting step.
  • Object ? nr
  • 43

52
Object Recognition using SIFT KEYS
  • Fitting a geometric model - least squares
    solution
  • Each cluster of SIFT KEYS with more than 3
    entries is subject to a verification procedure.
    With least-squares solution we try to find out
    the the best affine parameters, that relate the
    model image in the DB to the test image.
  • The affine transformation of a model point (x,y)T
    to an image point (u,v)T can be written as
  • Affine transformation accounts correctly for 3D
    rotation of planar surfaces under orthographic
    projection. For general 3D-objects this is not
    the case.
  • 44

53
Object Recognition using SIFT KEYS
  • Fitting a geometric model
  • The equation can be reformulated to
  • This is a linear system of the type Axb. The
    least-squares solution of such a linear system
    can be computed by
  • 45

54
Object Recognition using SIFT KEYS
  • Fitting a geometric model - iterative process
  • Outliers can be removed by checking for agreement
    between image features and the model image.
  • If fewer than 3 features remain after discarding
    outliers, the match is rejected (The
    interpretation associated with the cluster is
    considered to be false).
  • As outliers are discarded, the least-squares
    solution is resolved. This process (step 1-3) is
    repeated in iterative manner.
  • 46

55
Overview
  • Introduction
  • SIFT KEYS
  • Scale-Space
  • SIFT - Scale invariant feature transform
  • Object Recognition using SIFT KEYS
  • Results Further research
  • Results Further research
  • Literature list The End
  • 47

56
Object Recognition using SIFT KEYS
  • Results
  • The training images are shown on the left. The
    keypoints used for recognition are shown as
    squares with an extra line indicating
    orientation. The size of the squares indicate the
    image region, that was used for the construction
    of the descriptor.
  • 48

57
Object Recognition using SIFT KEYS
  • Results
  • Example image, where background is strongly
    cluttered. This one may be difficult to recognize
    for humans too!
  • The viewpoint was rotated by an angle of 30
    compared to the image, from which the training
    samples were taken.
  • 49

58
Object Recognition using SIFT KEYS
  • Results
  • The original size of the image from the first
    recognition example is (600x480), the size of the
    later one (640x315).
  • In both cases the time required for recognition
    of all object is less than 0.3 s on a 2 GHz
    Pentium 4 processor.
  • In general textured planar surfaces can reliably
    detected over a rotation depth about 50 in any
    direction and under almost any illumination
    condition (sufficient light must be provided, no
    glare).
  • For general 3D-objects, the range of rotation in
    depth diminishes to 30 and illumination change
    is more disruptive.
  • 50

59
Object Recognition using SIFT KEYS
  • Results
  • 51

60
Object Recognition using SIFT KEYS
  • Results
  • 52

61
Object Recognition using SIFT KEYS
  • Further research
  • Systematic tests with databases, that contain
    images representing multi-views/
    multi-illuminations.
  • Extension to color descriptors (Brown Lowe
    2002)
  • Incorporation of other feature types than
    gradients e.g. texture meassurements
  • Learning features, that are suited to recognize
    whole object categories
  • 53

62
Overview
  • Introduction
  • SIFT KEYS
  • Scale-Space
  • SIFT - Scale invariant feature transform
  • Object Recognition using SIFT KEYS
  • Results Further research
  • Literature list The End
  • Literature list The End
  • 54

63
Literature list
  • Lowe, D.G. (2004). Distinctive Image Features
    from Scale - Invariant Keypoints. International
    Journal of Computer Vision, 60, 2 (2004), pp.
    91-110. http//www.cs.ubc.ca/lowe/papers/ijcv04-a
    bs.html
  • Lowe, D.G. (1999). Object from local
    scale-invariant features. In International
    Conference on Computer Vision, Corfu, Greece, pp.
    1150-1157. http//www.cs.ubc.ca/spider/lowe/papers
    /iccv99-abs.html
  • Lindeberg, T. (1994). Scale-space theory A basic
    tool for analysing structures at different
    scales. Journal of Applied Structures, 21(2)
    224-270. http//www.nada.kth.se/tony/abstracts/Li
    n94-SI-abstract.html
  • 55

64
Literature list
  • Beis J. and Lowe, D.G. (1997). Shape indexing
    using approximate nearest-neighbour search in
    high-dimensional spaces. In Conference on
    Computer Vision and Pattern Recognition, Puerto
    Rico, pp. 1000-1006. http//www.cs.ubc.ca/spider/l
    owe/papers/cvpr97-abs.html
  • Sample, N., Haines M., Arnold M. and Purcell, T.
    (2001). Optimizing Search Strategies in k-d
    Trees. http//graphics.stan
    ford.edu/tpurcell/pubs/search.pdf
  • 56

65
The End - Thank you for paying attention
  • Salvador Dali - Tiger
  • 57
Write a Comment
User Comments (0)
About PowerShow.com