Machine Learning - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Machine Learning

Description:

Machine Learning & Category Representation. Non-linear classification through the ' ... further improved by combining Chi-square kernel over different types ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 18
Provided by: benk4
Category:
Tags: learning | machine

less

Transcript and Presenter's Notes

Title: Machine Learning


1
Machine Learning Category Representation
  • Non-linear classification through the kernel
    trick

2
Linear discriminant
w
3
Motivation of generalized linear functions
  • Linear classifiers are nice because of their
    simplicity
  • Computationally efficient
  • Some methods, as SVM and logistic discriminant,
    do not suffer from an objective function with
    multiple local optima.
  • The fact that only linear functions can be
    implemented limits their use to discriminating
    (approximately) linearly separable classes.
  • Easy way to extend their applicability extend or
    replace the original set of features with several
    non-linear functions of the original ones.

4
The main idea
  • Data vector x replaced by vector ? of m function
    responses.

5
Example of alternative feature space
  • Suppose that one class encapsulates the other
    class
  • A linear classifier does not work very well ...
  • Lets map our features to a new space spanned by
  • A circle in the original space is now described
    by a plane in the new space

6
The kernel function of our example
  • Lets compute the inner-product in the new
    feature space we defined
  • Thus, we simply square the standard inner
    product, and do not need to explicitly map the
    points to the new feature space !
  • This becomes useful if the new feature space has
    very many features.
  • The kernel function is a shortcut to compute
    inner products in feature space, without
    explicitly mapping the data to that space.

7
Example non-linear support vector machine
  • Example where classes are separable, but not
    linearly.
  • Gaussian kernel used

Decision surface
8
Nonlinear support vector machines
9
Classification
10
The kernel function
  • A kernel function k(x,y) computes inner-product
    between x and y after mapping x and y to some
    alternative representation.
  • Starting with a new representation in some
    feature space we can find the corresponding
    kernel function as the program
  • Map x and y to the new feature space.
  • Compute the inner product between the mapped
    points.
  • A function is positive definite if evaluating
    the function k(x,y) on all N2 pairs of N
    arbitrary points, always yields an NxN kernel
    matrix K that is positive definite
  • If the kernel is computing inner products then
    kernel matrix K is positive definite
  • Mercers Theorem if k is positive definite, then
    there exists a feature space in which k computes
    the inner-products.

11
The kernel trick has many applications
  • We used kernels to compute inner-products to find
    non-linear SVMs.
  • The main advantage of the kernel trick is that
    we can use feature spaces with a vast number of
    derived non-linear features without being
    confronted with the computational burden of
    explicitly mapping the data points to this space.
  • Some kernels even have an associated feature
    space with infinitely many dimensions, such the
    Gaussian kernel
  • The same trick can also be used for many other
    inner-product based methods for classification,
    regression, clustering, and dimension reduction.

12
Example Kernel Logistic Discriminant
  • Recall logistic disrciminant
  • Weight vector may be written as linear
    combination of data points
  • The derivative of the log-likelihood is now given
    by

As before, the empirical expectation should equal
the model expectations.
13
Image categorization with Bags-of-Words
  • A range of kernel functions are used in
    categorization tasks
  • If an interesting image similarity measure can be
    show to be a positive definite kernel, then it
    can be plugged into SVMs or logistic discriminant
  • Kernels kan be combined by
  • Summation
  • Products
  • Exponentiation
  • Many other ways

14
Chi-square kernel
  • One of the most popular and effective kernels for
    image categorization
  • Similarity of bag-of-word histograms x and y
  • See Zhang, Marszalek, Lazebnik Schmid, Int.
    Journal of Computer Vision 2007
  • Performance further improved by combining
    Chi-square kernel over different types of
    features
  • Descriptors SIFT, HOG, Colour,
  • Different spatial decompositions of image
  • Different ways to detect patches interest
    points, dense extraction

K(x,y) exp-? ?² (x,y) ?² (x,y) ?i
(xi-yi) ² / xiyi
15
Pyramid match kernel (Grauman Darrell 2005)
optimal partial matching between sets of features
number of new matches at level i
difficulty of a match at level i
Slide credit Kristen Grauman
16
Spatial Pyramid Kernel
  • Histograms of visual words in each cell
  • Lazebnik, Schmid Ponce, CVPR 2006

17
Spatial Pyramid Kernel
  • Histograms of gradient orientations
  • Bosch, Zisserman, Munoz, Int. Conf. on Image and
    Video Retrieval (2007)
Write a Comment
User Comments (0)
About PowerShow.com