Machine Learning

About This Presentation

Title:

Machine Learning

Description:

Machine Learning & Category Representation. Non-linear classification through the ' ... further improved by combining Chi-square kernel over different types ... – PowerPoint PPT presentation

Number of Views:22

Avg rating:3.0/5.0

Slides: 18

Provided by: benk4

Category:

more less

Transcript and Presenter's Notes

Title: Machine Learning

1
Machine Learning Category Representation

Non-linear classification through the kernel
trick

2
Linear discriminant
w
3
Motivation of generalized linear functions

Linear classifiers are nice because of their
simplicity
Computationally efficient
Some methods, as SVM and logistic discriminant,
do not suffer from an objective function with
multiple local optima.
The fact that only linear functions can be
implemented limits their use to discriminating
(approximately) linearly separable classes.
Easy way to extend their applicability extend or
replace the original set of features with several
non-linear functions of the original ones.

4
The main idea

Data vector x replaced by vector ? of m function
responses.

5
Example of alternative feature space

Suppose that one class encapsulates the other
class
A linear classifier does not work very well ...
Lets map our features to a new space spanned by
A circle in the original space is now described
by a plane in the new space

6
The kernel function of our example

Lets compute the inner-product in the new
feature space we defined
Thus, we simply square the standard inner
product, and do not need to explicitly map the
points to the new feature space !
This becomes useful if the new feature space has
very many features.
The kernel function is a shortcut to compute
inner products in feature space, without
explicitly mapping the data to that space.

7
Example non-linear support vector machine

Example where classes are separable, but not
linearly.
Gaussian kernel used

Decision surface
8
Nonlinear support vector machines
9
Classification
10
The kernel function

A kernel function k(x,y) computes inner-product
between x and y after mapping x and y to some
alternative representation.
Starting with a new representation in some
feature space we can find the corresponding
kernel function as the program
Map x and y to the new feature space.
Compute the inner product between the mapped
points.
A function is positive definite if evaluating
the function k(x,y) on all N2 pairs of N
arbitrary points, always yields an NxN kernel
matrix K that is positive definite
If the kernel is computing inner products then
kernel matrix K is positive definite
Mercers Theorem if k is positive definite, then
there exists a feature space in which k computes
the inner-products.

11
The kernel trick has many applications

We used kernels to compute inner-products to find
non-linear SVMs.
The main advantage of the kernel trick is that
we can use feature spaces with a vast number of
derived non-linear features without being
confronted with the computational burden of
explicitly mapping the data points to this space.
Some kernels even have an associated feature
space with infinitely many dimensions, such the
Gaussian kernel
The same trick can also be used for many other
inner-product based methods for classification,
regression, clustering, and dimension reduction.

12
Example Kernel Logistic Discriminant

Recall logistic disrciminant
Weight vector may be written as linear
combination of data points
The derivative of the log-likelihood is now given
by

As before, the empirical expectation should equal
the model expectations.
13
Image categorization with Bags-of-Words

A range of kernel functions are used in
categorization tasks
If an interesting image similarity measure can be
show to be a positive definite kernel, then it
can be plugged into SVMs or logistic discriminant
Kernels kan be combined by
Summation
Products
Exponentiation
Many other ways

14
Chi-square kernel

One of the most popular and effective kernels for
image categorization
Similarity of bag-of-word histograms x and y
See Zhang, Marszalek, Lazebnik Schmid, Int.
Journal of Computer Vision 2007
Performance further improved by combining
Chi-square kernel over different types of
features
Descriptors SIFT, HOG, Colour,
Different spatial decompositions of image
Different ways to detect patches interest
points, dense extraction

K(x,y) exp-? ?² (x,y) ?² (x,y) ?i
(xi-yi) ² / xiyi
15
Pyramid match kernel (Grauman Darrell 2005)
optimal partial matching between sets of features
number of new matches at level i
difficulty of a match at level i
Slide credit Kristen Grauman
16
Spatial Pyramid Kernel