Pattern Recognition and Machine Learning: Kernel Methods - PowerPoint PPT Presentation

About This Presentation
Title:

Pattern Recognition and Machine Learning: Kernel Methods

Description:

... condition that for a function k(x,x') to be a valid kernel, the Gram matrix K, ... by k(xn,xm), should be positive semidefinite for all possible choices of ... – PowerPoint PPT presentation

Number of Views:473
Avg rating:3.0/5.0
Slides: 16
Provided by: publi5
Category:

less

Transcript and Presenter's Notes

Title: Pattern Recognition and Machine Learning: Kernel Methods


1
Pattern Recognition and Machine Learning
Kernel Methods

2
Overview
  • Many linear parametric models can be recast into
    an equivalent dual representation in which the
    predictions are based on linear combinations of a
    kernel function evaluated at the training data
    points
  • Kernel k(x,x) ?(x)T ?(x)
  • ?(x) is a fixed nonlinear feature space mapping
  • Kernel is symmetric of its arguments
  • i.e. k(x,x) k(x,x)

3
Overview
  • Kernel trick or kernel substitution is the
    general idea that, if we have an algorithm
    formulated in such a way that the input vector x
    enters only in the form of scalar products, then
    we can replace the scalar product with some other
    choice of kernel
  • Stationary kernels invariant to translations in
    input space
  • k(x,x) k(x-x)
  • Homogeneous kernels (RBF) depend only on the
    magnitude of the distance
  • k(x,x) k(x-x)

4
Dual Representations
5
Constructing Kernels
  • Approach 1 Choose a feature space mapping and
    then use this to find the kernel

6
Constructing Kernels
  • Approach 2 Construct kernel functions directly
    such that it corresponds to a scalar product in
    some feature space

7
Constructing Kernels
  • A simpler way to test without having to construct
    ?(x)
  • Use the necessary and sufficient condition that
    for a function k(x,x) to be a valid kernel, the
    Gram matrix K, whose elements are given by
    k(xn,xm), should be positive semidefinite for all
    possible choices of the set xn

8
Constructing Kernels
  • Another powerful technique is to build them out
    of simpler kernels

9
Radial Basis Functions
  • Historically introduced for the purpose of exact
    function interpolation
  • The values of the coefficients are found by least
    squares
  • Since there are as many constraints as
    coefficients, results in a function that fits
    every target value exactly

10
Radial Basis Functions
  • Imagine the noise on the input variable x,
    described by a variable ? having a distribution
    ?(?), the sum of squares error function is
  • Basis function centred on every data point
  • Nadaraya-Watson model

11
Nadaraya-Watson model
  • Imagine the noise on the input variable x,
    described by a variable ? having a distribution
    ?(?), the sum of squares error function is
  • Basis function centred on every data point
  • Nadaraya-Watson model

12
Nadaraya-Watson model
  • Imagine the noise on the input variable x,
    described by a variable ? having a distribution
    ?(?), the sum of squares error function is
  • Basis function centred on every data point
  • Nadaraya-Watson model

13
Nadaraya-Watson model
  • Can also be derived from kernel density
    estimation
  • where f(x,t) is the component density function
    and there is one such component centred on each
    data point
  • We now find an expression for the regression
    function y(x), corresponding to the conditional
    average of the target variable conditioned on the
    input variable

14
Nadaraya-Watson model
15
Nadaraya-Watson model
  • This model is also known as kernel regression
  • For a localized kernel function, it has the
    property of giving more weight to data points
    that a close to x
Write a Comment
User Comments (0)
About PowerShow.com