Pattern Recognition and Machine Learning: Kernel Methods

About This Presentation

Title:

Description:

Number of Views:475

Avg rating:3.0/5.0

Slides: 16

Provided by: publi5

Learn more at: https://www.public.asu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Pattern Recognition and Machine Learning: Kernel Methods

1
Pattern Recognition and Machine Learning
Kernel Methods

2
Overview

Many linear parametric models can be recast into
an equivalent dual representation in which the
predictions are based on linear combinations of a
kernel function evaluated at the training data
points
Kernel k(x,x) ?(x)T ?(x)
?(x) is a fixed nonlinear feature space mapping
Kernel is symmetric of its arguments
i.e. k(x,x) k(x,x)

3
Overview

4
Dual Representations
5
Constructing Kernels

6
Constructing Kernels

Approach 2 Construct kernel functions directly
such that it corresponds to a scalar product in
some feature space

7
Constructing Kernels

A simpler way to test without having to construct
?(x)
Use the necessary and sufficient condition that
for a function k(x,x) to be a valid kernel, the
Gram matrix K, whose elements are given by
k(xn,xm), should be positive semidefinite for all
possible choices of the set xn

8
Constructing Kernels

9
Radial Basis Functions

Historically introduced for the purpose of exact
function interpolation
The values of the coefficients are found by least
squares
Since there are as many constraints as
coefficients, results in a function that fits
every target value exactly

10
Radial Basis Functions

Imagine the noise on the input variable x,
described by a variable ? having a distribution
?(?), the sum of squares error function is
Basis function centred on every data point
Nadaraya-Watson model

11
Nadaraya-Watson model

Imagine the noise on the input variable x,
described by a variable ? having a distribution
?(?), the sum of squares error function is
Basis function centred on every data point
Nadaraya-Watson model

12
Nadaraya-Watson model

Imagine the noise on the input variable x,
described by a variable ? having a distribution
?(?), the sum of squares error function is
Basis function centred on every data point
Nadaraya-Watson model

13
Nadaraya-Watson model

Can also be derived from kernel density
estimation
where f(x,t) is the component density function
and there is one such component centred on each
data point
We now find an expression for the regression
function y(x), corresponding to the conditional
average of the target variable conditioned on the
input variable

14
Nadaraya-Watson model
15
Nadaraya-Watson model

This model is also known as kernel regression
For a localized kernel function, it has the
property of giving more weight to data points
that a close to x

Write a Comment

User Comments (0)