Title: Kernel methods overview
1Kernel methods- overview
- Kernel smoothers
- Local regression
- Kernel density estimation
- Radial basis functions
2Introduction
- Kernel methods are regression techniques used to
estimate a response function - from noisy data
- Properties
- Different models are fitted at each query point,
and only those observations close to that point
are used to fit the model - The resulting function is smooth
- The models require only a minimum of training
3A simple one-dimensional kernel smoother
4Kernel methods, splines and ordinary least
squares regression (OLS)
-
- OLS A single model is fitted to all data
- Splines Different models are fitted to different
subintervals (cuboids) of the input domain - Kernel methods Different models are fitted at
each query point
5Kernel-weighted averages and moving averages
- The Nadaraya-Watson kernel-weighted average
- where ? indicates the window size and the
function D shows how the weights change with
distance within this window - The estimated function is smooth!
- K-nearest neighbours
- The estimated function is piecewise constant!
6Examples of one-dimesional kernel smoothers
- Epanechnikov kernel
- Tri-cube kernel
7Issues in kernel smoothing
- The smoothing parameter ? has to be defined
- When there are ties at xi Compute an average y
value and introduce weights representing the
number of points - Boundary issues
- Varying density of observations
- bias is constant
- the variance is inversely proportional to the
density
8Boundary effects of one-dimensionalkernel
smoothers
- Locally-weighted averages can be badly biased on
the boundaries if the response function has a
significant slope ?apply local linear regression
9Local linear regression
- Find the intercept and slope parameters solving
- The solution is a linear combination of yi
10Kernel smoothing vs local linear regression
- Kernel smoothing
- Solve the minimization problem
- Local linear regression
- Solve the minimization problem
11Properties of local linear regression
- Automatically modifies the kernel weights to
correct for bias - Bias depends only on the terms of order higher
than one in the expansion of f.
12Local polynomial regression
- Fitting polynomials instead of straight lines
- Behavior of estimated response function
13Polynomial vs local linear regression
- Advantages
- Reduces the Trimming of hills and filling of
valleys - Disadvantages
- Higher variance (tails are more wiggly)
14Selecting the width of the kernel
- Bias-Variance tradeoff
- Selecting narrow window leads to high variance
and low bias whilst selecting wide window leads
to high bias and low variance.
15Selecting the width of the kernel
- Automatic selection ( cross-validation)
- Fixing the degrees of freedom
16Local regression in RP
- The one-dimensional approach is easily extended
to p dimensions by - Using the Euclidian norm as a measure of distance
in the kernel. - Modifying the polynomial
17Local regression in RP
- The curse of dimensionality
- The fraction of points close to the boundary of
the input domain increases with its dimension - Observed data do not cover the whole input domain
18Structured local regression models
- Structured kernels (standardize each variable)
- Note A is positive semidefinite
19Structured local regression models
- Structured regression functions
- ANOVA decompositions (e.g., additive models)
- Backfitting algorithms can be used
- Varying coefficient models (partition X)
- INSERT FORMULA 6.17
20Structured local regression models
- Varying coefficient
- models (example)
-
21Local methods
- Assumption model is locally linear -gtmaximize
the log-likelihood locally at x0 - Autoregressive time series. ytß0ß1yt-1
ßkyt-ket -gt - ytztT ßet. Fit by local least-squares with
kernel K(z0,zt)
22Kernel density estimation
- Straightforward estimates of the density are
bumpy - Instead, Parzens smooth estimate is preferred
- Normally, Gaussian kernels are used
23Radial basis functions and kernels
-
- Using the idea of basis expansion, we treat
kernel functions as basis functions - where ?j prototype parameter, ?j-scale parameter
24Radial basis functions and kernels
- Choosing the parameters
-
- Estimate ?j, ?j separately from ßj (often by
using the distribution of X alone) and solve
least-squares.