An Overview of Kernel-Based Learning Methods - PowerPoint PPT Presentation

About This Presentation

Title:

An Overview of Kernel-Based Learning Methods

Description:

Title: A Comparative Study of Kernel Methods for Text Classification Author: yanliu Last modified by: yanliu Created Date: 9/23/2003 4:15:53 AM Document presentation ... – PowerPoint PPT presentation

Number of Views:110

Avg rating:3.0/5.0

Slides: 26

Provided by: yanl8

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: An Overview of Kernel-Based Learning Methods

1
An Overview of Kernel-Based Learning Methods

Yan Liu
Nov 18, 2003

2
Outline

Introduction
Theory Basis
Reproducing Kernel Hilbert space(RKHS), Mercers
theorem, Representer theorem, regularization
Kernel based learning algorithm
Supervised learning support vector
machines(SVMs), kernel fisher discriminant (KFD)
Unsupervised learning one class SVM , kernel PCA
Kernel design
Standard kernels
Making kernels from kernels
Application oriented kernels Fisher kernel

3
Introduction

Example

Idea map the problem into higher dimensional
space.
Let F be a potentially much higher dimensional
feature space. Let f X -gt F, x-gtf(x)
Learning problem now works with samples (f(x_1),
y_1), . . . , (f(x_N)), y_N) in F Y.
Key Can this mapped problem be classified in a
simple way?

4
Exploring Theory Roadmap
5
Reproducing Kernel Hilbert Space -1

Inner product space
Hilbert space
Hilbert space is a complete inner product space,
obeying the following

6
Reproducing Kernel Hilbert Space -2

Reproducing Kernel Hilbert Space (RKHS)
Gram matrix
given a kernel k(x, y), define the gram matrix to
be Kij k(xi, xj)
We say the kernel is positive definite when the
corresponding gram matrix is positive definite
Definition of RKHS

7
Reproducing Kernel Hilbert Space -3

Reproducing properties
Comment
RKHS is a bounded Hilbert space
RKHS is a smoothed Hilbert space

8
Mercers Theorem-1

Mercers Theorem
For discrete case, assume A is the Gram Matrix.
If A is positive definite, then

9
Mercers Theorem-2

Comment
Mercers theorem provides a concrete way to
construct the basis for a RKHS
Mercers condition is the only constraint for a
kernel the corresponding gram matrix must be
positive definite to be a kernel

10
Representer Theorem-1
11
Representer Theorem-2

Comment
Representer theorem is a powerful result. It
shows that although we search for the optimal
solution in an infinite-dimension feature space,
adding the regularization term reduces the
problem to finite-dimensional space (training
examples)
In reality, regularization and RKHS are
equivalent.

12
Exploring Theory Roadmap
13
Outline

Introduction
Theory Basis
Reproducing Kernel Hilbert space(RKHS), Mercers
theorem, Representer theorem, regularization
Kernel based learning algorithm
Supervised learning support vector
machines(SVMs), kernel fisher discriminant (KFD)
Unsupervised learning one class SVM , kernel PCA
Kernel design
Standard kernels
Making kernels from kernels
Application oriented kernels Fisher kernel

14
Support Vector Machines-1quick overview
15
Support Vector Machines-1quick overview
16
Support Vector Machines-3

Parameter Sparsity
Most a_i are zeros
C regularization constant
slack variables

17
Support Vector Machines-4Optimization technique

Chunking
Each step sovles the problem containing all
non-zero a_I plus some of the a_I violating KKT
conditions
Decomposition methods SVM_light
The size of the subproblem is fixed, add and
remove one sample in each iteration
Sequential minimal optimization (SMO)
Each iteration solves a quadratic problem of size
two

18
Kernel Fisher Discriminant-1Overview of LDA

Fishers discriminant (or LDA) find the linear
projection with the most discriminative direction
Maximizing the Rayleigh coefficient
where S_w is the within class variance and S_B
is between class variance.
Comparison with PCA

19
Kernel Fisher Discriminant-2

KFD solves the problem of Fishers linear
discriminant to get a nonlinear discriminant in
input space.
One can express w in terms of mapped training
patterns
The optimization problem for the KFD can be
written as

20
Kernel PCA -1

The basic idea of PCA find a set of orthogonal
directions that capture most of the variance in
the data.
However, sometimes the clusters are more
than N (N is the number of dimensions)
Kernel PCA tries to map the data into a higher
dimensional space and perform standard PCA. Using
the kernel trick, we can do all our calculations
in a lower dimension.

21
Kernel PCA -2

Covariance matrix
By definition
Then we have
Define the gram matrix
At last we have
Therefore we simply have to solve an eigenvalue
problem on the Gram matrix.

22
Outline

Introduction
Theory Basis
Reproducing Kernel Hilbert space(RKHS), Mercers
theorem, Representer theorem, regularization
Kernel based learning algorithm
Supervised learning support vector
machines(SVMs), kernel fisher discriminant (KFD)
Unsupervised learning one class SVM , kernel PCA
Kernel design
Standard kernels
Making kernels from kernels
Application oriented kernels Fisher kernel

23
Standard Kernels
24
Making kernels out of Kernels

Theorem
K(x, z) K1(x,z) K2(x,z)
K(x, z) aK1(x,z)
K(x, z) K1(x,z) K2(x, z)
K(x, z) f(x) f(z)
K(x, z) K3(F (x), F (y))
Kernel selection

25
Fisher-kernel

Jaakolla and Haussler proposed using a generative
model as a kernel in a discriminative
(non-probabilistic) kernel classifier.
Build a HMM model for each family
Compute the fisher scores for each parameter in
the HMM
Use scores as features and predict by SVM with
RBF kernel
Good performance for protein family classification

Write a Comment

User Comments (0)