Support Vector Machines (SVM): A Tool for Machine Learning

About This Presentation

Title:

Support Vector Machines (SVM): A Tool for Machine Learning

Description:

Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002 –

Number of Views:190

Avg rating:3.0/5.0

Slides: 28

Provided by: Yixin3

Learn more at: https://john.cs.olemiss.edu

Category:

more less

Transcript and Presenter's Notes

Title: Support Vector Machines (SVM): A Tool for Machine Learning

1
Support Vector Machines (SVM) A Tool for Machine
Learning

Yixin Chen
Ph.D Candidate, CSE
1/10/2002

2
Presentation Outline

Introduction
Linear Learning Machines
Support Vector Machines (SVM)
Examples
Conclusions

3
Introduction

Building machines capable of learning from
experiences.
Experiences are usually specified by finite
amount of training data.
The goal is to achieve high generalization
performance via learning from the training set.
The construction of a good learning machine is a
compromise between the accuracy attained on a
particular training set and the capacity of the
machine.
SVMs have large learning capacity and can have
excellent generalization performance.

4
Linear Learning Machines

Binary classification uses a linear function g(x)
wtxw0.
x is the feature vector, w is the weight vector
and w0 the bias or threshold weight.
A two-category classifier implements the decision
rule Decide class 1 if g(x)gt0 and class -1 if
g(x)lt0.

5
A Simple Linear Classifier
6
Some Properties of Linear Learning Machines

Decision surface is a hyperplane.
The feature space is divided into two half-spaces.

7
Several Questions

Does there exist a hyperplane which separates the
training set?
If yes, how to compute it?
Is it unique?
If not unique, can we and how can we find an
optimal one?
What can we do if there doesnt exist one?

8
Facts

If the training set is linearly separable, then
there exist infinitely many separating
hyperplanes for the given training set.
If the training set is linearly inseparable then
there does not exist any separating hyperplane
for the given training set.

9
Support Vector Machines

Linearly Separable

10
Support Vector Machines

Margin 2/w

H1 wtx-w01 H wtx-w00 H2 wtx-w0-1
11
Support Vector Machines

Maximize the margin ? Minimize w/2

12
Support Vector Machines

Quadratic Program (Maximal Margin)
minw,w0 w2/2,
s.t. wtxiw01 for yi1, and wtxi?w0-1 for
yi-1.
(or equivalently yi(wtxi-w0) 1)
Dual QP (Maximal Margin)
min? 0.5?i1,,m?j1,,myiyj?i?jxitxj -
?i1,,m?i
s.t. ?i1,,myi?i0, ?i?0, i1,,m
Support Vectors
w is a linear combination of support vectors.

13
Support Vector Machines

Linearly Inseparable

14
Support Vector Machines

Maximize Margin and Minimize Error (Soft Margin)
minw,w0,z w2/2C?i1,mzi,
s. t. yi(wtxi-w0)zi 1,
zi 0, i1,,m.
(zi is slack or error variable)
Dual QP (Soft Margin)
min? 0.5?i1,,m?j1,,myiyj?i?jxitxj -
?i1,,m?i
s.t. ?i1,,myi?i0
C??i?0, i1,,m

15
Support Vector Machines

Nonlinear Mappings via Kernels
Idea Map original features into higher
dimensional feature space x??(x). Design
classifier in the new feature space. The
classifier is nonlinear in the original feature
space but linear in the new feature space. (With
an appropriate nonlinear mapping to a
sufficiently high dimension, data from two
categories can always be separated by a
hyperplane.)

16
Support Vector Machines

Maximal Margin
min? 0.5?i1,,m?j1,,myiyj?i?j?(xi)t?(xj) -
?i1,,m?i
s.t. ?i1,,myi?i0, ?i?0, i1,,m
Soft Margin
min? 0.5?i1,,m?j1,,myiyj?i?j?(xi)t?(xj) -
?i1,,m?i
s.t. ?i1,,myi?i0, C??i?0, i1,,m

17
Support Vector Machines

Role of Kernels
Simplify the computation of inner product in the
new feature space
K(x,y) ?(x)t?(y).
Some Popular Kernels
Polynomial K(x,y)(xty1)p
Gaussian K(x,y)e-x-y2/2?2
Sigmoid K(x,y)tanh(?xty-?)
Maximal Margin and Soft Margin

18
Support Vector Machines

Maximal Margin
min? 0.5?i1,,m?j1,,myiyj?i?jK(xi,xj) -
?i1,,m?i
s.t. ?i1,,myi?i0, ?i?0, i1,,m
Soft Margin
min? 0.5?i1,,m?j1,,myiyj?i?jK(xi,xj) -
?i1,,m?i
s.t. ?i1,,myi?i0, C??i?0, i1,,m

19
Examples

Checker-Board Problem

20
Checker-Board Problem
169 training samples, Gauss Kernel, Soft Margin,
C1000
21
Checker-Board Problem
169 training samples, Gauss Kernel, Soft Margin,
C1000
22
Examples

Two-Spiral Problem

23
Two-Spiral Problem
154 training samples, Gauss Kernel, Soft Margin,
C1000
24
Two-Spiral Problem
154 training samples, Gauss Kernel, Soft Margin,
C1000
25
Conclusions

Advantages
Always finds a global minimum.
Simple and clear geometric interpretation.
Limitations
Choice of Kernel.
Training a multi-class SVM in one step.

26
References

N. Cristianini and J.Shawe-Taylor, An
Intorduction to Support Vector Machines and Other
Kernel-Based Learning Methods, Cambridge
University Press, 2000.
R. O. Duda, P. E. Hart, and D. G. Stork, Pattern
Classification, John Wiley Sons, INC., 2001.
C. J.C. Burges, A Tutorial on Support Vector
Machines for Pattern Recognition, Data Mining and
Knowledge Discovery, c, 121-167, 1998.
K. P. Bennett and C. Campbell, Support Vector
Machines Hype or Hallelujah?, SIGKDD
Explorations, 2, 2, 1-13, 2000.
SVMLight, http//svmlight.joachims.org/

27
(No Transcript)

Write a Comment

User Comments (0)