Title: Support Vector Machines (SVM): A Tool for Machine Learning
1Support Vector Machines (SVM) A Tool for Machine
Learning
- Yixin Chen
- Ph.D Candidate, CSE
- 1/10/2002
2Presentation Outline
- Introduction
- Linear Learning Machines
- Support Vector Machines (SVM)
- Examples
- Conclusions
3Introduction
- Building machines capable of learning from
experiences. - Experiences are usually specified by finite
amount of training data. - The goal is to achieve high generalization
performance via learning from the training set. - The construction of a good learning machine is a
compromise between the accuracy attained on a
particular training set and the capacity of the
machine. - SVMs have large learning capacity and can have
excellent generalization performance.
4Linear Learning Machines
- Binary classification uses a linear function g(x)
wtxw0. - x is the feature vector, w is the weight vector
and w0 the bias or threshold weight. - A two-category classifier implements the decision
rule Decide class 1 if g(x)gt0 and class -1 if
g(x)lt0.
5A Simple Linear Classifier
6Some Properties of Linear Learning Machines
- Decision surface is a hyperplane.
- The feature space is divided into two half-spaces.
7Several Questions
- Does there exist a hyperplane which separates the
training set? - If yes, how to compute it?
- Is it unique?
- If not unique, can we and how can we find an
optimal one? - What can we do if there doesnt exist one?
8Facts
- If the training set is linearly separable, then
there exist infinitely many separating
hyperplanes for the given training set. - If the training set is linearly inseparable then
there does not exist any separating hyperplane
for the given training set.
9Support Vector Machines
10Support Vector Machines
H1 wtx-w01 H wtx-w00 H2 wtx-w0-1
11Support Vector Machines
- Maximize the margin ? Minimize w/2
12Support Vector Machines
- Quadratic Program (Maximal Margin)
- minw,w0 w2/2,
- s.t. wtxiw01 for yi1, and wtxi?w0-1 for
yi-1. - (or equivalently yi(wtxi-w0) 1)
- Dual QP (Maximal Margin)
- min? 0.5?i1,,m?j1,,myiyj?i?jxitxj -
?i1,,m?i - s.t. ?i1,,myi?i0, ?i?0, i1,,m
- Support Vectors
- w is a linear combination of support vectors.
13Support Vector Machines
14Support Vector Machines
- Maximize Margin and Minimize Error (Soft Margin)
- minw,w0,z w2/2C?i1,mzi,
- s. t. yi(wtxi-w0)zi 1,
- zi 0, i1,,m.
- (zi is slack or error variable)
- Dual QP (Soft Margin)
- min? 0.5?i1,,m?j1,,myiyj?i?jxitxj -
?i1,,m?i - s.t. ?i1,,myi?i0
- C??i?0, i1,,m
15Support Vector Machines
- Nonlinear Mappings via Kernels
- Idea Map original features into higher
dimensional feature space x??(x). Design
classifier in the new feature space. The
classifier is nonlinear in the original feature
space but linear in the new feature space. (With
an appropriate nonlinear mapping to a
sufficiently high dimension, data from two
categories can always be separated by a
hyperplane.)
16Support Vector Machines
- Maximal Margin
- min? 0.5?i1,,m?j1,,myiyj?i?j?(xi)t?(xj) -
?i1,,m?i - s.t. ?i1,,myi?i0, ?i?0, i1,,m
- Soft Margin
- min? 0.5?i1,,m?j1,,myiyj?i?j?(xi)t?(xj) -
?i1,,m?i - s.t. ?i1,,myi?i0, C??i?0, i1,,m
17Support Vector Machines
- Role of Kernels
- Simplify the computation of inner product in the
new feature space - K(x,y) ?(x)t?(y).
- Some Popular Kernels
- Polynomial K(x,y)(xty1)p
- Gaussian K(x,y)e-x-y2/2?2
- Sigmoid K(x,y)tanh(?xty-?)
- Maximal Margin and Soft Margin
18Support Vector Machines
- Maximal Margin
- min? 0.5?i1,,m?j1,,myiyj?i?jK(xi,xj) -
?i1,,m?i - s.t. ?i1,,myi?i0, ?i?0, i1,,m
- Soft Margin
- min? 0.5?i1,,m?j1,,myiyj?i?jK(xi,xj) -
?i1,,m?i - s.t. ?i1,,myi?i0, C??i?0, i1,,m
19Examples
20Checker-Board Problem
169 training samples, Gauss Kernel, Soft Margin,
C1000
21Checker-Board Problem
169 training samples, Gauss Kernel, Soft Margin,
C1000
22Examples
23Two-Spiral Problem
154 training samples, Gauss Kernel, Soft Margin,
C1000
24Two-Spiral Problem
154 training samples, Gauss Kernel, Soft Margin,
C1000
25Conclusions
- Advantages
- Always finds a global minimum.
- Simple and clear geometric interpretation.
- Limitations
- Choice of Kernel.
- Training a multi-class SVM in one step.
26References
- N. Cristianini and J.Shawe-Taylor, An
Intorduction to Support Vector Machines and Other
Kernel-Based Learning Methods, Cambridge
University Press, 2000. - R. O. Duda, P. E. Hart, and D. G. Stork, Pattern
Classification, John Wiley Sons, INC., 2001. - C. J.C. Burges, A Tutorial on Support Vector
Machines for Pattern Recognition, Data Mining and
Knowledge Discovery, c, 121-167, 1998. - K. P. Bennett and C. Campbell, Support Vector
Machines Hype or Hallelujah?, SIGKDD
Explorations, 2, 2, 1-13, 2000. - SVMLight, http//svmlight.joachims.org/
27(No Transcript)