Support Vector Machines (SVM): A Tool for Machine Learning

About This Presentation
Title:

Support Vector Machines (SVM): A Tool for Machine Learning

Description:

Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002 –

Number of Views:190
Avg rating:3.0/5.0
Slides: 28
Provided by: Yixin3
Category:

less

Transcript and Presenter's Notes

Title: Support Vector Machines (SVM): A Tool for Machine Learning


1
Support Vector Machines (SVM) A Tool for Machine
Learning
  • Yixin Chen
  • Ph.D Candidate, CSE
  • 1/10/2002

2
Presentation Outline
  • Introduction
  • Linear Learning Machines
  • Support Vector Machines (SVM)
  • Examples
  • Conclusions

3
Introduction
  • Building machines capable of learning from
    experiences.
  • Experiences are usually specified by finite
    amount of training data.
  • The goal is to achieve high generalization
    performance via learning from the training set.
  • The construction of a good learning machine is a
    compromise between the accuracy attained on a
    particular training set and the capacity of the
    machine.
  • SVMs have large learning capacity and can have
    excellent generalization performance.

4
Linear Learning Machines
  • Binary classification uses a linear function g(x)
    wtxw0.
  • x is the feature vector, w is the weight vector
    and w0 the bias or threshold weight.
  • A two-category classifier implements the decision
    rule Decide class 1 if g(x)gt0 and class -1 if
    g(x)lt0.

5
A Simple Linear Classifier
6
Some Properties of Linear Learning Machines
  • Decision surface is a hyperplane.
  • The feature space is divided into two half-spaces.

7
Several Questions
  • Does there exist a hyperplane which separates the
    training set?
  • If yes, how to compute it?
  • Is it unique?
  • If not unique, can we and how can we find an
    optimal one?
  • What can we do if there doesnt exist one?

8
Facts
  • If the training set is linearly separable, then
    there exist infinitely many separating
    hyperplanes for the given training set.
  • If the training set is linearly inseparable then
    there does not exist any separating hyperplane
    for the given training set.

9
Support Vector Machines
  • Linearly Separable

10
Support Vector Machines
  • Margin 2/w

H1 wtx-w01 H wtx-w00 H2 wtx-w0-1
11
Support Vector Machines
  • Maximize the margin ? Minimize w/2

12
Support Vector Machines
  • Quadratic Program (Maximal Margin)
  • minw,w0 w2/2,
  • s.t. wtxiw01 for yi1, and wtxi?w0-1 for
    yi-1.
  • (or equivalently yi(wtxi-w0) 1)
  • Dual QP (Maximal Margin)
  • min? 0.5?i1,,m?j1,,myiyj?i?jxitxj -
    ?i1,,m?i
  • s.t. ?i1,,myi?i0, ?i?0, i1,,m
  • Support Vectors
  • w is a linear combination of support vectors.

13
Support Vector Machines
  • Linearly Inseparable

14
Support Vector Machines
  • Maximize Margin and Minimize Error (Soft Margin)
  • minw,w0,z w2/2C?i1,mzi,
  • s. t. yi(wtxi-w0)zi 1,
  • zi 0, i1,,m.
  • (zi is slack or error variable)
  • Dual QP (Soft Margin)
  • min? 0.5?i1,,m?j1,,myiyj?i?jxitxj -
    ?i1,,m?i
  • s.t. ?i1,,myi?i0
  • C??i?0, i1,,m

15
Support Vector Machines
  • Nonlinear Mappings via Kernels
  • Idea Map original features into higher
    dimensional feature space x??(x). Design
    classifier in the new feature space. The
    classifier is nonlinear in the original feature
    space but linear in the new feature space. (With
    an appropriate nonlinear mapping to a
    sufficiently high dimension, data from two
    categories can always be separated by a
    hyperplane.)

16
Support Vector Machines
  • Maximal Margin
  • min? 0.5?i1,,m?j1,,myiyj?i?j?(xi)t?(xj) -
    ?i1,,m?i
  • s.t. ?i1,,myi?i0, ?i?0, i1,,m
  • Soft Margin
  • min? 0.5?i1,,m?j1,,myiyj?i?j?(xi)t?(xj) -
    ?i1,,m?i
  • s.t. ?i1,,myi?i0, C??i?0, i1,,m

17
Support Vector Machines
  • Role of Kernels
  • Simplify the computation of inner product in the
    new feature space
  • K(x,y) ?(x)t?(y).
  • Some Popular Kernels
  • Polynomial K(x,y)(xty1)p
  • Gaussian K(x,y)e-x-y2/2?2
  • Sigmoid K(x,y)tanh(?xty-?)
  • Maximal Margin and Soft Margin

18
Support Vector Machines
  • Maximal Margin
  • min? 0.5?i1,,m?j1,,myiyj?i?jK(xi,xj) -
    ?i1,,m?i
  • s.t. ?i1,,myi?i0, ?i?0, i1,,m
  • Soft Margin
  • min? 0.5?i1,,m?j1,,myiyj?i?jK(xi,xj) -
    ?i1,,m?i
  • s.t. ?i1,,myi?i0, C??i?0, i1,,m

19
Examples
  • Checker-Board Problem

20
Checker-Board Problem
169 training samples, Gauss Kernel, Soft Margin,
C1000
21
Checker-Board Problem
169 training samples, Gauss Kernel, Soft Margin,
C1000
22
Examples
  • Two-Spiral Problem

23
Two-Spiral Problem
154 training samples, Gauss Kernel, Soft Margin,
C1000
24
Two-Spiral Problem
154 training samples, Gauss Kernel, Soft Margin,
C1000
25
Conclusions
  • Advantages
  • Always finds a global minimum.
  • Simple and clear geometric interpretation.
  • Limitations
  • Choice of Kernel.
  • Training a multi-class SVM in one step.

26
References
  • N. Cristianini and J.Shawe-Taylor, An
    Intorduction to Support Vector Machines and Other
    Kernel-Based Learning Methods, Cambridge
    University Press, 2000.
  • R. O. Duda, P. E. Hart, and D. G. Stork, Pattern
    Classification, John Wiley Sons, INC., 2001.
  • C. J.C. Burges, A Tutorial on Support Vector
    Machines for Pattern Recognition, Data Mining and
    Knowledge Discovery, c, 121-167, 1998.
  • K. P. Bennett and C. Campbell, Support Vector
    Machines Hype or Hallelujah?, SIGKDD
    Explorations, 2, 2, 1-13, 2000.
  • SVMLight, http//svmlight.joachims.org/

27
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com