Introduction to Support Vector Machines - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Introduction to Support Vector Machines

Description:

Raluca Paiu - PHAROS Reading Group. 1. Introduction to Support Vector Machines ... Raluca Paiu - PHAROS Reading Group. 4. SVMs - Generalization. Data is ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 19
Provided by: raluc8
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Support Vector Machines


1
Introduction to Support Vector Machines
2
SVMs - Usage
  • SVMs set of supervised learning methods
  • Classification
  • Handwritten digit recognition
  • Object recognition
  • Speaker identification
  • Face detection in images
  • Text categorization
  • Regression
  • Curve fitting
  • Forecasting

3
SVMs Basic Idea
  • Data classification
  • E.g. data represented as points in R2
  • Can we find a 1 - dimensional line that separates
    them?
  • Constraint we want to find
    the line that maximizes
    the
    separation (margin) between
    the 2 classes of points.

4
SVMs - Generalization
  • Data is represented as vectors in Rn
  • Build a (n-1) dimensional hyperplane separating
    the data
  • Two parallel hyperplanes created on each side of
    this hyperplane.
  • Find the hyperplane that maximizes the distance
    between the two parallel hyperplanes

5
SVMs Formalization 1
  • Points (x1, c1), (x2, c2), , (xn, cn)
  • xi n dimensional vectors (of scaled values
    usually 0, 1 or -1, 1)
  • ci constant -1, 1
  • The separating hyperplane
  • w x - b 0
  • Problem find w and b

6
SVMs Formalization 2
  • xi w - b 1, for ci 1 (1)
  • xi w - b -1, for ci -1 (2)
  • Distance between planes
  • 2 / w
  • Goal maximize (2 / w) ? minimize w

7
SVMs Formalization 3
  • From eq. (1) and (2)
  • ci (xi w b) 1, 1 i n
  • Primal Form
  • minimize (1/2) w2, subject to ci (xi w b)
    1

8
SVMs Formalization 4
  • minimize LP
  • ai 0
  • For computing b, solve
  • and take the mean value for b

9
SVMs Formalization 5
  • Dual Form (based on KKT conditions)
  • maximize LD
  • ai 0
  • All points in the training set for which ai 0
    are called support vectors
  • The optimization problem is sometimes simpler
  • Constraints for this version are simpler than the
    original constraints

10
The Non-Separable Case 1
  • Soft Margin Method
  • choose a hyperplane that splits the examples
    as cleanly as possible, while still maximizing
    the distance to the nearest cleanly split
    examples.
  • ci (xi w b) 1 - ?i, 1 i n
  • ?i 0
  • The error should be as small as possible

11
The Non-Separable Case 2
  • Minimize
  • (1/2) w2 C ??i, such that
  • ci (xi w b) 1 - ?i, 1 i n

12
Linear SVMs Overview
  • The classifier is a separating hyperplane.
  • Most important training points are support
    vectors they define the hyperplane.
  • Quadratic optimization algorithms can identify
    which training points xi are support vectors with
    non-zero Lagrangian multipliers ai.
  • Both in the dual formulation of the problem and
    in the solution training points appear only
    inside dot products

Find a1aN such that Q(a) Sai - ½SSaiajci cj xi
xj is maximized and (1) Saici 0 (2) 0 ai
C for all ai
f(x) Saicixix b
13
Non-linear SVMs
  • Datasets that are linearly separable with some
    noise work out great
  • But what are we going to do if the dataset is
    just too hard?
  • How about mapping data to a higher-dimensional
    space

0
x
14
Non-linear Classification
  • create non-linear classifiers by applying the
    kernel-trick
  • the resulting algorithm is similar, just that
  • all dot products (xixj) are replaced by
    non-linear kernel functions

15
Kernel Trick 1
  • Instead of the inner product ltxi, xjgt use a
    non-linear function K(xi, xj) ?
  • the boundary between the classes is
  • K(x,w) b 0 ()
  • K(xi, xj) F(xi)F(xj)
  • F Rd -gt H
  • The set of points x ? Rd on the boundary ()
    becomes a curved surface

16
Kernel Trick 2
  • A kernel function is some function that
    corresponds to an inner product in some expanded
    feature space.
  • For some functions K(xi,xj) checking that
  • K(xi,xj) f(xi) f(xj) can be
    cumbersome.
  • Mercers theorem
  • Every semi-positive definite symmetric function
    is a kernel

17
Kernel Trick 3
  • Common Kernel functions
  • Polynomial (homogeneous)
  • Polynomial (inhomogeneous)
  • Radial Basis function
    , ? gt 0
  • Gaussian Radial basis function
  • Sigmoid
  • for some (not every) ? gt 0 and c lt 0

18
Multi-Class Classification
  • SVMs can only handle two-class outputs (i.e. a
    categorical output variable with arity 2).
  • What can be done?
  • Answer with output arity N, learn N SVMs
  • SVM 1 learns Output1 vs Output ! 1
  • SVM 2 learns Output2 vs Output ! 2
  • SVM n learns Outputn vs Output ! n
  • Then to predict the output for a new input, just
    predict with each SVM and find out which one puts
    the prediction the furthest into the positive
    region.
Write a Comment
User Comments (0)
About PowerShow.com