Support Vector Machine (Chapter 5 - PowerPoint PPT Presentation

About This Presentation

Title:

Support Vector Machine (Chapter 5

Description:

Minimise. Subject to. This is a quadratic programming problem with linear inequality constraints. ... minimise objective function. subject to. inequality ... – PowerPoint PPT presentation

Number of Views:112

Avg rating:3.0/5.0

Slides: 33

Provided by: marti292

Category:

more less

Transcript and Presenter's Notes

Title: Support Vector Machine (Chapter 5

1
Support Vector Machine (Chapter 5 6)

Maximum margin classifier ( Chapter 6)
Optimisation Theory ( Chapter 5)
Soft Margin Hyperplane ( Chapter 6)
Support Vector Regression ( Chapter 6)

2
Simple Classification Problem Linear Separable
Case

Many decision boundaries can separate these two
classes
Which one should we choose?

Class 2
Class 1
3
Separating Hyperplane

Linear separable data.
Canonical Hyperplane

wxbgt0
Class 2
wxblt0
Class 1
Class 2
wxb1
Class 1
wxb-1
wxb0
4
Margins
Support vectors

Functional margin the margin from the output of
the function

Geometric margin

Class 2
Class 1
wxb1
wxb-1
wxb0
5
Importance of margin
Given a training point Suppose test points
Hyperplane correctly classify all test points when
6
Error bound
Maximal margin hyperplane error bounded by
Any distribution D on X -1,1 ,with probability
1-d over l random examples. d is the number of
support vectors.
7
Maximum margin Minimum norm

x and x- are the nearest positive and negative
data
Computing the geometric margin (to be maximised)
And here are the constraints

8
Maximum margin Summing up

Given a linearly separable training set (xi,
yi), i1,2,l yi?1,-1
Minimise
Subject to
This is a quadratic programming problem with
linear inequality constraints.

9
Optimisation Theory

Primal optimisation problem
minimise
objective function
subject to
inequality constraints

10
Convexity
11
Primal to Dual

Minimise
Subject to

difficult to be solved directly by primal
Lagrangian with inequality constraints.
transform from primal to dual problem, which is
obtained by introducing Lagrange Multipliers
Construct minimise Primal Lagrangian

Lagrange Multiplier
12
Primal to Dual (2)

Find minimum with respect to
w and b by taking derivatives of them and equate
them to 0

Plug them back into the Lagrangian to obtain the
dual formulation

13
Primal to Dual (3)
Find maximum of L(a,b) with respect to a, b by
taking derivatives of them and equate them to 0.
Optimal a can be found.
Data enters only in the form of dot products! can
use kernels
14
Why Primal and Dual are Equal ?

Assume (w, b) is an optimal solution of the
primal with the optimal objective value g

Thus, all (w, b) satisfies

There is agt0, that for all (w, b),

On the other hand,

15
Solving

In addition, putting (w, b) into
With agt0,

Karush-Kuhn-Tucker condition

only training points whose margin 1 will
have non-zero ?, they are support vectors.
The decision boundary is determined only by the
SV.

Important !
16
A Geometrical Interpretation
Class 2
SV mean how important a given training point is
in forming the final solution.
a100
a80.6
a70
a20
a50
a10.8
a40
a61.4
wxb1
a90
a30
Class 1
wxb0
wxb-1
17
Solving

parameters are expressed as linear combination
of training points.

except an abnormal situation where all optimal
a are zero, b can be solved using KKT.

for testing with a new data z, compute
and classify z as class 1 if the sum is
positive,
class 2 otherwise

18
What if data is not linearly separable

We allow error ?i in classification

Class 2
wxb1
wxb0
Class 1
wxb-1
19
Soft Margin Hyperplane

?i are just slack variables in optimization
theory
We want to minimize
C tradeoff parameter between error and margin

20
1-Norm Soft Margin Box Constraint

The optimization problem becomes
Incorporating kernels, and rewriting it in terms
of Lagrange Multiplier, this leads to the dual
problem,

The only difference with the linear separable
case is the upper bounded C on the a (Box
constraint).
The influence of the individual patterns (which
could be outliers) get limited.

21
1-Norm Soft Margin the Box Constraint (2)

The related KKT condition is

This implies that non-zero slack variables can
only occur when ai C.

wxb1
wxb-1
22
Support Vector Regression

e Insensitive Loss Regression
Kernel Ridge Regression

23
e Insensitive Loss Regression
L
24
Quadratic e Insensitive Loss
25
Primal function
subject to
26
Lagrangian function

Optimality Conditions

27
Dual Form
maximize
Subjext to
KKT Optimality Conditions
28
Another form
If
Subject to
29
Solving and general to nonlinear
30
Kernel Ridge Regression
under constraints
Lagrangian
Differentiating in w and b we obtain
31
Dual Form of Kernel Ridge Regression
dual form
under constraint
the regression function
32
Vector form of Kernel Ridge Regression

Write a Comment

User Comments (0)