Title: Support Vector Machine (Chapter 5
1Support Vector Machine (Chapter 5 6)
- Maximum margin classifier ( Chapter 6)
Optimisation Theory ( Chapter 5) - Soft Margin Hyperplane ( Chapter 6)
- Support Vector Regression ( Chapter 6)
2Simple Classification Problem Linear Separable
Case
- Many decision boundaries can separate these two
classes - Which one should we choose?
Class 2
Class 1
3Separating Hyperplane
- Linear separable data.
- Canonical Hyperplane
wxbgt0
Class 2
wxblt0
Class 1
Class 2
wxb1
Class 1
wxb-1
wxb0
4Margins
Support vectors
- Functional margin the margin from the output of
the function
Class 2
Class 1
wxb1
wxb-1
wxb0
5Importance of margin
Given a training point Suppose test points
Hyperplane correctly classify all test points when
6Error bound
Maximal margin hyperplane error bounded by
Any distribution D on X -1,1 ,with probability
1-d over l random examples. d is the number of
support vectors.
7Maximum margin Minimum norm
- x and x- are the nearest positive and negative
data - Computing the geometric margin (to be maximised)
- And here are the constraints
-
8Maximum margin Summing up
- Given a linearly separable training set (xi,
yi), i1,2,l yi?1,-1 - Minimise
-
- Subject to
- This is a quadratic programming problem with
linear inequality constraints.
9Optimisation Theory
- Primal optimisation problem
- minimise
objective function - subject to
-
-
inequality constraints -
10Convexity
11Primal to Dual
- difficult to be solved directly by primal
Lagrangian with inequality constraints. - transform from primal to dual problem, which is
obtained by introducing Lagrange Multipliers - Construct minimise Primal Lagrangian
Lagrange Multiplier
12Primal to Dual (2)
- Find minimum with respect to
w and b by taking derivatives of them and equate
them to 0
- Plug them back into the Lagrangian to obtain the
dual formulation
13Primal to Dual (3)
Find maximum of L(a,b) with respect to a, b by
taking derivatives of them and equate them to 0.
Optimal a can be found.
Data enters only in the form of dot products! can
use kernels
14Why Primal and Dual are Equal ?
- Assume (w, b) is an optimal solution of the
primal with the optimal objective value g
- Thus, all (w, b) satisfies
- There is agt0, that for all (w, b),
15Solving
- In addition, putting (w, b) into
- With agt0,
-
Karush-Kuhn-Tucker condition
- only training points whose margin 1 will
- have non-zero ?, they are support vectors.
- The decision boundary is determined only by the
SV.
Important !
16A Geometrical Interpretation
Class 2
SV mean how important a given training point is
in forming the final solution.
a100
a80.6
a70
a20
a50
a10.8
a40
a61.4
wxb1
a90
a30
Class 1
wxb0
wxb-1
17Solving
- parameters are expressed as linear combination
of training points. -
- except an abnormal situation where all optimal
a are zero, b can be solved using KKT.
- for testing with a new data z, compute
-
- and classify z as class 1 if the sum is
positive, - class 2 otherwise
18What if data is not linearly separable
- We allow error ?i in classification
Class 2
wxb1
wxb0
Class 1
wxb-1
19Soft Margin Hyperplane
- ?i are just slack variables in optimization
theory - We want to minimize
-
- C tradeoff parameter between error and margin
201-Norm Soft Margin Box Constraint
- The optimization problem becomes
- Incorporating kernels, and rewriting it in terms
of Lagrange Multiplier, this leads to the dual
problem,
- The only difference with the linear separable
case is the upper bounded C on the a (Box
constraint). - The influence of the individual patterns (which
could be outliers) get limited.
211-Norm Soft Margin the Box Constraint (2)
- The related KKT condition is
- This implies that non-zero slack variables can
only occur when ai C.
wxb1
wxb-1
22Support Vector Regression
- e Insensitive Loss Regression
- Kernel Ridge Regression
23e Insensitive Loss Regression
L
24Quadratic e Insensitive Loss
25Primal function
subject to
26Lagrangian function
27Dual Form
maximize
Subjext to
KKT Optimality Conditions
28Another form
If
Subject to
29Solving and general to nonlinear
30Kernel Ridge Regression
under constraints
Lagrangian
Differentiating in w and b we obtain
31Dual Form of Kernel Ridge Regression
dual form
under constraint
the regression function
32Vector form of Kernel Ridge Regression