Title: Support%20Vector%20Machines:%20Hype%20or%20Hallelujah?
1Support Vector MachinesHype or Hallelujah?
- Kristin Bennett
- Math Sciences Dept
- Rensselaer Polytechnic Inst.
- http//www.rpi.edu/bennek
2Outline
- Support Vector Machines for Classification
- Linear Discrimination
- Nonlinear Discrimination
- Extensions
- Hallelujah
- Hype
3Binary Classification
- Example Medical Diagnosis
- Is it benign or malignant?
-
4Linear Classification Model
- Given training data
- Linear model - find
- Such that
-
5Best Linear Separator?
6Best Linear Separator?
7Best Linear Separator?
8Best Linear Separator?
9Best Linear Separator?
10Find Closest Points in Convex Hulls
d
c
11Plane Bisect Closest Points
d
c
12Find using quadratic program
Many existing and new solvers.
13Best Linear SeparatorSupporting Plane Method
Maximize distance Between two parallel
supporting planes
Distance Margin
14Maximize margin using quadratic program
15Dual of Closest Points Method is Support Plane
Method
Solution only depends on support vectors
16Support Vector Machines (SVM)
A methodology for inference based on Vapniks
Statistical Learning Theory.
- Key Ideas
- Maximize Margins
- Do the Dual
- Construct Kernels
17Statistical Learning Theory
- Misclassification error and the function
complexity bound generalization error. - Maximizing margins minimizes complexity.
- Eliminates overfitting.
- Solution depends only on Support Vectors not
number of attributes.
18Margins and Complexity
Skinny margin is more flexible thus more complex.
19Margins and Complexity
Fat margin is less complex.
20Linearly Inseparable Case
Convex Hulls Intersect! Same argument
wont work.
21Reduced Convex Hulls Dont Intersect
Reduce by adding upper bound D
22Find Closest Points Then Bisect
No change except for D. D determines number of
Support Vectors.
23Linearly Inseparable CaseSupporting Plane Method
Just add non-negative error vector z.
24Dual of Closest Points Method is Support Plane
Method
Solution only depends on support vectors
25Nonlinear Classification
26Nonlinear Classification Map to higher
dimensional space
IDEA Map each point to higher dimensional
feature space and construct linear discriminant
in the higher dimensional space.
Dual SVM becomes
27Generalized Inner Product
By Hilbert-Schmidt Kernels (Courant and Hilbert
1953)
for certain ? and K, e.g.
28Final Classification via Kernels
The Dual SVM becomes
29 30Final SVM Algorithm
- Solve Dual SVM QP
- Recover primal variable b
- Classify new x
Solution only depends on support vectors
31Support Vector Machines (SVM)
- Key Formulation Ideas
- Maximize Margins
- Do the Dual
- Construct Kernels
- Generalization Error Bounds
- Practical Algorithms
32Hallelujah!
- Generalization theory and practice meet
- General methodology for many types of problems
- Same Program New Kernel New method
- No problems with local minima
- Few model parameters. Selects capacity.
- Robust optimization methods.
- Successful Applications
BUT
33HYPE?
- Will SVMs beat my best hand-tuned method Z for X?
- Do SVM scale to massive datasets?
- How to chose C and Kernel?
- What is the effect of attribute scaling?
- How to handle categorical variables?
- How to incorporate domain knowledge?
- How to interpret results?
34Support Vector Machine Resources
- http//www.support-vector.net/
- http//www.kernel-machines.org/
- Links off my web page
- http//www.rpi.edu/bennek