Title: Support Vector Machines
1Support Vector Machines
- H. Clara Pong
- Julie Horrocks1, Marianne Van den Heuvel2,Francis
Tekpetey3, B. Anne Croy4. - 1 Mathematics Statistics, University of Guelph,
- 2 Biomedical Sciences, University of Guelph,
- 3 Obstetrics and Gynecology, University of
Western Ontario, - 4 Anatomy Cell Biology, Queens University
2Outline
- Background
- Separating Hyper-plane Basis Expansion
- Support Vector Machines
- Simulations
- Remarks
3Background
- Motivation
- The IVF (In-Vitro Fertilization) project
- 18 infertile women
- each undergoing the IVF treatment
- Outcome (Outputs, Ys) Binary (pregnancy)
- Predictor (Inputs, Xs) Longitudinal data
(adhesion)
4Background
- Classification methods
- Relatively new method Support Vector Machines
- V. Vapnik first proposed in 1979
- Maps input space into a high dimensional feature
space - Constructs a linear classifier in the new feature
space - Traditional method Discriminant Analysis
- R.A. Fisher 1936
- Classify according to the values from the
discriminant functions - Assumption the predictors X in a given class has
a Multi-Normal distribution.
5Separating Hyper-plane
- Suppose there are 2 classes (A, B)
- y 1 for group A, y -1 for group B.
- Let a hyper-plane be defined as f(X) ß0
ßTX 0 - then f(X) is the decision boundary that
separates the two groups. - f(X) ß0 ßTX gt 0 for X ? A
- f(X) ß0 ßTX lt 0 for X ? B
Given X0 ? A, misclassified when f(X0 ) lt
0. Given X0 ? B , misclassified when f(X0 ) gt 0.
6Separating Hyper-plane
- The perceptron learning algorithm search for
a hyper-plane that minimizes the distance of
misclassified points to the decision boundary.
However this does not provide a unique solution.
7Optimal Separating Hyper-plane
- Let C be the distance of the closest point from
the two groups to the - hyper-plane.
- The Optimal Separating hyper-plane is the unique
separating - hyper-plane f(X) ß0 ßTX 0, where (ß0
,ßT) maximizes C.
8Optimal Separating Hyper-plane
Subjects to 1. ai yi (xiTß ß0) -1 0
2. ai 0 all i1N 3. ß S i1..N ai
yixi 4. S i1..N ai yi 0 5. The
Kuhn Tucker Conditions f(X) only depends on the
xis where ai ? 0
9Optimal Separating Hyper-plane
10Basis Expansion
- Suppose there are p inputs X(x1 xp)
- Let hk(X) be a transformation that maps X from
Rp ?R. - hk(X) is called the basis function.
- H h1(X), ,hm(X) is the basis of a new
feature space (dimm)
Example X(x1,x2) H h1(X), h2(X),h3(X)
h1(X) h1(x1,x2) x1, h2(X) h2(x1,x2) x2,
h3(X) h3(x1,x2) x1x2
X_new H(X) (x1, x2, x1x2)
11Support Vector Machines
- The optimal hyper-plane X f(X) ß0 ßTX0 .
- f(X) ß0 ßTX is called the Support Vector
Classifier.
12Support Vector Machines
Non-separable Case training data is
non-separable.
f(X) ß0 ßTX 0
- Hyper-plane X f(X) ß0 ßTX 0
Xi crosses the margin of its group when C yi
f(Xi) gt 0.
Si C yi f(Xi) when Xi crosses the margin and
its zero when Xi outside.
Let ?iC Si, ?i is the proportional of C that the
prediction has crossed the margin. Misclassificat
ion occurs when Si gt C (?i gt 1).
13Support Vector Machines
The overall misclassification is S?i , and is
bounded by d.
14Support Vector Machines
- SVM search for an optimal hyper-plane in a new
feature - space where the data are more separate.
Suppose H h1(X), ,hm(X) is the basis for
the new feature space F. All elements in the new
feature space is a linear basis expansion of X.
15Support Vector Machines
The kernel and the basis transformation define
one another.
16Support Vector Machines
This shows the basis transformation in SVM does
not need to be define explicitly.
17Simulations
- 3 cases
- 100 simulations per case
- Each simulation consists of 200 points
- 100 points from each group
- Input space 2 dimensional
- Output 0 or 1 (2 groups)
- Half of the points are randomly selected as the
training set.
X(x1,x2), Y ? 0,1
18Simulations
- Case 1 (Normal with same covariance matrix)
19Simulations
Misclassifications (in 100 simulations) Misclassifications (in 100 simulations) Misclassifications (in 100 simulations) Misclassifications (in 100 simulations) Misclassifications (in 100 simulations)
Training Training Testing Testing
Mean Sd Mean Sd
LDA 7.85 2.65 8.07 2.51
SVM 6.98 2.33 8.48 2.81
20Simulations
- Case 2 (Normal with unequal covariance matrixes)
21Simulations
Misclassifications (in 100 simulations) Misclassifications (in 100 simulations) Misclassifications (in 100 simulations) Misclassifications (in 100 simulations) Misclassifications (in 100 simulations)
Training Training Testing Testing
Mean Sd Mean Sd
QDA 15.5 3.75 16.84 3.48
SVM 13.6 4.03 18.8 4.01
22Simulations
23Simulations
Misclassifications (in 100 simulations) Misclassifications (in 100 simulations) Misclassifications (in 100 simulations) Misclassifications (in 100 simulations) Misclassifications (in 100 simulations)
Training Training Testing Testing
Mean Sd Mean Sd
QDA 14 3.79 16.8 3.63
SVM 9.34 3.46 14.8 3.21
24Simulations
- Paired t-test for differences in
misclassifications - Ho mean different 0 Ha mean different ? 0
- Case 1
- mean different (LDA - SVM) - 0.41 , se
0.3877 - t -1.057, p-value 0.29
(insignificant) - Case 2
- mean different (QDA - SVM) -1.96 , se
0.4170 - t -4.70, p-value 8.42e-06 (significant)
- Case 3
- mean different (QDA - SVM) 2, sd 0.4218
- t 4.74, p-value 7.13e-06 (significant)
-
25Remarks
- Support Vector Machines
- Maps the original input space onto a feature
space of higher dimension - No assumption on the distributions of Xs
- Performance
- The performances of Discriminant Analysis and SVM
are similar (when (XY) has a Normal distribution
and share the same S) - Discriminant Analysis has a better performance
- (when the covariance matrices for the two groups
are different) - SVM has a better performance
- (when the input (X) violated the
distribution assumption)
26Reference
- N. Cristianini, and J. Shawe-Taylor An
introduction to Support Vector Machines and other
kernel-based learning methods. New York
Cambridge University Press, 2000. - J. Friedman, T. Hastie, and R. Tibshirani The
Elements of Statistical Learning. NewYork
Springer, 2001. - D. Meyer, C. Chang, and C. Lin. R Documentation
Support Vector Machines. http//www.maths.lth.se/h
elp/R/.R/library/e1071/html/svm.html - Last updated March 2006
- H. Planatscher and J. Dietzsch. SVM-Tutorial
using R (e1071-package) http//www.potschi.de
/svmtut/svmtut.htm - M. Van Den Heuvel, J. Horrocks, S. Bashar, S.
Taylor, S. Burke, K. Hatta, E. Lewis, and A.
Croy. Menstrual Cycle Hormones Induce Changes in
Functional Interac-tions Between Lymphocytes and
Endothelial Cells. Journal of Clinical
Endocrinology and Metabolism, 2005.
27Thank You !