Title: Knowledge-Based Support Vector Machine Classifiers
1 Knowledge-Based Support Vector Machine
Classifiers
NIPS2002, Vancouver, December 9-14, 2002
- Glenn Fung
- Olvi Mangasarian
- Jude Shavlik
University of Wisconsin-Madison
2Outline of Talk
- Support Vector Machine (SVM) Classifiers
- Standard Quadratic Programming formulation
- Linear Programming formulation1-norm linear
SVM
- Polyhedral Knowledge Sets
- Incorporating knowledge sets into a classifier
- Wisconsin breast cancer prognosis dataset
3Support Vector MachinesMaximizing the Margin
between Bounding Planes
A
A-
4Support Vector MachinesMaximizing the Margin
between Bounding Planes
A
A-
5Algebra of the Classification Problem 2-Category
Linearly Separable Case
- Given m points in n dimensional space
- Represented by an m-by-n matrix A
6Support Vector Machines Quadratic Programming
Formulation
- Solve the following quadratic program
7Support Vector MachinesLinear Programming
Formulation
- Use the 1-norm instead of the 2-norm
- This is equivalent to the following linear
program
8Knowledge-Based SVM via Polyhedral Knowledge
Sets
9Incorporating Knowledge Sets Into an SVM
Classifier
- Will show that this implication is equivalent to
a set of constraints that can be imposed on the
classification problem.
10Knowledge Set Equivalence Theorem
11Proof of Equivalence Theorem( Via Nonhomogeneous
Farkas or LP Duality)
Proof By LP Duality
12Knowledge-Based SVM Classification
13Knowledge-Based SVM Classification
14Knowledge-Based LP with Slack VariablesMinimize
Error in Knowledge Set Constraints Satisfaction
15Knowledge-Based SVM via Polyhedral Knowledge
Sets
16Empirical EvaluationThe Promoter Recognition
Dataset
- Promoter Short DNA sequence that precedes a
gene sequence. - A promoter consists of 57 consecutive DNA
nucleotides belonging to A,G,C,T . - Important to distinguish between promoters and
nonpromoters - This distinction identifies starting locations
of genes in long uncharacterized DNA sequences.
17The Promoter Recognition DatasetNumerical
Representation
- Using 1-of-4 representation
57 nominal values
57 x 4 228 binary values
18Promoter Recognition Dataset Prior Knowledge
Rules
- Prior knowledge consist of the following 64
rules Ri
19Promoter Recognition Dataset Sample Rules
20The Promoter Recognition DatasetComparative Test
Results
21Wisconsin Breast Cancer Prognosis Dataset
Description of the data
- 110 instances corresponding to 41 patients
whose cancer had recurred and 69 patients whose
cancer had not recurred - 32 numerical features
- The domain theory two simple rules used by
doctors
22Wisconsin Breast Cancer Prognosis Dataset
Numerical Testing Results
- Doctors rules applicable to only 32 out of 110
patients. - Only 22 of 32 patients are classified correctly
by this rule. - KSVM linear classifier applicable to all patients
with correctness of 66.4. - Correctness comparable to best available
results using conventional SVMs. - KSVM can get classifiers based on knowledge
without using any data. -
23Conclusion
- Prior knowledge easily incorporated into
classifiers through polyhedral knowledge sets. - Resulting problem is a simple linear program.
- Knowledge sets can be used with or without
conventional labeled data. - In either case, KSVM is better than most
classifiers tested.
24Future Research
- Generate classifiers based on prior expert
knowledge in various fields - Diagnostic rules for various diseases
- Financial investment rules
- Intrusion detection rules
- Extend knowledge sets to general convex sets
- Nonlinear kernel classifiers. Challenges
- Express prior knowledge nonlinearly
- Extend equivalence theorem to general convex
sets -
25Web Pages