Classification via Mathematical Programming Based Support Vector Machines

About This Presentation
Title:

Classification via Mathematical Programming Based Support Vector Machines

Description:

By QP 'duality', . Maximizing the margin. in the 'dual space' , gives: min. Replace ... ( Via Nonhomogeneous Farkas or LP Duality) Proof: By LP Duality: ... –

Number of Views:119
Avg rating:3.0/5.0
Slides: 53
Provided by: olvilman9
Category:

less

Transcript and Presenter's Notes

Title: Classification via Mathematical Programming Based Support Vector Machines


1
Classification via Mathematical Programming
Based Support Vector Machines
  • Glenn M. Fung

November 26, 2002
Computer Sciences Dept. University of Wisconsin -
Madison
2
Outline of Talk
  • (Standard) Support vector machines (SVM)
  • Classify by halfspaces
  • Proximal support vector machines (PSVM)
  • Classify by proximity to planes
  • Numerical experiments
  • Incremental PSVM classifiers
  • Synthetic dataset consisting of 1 billion points
    in 10- dimensional input space
    classified in less than 2 hours and 26 minutes
    seconds
  • Knowledge based linear SVMs
  • Incorporating knowledge sets into a classifier
  • Numerical experiments

3
Support Vector MachinesMaximizing the Margin
between Bounding Planes
A
A-
4
Standard Support Vector MachineAlgebra of
2-Category Linearly Separable Case
5
Standard Support Vector Machine Formulation
6
Proximal Vector Machines (KDD 2002)Fitting the
Data using two parallel Bounding Planes
A
A-
7
PSVM Formulation
We have from the QP SVM formulation
This simple, but critical modification, changes
the nature of the optimization problem
tremendously!!
8
Advantages of New Formulation
  • Objective function remains strongly convex
  • An explicit exact solution can be written in
    terms of the problem data
  • PSVM classifier is obtained by solving a single
    system of linear equations in the usually small
    dimensional input space
  • Exact leave-one-out-correctness can be obtained
    in terms of problem data

9
Linear PSVM
  • Setting the gradient equal to zero, gives a
    nonsingular system of linear equations.
  • Solution of the system gives the desired PSVM
    classifier

10
Linear PSVM Solution
11
Linear Proximal SVM Algorithm
12
Nonlinear PSVM Formulation
13
The Nonlinear Classifier
  • Where K is a nonlinear kernel, e.g.

14
Nonlinear PSVM
However, reduced kernels techniques can be used
(RSVM) to reduce dimensionality.
15
Linear Proximal SVM Algorithm
Non

Solve
16
Linear Nonlinear PSVM MATLAB Code
function w, gamma psvm(A,d,nu) PSVM linear
and nonlinear classification INPUT A,
ddiag(D), nu. OUTPUT w, gamma w, gamma
psvm(A,d,nu) m,nsize(A)eones(m,1)HA
-e v(dH) vHDe
r(speye(n1)/nuHH)\v solve (I/nuHH)rv
wr(1n)gammar(n1) getting w,gamma from
r
17
Linear PSVM Comparisons with Other SVMsMuch
Faster, Comparable Correctness
18
Linear PSVM vs LSVM 2-Million Dataset Over 30
Times Faster
19
Nonlinear PSVM Spiral Dataset94 Red Dots 94
White Dots
20
Nonlinear PSVM Comparisons
A rectangular kernel was used of size 8124 x 215
21
Conclusion
  • PSVM is an extremely simple procedure for
    generating linear and nonlinear classifiers
  • PSVM classifier is obtained by solving a single
    system of linear equations in the usually
    small dimensional input space for a linear
    classifier
  • Comparable test set correctness to standard SVM
  • Much faster than standard SVMs typically an
    order of magnitude less.

22
Incremental PSVM Classification(Second SIAM Data
Mining Conference)
23
Linear Incremental Proximal SVM Algorithm
24
Linear Incremental Proximal SVM Adding Retiring
Data
  • Capable of modifying an existing linear
    classifier by both adding and retiring data
  • Option of retiring old data is similar to adding
    new data
  • Financial Data old data is obsolete
  • Option of keeping old data and merging it with
    the new data
  • Medical Data old data does not obsolesce.

25
Numerical experimentsOne-Billion Two-Class
Dataset
  • Synthetic dataset consisting of 1 billion points
    in 10- dimensional input space
  • Generated by NDC (Normally Distributed
    Clustered) dataset generator
  • Dataset divided into 500 blocks of 2 million
    points each.
  • Solution obtained in less than 2 hours and 26
    minutes
  • About 30 of the time was spent reading data
    from disk.
  • Testing set Correctness 90.79

26
Numerical Experiments Simulation of Two-month
60-Million Dataset
  • Synthetic dataset consisting of 60 million
    points (1 million per day) in 10- dimensional
    input space
  • Generated using NDC
  • At the beginning, we only have data
    corresponding to the first month
  • Every day
  • The oldest block of data is retired (1 Million)
  • A new block is added (1 Million)
  • A new linear classifier is calculated daily
  • Only an 11 by 11 matrix is kept in memory at the
    end of each day. All other data is purged.

27
Numerical experimentsSeparator changing through
time
28
Numerical experiments Normals to the separating
hyperplanes Corresponding to 5 day intervals
29
Conclusion
  • Proposed algorithm is an extremely simple
    procedure for generating linear classifiers in an
    incremental fashion for huge datasets.
  • The linear classifier is obtained by solving a
    single system of linear equations in the
    small dimensional input space.
  • The proposed algorithm has the ability to retire
    old data and add new data in a very simple
    manner.
  • Only a matrix of the size of the input space is
    kept in memory at any time

30
Support Vector MachinesLinear Programming
Formulation
  • Use the 1-norm instead of the 2-norm
  • This is equivalent to the following linear
    program

31
Conventional Data-Based SVM
32
Knowledge-Based SVM via Polyhedral Knowledge
Sets (NIPS 2002)
33
Incoporating Knowledge Sets Into an SVM
Classifier
  • Will show that this implication is equivalent to
    a set of constraints that can be imposed on the
    classification problem.

34
Knowledge Set Equivalence Theorem
35
Proof of Equivalence Theorem( Via Nonhomogeneous
Farkas or LP Duality)
Proof By LP Duality
36
Knowledge-Based SVM Classification
37
Knowledge-Based SVM Classification
38
Parametrized Knowledge-Based LP
39
Numerical TestingThe Promoter Recognition Dataset
  • Promoter Short DNA sequence that precedes a
    gene sequence.
  • A promoter consists of 57 consecutive DNA
    nucleotides belonging to A,G,C,T .
  • Important to distinguish between promoters and
    nonpromoters
  • This distinction identifies starting locations
    of genes in long uncharacterized DNA sequences.

40
The Promoter Recognition DatasetNumerical
Representation
  • Simple 1 of N mapping scheme for converting
    nominal attributes into a real valued
    representation
  • Not most economical representation, but commonly
  • used.

41
The Promoter Recognition DatasetNumerical
Representation
  • Feature space mapped from 57-dimensional nominal
    space to a real valued 57 x 4228 dimensional
    space.

57 nominal values
57 x 4 228 binary values
42
Promoter Recognition Dataset Prior Knowledge
Rules
  • Prior knowledge consist of the following 64
    rules

43
Promoter Recognition Dataset Sample Rules
44
The Promoter Recognition DatasetComparative
Algorithms
  • KBANN Knowledge-based artificial neural network
    Shavlik et al
  • BP Standard back propagation for neural
    networks Rumelhart et al
  • ONeills Method Empirical method suggested by
    biologist ONeill ONeill
  • NN Nearest neighbor with k3 Cost et al
  • ID3 Quinlans decision tree builderQuinlan
  • SVM1 Standard 1-norm SVM Bradley et al

45
The Promoter Recognition DatasetComparative Test
Results
46
Wisconsin Breast Cancer Prognosis Dataset
Description of the data
  • 110 instances corresponding to 41 patients
    whose cancer had recurred and 69 patients whose
    cancer had not recurred
  • 32 numerical features
  • The domain theory two simple rules used by
    doctors

47
Wisconsin Breast Cancer Prognosis Dataset
Numerical Testing Results
  • Doctors rules applicable to only 32 out of 110
    patients.
  • Only 22 of 32 patients are classified correctly
    by this rule (20 Correctness).
  • KSVM linear classifier applicable to all
    patients with correctness of 66.4.
  • Correctness comparable to best available
    results using conventional SVMs.
  • KSVM can get classifiers based on knowledge
    without using any data.

48
Conclusion
  • Prior knowledge easily incorporated into
    classifiers through polyhedral knowledge sets.
  • Resulting problem is a simple LP.
  • Knowledge sets can be used with or without
    conventional labeled data.
  • In either case KSVM is better than most
    knowledge based classifiers.

49
Breast Cancer Treatment ResponseJoint with
ExonHit ( French BioTech)
  • 35 patients treated by a drug cocktail
  • 9 partial responders 26 nonresponders
  • 25 gene expression measurements made on each
    patient
  • 1-Norm SVM classifier selected 12 out of 25
    genes
  • Combinatorially selected 6 genes out of 12
  • Separating plane obtained
  • 2.7915 T11 0.13436 S24 -1.0269 U23 -2.8108 Z23
    -1.8668 A19 -1.5177 X05 2899.1 0.
  • Leave-one-out-error 1 out of 35 (97.1
    correctness)

50
Other papers
  • A fast and Global Two Point Low Storage
    Optimization Technique for Tracing Rays in 2D and
    3D Isotropic Media (Journal of Applied
    Geophysics)
  • Semi-Supervised Support Vector Machines for
    Unlabeled data Classification (Optimization
    Methods and Software)
  • Select a small subset of an unlabeled dataset to
    be labeled by an oracle or expert
  • Use the new labeled data and the remaining
    unlabeled data to train a SVM clasifier

51
Other papers
  • Multicategory Proximal SVM Classifiers
  • Fast multicategory algorithm based on PSVM
  • Newton refinement step proposed
  • Data Selection for SVM Classifiers (KDD 2000)
  • Reduce the number of support vectors of a linear
    SVM
  • Minimal Kernel Classifiers (JMLR)
  • Use a concave minimization formulation to reduce
    the SVM model complexity.
  • Useful for online testing where testing time is
    an issue.

52
Other papers
  • A Feature Selection Newton Method for SVM
    Classification
  • LP SVM solved using a Newton method
  • Very sparse solutions are obtained
  • Finite Newton method for Lagrangian SVM
    Classifiers (Neurocomputing Journal)
  • Very fast performance, specially when ngtm
Write a Comment
User Comments (0)
About PowerShow.com