Title: Extreme Re-balancing for SVMs and other classifiers
1Extreme Re-balancing for SVMs and other
classifiers
Authors Bhavani Raskutti Adam Kowalczyk
Telstra Croporation Victoria, Austalia
- Presenter Cui, Shuoyang
- 2005/03/02
2- Imbalance makes the minority-classes samples
farther from the true boundary than the
majority-class samples. - Majority-class samples dominate the penalty
introduced by soft margin.
3- Data Balancing
- up/down samplings
- No convincing evidence for how the balanced data
sampled - Imbalance-free algorithm design
- Objective function should not be accuracy any
longer - Reference
- Machine Learning from Imbalanced Data Sets 101
- http//pages.stern.nyu.edu/fprovost/Papers/skew.P
DF
4- In this paper
- Exploring the characters of two class learning
and analyses situations with supervised learning. - In the experiments offered later, comparing
one-class learning with two class learning and
list different forms of imbalance compensation.
5-
- Two class discrimination
- to take examples from these two classes
generate a model for discriminating them - for many machine learning algorithms, the
training data should include the example form two
classes.
6When the data has heavily unbalanced
representatives of these two class.
- design re-balancing
- ignore the large pool of negative examples
- learn from positive examples only
7Why extreme re-balancing
- Extreme imbalance in very high dimensional input
spaces - Minority class consisting of 1-3 of the total
data - Learning sample size is much below the
dimensionality of the input space - Data site has more than 10,000 features
8The kernel machine
- The kernel machine is solved iteratively using
the conjugate gradient method. - Designing a kernel machine is to take a standard
- algorithm and massage it so that all references
to the original data vectors x appear only in dot
products ( xi xj). - Given a training sequence(xi,yi) of binary
nvectors and bipolar
labels
9Two different cases of kernel machines used here
10Two forms of imbalance compensation
- Sample balancing
- Weight balancing
11Sample balancing
10------the case of 1-class learner using all of
the negative examples 11------ the case of
2-class learner using all training examples
01------ the case of 1-class learner using all
of the positive examples
12Weight balancing
- Using different values of the regularisation of
the regulation constants for both the minority
and majority class data
B is a parameter called a balance factor
13ExperimentsReal world data collections
14AHR-data
- Combined training and test data set
- Each training instance labeled with
- control, change or nc
- Convert all of the info from different files to a
sparse matrix containing 18330 features
15Reuters data
- A collection of 12902 documents
- Each document has been converted to a vector of
20197 dimensional word-presence feature space
16AROC is used as performance measure
- AROC is the Area under the ROC
- Receiver operating characteristic (ROC) curves
are used to describe and compare the performance
of diagnostic technology and diagnostic
algorithms.
17Experiments with Real World Data
- Impact of regularisation constant
- Experiment with sample balancing
- Experiments with weight balancing
18Impact of regularisation constant
19Experiment with sample balancing
- The AROC with 2-class learners is close to 1 for
all categories indicating that this
categorization problem is easy to learn
20Experiments with weight balancing
21Experiments with weight balancing
- 2.Test on Reuters
- To observe the performance ouf 1-class and
2-class SVMs when the most features are moved
22The characters of test on Reuters
- The accuracy of all classifiers is very high
- SVM models start degenerating, the drop in
performance for 2-class SVM is larger. - 1-class SVM models start outperforming 2-class
models - Similar trends
- AROC is always bigger than 0.5