Extreme Re-balancing for SVMs and other classifiers

About This Presentation

Title:

Extreme Re-balancing for SVMs and other classifiers

Description:

Extreme Re-balancing for SVMs and other classifiers Authors: Bhavani Raskutti & Adam Kowalczyk Telstra Croporation Victoria, Austalia Presenter: Cui, Shuoyang – PowerPoint PPT presentation

Number of Views:53

Avg rating:3.0/5.0

Slides: 23

Provided by: ou63

Category:

more less

Transcript and Presenter's Notes

Title: Extreme Re-balancing for SVMs and other classifiers

1
Extreme Re-balancing for SVMs and other
classifiers
Authors Bhavani Raskutti Adam Kowalczyk
Telstra Croporation Victoria, Austalia

Presenter Cui, Shuoyang
2005/03/02

Imbalance makes the minority-classes samples
farther from the true boundary than the
majority-class samples.
Majority-class samples dominate the penalty
introduced by soft margin.

Data Balancing
up/down samplings
No convincing evidence for how the balanced data
sampled
Imbalance-free algorithm design
Objective function should not be accuracy any
longer
Reference
Machine Learning from Imbalanced Data Sets 101
http//pages.stern.nyu.edu/fprovost/Papers/skew.P
DF

In this paper
Exploring the characters of two class learning
and analyses situations with supervised learning.
In the experiments offered later, comparing
one-class learning with two class learning and
list different forms of imbalance compensation.

Two class discrimination
to take examples from these two classes
generate a model for discriminating them
for many machine learning algorithms, the
training data should include the example form two
classes.

6
When the data has heavily unbalanced
representatives of these two class.

design re-balancing
ignore the large pool of negative examples
learn from positive examples only

7
Why extreme re-balancing

Extreme imbalance in very high dimensional input
spaces
Minority class consisting of 1-3 of the total
data
Learning sample size is much below the
dimensionality of the input space
Data site has more than 10,000 features

8
The kernel machine

The kernel machine is solved iteratively using
the conjugate gradient method.
Designing a kernel machine is to take a standard
algorithm and massage it so that all references
to the original data vectors x appear only in dot
products ( xi xj).
Given a training sequence(xi,yi) of binary
nvectors and bipolar
labels

9
Two different cases of kernel machines used here
10
Two forms of imbalance compensation

Sample balancing
Weight balancing

11
Sample balancing
10------the case of 1-class learner using all of
the negative examples 11------ the case of
2-class learner using all training examples
01------ the case of 1-class learner using all
of the positive examples
12
Weight balancing

Using different values of the regularisation of
the regulation constants for both the minority
and majority class data

B is a parameter called a balance factor
13
ExperimentsReal world data collections

AHR-data
Reuters data

14
AHR-data

Combined training and test data set
Each training instance labeled with
control, change or nc
Convert all of the info from different files to a
sparse matrix containing 18330 features

15
Reuters data

A collection of 12902 documents
Each document has been converted to a vector of
20197 dimensional word-presence feature space

16
AROC is used as performance measure

AROC is the Area under the ROC
Receiver operating characteristic (ROC) curves
are used to describe and compare the performance
of diagnostic technology and diagnostic
algorithms.

17
Experiments with Real World Data

Impact of regularisation constant
Experiment with sample balancing
Experiments with weight balancing

18
Impact of regularisation constant
19
Experiment with sample balancing

The AROC with 2-class learners is close to 1 for
all categories indicating that this
categorization problem is easy to learn

20
Experiments with weight balancing

1.Test on AHRdata

21
Experiments with weight balancing

2.Test on Reuters
To observe the performance ouf 1-class and
2-class SVMs when the most features are moved

22
The characters of test on Reuters

The accuracy of all classifiers is very high
SVM models start degenerating, the drop in
performance for 2-class SVM is larger.
1-class SVM models start outperforming 2-class
models
Similar trends
AROC is always bigger than 0.5

Write a Comment

User Comments (0)