Extreme Re-balancing for SVMs and other classifiers - PowerPoint PPT Presentation

About This Presentation
Title:

Extreme Re-balancing for SVMs and other classifiers

Description:

Extreme Re-balancing for SVMs and other classifiers Authors: Bhavani Raskutti & Adam Kowalczyk Telstra Croporation Victoria, Austalia Presenter: Cui, Shuoyang – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 23
Provided by: ou63
Category:

less

Transcript and Presenter's Notes

Title: Extreme Re-balancing for SVMs and other classifiers


1
Extreme Re-balancing for SVMs and other
classifiers
Authors Bhavani Raskutti Adam Kowalczyk
Telstra Croporation Victoria, Austalia
  • Presenter Cui, Shuoyang
  • 2005/03/02

2
  • Imbalance makes the minority-classes samples
    farther from the true boundary than the
    majority-class samples.
  • Majority-class samples dominate the penalty
    introduced by soft margin.

3
  • Data Balancing
  • up/down samplings
  • No convincing evidence for how the balanced data
    sampled
  • Imbalance-free algorithm design
  • Objective function should not be accuracy any
    longer
  • Reference
  • Machine Learning from Imbalanced Data Sets 101
  • http//pages.stern.nyu.edu/fprovost/Papers/skew.P
    DF

4
  • In this paper
  • Exploring the characters of two class learning
    and analyses situations with supervised learning.
  • In the experiments offered later, comparing
    one-class learning with two class learning and
    list different forms of imbalance compensation.

5
  • Two class discrimination
  • to take examples from these two classes
    generate a model for discriminating them
  • for many machine learning algorithms, the
    training data should include the example form two
    classes.

6
When the data has heavily unbalanced
representatives of these two class.
  • design re-balancing
  • ignore the large pool of negative examples
  • learn from positive examples only

7
Why extreme re-balancing
  • Extreme imbalance in very high dimensional input
    spaces
  • Minority class consisting of 1-3 of the total
    data
  • Learning sample size is much below the
    dimensionality of the input space
  • Data site has more than 10,000 features

8
The kernel machine
  • The kernel machine is solved iteratively using
    the conjugate gradient method.
  • Designing a kernel machine is to take a standard
  • algorithm and massage it so that all references
    to the original data vectors x appear only in dot
    products ( xi xj).
  • Given a training sequence(xi,yi) of binary
    nvectors and bipolar
    labels

9
Two different cases of kernel machines used here
10
Two forms of imbalance compensation
  • Sample balancing
  • Weight balancing

11
Sample balancing
10------the case of 1-class learner using all of
the negative examples 11------ the case of
2-class learner using all training examples
01------ the case of 1-class learner using all
of the positive examples
12
Weight balancing
  • Using different values of the regularisation of
    the regulation constants for both the minority
    and majority class data

B is a parameter called a balance factor
13
ExperimentsReal world data collections
  • AHR-data
  • Reuters data

14
AHR-data
  • Combined training and test data set
  • Each training instance labeled with
  • control, change or nc
  • Convert all of the info from different files to a
    sparse matrix containing 18330 features

15
Reuters data
  • A collection of 12902 documents
  • Each document has been converted to a vector of
    20197 dimensional word-presence feature space

16
AROC is used as performance measure
  • AROC is the Area under the ROC
  • Receiver operating characteristic (ROC) curves
    are used to describe and compare the performance
    of diagnostic technology and diagnostic
    algorithms.

17
Experiments with Real World Data
  • Impact of regularisation constant
  • Experiment with sample balancing
  • Experiments with weight balancing

18
Impact of regularisation constant
19
Experiment with sample balancing
  • The AROC with 2-class learners is close to 1 for
    all categories indicating that this
    categorization problem is easy to learn

20
Experiments with weight balancing
  • 1.Test on AHRdata

21
Experiments with weight balancing
  • 2.Test on Reuters
  • To observe the performance ouf 1-class and
    2-class SVMs when the most features are moved

22
The characters of test on Reuters
  • The accuracy of all classifiers is very high
  • SVM models start degenerating, the drop in
    performance for 2-class SVM is larger.
  • 1-class SVM models start outperforming 2-class
    models
  • Similar trends
  • AROC is always bigger than 0.5
Write a Comment
User Comments (0)
About PowerShow.com