Combining Bagging and Random Subspaces to Create Better Ensembles - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Combining Bagging and Random Subspaces to Create Better Ensembles

Description:

First we perform bootstrap sampling with replication ... Three different base-level algorithms used. J48 decision tree. JRip rule learning ... – PowerPoint PPT presentation

Number of Views:140
Avg rating:3.0/5.0
Slides: 22
Provided by: Pan45
Category:

less

Transcript and Presenter's Notes

Title: Combining Bagging and Random Subspaces to Create Better Ensembles


1
Combining Bagging and Random Subspaces to Create
Better Ensembles
  • Pance Panov, Sao Deroski
  • Joef Stefan Institute

2
Outline
  • Motivation
  • Overview of Randomization Methods for
    constructing Ensembles (bagging, random subspace
    method, random forests)
  • Combining Bagging and Random Subspaces
  • Experiments and results
  • Summary and further work

3
Motivation
  • Random Forests is one of best performing ensemble
    methods
  • Use random sub samples of the training data
  • Use randomized base level algorithm
  • Our proposal is to use similar approach
  • Combination of bagging and random subspace method
    to achieve similar effect
  • Advantages
  • The method is applicable to any base level
    algorithm
  • There is no need of randomizing the base level
    algorithm

4
Randomization methods for constructing ensembles
  • Find set of base-level algorithms that are
    diverse in their decisions and complement each
    other
  • Different possibilities
  • bootstrap sampling
  • random subset of features
  • randomized version of the base-level algorithms

5
Bagging
ENSEMBLE
Learning algorithm
Classifier C1
Training set S
  • Introduced by Breiman in 1996
  • Based on bootstraping with replacement
  • Usefull to use with unstable algorithms (e.g.
    decision trees)

S1
S2
Learning algorithm
Classifier C2
Sb
Learning algorithm
Classifier Cb
6
Random Subspace Method
ENSEMBLE
  • Introduced by Ho in 1998
  • Modification of the training data is in the
    feature space
  • Usefull to use with high dimensional data

Learning algorithm
Classifier C1
S1
Learning algorithm
Classifier C2
S2
Sb
Learning algorithm
Classifier Cb
7
Random Forest
ENSEMBLE
Classifier C1
Training set S
  • Introduced by Breiman in 2001
  • Particular implementation of bagging where base
    level algorithm is a random tree

Random Tree
S1
S2
Classifier C2
Random Tree
Sb
Classifier Cb
Random Tree
8
Combining Bagging and Random Subspaces
  • Training sets are generated on the basis of
    bagging and random subspaces
  • First we perform bootstrap sampling with
    replication
  • then we perform random feature subset selection
    on the bootstrap samples
  • The new algorithm is named SubBag

9
Training set S
10
b number of bootstrap replicates
1
2
3
4
P

Training set S
S1
S2
Sb
Bootstrap sampling with replacement
11
b number of bootstrap replicates
Random Subspace selection
41
1
2
3
P

Training set S
S1
S2
Sb
Bootstrap sampling with replacement
12
b number of bootstrap replicates
Random Subspace selection
1
2
3
4
P

S1
Training set S
S1
S2
S2
Sb
Sb
Bootstrap sampling with replacement
13
b number of bootstrap replicates
Random Subspace selection
1
2
3
4
P

Learning algorithm
S1
Training set S
S1
Learning algorithm
S2
S2
Sb
Learning algorithm
Sb
Bootstrap sampling with replacement
14
b number of bootstrap replicates
Random Subspace selection
P features (PltP)
P features
1
2
4
P

1
2
3

X11
X12
Learning algorithm
Classifier C1
S1
X13
X14
S1
Training set S
X1n
S1
Learning algorithm
Classifier C2
S2
S2
Sb
Learning algorithm
Classifier Cb
Sb
Bootstrap sampling with replacement
15
Experiments
  • 19 datasets from UCI Repository
  • WEKA environment used for experiments
  • Comparison of SubBag (proposed method) to
  • Random Subspace Method
  • Bagging
  • Random Forest
  • Three different base-level algorithms used
  • J48 decision tree
  • JRip rule learning
  • IBk - nearest neighbor
  • 10 fold cross-validation was performed

16
Results
Note
17
Results
18
Results
19
Results Wilcoxon test
  • Predictive performance using J48 as base level
    classifier
  • Predictive performance using JRip as base level
    classifier
  • Predictive performance using IBk as base level
    classifier

20
Summary
  • SubBag is comparable to Random Forests in case of
    J48 as base and better than Bagging and Random
    Subspaces
  • SubBag is comparable to Bagging and better than
    Random Subspaces in case of JRip
  • SubBag is better than Bagging and Random
    Subspaces in case of IBk

21
Further work
  • Investigate the diversity of ensemble and compare
    it with other methods
  • Use different combinations of bagging and random
    subspaces (e.g. bags of RSM ensembles and RSM
    ensembles of bags)
  • Compare bagged ensembles of randomized algorithms
    (e.g. rules)
Write a Comment
User Comments (0)
About PowerShow.com