Combining Bagging and Random Subspaces to Create Better Ensembles

About This Presentation

Title:

Combining Bagging and Random Subspaces to Create Better Ensembles

Description:

First we perform bootstrap sampling with replication ... Three different base-level algorithms used. J48 decision tree. JRip rule learning ... – PowerPoint PPT presentation

Number of Views:140

Avg rating:3.0/5.0

Slides: 22

Provided by: Pan45

Category:

more less

Transcript and Presenter's Notes

Title: Combining Bagging and Random Subspaces to Create Better Ensembles

1
Combining Bagging and Random Subspaces to Create
Better Ensembles

Pance Panov, Sao Deroski
Joef Stefan Institute

2
Outline

Motivation
Overview of Randomization Methods for
constructing Ensembles (bagging, random subspace
method, random forests)
Combining Bagging and Random Subspaces
Experiments and results
Summary and further work

3
Motivation

Random Forests is one of best performing ensemble
methods
Use random sub samples of the training data
Use randomized base level algorithm
Our proposal is to use similar approach
Combination of bagging and random subspace method
to achieve similar effect
Advantages
The method is applicable to any base level
algorithm
There is no need of randomizing the base level
algorithm

4
Randomization methods for constructing ensembles

Find set of base-level algorithms that are
diverse in their decisions and complement each
other
Different possibilities
bootstrap sampling
random subset of features
randomized version of the base-level algorithms

5
Bagging
ENSEMBLE
Learning algorithm
Classifier C1
Training set S

Introduced by Breiman in 1996
Based on bootstraping with replacement
Usefull to use with unstable algorithms (e.g.
decision trees)

S1
S2
Learning algorithm
Classifier C2
Sb
Learning algorithm
Classifier Cb
6
Random Subspace Method
ENSEMBLE

Introduced by Ho in 1998
Modification of the training data is in the
feature space
Usefull to use with high dimensional data

Learning algorithm
Classifier C1
S1
Learning algorithm
Classifier C2
S2
Sb
Learning algorithm
Classifier Cb
7
Random Forest
ENSEMBLE
Classifier C1
Training set S

Introduced by Breiman in 2001
Particular implementation of bagging where base
level algorithm is a random tree

Random Tree
S1
S2
Classifier C2
Random Tree
Sb
Classifier Cb
Random Tree
8
Combining Bagging and Random Subspaces

Training sets are generated on the basis of
bagging and random subspaces
First we perform bootstrap sampling with
replication
then we perform random feature subset selection
on the bootstrap samples
The new algorithm is named SubBag

9
Training set S
10
b number of bootstrap replicates
1
2
3
4
P

Training set S
S1
S2
Sb
Bootstrap sampling with replacement
11
b number of bootstrap replicates
Random Subspace selection
41
1
2
3
P

Training set S
S1
S2
Sb
Bootstrap sampling with replacement
12
b number of bootstrap replicates
Random Subspace selection
1
2
3
4
P

S1
Training set S
S1
S2
S2
Sb
Sb
Bootstrap sampling with replacement
13
b number of bootstrap replicates
Random Subspace selection
1
2
3
4
P

Learning algorithm
S1
Training set S
S1
Learning algorithm
S2
S2
Sb
Learning algorithm
Sb
Bootstrap sampling with replacement
14
b number of bootstrap replicates
Random Subspace selection
P features (PltP)
P features
1
2
4
P

1
2
3

X11
X12
Learning algorithm
Classifier C1
S1
X13
X14
S1
Training set S
X1n
S1
Learning algorithm
Classifier C2
S2
S2
Sb
Learning algorithm
Classifier Cb
Sb
Bootstrap sampling with replacement
15
Experiments

19 datasets from UCI Repository
WEKA environment used for experiments
Comparison of SubBag (proposed method) to
Random Subspace Method
Bagging
Random Forest
Three different base-level algorithms used
J48 decision tree
JRip rule learning
IBk - nearest neighbor
10 fold cross-validation was performed

16
Results
Note
17
Results
18
Results
19
Results Wilcoxon test

Predictive performance using J48 as base level
classifier
Predictive performance using JRip as base level
classifier
Predictive performance using IBk as base level
classifier

20
Summary

SubBag is comparable to Random Forests in case of
J48 as base and better than Bagging and Random
Subspaces
SubBag is comparable to Bagging and better than
Random Subspaces in case of JRip
SubBag is better than Bagging and Random
Subspaces in case of IBk

21
Further work

Investigate the diversity of ensemble and compare
it with other methods
Use different combinations of bagging and random
subspaces (e.g. bags of RSM ensembles and RSM
ensembles of bags)
Compare bagged ensembles of randomized algorithms
(e.g. rules)

Write a Comment

User Comments (0)