Title: Bagging and Boosting Classifiers
1Bagging and Boosting Classifiers
2- The Challenge
- U.S. Wireless Telecom company
- To discover whether the customer will churn or
not during the next few months
?
- Churn is the response dummy variable such that
- Churn1 if the customer is churning
- Churn -1 if the customer is not churning
3Opportunity for this Challenge
- Bagging and Boosting
- Aggregating Classifiers
FINAL RULE
- Breiman (1996) found gains in accuracy by
aggregating predictors built from reweighed
versions of the learning set
4Bagging and Boosting Aggregating Classifiers
5Bagging
- Bagging Bootstrap Aggregating
- Reweighing of the learning sets is done by
drawing at random with replacement from the
learning sets - Predictors are aggregated by plurality voting
6The Bagging Algorithm
- B bootstrap samples
- From which we derive
7Weighting
8Aggregation
Sign
9Boosting
- Freund and Schapire (1997), Breiman (1998)
- Data adaptively resampled
10AdaBoost
- Initialize weights
- Fit a classifier with these weights
- Give predicted probabilities to observations
according to this classifier - Compute pseudo probabilities
- Get new weights
- Normalize it (i.e., rescale so that it sums
to 1) - Combine the pseudo probabilities
11Weighting
...
12Aggregation
Sign
13Advantages of Bagging Boosting
- Easy to implement without additional information
- For Bagging variance reduction, where
- For Boosting variance and bias reduction, where
- For Boosting no overfitting
14Choose an Unstable Classifier for Bagging
- Changes in the dataset produces big changes in
the predictor - E.g. neural networks, classification and
regression trees - E.g. of stable classifiers K-nearest neighbours
Reference BAGGING PREDICTORS L.BREIMAN, MACHINE
LEARNING, 26(2), P.123-140, 1996
15Choose a Weak Classifier for Boosting
- A classifier that performs slightly better than
random guessing - Too weak classifiers do not provide good results
- E.g. classification into 2 classes
- Random guessing error rate of 50
- Weak classifier error rate close to 50 (?45)
- Stumps are appropriate weak classifiers (binary
trees with 2 terminal nodes)
- Adaboost with trees is the best off-the-shelf
classifier in the world (Breiman, 1996)
Reference ADDITIVE LOGISTIC REGRESSION A
STATISTICAL VIEW OF BOOSTING J.FRIEDMAN,
T.HASTIE R.TIBSHIRANI, THE ANNALS OF
STATISTICS, 28(2)337-407, 2000
16The Churn Problematic
- Company
- A Major wireless TELECOM company (US)
- Churn rate 1,8 / month
- Industry highly competitive
- Consolidation into a few major players
- Growth is slowing down
- Competition based on price
- Customer strategy to offer new services
17The Churn Problematic
- Churn rates
- Annual churn rate in the telecom industry 20-25
- (last years 25-46 )
- Monthly churn rate 2
- Reasons for churn
- Increased competition
- Similarities in offerings
- Portability
18The Churn Problematic
19Selection of the Variables the Procedure
- Descriptive analysis
- Rejection of the variables with more than 30
missing values - Theoretical background in Marketing
- Principal Components Analysis for each category
20Selection of the Variables
- Available predictors for churn
21Considered variables
22Assesment of the performance
- Training set (80) and test set (20)
- Misclassification rate
- Gini Index
- Top Decile Index
23Top Decile
- Customers are sorted from the predicted most
likely to churn to the predicted least. - Take only the higher 10
24Top Decile
- Customers are sorted from the predicted most
likely to churn to the predicted least. - Take only the higher 10
10 churners
N 100 customers
25Gini Index
Risk to churn
10
26ResultsBagging a decision tree
27Gini Index
28Gini Index
29Gini Index
30Top Decile
31Top Decile
32Top Decile
33Misclassification Rate
34Misclassification Rate
35Misclassification Rate
36Results Boosting a Decision Stump
- Gini 36 improvement
- Top Decile 23 improvement
- Misclassification Rate 7 improvement
37Comparison of Bagging and Boosting
38Conclusions
- Bagging and Boosting are easy to implement
- They give convincing results without any
additional information - Results depend heavily on the particular
classification problem - Many competing versions of Boosting (e.g.
TreeNet Stochastic Gradient Boosting) - Still many open issues
39(No Transcript)
40Gini Index
OVER-OPTIMISM
41Top Decile
OVER-OPTIMISM
42Misclassification Rate
OVER-OPTIMISM