Bagging and Boosting Classification Trees to Predict Churn - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Bagging and Boosting Classification Trees to Predict Churn

Description:

Bagging ... Do bagging (and boosting) provide better results than other benchmarks? ... Bagging is conceptually simple and easy-to-implement. ... – PowerPoint PPT presentation

Number of Views:264

Avg rating:3.0/5.0

Slides: 25

Provided by: nda25

Category:

more less

Transcript and Presenter's Notes

Title: Bagging and Boosting Classification Trees to Predict Churn

1
Bagging and Boosting Classification Trees to
Predict Churn

Insights from the US Telecom Industry

2
Motivation Reducing Churn

The 2002 Teradata Churn Tournament (Neslin et al.
2004)
Database an anonymous U.S. wireless telecom
company
Aim predicting churn (i.e. a customer defecting
from one provider to another)
Monthly churn rates approximately 2.6 ?
important issue.
Cost of churn 300 to 700 cost of replacement
of a lost customer in terms of sales support,
marketing, advertising, etc.

Predicting churn for elaborating targeted
retention strategies (Bolton et al. 2000, Ganesh
et al. 2000, Shaffer and Zhang 2002)
3
Formulation of the Churn Problem

Churn as a Classification issue
Classify a customer i characterized by k
variables
xi (xi1 , xi2 , , xiK ) as
Churner yi 1
Non-churner yi - 1
Churn is the response dummy variable to predict
yi f(xi )

Choice of the binary choice model f ( ) ?
4
Classification Models in Marketing

Simple binary logit choice model (e.g. Andrews et
al. 2002)
Models allowing for the heterogeneity in
consumers response
Finite mixture model (e.g. Wedel and Kamakura
2000)
Hierarchical Bayes model (e.g. Yang and Allenby
2003)
Non-parametric choice models
Decisions trees, neural nets (e.g. Thieme et al.
2000 West et al. 1997)

5
Classification Models in Marketing (2)

Other non-parametric choice models originating
from the statistical machine learning literature
Bagging (Breiman 1996)
Boosting (Freund and Schapire 1996)
Stochastic gradient boosting (Friedman 2002)

Mostly ignored in the marketing literature
S.G. Boosting was the winner of the Teradata
Tournament
Cardell (Salford) at the 2003 INFORMS Mktg
Science Conference

6
Calibration sample Z(xi,yi), i 1,,N
7
Aggregation

8
Bagging

Let the calibration sample be Z(x1,y1), ,
(xi,yi), , (xN,yN)
B bootstrap samples
From each , a base classifier (e.g. tree) is
estimated, giving B score functions
The final classifier is obtained by averaging the
scores
The classification rule is carried out via

9
Stochastic Gradient Boosting

Winner of the Teradata Churn Modeling Tournament
(Cardell, Golovnya and Steinberg, Salford
Systems).
Data adaptively resampled

Previously misclassified observations ? weights
Previously well-classified observations ?
weights
10
Data
Customer
Calibration Sample
Validation Hold-Out Sample
yi - 1
Balanced Sample
N51,306
yi 1
Xi(x1,, x46)
yi
Proportional Sample
yi - 1
N100,462
yi 1
Xi(x1,, x46)
yi
Time
11
Research Questions

Do bagging (and boosting) provide better results
than other benchmarks?
What are the more relevant churn drivers or
triggers that marketers could watch for?
How to correct estimated scores obtained from a
balanced calibration sample, when predicting rare
events like churn?

12
Comparing Error Rates
Model estimated on the balanced calibration
sample Error rates computed on the hold-out
proportional validation sample
13
Bias due to Balanced Sampling

Overestimation of the number of churners
Several bias correction methods exist (see e.g.
Cosslett 1993 Donkers et al. 2003 Franses and
Paap 2001, p.73-75 Imbens and Lancaster 1996
King and Zeng 2001a,b Scott and Wild 1997).
However, most are dedicated to traditional models
(e.g. logit). We discuss two corrections for
bagging and boosting.

14
The Bias Correction Methods

The weighting correction
Based on marketers prior beliefs about the
churn rate, i.e. the proportion of churners among
their customers, we attach weights to
observations of a balanced calibration sample.
The intercept correction
Take a non-zero cut-off value tB such that the
proportion of predicted churners in the
calibration sample equals the actual a priori
proportion of churners.

15
Bagging

Let the calibration sample be Z(x1,y1), ,
(xi,yi), , (xN,yN)
B bootstrap samples
From each , a base classifier (e.g. tree) is
estimated, giving B score functions
The final classifier is obtained by averaging the
scores
The classification is carried out via

16
Assessing the Best Bias Correction
Model estimated on the balanced calibration
sample Error rates computed on the hold-out
proportional validation sample
17
The Top-Decile Lift

Focuses on the most critical group of customers
regarding their churn risk Ideal segment for
targeting a retention mktg campaign
The top 10 riskiest customers
With the proportion of churners in this risky
segment
And the proportion of churners in the whole
validation set
Top-decile lift is related directly to
profitability
(Neslin et al. 2004 Gupta et al. 2004)

18
Top-Decile Lift with Intercept Correction
26
Model estimated on the balanced sample, and
lift computed on the validation sample.
19
Validated Top-Decile Lift
Model estimated on the balanced calibration
sample Error rates computed on the hold-out
proportional validation sample
20
Most Important Churn Triggers
Bagging
21
Partial Dependence Plots
Bagging
22
Partial Dependence Plot
51
50
Probability to churn
49
23
Conclusions Main Findings

Bagging and S.G. boosting are substantially
better classifiers than the binary logit choice
model
Improvement of 26 for the top-decile lift,
Good diagnostic measures offering face validity,
Interesting insights about potential churn
drivers,
Bagging is conceptually simple and
easy-to-implement.
Intercept correction constitutes an appropriate
bias correction for bagging when using balanced
sampling scheme.

24
Thanks for your attention

Write a Comment

User Comments (0)