Title: Bagging and Boosting Classification Trees to Predict Churn
1Bagging and Boosting Classification Trees to
Predict Churn
- Insights from the US Telecom Industry
2Motivation Reducing Churn
- The 2002 Teradata Churn Tournament (Neslin et al.
2004) - Database an anonymous U.S. wireless telecom
company - Aim predicting churn (i.e. a customer defecting
from one provider to another) - Monthly churn rates approximately 2.6 ?
important issue. - Cost of churn 300 to 700 cost of replacement
of a lost customer in terms of sales support,
marketing, advertising, etc.
Predicting churn for elaborating targeted
retention strategies (Bolton et al. 2000, Ganesh
et al. 2000, Shaffer and Zhang 2002)
3Formulation of the Churn Problem
- Churn as a Classification issue
- Classify a customer i characterized by k
variables - xi (xi1 , xi2 , , xiK ) as
- Churner yi 1
- Non-churner yi - 1
- Churn is the response dummy variable to predict
yi f(xi )
Choice of the binary choice model f ( ) ?
4Classification Models in Marketing
- Simple binary logit choice model (e.g. Andrews et
al. 2002) - Models allowing for the heterogeneity in
consumers response - Finite mixture model (e.g. Wedel and Kamakura
2000) - Hierarchical Bayes model (e.g. Yang and Allenby
2003) - Non-parametric choice models
- Decisions trees, neural nets (e.g. Thieme et al.
2000 West et al. 1997)
5Classification Models in Marketing (2)
- Other non-parametric choice models originating
from the statistical machine learning literature - Bagging (Breiman 1996)
- Boosting (Freund and Schapire 1996)
- Stochastic gradient boosting (Friedman 2002)
- Mostly ignored in the marketing literature
- S.G. Boosting was the winner of the Teradata
Tournament - Cardell (Salford) at the 2003 INFORMS Mktg
Science Conference
6Calibration sample Z(xi,yi), i 1,,N
7Aggregation
8Bagging
- Let the calibration sample be Z(x1,y1), ,
(xi,yi), , (xN,yN) - B bootstrap samples
- From each , a base classifier (e.g. tree) is
estimated, giving B score functions - The final classifier is obtained by averaging the
scores - The classification rule is carried out via
9Stochastic Gradient Boosting
- Winner of the Teradata Churn Modeling Tournament
(Cardell, Golovnya and Steinberg, Salford
Systems). - Data adaptively resampled
Previously misclassified observations ? weights
Previously well-classified observations ?
weights
10Data
Customer
Calibration Sample
Validation Hold-Out Sample
yi - 1
Balanced Sample
N51,306
yi 1
Xi(x1,, x46)
yi
Proportional Sample
yi - 1
N100,462
yi 1
Xi(x1,, x46)
yi
Time
11Research Questions
- Do bagging (and boosting) provide better results
than other benchmarks? - What are the more relevant churn drivers or
triggers that marketers could watch for? - How to correct estimated scores obtained from a
balanced calibration sample, when predicting rare
events like churn?
12Comparing Error Rates
Model estimated on the balanced calibration
sample Error rates computed on the hold-out
proportional validation sample
13Bias due to Balanced Sampling
- Overestimation of the number of churners
- Several bias correction methods exist (see e.g.
Cosslett 1993 Donkers et al. 2003 Franses and
Paap 2001, p.73-75 Imbens and Lancaster 1996
King and Zeng 2001a,b Scott and Wild 1997). - However, most are dedicated to traditional models
(e.g. logit). We discuss two corrections for
bagging and boosting.
14The Bias Correction Methods
- The weighting correction
- Based on marketers prior beliefs about the
churn rate, i.e. the proportion of churners among
their customers, we attach weights to
observations of a balanced calibration sample. - The intercept correction
- Take a non-zero cut-off value tB such that the
proportion of predicted churners in the
calibration sample equals the actual a priori
proportion of churners.
15Bagging
- Let the calibration sample be Z(x1,y1), ,
(xi,yi), , (xN,yN) - B bootstrap samples
- From each , a base classifier (e.g. tree) is
estimated, giving B score functions - The final classifier is obtained by averaging the
scores - The classification is carried out via
16Assessing the Best Bias Correction
Model estimated on the balanced calibration
sample Error rates computed on the hold-out
proportional validation sample
17The Top-Decile Lift
- Focuses on the most critical group of customers
regarding their churn risk Ideal segment for
targeting a retention mktg campaign - The top 10 riskiest customers
- With the proportion of churners in this risky
segment - And the proportion of churners in the whole
validation set - Top-decile lift is related directly to
profitability - (Neslin et al. 2004 Gupta et al. 2004)
18Top-Decile Lift with Intercept Correction
26
Model estimated on the balanced sample, and
lift computed on the validation sample.
19Validated Top-Decile Lift
Model estimated on the balanced calibration
sample Error rates computed on the hold-out
proportional validation sample
20Most Important Churn Triggers
Bagging
21Partial Dependence Plots
Bagging
22Partial Dependence Plot
51
50
Probability to churn
49
23Conclusions Main Findings
- Bagging and S.G. boosting are substantially
better classifiers than the binary logit choice
model - Improvement of 26 for the top-decile lift,
- Good diagnostic measures offering face validity,
- Interesting insights about potential churn
drivers, - Bagging is conceptually simple and
easy-to-implement. - Intercept correction constitutes an appropriate
bias correction for bagging when using balanced
sampling scheme.
24Thanks for your attention