Bagging and Boosting Classification Trees to Predict Churn - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Bagging and Boosting Classification Trees to Predict Churn

Description:

Bagging ... Do bagging (and boosting) provide better results than other benchmarks? ... Bagging is conceptually simple and easy-to-implement. ... – PowerPoint PPT presentation

Number of Views:263
Avg rating:3.0/5.0
Slides: 25
Provided by: nda25
Category:

less

Transcript and Presenter's Notes

Title: Bagging and Boosting Classification Trees to Predict Churn


1
Bagging and Boosting Classification Trees to
Predict Churn
  • Insights from the US Telecom Industry

2
Motivation Reducing Churn
  • The 2002 Teradata Churn Tournament (Neslin et al.
    2004)
  • Database an anonymous U.S. wireless telecom
    company
  • Aim predicting churn (i.e. a customer defecting
    from one provider to another)
  • Monthly churn rates approximately 2.6 ?
    important issue.
  • Cost of churn 300 to 700 cost of replacement
    of a lost customer in terms of sales support,
    marketing, advertising, etc.

Predicting churn for elaborating targeted
retention strategies (Bolton et al. 2000, Ganesh
et al. 2000, Shaffer and Zhang 2002)
3
Formulation of the Churn Problem
  • Churn as a Classification issue
  • Classify a customer i characterized by k
    variables
  • xi (xi1 , xi2 , , xiK ) as
  • Churner yi 1
  • Non-churner yi - 1
  • Churn is the response dummy variable to predict
    yi f(xi )

Choice of the binary choice model f ( ) ?
4
Classification Models in Marketing
  • Simple binary logit choice model (e.g. Andrews et
    al. 2002)
  • Models allowing for the heterogeneity in
    consumers response
  • Finite mixture model (e.g. Wedel and Kamakura
    2000)
  • Hierarchical Bayes model (e.g. Yang and Allenby
    2003)
  • Non-parametric choice models
  • Decisions trees, neural nets (e.g. Thieme et al.
    2000 West et al. 1997)

5
Classification Models in Marketing (2)
  • Other non-parametric choice models originating
    from the statistical machine learning literature
  • Bagging (Breiman 1996)
  • Boosting (Freund and Schapire 1996)
  • Stochastic gradient boosting (Friedman 2002)
  • Mostly ignored in the marketing literature
  • S.G. Boosting was the winner of the Teradata
    Tournament
  • Cardell (Salford) at the 2003 INFORMS Mktg
    Science Conference

6
Calibration sample Z(xi,yi), i 1,,N
7
Aggregation

8
Bagging
  • Let the calibration sample be Z(x1,y1), ,
    (xi,yi), , (xN,yN)
  • B bootstrap samples
  • From each , a base classifier (e.g. tree) is
    estimated, giving B score functions
  • The final classifier is obtained by averaging the
    scores
  • The classification rule is carried out via

9
Stochastic Gradient Boosting
  • Winner of the Teradata Churn Modeling Tournament
    (Cardell, Golovnya and Steinberg, Salford
    Systems).
  • Data adaptively resampled

Previously misclassified observations ? weights
Previously well-classified observations ?
weights
10
Data
Customer
Calibration Sample
Validation Hold-Out Sample
yi - 1
Balanced Sample
N51,306
yi 1
Xi(x1,, x46)
yi
Proportional Sample
yi - 1
N100,462
yi 1
Xi(x1,, x46)
yi
Time
11
Research Questions
  • Do bagging (and boosting) provide better results
    than other benchmarks?
  • What are the more relevant churn drivers or
    triggers that marketers could watch for?
  • How to correct estimated scores obtained from a
    balanced calibration sample, when predicting rare
    events like churn?

12
Comparing Error Rates
Model estimated on the balanced calibration
sample Error rates computed on the hold-out
proportional validation sample
13
Bias due to Balanced Sampling
  • Overestimation of the number of churners
  • Several bias correction methods exist (see e.g.
    Cosslett 1993 Donkers et al. 2003 Franses and
    Paap 2001, p.73-75 Imbens and Lancaster 1996
    King and Zeng 2001a,b Scott and Wild 1997).
  • However, most are dedicated to traditional models
    (e.g. logit). We discuss two corrections for
    bagging and boosting.

14
The Bias Correction Methods
  • The weighting correction
  • Based on marketers prior beliefs about the
    churn rate, i.e. the proportion of churners among
    their customers, we attach weights to
    observations of a balanced calibration sample.
  • The intercept correction
  • Take a non-zero cut-off value tB such that the
    proportion of predicted churners in the
    calibration sample equals the actual a priori
    proportion of churners.

15
Bagging
  • Let the calibration sample be Z(x1,y1), ,
    (xi,yi), , (xN,yN)
  • B bootstrap samples
  • From each , a base classifier (e.g. tree) is
    estimated, giving B score functions
  • The final classifier is obtained by averaging the
    scores
  • The classification is carried out via

16
Assessing the Best Bias Correction
Model estimated on the balanced calibration
sample Error rates computed on the hold-out
proportional validation sample
17
The Top-Decile Lift
  • Focuses on the most critical group of customers
    regarding their churn risk Ideal segment for
    targeting a retention mktg campaign
  • The top 10 riskiest customers
  • With the proportion of churners in this risky
    segment
  • And the proportion of churners in the whole
    validation set
  • Top-decile lift is related directly to
    profitability
  • (Neslin et al. 2004 Gupta et al. 2004)

18
Top-Decile Lift with Intercept Correction
26
Model estimated on the balanced sample, and
lift computed on the validation sample.
19
Validated Top-Decile Lift
Model estimated on the balanced calibration
sample Error rates computed on the hold-out
proportional validation sample
20
Most Important Churn Triggers
Bagging
21
Partial Dependence Plots
Bagging
22
Partial Dependence Plot
51
50
Probability to churn
49
23
Conclusions Main Findings
  • Bagging and S.G. boosting are substantially
    better classifiers than the binary logit choice
    model
  • Improvement of 26 for the top-decile lift,
  • Good diagnostic measures offering face validity,
  • Interesting insights about potential churn
    drivers,
  • Bagging is conceptually simple and
    easy-to-implement.
  • Intercept correction constitutes an appropriate
    bias correction for bagging when using balanced
    sampling scheme.

24
Thanks for your attention
Write a Comment
User Comments (0)
About PowerShow.com