Title: Customer Relationship Management A Databased Approach
1Customer Relationship ManagementA Databased
Approach
- V. Kumar
- Werner J. Reinartz
- Instructors Presentation Slides
2Chapter Ten
3Topics Discussed
- Applications of Data Mining
- Involvement of the three main groups
participating in a data-mining project - Overview of the Data Mining Process
- CRM at Work Credite Est and Yapi Kredi
4Applications of Data Mining
- Reducing churn with the help of predictive
models, which enable early identification of
those customers likely to stop doing business
with the company - Increasing customer profitability by identifying
customers with a high growth potential - Reducing marketing costs by more selective
targeting
5Overview of the Data Mining Process
Learn
Get Raw Data
Identify Relevant Variables
(Re)defineBusiness objectives
-
Gain Customer Insight
Act
- Extract
- descriptive and
- transactional data
- Define
- objectives
- and expectations
- Create analytical
- variables
- Define
- measurement
- of success
- Select relevant
- variables
6Timeframe of Data Mining Methodology
60
-
(Re
-
) Define
Raw
Relevant
Customer
insight
7Extent of Involvement of The Three Main Groups
Participating in a Data-Mining Project
8Involvement of Business, Data Mining and IT
Resources in a Typical Data Mining Project
- Data mining group
- Understand the business objectives and support
the business group to refine and sometimes
correct the scope, and expectations - Most active during the variable selection and
modeling phase - Share the obtained customer insights with the
business group - IT resources
- Required for the sourcing and extraction of the
required data used for modeling -
- Business group
- Involved in checking the plausibility and
soundness of the solution in business terms - Takes the lead in deploying the new insights into
corporate action such as a call center or direct
mail campaign
9Manipulations to Data Set
- Column manipulations
- Transformation
- Derivation
- Elimination
- Row manipulations
- Aggregation
- Change detection
- Missing value detection
- Outlier detection
10Data Preparation
- For modeling, incoming data is sampled and split
into various streams as - Train set Used to build the models
- Test set Used for out-of-sample tests of the
model quality and to select the final model
candidate - Scoring data Used for model-based prediction ,
large as compared to other data sets
11Define Business Objectives
- Modeling of expected customer potential, in
order to target acquisition of - customers who will be profitable over the
whole lifetime of the business - relationship
- Distinguish between customers with a target
variable equal to zero and - customers with a target variable equal to one
- Establish likelihood threshold levels above
which business group think a - prospect should be included in the marketing
campaign
12Define Business Objectives (contd.)
- Define the set of business or selection rules for
the campaign (e.g. , the customers that should
be excluded from or included in the target
groups) - Define the details of project execution
specifying the start and delivery dates - of the data mining process, and the responsible
resources for each task - Define the chosen experimental setup for the
campaign - Define a cost/revenue matrix describing how the
business mechanics will work in the supported
campaign and how it will impact the data mining
process - Establish the criteria for evaluating the success
of the campaign - Find a benchmark to compare against results
obtained in the past for the - same or similar campaign setups using
traditional targeting methods, and not predictive
models
13Cost/Revenue Matrix
- Will have an impact on the choice of model
- parameters such as the cut-off point for the
selected model scores - It will also give business users an immediately
interpretable table
14Cost/Revenue Matrix
Cost/Revenue matrix In reality prospect did not purchase In reality prospect did purchase
Model predicts prospect will not purchase (not contacted) Cost 0 1st year revenue 0 Total 0 lost business opportunity of 895
Model predicts prospect will purchase (contacted) Cost -5 1st year revenue 0 Total -5 Cost -5-100 1st year revenue 1000 Total 895
- Assuming average cost per call is 5, each
positive responder (purchaser) will generate
additional cost due to - administration work required to register him as a
new customer - the cost of the delivered phone handset (say,
100) - Customers, who respond positively will, generate
average revenue of 1000 per year
15Get Raw Data
- Identify, extract and consolidate raw data in a
database - (often called Analytical Data Mart)
- Check the quality of the analytical raw data -
technical checks as well - as ensuring that the data makes sense in the
given business context
16Get Raw Data (contd.)
- Step 1 Looking for Data Sources
- Mixed top-down and bottom-up process, driven by
business requirements (top) and technical
restrictions (bottom) - Step 2 Loading the Data
- Define how the data will be imported into the
data mining environment - Checking Data Quality
- Technical aspects of the data primary keys,
duplicate records, missing values - Business context realistic data
17Step 1 Looking for Data Sources
- Data warehouse infrastructures with advanced data
cleansing processes can help ensure that you are
working with high-quality data - Build a (simple) relational data model onto which
the source data will be mapped
18Step 2 Loading the Data
- Define further query restrictions , prepared by
IT teams , for execution at pre-defined time
windows in batch mode - Deliver extracted data to the data mining
environment in a pre-defined format - Further processing and using data to fill
previously defined data model in the data mining
environment as part of the ETL process
(Extract-Transform-Load)
19Step 3 Checking Data Quality
- Assess and understand limitations of data
resulting from its inherent quality (good or bad)
aspects - Create an analytical database as the basis for
subsequent analyses - Carry out preliminary data quality assessment
- To assure an acceptable level of quality of the
delivered data - To ensure that the data mining team has a clear
understanding of how to interpret the data in
business terms - Data miners have to carry out some basic data
interpretation and aggregation exercises
20Identify Relevant Predictive Variables
Step 1 Create Analytical Customer View
Flattening the Data Step 2 Create Analytical
Variables Step 3 Select Predictive Variables
21Step 1 Create Analytical Customer View
Flattening the Data
- Individual customer constitutes an observational
unit for data analysis and predictive modeling - All data pertaining to an individual customer is
contained in one observation (row, record) - Individual columns (variables, fields) represent
the conditions at specific points in time or a
summary over a whole period - Definition of the target or dependent variable-
values should be generated for all customers and
added to the existing data tables
22Step 2 Create Analytical Variables
- Introduce additional variables derived from the
original ones - When needed, transform variables to get new and
more predictive variables - Increase normality of variable distributions to
help the predictive model training process - Missing value management is key for enhancing the
quality of the analytical data set
23Step 3 Select Predictive Variables
- Inspect the descriptive statistics of all
univariate distributions associated to all
available variables - Exclude those variables
- which take on only one value (i.e. the variable
is a constant) - with mostly missing values
- directly or indirectly identifying an individual
customer - showing collinearities
- showing very little correlation with the target
variable - Containing personal identifiers
- Define a threshold missing value count level
above which the field would be excluded from
further analysis (e.g. more than 95 missing
values) - Check if all variables have been mapped to the
appropriate data types
24Gain Customer Insight
Step 1 Preparing data samples Step 2 Predictive
Modeling Step 3 Select Model
.
25Step 1 Preparing Data Samples
- Analyze if sufficient data is available to obtain
statistically significant results - If enough data is available, split the data into
two samples - the train set to fit the models
- the test set to check the models performance on
observations that have not been used to build it
26Step 2 Predictive Modeling
- Two steps
- The rules (or linear/non-linear analytical
models) are built based on a training set - These rules are then applied to a new dataset for
generating the answers needed for the campaign - Guidelines
- Distinguish between different types of predictive
models obtained through different modeling
paradigms supervised and un-supervised modeling - Find the right relationships between variables
describing the customers to predict their
respective group membership likelihood purchaser
or non-purchaser, referred to as scoring (e.g.
between 0 and 1) - Apply unsupervised modeling where group
membership is not known beforehand
27Step 3 Select Model
Compare relative quality of prediction by
comparing respective misclassification rates
obtained on the test set Example of
misclassification error rate or confusion matrix
Input Node - Classification Neural Network (10)
28Act
Step 1 Deliver Results to Operational
Systems Step 2 Archive Results Step 3 Learn
29Step 1 Deliver Results to Operational Systems
- Apply the selected model to the entire customer
base - Prepare score data set containing the most recent
information for each customer with the variables
required by the model - The obtained score value for each customer and
the defined threshold value will determine
whether the corresponding customer qualifies to
participate in the campaign - When delivering results to the operational
systems, provide necessary customer identifiers
to unambiguously link the models score
information to the correct customer
30Step 2 Archive Results
- Each data mining project will produce a huge
amount of information including - raw data used
- transformations for each variable
- formulas for creating derived variables
- train, test and score data sets
- target variable calculation
- models and their parameterizations
- score threshold levels
- final customer target selections
- Useful to preserve especially if the same model
is used to score different data sets obtained at
different times
31Step 3 Learn
- Referred to as closing the loop
- Obtain the facts describing performance of data
mining project and business impact - Obtained by monitoring campaign performance while
it is running and from final campaign performance
analysis after the campaign has ended - Detect when a model has to be re-trained
32CRM at Work Credite Est
- Regional mid-tier bank in France use of data
mining in marketing - Uses segmentation scheme based on behavioral
characteristics - (e.g. product ownership), and an
activity-based-costing system to identify
individual customer level contribution margin - Project
- Business goal to acquire new prospects
- Objective to identify the characteristics of
profitable customers in Credite Ests mass-market
segment to efficiently target similar profiles in
the prospect pool
33Credite Est (contd.)
- Get Raw Data
- Response variable for current customers is
customer contribution margin - Customers sorted by operating contribution and
profile of the top 20 of customers noted - Transaction information on prospects purchased
and then appended to individual records of
existing customers - Identify relevant variables
- To find the profile that best characterizes high
value clients which is subsequently applied to
prospects information - Model attempts to predict customer operating
margin as dependent variable with geodemographic
information as independent variables - Credite Est appended a total of 65 variables to
existing customer records
34Credite Est (contd.)
- Select Predictive Variables
- All variables that were appended had almost 50
missing data - Assessing whether any of the missing data could
be meaningfully replaced improved the overall
rate of missing values from 42 to 21 - Investigation of univariate statistics (means,
standard deviations, frequencies, outliers) for
all variables brought reduction in variables from
65 to 54 - Calculation of all bi-variate correlations (or
mean analyses in case of categorical variables)
of existing independent variables with the
dependent variable customer value - Data evaluation process resulted in a total of 17
variables that had a reasonable correlation with
the dependent variable. These were retained for
the next step, the response model
35Credite Est (contd.)
- Gain Customer Insight
- Use logistic regression to classify the dependent
variable as 0/1 the goal being to either target
or not target a certain individual in the
prospect pool - Theory-based elimination variables that are
highly collinear - The ability of the model to correctly classify in
a holdout sample was 75.5 in the estimation
sample and 69.8 in the holdout sample, roughly
20 higher than based on chance alone - Result was deemed successful and it was decided
to utilize this model for a prospecting campaign
36Credite Est (contd.)
- Act
- Final model was rolled out in sequential fashion
to target prospect audience - Credite Est purchased addresses from list brokers
that had at least non-missing vales for 3 out of
the 5 variables in the final model - The prospects were scored with the model and then
ranked by likelihood of being a high value
customer - Objective was to assess the receptivity of the
two samples of customers for respective products - Result Both target mailings were significantly
more successful than the base line scenario
37CRM at Work Yapi Kredi Predictive Model Based
Cross-Sell Campaign
- Challenge To continue YAPI KREDIs development
as the fastest growing retail bank in Turkey - Capabilities required
- Advanced analytical customer segmentation
- Segment specific offering of product bundles
- Conversion of customers to more profitable
segments via targeted campaigns using advanced
CRM tools such as predictive modeling - Project plan
- To carry out a set of pilot projects for
cross-selling of consumer banking products - A reduced selection of target customers with a
high propensity to positively respond would be
included in a multi-channel, two-step campaign
38Yapi Kredi - Define Business Objectives
- YAPI KREDIs B-type mutual funds, characterized
by - Being low risk investment instruments based on
fixed income securities - Easily purchased via the ATM, Web, and Telephone
channels - Offer to two customer groups
- Customers already having invested into B-type
mutual funds to stimulate an increase of the
assets - Customers not yet owning any B-type fund to help
increase product ratio and attract new money
39Yapi Kredi-Define business objectives (contd. )
- Communication channels two-channel approach
- Campaign sizing Contact 3000 customers by
branch based out-bound calls and active marketing
during customer branch visits - Campaign Two-step
- Customers were first contacted with the B-type
mutual fund offer - Positive responders received a follow up call if
they had not purchased until one week after their
initial positive response - Evaluation of results Based on response and
purchase rates by contact channel (branch or call
center)
40Yapi Kredi- Get Raw Data Identify Relevant
Variables
- Get Raw Data
- Data mart with data extracted from more than 50
source system tables - About 20 database tables were produced with 30
Giga Bytes of disk space for the initial project
phase - Identify Relevant Variables - customer attributes
describing - Demographics
- Product Ownership
- Product Usage
- Channel usage
- Assets
- Liabilities
- Profitability
41Yapi Kredi - Gain Customer Insight
- Based on six months of historical customer data,
five different predictive models were developed - Best model logistic regression
- Yielding a lift value of 29 and a cumulative
response rate of 14 for the top customer
percentile - Reaches 2.9 times more responders for the top
customer percentile than a random selection of
the same size - A set of 4200 customers with the highest
propensity to purchase was selected as the target
group for the pilot campaign
42Yapi Kredi - Act
- A subset of 3000 customers was assigned to the 16
branches holding the responsibility for the
respective relationships - The remaining 1200 customers were assigned to the
call center - The target list with the corresponding channel
assignment was made available to the campaign
management system
43Yapi Kredi - Result
- Result
- Impressive response rates of 6.5 and 12.2 were
obtained with the branch based part of the
campaign and the call center based part of the
campaign respectively - The pilot campaign acquired more than 1 million
into B-type mutual funds
44Summary
- Data Mining can assist in selecting the right
target customers or in identifying previously
unknown customers with similar behavior and needs - A good target list is likely to increase purchase
rates, and have a positive impact on revenue - In the context of CRM, the individual customer is
often the central object analyzed by means of
data mining methods - A complete data mining process comprises
assessing and specifying the business objectives,
data sourcing, transformation and creation of
analytical variables, and building analytical
models using techniques such as logistic
regression and neural networks, scoring customers
and obtaining feedback from the field - Learning and refining the data mining process is
the key to success