Title: Based on the book
1Data Mining Applications for CRM
- Based on the book Building Data Mining
Applications for CRM - By
- Alex Berson
- Stephen Smith
- Kurt Thearling
2Data Mining Applications for CRM
- Summary of Topics
- 1. Customer Relationship Management-Framework
and - Architecture
- 2. Reinforcing CRM with Data Mining
- 3. Data Mining An Overview
- 4. Key Terms
- 5. Data Mining Methodology
- 6. Classical Techniques Statistics,
Neighborhoods, and Clustering - 7. Next Generation Techniques Trees,
Networks, and Rules - 8. CRM -The Business Perspective
- 9. Deploying Data Mining for CRM
- 10. Data Quality
- 11. Next Generation of Information Mining and
Knowledge Discovery for Effective
CRM - 12. CRM in the e-Business World
3Topic 1 Customer Relationship Management-Framewor
k and Architecture
- CRM is an enterprise approach to customer service
that uses meaningful communication to understand
and influence consumer behavior. The purpose of
the process is twofold - a To impact all aspects to the consumer
relationship (e.g., improve customer
satisfaction, enhance customer loyalty, and
increase profitability) and - B To ensure that employees within an
organization are using CRM tools. The need for
greater profitability requires an organization to
proactively pursue its relationships with
customers.
4Customer Relationship Management-Framework and
Architecture
- Which customers are most profitable to me? Why?
- What promotions are most effective? For which
customers? - What kind of customers will be interested in my
new product? - What customers are at risk to defect to my
competitor? - How do I identify prospects with the greatest
profit potentials? - Customer information is rapidly becoming a
companys most - important asset to answer these questions.
However, to answer these - questions in broad generalities is not enough.
Each customer must be - analyzed and potentially treated uniquely.
Customer Relationship - Management provides the framework for analyzing
customer - profitability and improving marketing
effectiveness.
5Customer Relationship Management -Framework and
Architecture
- Many organizations have collected and stored a
wealth of data about their - customers, suppliers, and business partners.
However, the inability to - discover valuable information hidden in the data
prevents these organizations - from transforming this data into knowledge. The
business desire is, therefore, to - extract valid, previously unknown, and
comprehensible information from large - databases and use it for profits. To fulfill
these goals, organizations need to - follow these steps
- - Capture and integrate both the internal and
external data into a - comprehensive view that encompasses the whole
organization. - - Mine the integrated data for information.
- - Organize and present the information with
knowledge for decision-making.
6Data, Information, and Decision
- Data Resource Management (DRM)
- MIS (OLTP) OOAD
- KM (Knowledge Mgt), KWS (Knowledge Work Systems)
- DSS ESS, EIS (Executive Information Systems)
-
- Data Warehousing/Data Mart/Data Mining/OLAP
(Executive, Collaborative and individual
levels) - Business Intelligence
- Data
- Information (Data Process)
- Knowledge/Business Intelligence
- Decision (Information
- Knowledge)
- Data/Information/Decision /Business Intelligence
7Customer Relationship Management --Framework and
Architecture
- From the architecture point of view, the entire
CRM framework can - be classified into three key components
- Operational CRM The automation of horizontally
integrated business processes, including customer
touch-points, channels, and front-back office
integration. - Analytical CRM- The analysis of data created by
the Operational CRM - Collaborative CRM- Applications of Collaborative
services including e-mail, personalized
publishing, e-communities, and similar vehicles
designed to facilitate interactions between
customers and organizations.
8CRM Architecture
Business Rules and Metadata Management
Data Sources
Market Data Store
Decision Support Applications
Communication Channels
Contact History
Direct Mails
Campaign Mgt
Campaign Mgt
Call Center Call Center
Contact Mgt
Transaction History
ETL Tools
Customer Service Center
Analytics Data Mart
Data Mining Analytics
Marketing Data Marts
Customer Profile And account
Internet
E-mail
Reporting Data Mart
Reporting Data Mart
Other
External Data
Workflow Management
9Campaign MGT Software-Managing Campaigns
- Accommodation of many new touch points besides
direct mail, for ex., the Web, direct TV ad.,
hard copy advertising customer services, street
brochure dispatch, and signage. - Focus on profitability (not only on which
customer was most profitable, but also on what
was the most profitable promotion that could be
sent., e.g., send .025 postcard rather than the
25 rebate if both have the same effect). - Optimization of the sequence of promotion
delivery. - Tools for constructing experiments that allow the
marketing professionals to test out the
effectiveness of new promotions and new
segmentation techniques, for ex., using different
contents and timing for signage advertising. - Accommodation by the system of predictive
modeling from data mining , which provides
insights into future customer behavior and future
customer profitability.
10Web-Enabled Information Delivery
Structured Content
Query Engine Analytics Drill Down Agents
Web Browser
Web Server
SQL
CGI
HTML
HTML
Unstructured Content
How about the web Log, or blog which has become
a popular source for information acquisition.
11Topic 2 Reinforcing CRM with Data Mining
- Companies worldwide are beginning to realize that
surviving an intensively competitive and global
marketplace requires closer relationships with
customers. In turn, enhanced customer
relationships can boost profitability three ways
a) by reducing costs by attracting more suitable
customers, b) by generating profits through
cross-selling and up-selling activities, and c)
by extending profits through customer retention.
Slightly expanded explanations of these
activities follow
12Reinforcing CRM with Data Mining
- Attracting more suitable customers Data
mining can help firms understand which customers
are most likely to purchase specific products and
services, thus enabling businesses to develop
targeted marketing programs for higher response
rates and better returns on investment. - Better cross-selling and up-selling
Businesses can increase their value proposition
by offering additional products and services that
are actually desired by customers, thereby
raising satisfaction levels and reinforcing
purchasing habits. - Better retention Data-mining techniques can
identify which customers are more likely to
defect and why. A company can use this
information to generate ideas that allow them to
maintain these customers.
13DW Technologies and Tools-An Overview
Data Acquisition
Data Storage
Information Delivery
OLAP
Source Systems
Data Modeling
DW/ Data Marts
Extraction
Report Writer
Data Loading
Transformation
Staging Area
Quality Assurance
Load Image Creation
Alert Systems
Data Mining
14DW Information Flow
15Data Warehouse Database
- The central data warehouse database is a
cornerstone of data warehousing - environment. On the architecture diagram, the
database is almost always - implemented on the relational database management
system (RDBMS) - technology. Now the now approaches include the
following - Multidimensional database (MDDBs)- This is
tightly coupled with the online analytical
processing (OLAP) tools that act as clients to
the multidimensional data stores. - An innovative approach to speed up a traditional
RDBMs by using new index structures to bypass
relational table scans. - Parallel relational database designs that require
a parallel computing platforms, for ex.,
symmetric multiprocessor (SMP), massively
parallel processors (MPPs), and or clusters of
uni-or multiprocessors.
16Information Delivery Tool Taxonomy
- Tools are generally divided into five main groups
- Data query and reporting tools.
- Application development tools.
- Executive Information System (EIS) tools.
- Online analytical processing tools.
- Data mining tools.
17Topic 3 Data Mining An Overview
- Data mining can help reduce information overload
and improve decision making. This is achieved by
extracting and refining useful knowledge through
a process of searching for relationships and
patterns from the extensive data collected by
organizations. The extracted information is used
to predict, classify, model, and summarize the
data being mined. Data-mining technologies, such
as rule induction, neural networks, genetic
algorithms, fuzzy logic, and rough sets, are used
for classification and pattern recognition in
many industries.
18Data Mining An Overview
- A supermarket organizes its merchandise stock
based on shoppers' purchase patterns. - An airline reservation system uses customers'
travel patterns and trends to increase seat
utilization. - Web pages alter their organizational structure or
visual appearance based on information about the
person who is requesting the pages. - Individuals perform a Web-based query to find the
median income of households in Iowa.
19Data Mining An Overview
- Data mining builds models of customer
behavior by using established statistical and
machine-learning techniques. The basic objective
is to construct a model for one situation in
which the answer or output is known and then
apply that model to another situation in which
the answer or output is sought. The best
applications of the above techniques are
integrated with data warehouses and other
interactive, flexible business analysis tools.
The analytic data warehouse can thus improve
business processes across the organization in
areas such as campaign management, new product
rollout, and fraud detection.
20Data Mining An Overview
- Data mining integrates different
technologies to populate, organize, and manage
the data store. Because quality data is crucial
to accurate results, data-mining tools must be
able to clean the data, making it consistent,
uniform, and compatible with the data store. Data
mining employs several techniques to extract
important information. Operations are the actions
that can be performed on accumulated data,
including predictive modeling, database
segmentation, link analysis, and deviation
detection.
21Taxonomy of Data Mining Tools
- We can divided the entire data mining tool market
into three main - groups General-purpose tools, integrated
DSS/OLAP/data mining - tools, and rapidly growing, application-specific
tools. - The General-purpose tools which occupy the larger
and more mature - segment of the market include the following
- SAS Enterprise Minor
- IBM Intelligent Minor
- Unica PRW
- SPSS Clementine
- SGI Mineset
- Oracle Darwin
- Angoss KnowledgeSeeker
22Taxonomy of Data Mining Tools
- The integrated data mining tool segment addresses
a very real and - compelling business requirement of having a
single multi-function, - decision-support tool that can provide management
reporting, online - analytical processing, and data mining
capabilities within a common - framework. Examples of these integrated tools
include Cognos - scenario and Business Objects.
- The application-specific tools segment is rapidly
gaining momentum. - Among these tools are the following
- KDI (focuses on retail)
- Options Choices (focuses on insurance
industries) - HNC (focuses on fraud detection)
- Unica Model 1 (focuses on marketing)
23Database Mining Workstation (HNC)
- HNC is one of the most successful data mining
companies. Its Database - Mining workstation (DMW) is a neural network tool
that is widely-accepted - For credit card fraud analysis applications. DMW
consists of Windowsbased - software applications and a custom processing
board. Other HNC products - include Falcon and ProfitMax processing
applications for financial services, - and the Advanced Telecommunications Abuse Control
System (ATACS) - fraud-detection solution that HNC plans to deploy
in the Telecommunications - Industries.
24Taxonomy of Data Mining Tools
- There are specific tools, for example, for the
following applications - Financial Data Analysis neural networks have
been used in forecasting stock prices, option
trading, rating bonds, portfolio management,
commodity-price prediction, and mergers and
acquisition analysis. Using IBM Intelligent
minor, Mellon Bank developed a credit
card-attrition model to predict which customers
will stop using Mellons credit card in the next
few months. - Telecommunications Industry The
hyper-competitive nature of the industry has
created a need to understand customers, to keep
them, to model effective ways to market new
products.
25Taxonomy of Data Mining Tools
- Retail Industry Retail data mining can help
identify customer-buying behaviors, discover
consumer-shopping patterns and trends. - Healthcare and biomedical research The analysis
of large quantities of time-stamped data will
provide doctors with important information
regarding the progress of the decease. For ex.,
NeuroMedicalSystems used neural networks to
perform a pap smear diagnostic aid. - Science and engineering To improve its
manufacturing process. Boeing has successfully
applied machine-learning algorithms to the
discovery of informative and useful rules from
its plant data.
26Data Mining vs. Data Warehouse
- Major challenge to exploit data mining is
identifying suitable data to mine. - Data mining requires single, separate, clean,
integrated, and self-consistent source of data. - A data warehouse is well equipped for providing
data for mining. - Data quality and consistency is a pre-requisite
for mining to ensure the accuracy of the
predictive models. Data warehouses are populated
with clean, consistent data.
27Data Mining vs. Data Warehouse
- Data Mining does not require that a Data
Warehouse be built. Often, data can be downloaded
from the operational files to flat files that
contain the data ready for the data mining
analysis. - Data Mining can be implemented rapidly on
existing software and hardware platforms. Data
Mining tools can analyze massive databases to
deliver answers to questions such as, Which
customers are most likely to respond to my next
promotional mailing, and why?
28Data Mining vs. Data Warehouse
- Advantageous to mine data from multiple sources
to discover as many interrelationships as
possible. Data warehouses contain data from a
number of sources. - Selecting relevant subsets of records and fields
for data mining requires query capabilities of
the data warehouse. - Results of a data mining study are useful if
there is some way to further investigate the
uncovered patterns. Data warehouses provide
capability to go back to the data source.
29Data Mining vs. OLAP
- They are two separate breeds of analysis with
- entirely different objectives, not to mention
- tools, skill sets, and implementation methods.
30Data Mining vs. OLAP
- With canned reports, ad hoc querying, and OLAP,
the - end user defines a hypothesis and determines
which data - to examine. With data mining, the tool identifies
the - hypothesis, and it actually tells the user where
in the data - to start the exploration process.
31Data Mining vs. OLAP
- Rather than using SQL to filter out values and
methodically - reduce the data into a concise answer set, data
mining uses - algorithms that exhaustively review the
relationships among - data elements to determine if any patterns exist.
The whole - purpose of data mining is to yield new business
information - that a business person can act on.
32OLAP vs. Data Mining Tools
OLAP Tools
Data Mining Tools
- Are ad hoc, shrink wrapped tools that provide an
interface to data - Are used when you have specific known questions
- Looks and feels like a spreadsheet that allow
rotation, slicing and graphic -
- Can be deployed to large number of users
- Methods for analyzing multiple data types
- -- Regression Trees
- -- Neural networks
- -- Genetic algorithms
- Are used when you dont know what the questions
are - Usually textual in nature
- Usually deployed to a small number of analysts
-
33Topic 4 Key Terms
- Application Service Providers
- Offer outsourcing solutions that supply,
develop, and manage application specific software
and hardware so that customers' internal
information technology resources can be freed up. - Business Intelligence
- The type of detailed information that
business managers need for analyzing sales
trends, customers' purchasing habits, and other
key performance metrics in the company.
34Key Terms
- Categorical Data
- Fits into a small number of distinct
categories of a discrete nature, in contrast to
continuous data, and can be ordered (ordinal),
for example, high, medium, or low temperatures,
or nonordered (nominal), for example, gender or
city. - Classification
- The distribution of things into classes or
categories of the same type, or the prediction of
the category of data by building a model based on
some predictor variables.
35Key Terms
- Clustering
- Groups of items that are similar as
identified by algorithms. For example, an
insurance company can use clustering to group
customers by income, age, policy types, and prior
claims. The goal is to divide a data set into
groups such that records within a group are as
homogeneous as possible and groups are as
heterogeneous as possible. When the categories
are unspecified, this may be called unsupervised
learning. - Genetic Algorithm
- Optimization techniques based on
evolutionary concepts that employ processes such
as genetic combination, mutation, and natural
selection in a design.
36Key Terms
- Online Profiling
- The process of collecting and analyzing data
from Web site visits, which can be used to
personalize a customer's subsequent experiences
on the Web site. Network advertisers, for
example, can use online profiles to track a
user's visits and activities across multiple Web
sites, although such a practice is controversial
and may be subject to various forms of
regulation. - Rough Sets
- A mathematical approach to extract knowledge
from imprecise and uncertain data.
37Key Terms
- Rule Induction
- The extraction of valid and useful
if-then-else rules from data based on their
statistical significance levels, which are
integrated with commercial data warehouse and
OLAP platforms. - Visualization
- Graphically displayed data from simple
scatter plots to complex multidimensional
representations to facilitate better
understanding.
38Topic 5 Data Mining Methodology
- The methodology used today in data mining, when
it is well thought - out and well executed, consists of just a few
very important concepts. - Finding a pattern in the data and building a
model. In general, it means any sequence or
pattern of data that occurs more often than one
would it to if it were a random event. - Sampling or not having to use all of the data in
order to make significant conclusions about what
might be happening with other parts of the data. - Validating the predictive models that arise out
of data mining algorithm. - Finally, coming down to finding the pattern or
model that is the beat. -
- The four parts of data mining technology
patterns, sampling, validation, - and choosing the model.
39Pattern and Model
- Pattern An event or combination of events in a
database that occurs more - often than expected. Typically, this means that
its actual occurrence is - significantly different than what would be by
random chance. (for ex., - 121212?
- Model A description that adequately explains and
predicts relevant data but - is generally much smaller than the data itself.
For real-world applications, a - model can be anything from a mathematical
Equation, to a set of rules that - describes customer segments, to the computer
representation of a complex - neural network architecture, which translates to
several sets of mathematical - equations.
- Predictive model A model created or used to
perform prediction. In contrast - to models created solely for pattern detection,
exploration or general - organization of the data.
40Types Of Models
Descriptive The dealer sold 200 cars last month.
Operational
(OLTP)
Explanatory For every increase in 1 in the
interest, auto sales decrease by 5 .
Traditional DW
OLAP
Predictive predictions about future buyer
behavior.
Data Mining
41A high-level View of Modeling Process
Historical Data
Model Building
Prediction
Record ???
123
Model
42The Needs for Sampling
- Containing costs
- Speeding up the data gathering
- Improving effectiveness
- Reducing bias
43Sampling Design
- Four steps
- Determine the data to be collected or described
- Determine the population to be sampled
- Choose the type of sample
- Decide on the sample size
44Two Types of Data Mining Modeling- Verification
and Discovery
- The verification model utilizes a process that
looks in a database to detect trends and patterns
in data that will help answer some specific
questions about the business. - In this mode, the user generates a hypothesis
about the data, issues a query against the data
and examines the results of the query looking for
verification of the hypothesis or the user
decides that the hypothesis is not valid.
45Verification Model
- In this model, very little information is created
in this extraction process either the hypothesis
is verified or it is not. - Common tools used in this mode are queries,
multidimensional analysis and visualization. What
all have in common are that the user is
essentially guiding the exploration of the data
being inspected.
46Discovery Model
- A more popular model is the Discovery Model that
utilizes a process that looks in a database to
discover and/or predict future patterns. The
discovery model is divided into two modes
Descriptive and Predictive.
47Discovery Model- Descriptive Mode
- The Descriptive mode finds hidden patterns
without a predetermined idea or hypothesis about
what the patterns may be. In other words, the
Data Mining software or program takes the
initiative in finding what the interesting
patterns are, without the user thinking of the
relevant questions first. In this mode
information is created about the data with very
little or guidance from the user. The exploration
of the data is done in such a way as to yield as
large a number of useful facts about the data in
the shortest amount of time.
48Discovery Model- Predictive Mode
- In the Predictive mode patterns discovered from
the database are used to predict the future
patterns or trends. Predictive modeling allows
the user to submit records with some unknown
field values, and the system will guess the
unknown values based on previous patterns
discovered from the database. - In comparing the two models, one can state that
Verification can be very inefficient, timely
and costly. Whereas, Discovery modeling can be
very efficient, cost effective, less dependent on
user input and increases modeling accuracy.
49Predictive Modelling
- Similar to the human learning experience
- uses observations to form a model of the
important characteristics of some phenomenon. - Uses generalizations of real world and ability
to fit new data into a general framework. - Can analyze a database to determine essential
characteristics (model) about the data set.
50Predictive Modelling
- Model is developed using a supervised learning
approach, which has two phases training and
testing. - Training builds a model using a large sample of
historical data called a training set. - Testing involves trying out the model on new,
previously unseen data to determine its accuracy
and physical performance characteristics.
51Predictive Modelling
- Applications of predictive modelling include
customer retention management, credit approval,
cross selling, and direct marketing. - Two techniques associated with predictive
modelling - A. classification
- B. value prediction, distinguished by nature
of the variable being predicted.
52Predictive Modelling - Classification
- Used to establish a specific predetermined class
for each record in a database from a finite set
of possible, class values. - Two specializations of classification tree
induction and neural induction.
53Example of Classification using Tree Induction
54Example of Classification using Tree Induction
Customer renting property gt 2 years
No
Yes
Rent property
Customer agegt45
No
Yes
Rent property
Buy property
55Example of Classification using Neural Induction
56Example of Classification Using Neural Induction
- Each processing unit (circle) in one layer is
connected to each processing unit in the next
layer by a weighted value, expressing the
strength of the relationship. The network
attempts to mirror the way the human brain works
in recognizing patterns by arithmetically
combining all the variables with a given data
point. - In this way, it is possible to develop nonlinear
predictive models that learn by studying
combinations of variables and how different
combinations of variables affect different data
sets.
57Predictive Modelling - Value Prediction
- Used to estimate a continuous numeric value that
is associated with a database record. - Uses the traditional statistical techniques of
linear regression and non-linear regression. - Relatively easy-to-use and understand.
58Predictive Modelling - Value Prediction
- Linear regression attempts to fit a straight line
through a plot of the data, such that the line is
the best representation of the average of all
observations at that point in the plot. - Problem is that the technique only works well
with linear data and is sensitive to the presence
of outliers (i.e.., data values, which do not
conform to the expected norm).
59Predictive Modelling - Value Prediction
- Although non-linear regression avoids the main
problems of linear regression, still not flexible
enough to handle all possible shapes of the data
plot. - Statistical measurements are fine for building
linear models that describe predictable data
points, however, most data is not linear in
nature.
60Predictive Modelling - Value Prediction
- Data mining requires statistical methods that can
accommodate non-linearity, outliers, and
non-numeric data. - Applications of value prediction include credit
card fraud detection or target mailing list
identification.
61Database Segmentation
- Aim is to partition a database into an unknown
number of segments, or clusters, of similar
records. - Uses unsupervised learning to discover
homogeneous sub-populations in a database to
improve the accuracy of the profiles.
62Database Segmentation
- Less precise than other operations thus less
sensitive to redundant and irrelevant features. - Sensitivity can be reduced by ignoring a subset
of the attributes that describe each instance or
by assigning a weighting factor to each variable.
- Applications of database segmentation include
customer profiling, direct marketing, and cross
selling.
63Example of Database Segmentation using a Scatter
plot
64Database Segmentation
- Associated with demographic or neural clustering
techniques, distinguished by - Allowable data inputs
- Methods used to calculate the distance between
records - Presentation of the resulting segments for
analysis.
65Example of Database Segmentation using a
Visualization
66Link Analysis
- Aims to establish links (associations) between
records, or sets of records, in a database. - There are three specializations
- Associations discovery
- Sequential pattern discovery
- Similar time sequence discovery
- Applications include product affinity analysis,
direct marketing, and stock price movement.
67Link Analysis - Associations Discovery
- Finds items that imply the presence of other
items in the same event. - Affinities between items are represented by
association rules. - e.g. When customer rents property for more than
2 years and is more than 25 years old, in 40 of
cases, customer will buy a property. Association
happens in 35 of all customers who rent
properties.
68Link Analysis - Sequential Pattern Discovery
- Finds patterns between events such that the
presence of one set of items is followed by
another set of items in a database of events over
a period of time. - e.g. Used to understand long term customer buying
behaviour.
69Link Analysis - Similar Time Sequence Discovery
- Finds links between two sets of data that are
time-dependent, and is based on the degree of
similarity between the patterns that both time
series demonstrate. - e.g. Within three months of buying property, new
home owners will purchase goods such as cookers,
freezers, and washing machines.
70Deviation Detection
- Relatively new operation in terms of commercially
available data mining tools. - Often a source of true discovery because it
identifies outliers, which express deviation from
some previously known expectation and norm.
71Deviation Detection
- Can be performed using statistics and
visualization techniques or as a by-product of
data mining. - Applications include fraud detection in the use
of credit cards and insurance claims, quality
control, and defects tracing.
72A Summary Data-Driven Techniques
- Data Visualization
- Decision Trees
- Clustering
- Factor Analysis
- Neural Network
- Association Rules
- Rule Induction
- Based on Sakhr Younesss book Professional
Data Warehousing with SQL Server 7.0 and OLAP
Services -
73Data Visualization
A pie chart showing the sales of a product by
region is sometimes much more effective than
presenting the same data in a text or tabular
form.
9
11
Northeast
South
North
39
21
West
20
East
74Decision Tree
75Cluster Analysis
First segment (high incomegt8,000)
Have Children
Second Segment (8000gtmiddle income gt3000)
Married
Last car is A used one
Third Segment (low income lt 3000)
Own car
76Factor Analysis
- Unlike cluster analysis, factor analysis builds a
model from data. The technique finds underlying
factors, also called latent variables and
provides models for these factors based on
variables in the data. For ex., a software
company is considering a survey to find out the
nine most perceived attributes of one of their
products. They might categorize these products to
categories such as service for technical support,
availability for training and a help system. - Factor analysis is used for grouping together
products based on a similarity of buying patterns
so that vendors may bundle several products as
one to sell them together at a lower price than
their added individual prices..
77Neural Networks
78Association Rules
- Association models are models that examine the
extent to which values of one field depend on, or
are produced by, values of another field. These
models are often referred to as Market Basket
Analysis when they are applied to retail
industries to study the buying patterns of these
customers, especially in grocery and retail
stores that issue their own credit cards.
Charging against these cards gives the store the
chance to associate the purchases of customers
with their identities, which allows them to study
associations among other things.
79Rules Induction
- This is a powerful technique that involves a
large number of rules using a set of if..then
statements in the pursuit of all possible
patterns in the dataset. For ex., if the customer
is a male then, if he is between 30 and 40 years
of ages, and his income is less than 50,000 and
more than 20,000, he is likely to be driving a
car that was bought as new.
80A Summary Theory-Driven Techniques
- Correlations
- T-Tests
- Analysis of Variables
- Linear Regression
- Logistic Regression
- Discriminate Analysis
- Forecasting Methods
-
81Validating Picking the Model
- Validating any model that comes out of a data
mining tool is going to be the - most important thing that you can do. The
validation required for data - mining is that after you build the model on some
historical data, you apply - the model to similar historical data from which
the model was not built. - Because the data is historical, you already know
the outcome so that the - accuracy of the predictive model can be measured.
- One of the most important things that needs to be
done when you are - building a predictive model is to make sure that
you have picked up the - essential patterns in the data that will hold
true the next time you apply - your model.
82Three Additional Ways in Which Data mining
Supports CRM Initiatives.
- 1. Database marketing
- 2. Customer acquisition
- 3. Campaign optimization
83Database Marketing
- Data mining helps database marketers develop
campaigns that are closer to the targeted needs,
desires, and attitudes of their customers. If the
necessary information resides in a database, data
mining can model a wide range of customer
activities. The key objective is to identify
patterns that are relevant to current business
problems. For example, data mining can help
answer questions such as "Which customers are
most likely to cancel their cable TV service?"
and "What is the probability that a customer will
spend over 120 from a given store?" Answering
these types of questions can boost customer
retention and campaign response rates, which
ultimately increases sales and returns on
investment.
84Database Marketing
- Database marketing software enables companies to
send customers and prospective customers timely
and relevant messages and value propositions.
Modern campaign management software also monitors
and manages customer communications on multiple
channels including direct mail, telemarketing,
e-mail, the Internet, point of sale, and customer
service. Furthermore, this software can be used
to automate and unify diverse marketing campaigns
at their various stages of planning, execution,
assessment, and refinement. The software can also
launch campaigns in response to specific customer
behaviors, such as the opening of a new account.
85Database Marketing
- Generally, better business results are obtained
when data mining and campaign management work
closely together. For example, campaign
management software can apply the data-mining
model's scores to sharpen the definition of
targeted customers, thereby raising response
rates and campaign effectiveness. Furthermore,
data mining may help to resolve the problems that
traditional campaign management processes and
software typically do not adequately address,
such as scheduling, resource assignment, and so
forth. Although finding patterns in data is
useful, data mining's main contribution is
providing relevant information that enables
better decision making. In other words, it is a
tool that can be used along with other tools
(e.g., knowledge, experience, creativity,
judgment, etc.) to obtain better results. A
data-mining system manages the technical details,
thus enabling decision makers to focus on
critical business questions such as "Which
current customers are likely to be interested in
our new product?" and "Which market segment is
best for the launch of our new product?"
86Customer Acquisition
- The growth strategy of businesses depends
heavily on acquiring new customers, which may
require finding people who have been unaware of
various products and services, who have just
entered specific product categories (for example,
new parents and the diaper category), or who have
purchased from competitors. Although experienced
marketers often can select the right set of
demographic criteria, the process increases in
difficulty with the volume, pattern complexity,
and granularity of customer data. Highlighting
the challenges of customer segmentation has
resulted in an explosive growth in consumer
databases. Data mining offers multiple
segmentation solutions that could increase the
response rate for a customer acquisition
campaign. Marketers need to use creativity and
experience to tailor new and interesting offers
for customers identified through data-mining
initiatives.
87Campaign Optimization
- Many marketing organizations have a variety of
methods to interact with current and prospective
customers. The process of optimizing a marketing
campaign establishes a mapping between the
organization's set of offers and a given set of
customers that satisfies the campaign's
characteristics and constraints, defines the
marketing channels to be used, and specifies the
relevant time parameters. Data mining can elevate
the effectiveness of campaign optimization
processes by modeling customers' channel-specific
responses to marketing offers.
88Topic 6 Classical Techniques Statistics,
Neighborhoods, and Clustering
- Statistics can help to answer several important
questions about the - data
- What patterns are there in my database?
- What is the chance that an event will occur?
- What patterns are significant?
- What is a high-level summary of the data that
gives me some idea of what is contained in my
database?
89Statistics --Histogram
- The first step in understanding statistics is to
understand how the - data is collected into a higher-level formone of
the most notable - Ways of doing this is with the histogram.
of customers or Amount of sales
90Histogram
Number of customers
3000
2500
2000
1500
1000
500
1
11
21
31
41
51
61
71
81
Ages
91Linear Regression Is Similar to the Task of
Findingthe Line that Minimizes the Total
Distance to a Set of Data.
Prediction (Average Consumer bank balance)
Predictor (Consumer annual income)
92Linear Regression
- The predictive model is the line shown in the
previous chart. The line - will take a given value for a predictor and map
it into a given value - for a prediction. The actual equation would look
something like - Prediction a b predictor. This is just the
equation for a line Y - A bX. As an example for a bank, the predicted
average consumer - bank balance might equal to 1,000 0.01
customers annual - income.
93Linear Regression
- Linear regression attempts to fit a straight line
through a plot of the data, such that the line is
the best representation of the average of all
observations at that point in the plot. - Problem is that the technique only works well
with linear data and is sensitive to the presence
of outliers (i.e.., data values, which do not
conform to the expected norm).
94Linear Regression
- Although non-linear regression avoids the main
problems of linear regression, still not flexible
enough to handle all possible shapes of the data
plot. - Statistical measurements are fine for building
linear models that describe predictable data
points, however, most data is not linear in
nature.
95Linear Regression
- Data mining requires statistical methods that can
accommodate non-linearity, outliers, and
non-numeric data. - Applications of value prediction include credit
card fraud detection or target mailing list
identification.
96The Nearest Neighbor Prediction
- One of the classic areas that nearest neighbor
has been used for - prediction has been in text retrieval. The end
user defines a document - (for ex., a Wall Street Journal) to be retrieved,
then the nearest - neighbor characteristics with these documents
that have been - marked are more likely to be retrieved.
- Another good example is that the supermarkets
tend to put similar - produces in the same area, for ex., an apple
closer to an orange than - to tomato. Thus, if you know the predictive
value of one of the - objects, you can predict it for the nearest
neighbors.
97Data Clustering
- Clustering analysis is an important means of
processing multimedia - data. It is basically the organization of a
collection of patterns into - clusters of similar objects. Patterns within
valid cluster are more - similar to each other than they are to a pattern
in a different cluster.
98Data Clustering
- Clustering can allow us to carry out the
following activities - that can help in query processing
- Representing patterns in the data so that we can
reduce the size of the media - Defining a way of measuring the proximity of
different patterns in the data so that we can
find the instances that match our example. - Clustering or grouping the data in preparation
for matching - Data abstraction, particularly of features that
we can store as metadata - Assessing the output by estimating how good the
selection is.
99Clustering and Nearest Neighbor
- A simple example of clustering would be the
clustering that most - people perform when they do the laundry- grouping
the permanent - press, dry cleaning, whites, and brightly colored
clothes is important - because they have similar characteristics.
- A simple example of the nearest neighbor
prediction algorithm Is - when you look at the people in your neighborhood.
You may notice - that, in general, you all have somewhat similar
income.
100Statistical Analysis of Actual Sales (dollars and
quantities) relative to these Signage Variables-a
predictive modeling example.
- Content
- Frequency
- Depth
- Focus
- Depth
- Scale
- Length
- Location
- Statistical Analysis Correlation, Regression,
Experiment Design, - Optimization. Now it goes into real time
analysis.
101Signage
102Signage
103Topic 7 Next Generation Techniques Decision
Trees, Networks, and Rules
A Decision Tree
Customer renting property gt 2 years
No
Yes
Rent property
Customer agegt45
No
Yes
Rent property
Buy property
104A Decision Tree
105CART and CHAID
- CART, which stands for Classification and
Regression Trees, is a data - exploration and prediction algorithm developed by
Leo Breiman, - Jerome Friedman, Richard Olshen and Charles
Stone. It is nicely - detailed in their 1984 book, Classification and
Regression Trees ( - Breiman, Friedman, Olshen, and Stone, 1984. These
researchers from - Standard University and the University of
California at Berkeley - Showed how this new algorithm could be used on a
variety of - different problems from the detection of chlorine
from the data - contained in a mass spectrum. One of the great
advantages of CART - is that the algorithm has the validation of the
model and the discovery - of the optimally general model built deeply into
the algorithm. - Another popular decision tree technology is CHARD
(Chi-Square - Automatic Interaction Detector). CHARD is similar
to CART in that - it builds a decision tree, but it differs in the
way that it chooses its - splits.
106B Neural Networks
- A neural network is loosely based on the way some
people believe - That the human brain is organized and how it
learns. There are two - Main structures of consequence in the neural
networks - The node, which loosely corresponds to the neuron
in the human brain - The link, which loosely corresponds to the
connections between neutrons (axons, dendrites,
and synapses) in the human brain.
107Neural Networks
When customer rents property for more than 2
years and is more than 25 years old, in 40 of
cases, customer will buy a property. Association
happens in 35 of all customers who rent
properties.
108Example of Classification using Neural Induction
- Each processing unit (circle) in one layer is
connected to each processing unit in the next
layer by a weighted value, expressing the
strength of the relationship. The network
attempts to mirror the way the human brain works
in recognizing patterns by arithmetically
combining all the variables with a given data
point. - In this way, it is possible to develop nonlinear
predictive models that learn by studying
combinations of variables and how different
combinations of variables affect different data
sets.
109How Does a Neural Induction Make a prediction?
- The value age of 47 is normalized to fall between
0.0 and 1.0, it has the value of 0.47, and the
income is normalized to the value of 0.65. This
simplified neural network makes the prediction of
no default for a 47-year old making 65,000. The
links are weighted at 0.7 and 0.1, and the
resulting value, after multiplying the node
values by the link weights, is 0.39.
Age
Weighted 0.7
0.47
default
0.39
0.65
Income
Weighted 0.1
0.47(0.7) 0.65(0.1) 0.39
110C Rule Induction
- This is a powerful technique that involves a
large number of rules using a set of if..then
statements in the pursuit of all possible
patterns in the dataset. For ex., if the customer
is a male then, if he is between 30 and 40 years
of ages, and his income is less than 50,000 and
more than 20,000, he is likely to be driving a
car that was bought as new.
111What Is A Rule?
Rule
Accuracy
Coverage
- If breakfast cereal purchased, the
85 20 - milk is purchased.
- If bread purchased, then Swiss choose
15 6 - will be purchased.
- If 42 years old and purchased pretzels
95 0.01 - and dry roasted peanuts, then beer will
- be purchased.
112Topic 8 CRM -The Business Perspective
- Tools and technologies will be applied to real
business problems - across a variety of industries. They are
- Customer Profitability provides a blueprint for
how to define and use customer profitability as
the bedrock for your CRM processes. - Customer Acquisition shows how to use data
mining to acquire new customers in the most
profitable way possible. - Customer Cross-selling details how the
technology architecture can be used to increase
the value of existing customers by applying more
to them. - Customer Retention uses a case study from the
telecommunications industry to show how to
execute successful CRM systems to retain your
profitable customers. - Customer Segmentation provides the business
methodology of how to segment and manage your
customers in a consistent and repeatable way
across the enterprise.
113The Business-Centric View of Data Mining Process
Business Problem
Data
Understand
Define Value
Data Definition
ROI Definition
Data Mining
Define Value
Predictive Model
Predicted ROI
Application
Display
ROI
114Customer Profitability
- Customer profitability is the bedrock of data
mining. Data mining - earns its keep by helping you to understand and
improve Customer - Profitability. How does the organization define
what a profitable - customer is versus an unprofitable customer?
Keeping a customer - loyal can have profound effects on per-customer
profitability. The - compounding effect of customer loyalty on
customer profitability also - increases because sales costs are lower and
revenue generally has - increased. Data Mining can be used to predict
customer profitability, - Under a variety of different marketing campaigns.
115A Customer Value Matrix Showing Recommended
Service Level
Current Value Lifetime Value Potential Value Potential Lifetime Value Customer Service Level Best Service Level
1 High High High High Gold Gold
2 High Low High High Gold Gold
3 High Low High Low Gold Bronze
4 Low Low Low High Bronze Gold
5 Low Low High High Bronze Gold
6 Low Low Low Low Bronze Bronze
Segment
116A Customer Value Matrix
- This should be one of the first things that we
should do - with data mining.
- Segment 1 is our best customers. They will remain
your best customers through their lives and their
current value matches their potential. - Segment 2 is similar, except that they are likely
to have low lifetime value, despite their high
value today, probably because they are not loyal
and likely switch to a competitor at some time in
their customer life. - Segments 4 and 5 represent customers who, with
the right care and service, can be transitioned
to high-value customers, either short-term or
long-term . - Segments 6 represents your low-value customers
that you will treat with some of your least
expensive services.
117Customer Acquisition
- The traditional approach to customer acquisition
involved a - marketing manager developing a combination of
mass marketing - (magazine advertisements, billboards, etc.) and
direct marketing ( - Telemarketing, mail, etc.) campaigns based on
their knowledge of the - Particular customer base that was being targeted.
- A marketing manager selects the demographics
(Age, Gender, - interest in particular subjects, etc.) and then
works with a data - vendor (sometimes known as a service bureau) to
obtain Lists of - customers who meet those characteristics.
- Although a marketer with a wealth pf experience
can often choose relevant - demographic selection criteria, the process
becomes more difficult as the - amount of data increases.
- Data Mining can help this process.
118Defining Some Key Customer Acquisition Concepts
- The responses that come in as a result of a
marketing campaign are called - response behaviors. Binary response behaviors
(either a yes or no) are the - simplest kind of response.
- Beyond binary response behaviors are a type of
categorical response - behaviors which allows for multiple behaviors to
be defined. The rules that - define the behaviors are based on the kind of
business you are involved in. - There are usually several different kinds of
positive response behaviors that - can be associated with an acquisition marketing
campaign. They are - Customer inquiry Purchase of the offered product
or products - Purchase of a product different from the one
offered.
119Response Analysis Broken Down By Behaviors
Behavior Measures 12/1/05 12/5/05 12/7/05 12/9/05 Total
Inquiry of Responses 1,556 1,340 328 352 3,576
Purchase A of Responses 210 599 128 167 1.104
Purchase B of Responses 739 476 164 97 1,476
Purchase C of Responses 639 647 113 105 1,504
120Cross-Selling
- Cross-selling is the process by which you offer
your existing customers new - products and services. Customers who purchase
baby diapers might also be - interested in hearing about your other baby
products. - One form of cross-selling, sometimes called up
selling, takes place when the - new offer is related to existing purchases by the
customer. For., ex., an up- - sell opportunity might exist for a telephone
company to market a premium - long-distance service to existing long-distance
customers who currently have - the standard service.
-
121How Cross-Selling Works
- Assume that you are a marketing manager for a
mid-size bank. You - have the following products available for your
customers - Value checking account
- Standard checking account
- Gold credit card
- Platinum credit card
- Primary mortgage
- Secondary mortgage
- Of these products, youre responsible for
marketing the mortgage products to - your Customers. Your goal is to find out which
customers might be interested - in a mortgage offering at least 60 days before
they would apply for the loan. - It is important that any predictions are made
with sufficient lead time (in this - case, two months), so that any Interactions with
the customers take place - before they are committed to a relationship with
your competition. -
122How Cross-Selling Works
- You have already done some thinking about your
customers and their - motivations in this area and came up with several
scenarios, which you - presented to your boss when pitching this new
campaign - Customer preparing to buy a new home. These
customers might be building up cash reserves in
their checking and/or savings account in order to
put together a down payment. - Customer preparing to refinance an existing home.
These customers might be paying off credit card
debt (thus making them more acceptable from a
risk point of view), and hold a mortgage whose
interest rate is higher than the current interest
rate. - Customer preparing to add a second mortgage.
These customers might have increasing credit card
debt, an on-time payment history for their credit
cards and existing mortgage (which means that
they are a good risk), and enough equity in their
house to cover the outstanding credit card
balance.
123Data Mining Process for Cross-Selling
- The actual data mining process contains three
distinct steps when - doing cross-selling process
- Modeling of individual behaviors
- Scoring data with predictive models
- Optimization of the scoring matrices
- Model A description that adequately explains and
predicts relevant data that - but is generally much smaller than the data
itself. For real-world - applications, a model can be anything from a
mathematical Equation, to a set - of rules that describes customer segments, to the
computer representation of - a complex neural network architecture, which
translates to several sets of - mathematical equations.
- Predictive model A model created or used to
perform prediction. In contrast - to models created solely for pattern detection,
exploration or general - organization of the data.
124Customer Retention
- As industries become more competitive and the
cost of acquiring new - customers increases, the value of retaining
current customers also increases. - for instance, in the cellular phone industry, it
is estimated that the cost of - attracting and signing up a new customer is 300
or more when the costs of - disconnected hardware and sales commissions are
included. The cost of - retaining a current customer, however, can be as
low as the price of a phone - call or the cost of updating their cellular phone
to the latest technology - offering. Although expensive, this is still
significantly cheaper than signing - up a wholly new customer.
125A Case Study- Cellular Phone Industry
- Customer churn is the term used in the cellular
telephone industry to denote - the movement of cellular telephone customers from
one provider to another. - In many industries, this is called customer
attrition, but because of the highly