Title: Knowledge Discovery and Data Mining
1 Knowledge Discovery and Data Mining
2(No Transcript)
3Customer Relationship Management (CRM)
4????? ????? ??? !!
???? - ?? ????? - ???? ? ?? ???? ??????
???
http//news.media.daum.net/economic/industry/20061
1/16/joins/v14743974.html?_RIGHT_COMMR4
5Customer Attrition Case Study
- Situation Attrition rate at for mobile phone
customers is around 25-30 a year ! - With this in mind, what is our task?
- Assume we have customer information for the past
N months.
6Customer Attrition Case Study
- Task
- Predict who is likely to attrite next month.
- Estimate customer value and what is the
cost-effective offer to be made to this customer.
7Customer Attrition Results
- Verizon Wireless built a customer data warehouse
-
- Identified potential attriters
- Developed multiple, regional models
- Targeted customers with high propensity to accept
the offer - Reduced attrition rate from over 2/month to
under 1.5/month (huge impact, with gt30 M
subscribers) - (Reported in 2003)
8Data Mining An Example
- You are a marketing manager for a brokerage
company - Problem Churn is too high (also known as
Attrition) - Turnover (after six month introductory period
ends) is 40 - Customers receive incentives (average cost 160)
when account is opened - Giving new incentives to everyone who might leave
is very expensive (as well as wasteful) - Bringing back a customer after they leave is both
difficult and costly
8
9 A Solution
- One month before the end of the introductory
period is over, predict which customers will
leave - If you want to keep a customer that is predicted
to churn, offer them something based on their
predicted value - The ones that are not predicted to churn need no
attention - If you dont want to keep the customer, do
nothing - How can you predict future behavior?
- Build models
- Test models
-
-
9
10Convergence of Three Technologies
10
11Why Now? 1. Increasing Computing Power
- Moores law doubles computing power every 18
months - Powerful workstations became common
- Cost effective servers (SMPs) provide parallel
processing to the mass market
11
122. Improved Data Collection
-
- Data Collection ? Access ? Navigation ? Mining
- The more data the better (usually)
12
13Mining Large Data Sets - Motivation
- There is often information hidden in the data
that is not readily evident - Human analysts may take weeks to discover useful
information - Much of the data is never analyzed at all
14Largest databases in 2007
- Commercial databases
- ATT 312 TB
- World Data Centre for Climate 220 TB
- YouTube 45TB of videos
- Amazon 42 TB (250,000 full textbooks)
- Central Intelligence Agency (CIA) ?
153. Improved Algorithms (AI Data Base)
- Techniques have often been waiting for computing
technology to catch up - Statisticians already doing manual data mining
- Good machine learning intelligent application
of statistical processes - A lot of data mining research focused on tweaking
existing techniques to get small percentage gains
15
16Definition Predictive Model
-
- A black box that makes predictions about the
future based on information from the past and
present - Large number of inputs usually available
16
17How are Models Built and Used?
17
18The Data Mining Process
18
19What the Real World Looks Like
19
20Why Mine Data?
Motivation Necessity is the Mother of
Invention
- Data explosion problem
- Automated data collection tools and mature
database technology lead to tremendous amounts of
data stored in databases, data warehouses and
other information repositories - We are drowning in data, but starving for
knowledge! - Solution Data warehousing and data mining
- Data warehousing and on-line analytical
processing - Extraction of interesting knowledge (rules,
regularities, patterns, constraints) from data
in large databases
21Predictive Models are
- Decision Trees
- Nearest Neighbor Classification
- Neural Networks
- Rule Induction
- K-means Clustering
21
22Why Data Mining? Potential Applications
- Database analysis and decision support
- Market analysis and management
- target marketing, customer relation management,
market basket analysis, cross selling, market
segmentation - Risk analysis and management
- Forecasting, customer retention, improved
underwriting(????), quality control, competitive
analysis - Fraud detection and management
- Other Applications
- Text mining (news group, email, documents) and
Web analysis. - Intelligent query answering
23Data Mining is Not ...
- Data warehousing
- SQL / Ad Hoc Queries / Reporting
- Software Agents
- Online Analytical Processing (OLAP)
- Data Visualization
23
24Common Uses of Data Mining
- Marketing
- Direct mail marketing
- Web site personalization
- Fraud Detection
- Credit card fraud detection
- Science
- Bioinformatics
- Gene analysis
- Web Text analysis
- Google
24
25Corporate Analysis and Risk Management
- Finance planning and asset evaluation
- cash flow analysis and prediction
- contingent claim analysis to evaluate assets
- trend analysis, etc.
- Resource planning
- summarize and compare the resources and spending
- Competition
- monitor competitors and market directions
- group customers into classes and a class-based
pricing procedure - set pricing strategy in a highly competitive
market
26Fraud Detection and Management (1)
- Applications
- widely used in health care, retail, credit card
services, telecommunications (phone card fraud),
etc. - Approach
- use historical data to build models of fraudulent
behavior and use data mining to help identify
similar instances - Examples
- auto insurance detect a group of people who
stage accidents to collect on insurance - money laundering detect suspicious money
transactions (US Treasury's Financial Crimes
Enforcement Network) - medical insurance detect professional patients
and ring of doctors and ring of references
27Fraud Detection and Management (2)
- Detecting inappropriate medical treatment
- Australian Health Insurance Commission identifies
that in many cases blanket screening tests were
requested (save Australian 1m/yr). - Detecting telephone fraud
- Telephone call model destination of the call,
duration, time of day or week. Analyze patterns
that deviate from an expected norm. - British Telecom identified discrete groups of
callers with frequent intra-group calls,
especially mobile phones, and broke a
multimillion dollar fraud. - Retail
- Analysts estimate that 38 of retail shrink is
due to dishonest employees.
28 Scientific Viewpoint
- Data collected and stored at enormous speeds
(GB/hour) - remote sensors on a satellite
- telescopes scanning the skies
- microarrays generating gene expression data
- scientific simulations generating terabytes of
data - Traditional techniques infeasible for raw data
- Data mining may help scientists
- in classifying and segmenting data
- in Hypothesis Formation
29Other Applications
- Sports
- IBM Advanced Scout analyzed NBA game statistics
(shots blocked, assists, and fouls) to gain
competitive advantage for New York Knicks and
Miami Heat - Astronomy
- JPL and the Palomar Observatory discovered 22
quasars with the help of data mining - Internet Web Surf-Aid
- IBM Surf-Aid applies data mining algorithms to
Web access logs for market-related pages to
discover customer preference and behavior pages,
analyzing effectiveness of Web marketing,
improving Web site organization, etc.
30What is Data Mining?
- Many Definitions
- Non-trivial extraction of implicit, previously
unknown and potentially useful information from
data - Exploration analysis, by automatic or
semi-automatic means, of large quantities of
data in order to discover meaningful patterns
31What is (not) Data Mining?
- What is Data Mining?
-
- Certain names are more prevalent in certain US
locations (OBrien, ORurke, OReilly in Boston
area) - Group together similar documents returned by
search engine according to their context (e.g.
Amazon rainforest, Amazon.com,)
- What is not Data Mining?
- Look up phone number in phone directory
-
- Query a Web search engine for information about
Amazon
32Origins of Data Mining
- Draws ideas from machine learning/AI, pattern
recognition, statistics, and database systems - Traditional Techniquesmay be unsuitable due to
- Enormity of data
- High dimensionality of data
- Heterogeneous, distributed nature of data
33Data Mining Tasks
- Prediction Methods
- Use some variables to predict unknown or future
values of other variables. - Description Methods
- Find human-interpretable patterns that describe
the data.
From Fayyad, et.al. Advances in Knowledge
Discovery and Data Mining, 1996
34Data Mining Tasks...
- Exploratory Data Analysis
- Classification Predictive
- Clustering Descriptive
- Association Rule Discovery Descriptive
- Sequential Pattern Discovery Descriptive
- Regression Predictive
- Deviation Detection Predictive
35Exploratory Data Analysis
- Exploratory Data Analysis (EDA)
- Explore the data without any clear ideas of what
we are looking for - EDA techniques are interactive and visual
- Many effective visualization techniques for small
and low dimensional data - High dimensionality gt difficult visualization gt
requires dimensionality reduction and projection
techniques - Examples of visualization techniques pie charts,
histograms, scatterplots, contour plots
36Predictive Data Mining
- Predictive Modeling Classification and
Regression - Goal Build a model that will predict the value
of one variable from the known values of other
variables - - Classification the variable to be predicted is
categorical (i.e. its values belong to a
pre-specified, finite set of possibilities) - - Regression the variable to be predicted is
numeric - called supervised learning in Machine Learning
37Classification Definition
- Given a collection of records (training set )
- Each record contains a set of attributes, one of
the attributes is the class. - Find a model for class attribute as a function
of the values of other attributes. - Goal previously unseen records should be
assigned a class as accurately as possible. - A test set is used to determine the accuracy of
the model. Usually, the given data set is divided
into training and test sets, with training set
used to build the model and test set used to
validate it.
38Classification Example
categorical
categorical
continuous
class
Learn Classifier
Training Set
39Classification Application 1
- Direct Marketing
- Goal Reduce cost of mailing by targeting a set
of consumers likely to buy a new cell-phone
product. - Approach
- Use the data for a similar product introduced
before. - We know which customers decided to buy and which
decided otherwise. This buy, dont buy decision
forms the class attribute. - Collect various demographic, lifestyle, and
company-interaction related information about all
such customers. - Type of business, where they stay, how much they
earn, etc. - Use this information as input attributes to learn
a classifier model.
From Berry Linoff Data Mining Techniques, 1997
40- Ex.1 Credit card purchases authorization
- - Credit card companies must determine
whether to authorize credit card purchases based
on past transactions. 4 classes have been
identified - authorize
- ask for further identification before
authorization - do not authorize
- do not authorize and call police
- Ex. 2 Credit card application approval
- - Predict if to accept or deny credit card
applications - Historic data
41Classification Application 2
- Fraud Detection
- Goal Predict fraudulent cases in credit card
transactions. - Approach
- Use credit card transactions and the information
on its account-holder as attributes. - When does a customer buy, what does he buy, how
often he pays on time, etc - Label past transactions as fraud or fair
transactions. This forms the class attribute. - Learn a model for the class of the transactions.
- Use this model to detect fraud by observing
credit card transactions on an account.
42Classification Application 3
- Customer Attrition/Churn
- Goal To predict whether a customer is likely to
be lost to a competitor. - Approach
- Use detailed record of transactions with each of
the past and present customers, to find
attributes. - How often the customer calls, where he calls,
what time-of-the day he calls most, his financial
status, marital status, etc. - Label the customers as loyal or disloyal.
- Find a model for loyalty.
From Berry Linoff Data Mining Techniques, 1997
43Classification Application 4
- Sky Survey Cataloging
- Goal To predict class (star or galaxy) of sky
objects, especially visually faint ones, based on
the telescopic survey images (from Palomar
Observatory). - 3000 images with 23,040 x 23,040 pixels per
image. - Approach
- Segment the image.
- Measure image attributes (features) - 40 of them
per object. - Model the class based on these features.
- Success Story Could find 16 new high red-shift
quasars, some of the farthest objects that are
difficult to find!
From Fayyad, et.al. Advances in Knowledge
Discovery and Data Mining, 1996
44Classifying Galaxies
Courtesy http//aps.umn.edu
- Attributes
- Image features,
- Characteristics of light waves received, etc.
Early
- Class
- Stages of Formation
Intermediate
Late
- Data Size
- 72 million stars, 20 million galaxies
- Object Catalog 9 GB
- Image Database 150 GB
45Descriptive Data Mining
Goal Describe all of the data (or the process
that generated the data) Density estimation -
what is the probability distribution Dependency
modeling what are the relationships
between variables Clustering (segmentation)
find groups of data objects that are ?
similar to one another within the same
group(cluster) ? dissimilar to the objects in
other clusters ? called unsupervised learning
in Machine Learning
46Clustering More Example
- Ex. 3 Re-design of uniforms for female soldiers
in US army - Goal reduce the number of uniform sizes to be
kept in inventory while still providing good fit - Researchers from Cornell Uni used clustering and
designed a new set of sizes - ? - Traditional clothing size system ordered
set of graduated sizes where all dimensions
increase together - ? - The new system sizes that fit body types
- e.g. one size for short-legged, small waist,
women with wide and long torsos, average arms,
broad shoulders, and skinny necks
47Clustering Definition
- Given a set of data points, each having a set of
attributes, and a similarity measure among them,
find clusters such that - Data points in one cluster are more similar to
one another. - Data points in separate clusters are less similar
to one another. - Similarity Measures
- Euclidean Distance if attributes are continuous.
- Other Problem-specific Measures.
48Illustrating Clustering
- Euclidean Distance Based Clustering in 3-D space.
Intracluster distances are minimized
Intercluster distances are maximized
49Clustering Application 1
- Market Segmentation
- Goal subdivide a market into distinct subsets of
customers where any subset may conceivably be
selected as a market target to be reached with a
distinct marketing mix. - Approach
- Collect different attributes of customers based
on their geographical and lifestyle related
information. - Find clusters of similar customers.
- Measure the clustering quality by observing
buying patterns of customers in same cluster vs.
those from different clusters.
50Clustering Application 2
- Document Clustering
- Goal To find groups of documents that are
similar to each other based on the important
terms appearing in them. - Approach To identify frequently occurring terms
in each document. Form a similarity measure based
on the frequencies of different terms. Use it to
cluster. - Gain Information Retrieval can utilize the
clusters to relate a new document or search term
to clustered documents.
51(No Transcript)
52Associative DM
- Goal Find relationships among data
- market-basket analysis - find combinations
of items that occur - typically together
- sequential analysis find sequential
patterns in data - Market-basket analysis
- Uses the information about what customers
buy to give us - insight into who they are and why they
make certain purchases - Ex.1 A grocery store retailer is trying to
decide if to put bread on - sale.
- He generates association rules and finds
what other products are - typically purchased with bread. A
particular type of cheese is sold - 60 of the time the bread is sold and a
jelly is sold 70 of the time. -
- Based on these findings, he decides
- 1) to place some cheese and jelly at the end
of the aisle where the - bread is placed and
- 2) not to place either of these 3 items on
sale at the same time.
53Market-Basket Analysis More Examples
- Where should strawberries be placed to maximize
its sale? - Services purchased together by telecommunication
customers (e.g. - broad band Internet, call forwarding, etc.) help
determine how to - bundle these services together to maximize
revenue. - Unusual combinations of insurance claims can be a
sign of a fraud - Medical histories can give indications of
complications based on - combinations of treatments
- Sport analyzing game statistics (shots blocked,
assists, and fouls) to - gain competitive advantage
- - When player X is on the floor, player Ys
shot accuracy decreases from 75 to 30 - - Bhandari et.al. (1997). Advanced Scout data
mining and knowledge discovery in NBA data, Data
Mining and Knowledge Discovery, 1(1), pp.121-125
54Association Rule Discovery Definition
- Given a set of records each of which contain some
number of items from a given collection - Produce dependency rules which will predict
occurrence of an item based on occurrences of
other items.
Rules Discovered Milk --gt Coke
Diaper, Milk --gt Beer
55Association Rule Discovery Application 1
- Marketing and Sales Promotion
- Let the rule discovered be
- Bagels, --gt Potato Chips
- Potato Chips as consequent gt Can be used to
determine what should be done to boost its sales. - Bagels in the antecedent gt Can be used to see
which products would be affected if the store
discontinues selling bagels. - Bagels in antecedent and Potato chips in
consequent gt Can be used to see what products
should be sold with Bagels to promote sale of
Potato chips!
56Association Rule Discovery Application 2
- Supermarket shelf management.
- Goal To identify items that are bought together
by sufficiently many customers. - Approach Process the point-of-sale data
collected with barcode scanners to find
dependencies among items. - A classic rule --
- If a customer buys diaper and milk, then he is
very likely to buy beer. - So, dont be surprised if you find six-packs
stacked next to diapers!
57Association Rule Discovery Application 3
- Inventory Management
- Goal A consumer appliance repair company wants
to anticipate the nature of repairs on its
consumer products and keep the service vehicles
equipped with right parts to reduce on number of
visits to consumer households. - Approach Process the data on tools and parts
required in previous repairs at different
consumer locations and discover the co-occurrence
patterns.
58Sequential Pattern Discovery Definition
- Given is a set of objects, with each object
associated with its own timeline of events, find
rules that predict strong sequential dependencies
among different events. - Rules are formed by first disovering patterns.
Event occurrences in the patterns are governed by
timing constraints.
59Sequential Analysis
- Finds sequential patterns in data
- - These patterns are similar to market-basket
analysis but the relationship is based on time - Ex.1. Most people who purchase CD players,
purchase CDs within 3 days. - Ex.2. The webmaster at the company X periodically
analyses the web log data to determine how the
users of X browse them. He finds that 70 of the
users of page A follow one of the following
patterns - - A-gtB-gtC
- - A-gtD-gtB-gtC
- - A-gtE-gtB-gtC
- He then decides to add a link from page A to C
60Deviation/Anomaly Detection
- Detect significant deviations from normal
behavior - Applications
- Credit Card Fraud Detection
- Network Intrusion Detection
Typical network traffic at University
level may reach over 100 million connections per
day
61Challenges of Data Mining
- Scalability
- Dimensionality
- Complex and Heterogeneous Data
- Data Quality
- Data Ownership and Distribution
- Privacy Preservation
- Streaming Data
62Data Mining A KDD Process
Knowledge
Pattern Evaluation
- Data mining the core of knowledge discovery
process.
Data Mining
Task-relevant Data
Selection
Data Warehouse
Data Cleaning
Data Integration
Databases
63Steps of a KDD Process
- Learning the application domain
- relevant prior knowledge and goals of application
- Creating a target data set data selection
- Data cleaning and preprocessing (may take 60 of
effort!) - Data reduction and transformation
- Find useful features, dimensionality/variable
reduction, invariant representation. - Choosing functions of data mining
- summarization, classification, regression,
association, clustering. - Choosing the mining algorithm(s)
- Data mining search for patterns of interest
- Pattern evaluation and knowledge presentation
- visualization, transformation, removing redundant
patterns, etc. - Use of discovered knowledge
64Data Mining Confluence of Multiple Disciplines
Database Technology
Statistics
Data Mining
Machine Learning
Visualization
Information Science
Other Disciplines
65Components of DM Algorithms
- DM algorithms have 3 main components
- Model (structure)
- - DM algorithms attempt to fit a model
to data tree in - Decision Trees (DT)
- - layers of non-linear transformations of
weighted sums - of the inputs in backpropagation Neural
Networks (NNs) - Preference (score function) preference
criteria used to fit - one model over another
- - Number of misclassifications in DTs
- - Mean squared error in NNs
- Search method how the data is searched by
the algorithm - - Greedy search over structure in DTs
- - Gradient descent over parameters in NNs