Knowledge Discovery and Data Mining presentation

About This Presentation

Title:

Knowledge Discovery and Data Mining

Description:

Customer Relationship Management (CRM) 4. ????? ...????? ??? ! ... Online Analytical Processing (OLAP) Data Visualization. 24. 24. Common Uses of Data Mining ... –

Number of Views:204

Avg rating:3.0/5.0

Slides: 66

Provided by: qiang5

Category:

more less

Transcript and Presenter's Notes

Title: Knowledge Discovery and Data Mining

1
Knowledge Discovery and Data Mining

Soongsil University

2
(No Transcript)
3
Customer Relationship Management (CRM)
4
????? ????? ??? !!
???? - ?? ????? - ???? ? ?? ???? ??????
???
http//news.media.daum.net/economic/industry/20061
1/16/joins/v14743974.html?_RIGHT_COMMR4
5
Customer Attrition Case Study

Situation Attrition rate at for mobile phone
customers is around 25-30 a year !
With this in mind, what is our task?
Assume we have customer information for the past
N months.

6
Customer Attrition Case Study

Task
Predict who is likely to attrite next month.
Estimate customer value and what is the
cost-effective offer to be made to this customer.

7
Customer Attrition Results

Verizon Wireless built a customer data warehouse
Identified potential attriters
Developed multiple, regional models
Targeted customers with high propensity to accept
the offer
Reduced attrition rate from over 2/month to
under 1.5/month (huge impact, with gt30 M
subscribers)
(Reported in 2003)

8
Data Mining An Example

You are a marketing manager for a brokerage
company
Problem Churn is too high (also known as
Attrition)
Turnover (after six month introductory period
ends) is 40
Customers receive incentives (average cost 160)
when account is opened
Giving new incentives to everyone who might leave
is very expensive (as well as wasteful)
Bringing back a customer after they leave is both
difficult and costly

8
9
A Solution

One month before the end of the introductory
period is over, predict which customers will
leave
If you want to keep a customer that is predicted
to churn, offer them something based on their
predicted value
The ones that are not predicted to churn need no
attention
If you dont want to keep the customer, do
nothing
How can you predict future behavior?
Build models
Test models

9
10
Convergence of Three Technologies
10
11
Why Now? 1. Increasing Computing Power

Moores law doubles computing power every 18
months
Powerful workstations became common
Cost effective servers (SMPs) provide parallel
processing to the mass market

11
12
2. Improved Data Collection

Data Collection ? Access ? Navigation ? Mining
The more data the better (usually)

12
13
Mining Large Data Sets - Motivation

There is often information hidden in the data
that is not readily evident
Human analysts may take weeks to discover useful
information
Much of the data is never analyzed at all

14
Largest databases in 2007

Commercial databases
ATT 312 TB
World Data Centre for Climate 220 TB
YouTube 45TB of videos
Amazon 42 TB (250,000 full textbooks)
Central Intelligence Agency (CIA) ?

15
3. Improved Algorithms (AI Data Base)

Techniques have often been waiting for computing
technology to catch up
Statisticians already doing manual data mining
Good machine learning intelligent application
of statistical processes
A lot of data mining research focused on tweaking
existing techniques to get small percentage gains

15
16
Definition Predictive Model

A black box that makes predictions about the
future based on information from the past and
present
Large number of inputs usually available

16
17
How are Models Built and Used?

View from 20,000 feet

17
18
The Data Mining Process
18
19
What the Real World Looks Like
19
20
Why Mine Data?
Motivation Necessity is the Mother of
Invention

Data explosion problem
Automated data collection tools and mature
database technology lead to tremendous amounts of
data stored in databases, data warehouses and
other information repositories
We are drowning in data, but starving for
knowledge!
Solution Data warehousing and data mining
Data warehousing and on-line analytical
processing
Extraction of interesting knowledge (rules,
regularities, patterns, constraints) from data
in large databases

21
Predictive Models are

Decision Trees
Nearest Neighbor Classification
Neural Networks
Rule Induction
K-means Clustering

21
22
Why Data Mining? Potential Applications

Database analysis and decision support
Market analysis and management
target marketing, customer relation management,
market basket analysis, cross selling, market
segmentation
Risk analysis and management
Forecasting, customer retention, improved
underwriting(????), quality control, competitive
analysis
Fraud detection and management
Other Applications
Text mining (news group, email, documents) and
Web analysis.
Intelligent query answering

23
Data Mining is Not ...

Data warehousing
SQL / Ad Hoc Queries / Reporting
Software Agents
Online Analytical Processing (OLAP)
Data Visualization

23
24
Common Uses of Data Mining

Marketing
Direct mail marketing
Web site personalization
Fraud Detection
Credit card fraud detection
Science
Bioinformatics
Gene analysis
Web Text analysis
Google

24
25
Corporate Analysis and Risk Management

Finance planning and asset evaluation
cash flow analysis and prediction
contingent claim analysis to evaluate assets
trend analysis, etc.
Resource planning
summarize and compare the resources and spending
Competition
monitor competitors and market directions
group customers into classes and a class-based
pricing procedure
set pricing strategy in a highly competitive
market

26
Fraud Detection and Management (1)

Applications
widely used in health care, retail, credit card
services, telecommunications (phone card fraud),
etc.
Approach
use historical data to build models of fraudulent
behavior and use data mining to help identify
similar instances
Examples
auto insurance detect a group of people who
stage accidents to collect on insurance
money laundering detect suspicious money
transactions (US Treasury's Financial Crimes
Enforcement Network)
medical insurance detect professional patients
and ring of doctors and ring of references

27
Fraud Detection and Management (2)

Detecting inappropriate medical treatment
Australian Health Insurance Commission identifies
that in many cases blanket screening tests were
requested (save Australian 1m/yr).
Detecting telephone fraud
Telephone call model destination of the call,
duration, time of day or week. Analyze patterns
that deviate from an expected norm.
British Telecom identified discrete groups of
callers with frequent intra-group calls,
especially mobile phones, and broke a
multimillion dollar fraud.
Retail
Analysts estimate that 38 of retail shrink is
due to dishonest employees.

28
Scientific Viewpoint

Data collected and stored at enormous speeds
(GB/hour)
remote sensors on a satellite
telescopes scanning the skies
microarrays generating gene expression data
scientific simulations generating terabytes of
data
Traditional techniques infeasible for raw data
Data mining may help scientists
in classifying and segmenting data
in Hypothesis Formation

29
Other Applications

Sports
IBM Advanced Scout analyzed NBA game statistics
(shots blocked, assists, and fouls) to gain
competitive advantage for New York Knicks and
Miami Heat
Astronomy
JPL and the Palomar Observatory discovered 22
quasars with the help of data mining
Internet Web Surf-Aid
IBM Surf-Aid applies data mining algorithms to
Web access logs for market-related pages to
discover customer preference and behavior pages,
analyzing effectiveness of Web marketing,
improving Web site organization, etc.

30
What is Data Mining?

Many Definitions
Non-trivial extraction of implicit, previously
unknown and potentially useful information from
data
Exploration analysis, by automatic or
semi-automatic means, of large quantities of
data in order to discover meaningful patterns

31
What is (not) Data Mining?

What is Data Mining?
Certain names are more prevalent in certain US
locations (OBrien, ORurke, OReilly in Boston
area)
Group together similar documents returned by
search engine according to their context (e.g.
Amazon rainforest, Amazon.com,)

What is not Data Mining?
Look up phone number in phone directory
Query a Web search engine for information about
Amazon

32
Origins of Data Mining

Draws ideas from machine learning/AI, pattern
recognition, statistics, and database systems
Traditional Techniquesmay be unsuitable due to
Enormity of data
High dimensionality of data
Heterogeneous, distributed nature of data

33
Data Mining Tasks

Prediction Methods
Use some variables to predict unknown or future
values of other variables.
Description Methods
Find human-interpretable patterns that describe
the data.

From Fayyad, et.al. Advances in Knowledge
Discovery and Data Mining, 1996
34
Data Mining Tasks...

Exploratory Data Analysis
Classification Predictive
Clustering Descriptive
Association Rule Discovery Descriptive
Sequential Pattern Discovery Descriptive
Regression Predictive
Deviation Detection Predictive

35
Exploratory Data Analysis

Exploratory Data Analysis (EDA)
Explore the data without any clear ideas of what
we are looking for
EDA techniques are interactive and visual
Many effective visualization techniques for small
and low dimensional data
High dimensionality gt difficult visualization gt
requires dimensionality reduction and projection
techniques
Examples of visualization techniques pie charts,
histograms, scatterplots, contour plots

36
Predictive Data Mining

Predictive Modeling Classification and
Regression
Goal Build a model that will predict the value
of one variable from the known values of other
variables
- Classification the variable to be predicted is
categorical (i.e. its values belong to a
pre-specified, finite set of possibilities)
- Regression the variable to be predicted is
numeric
called supervised learning in Machine Learning

37
Classification Definition

Given a collection of records (training set )
Each record contains a set of attributes, one of
the attributes is the class.
Find a model for class attribute as a function
of the values of other attributes.
Goal previously unseen records should be
assigned a class as accurately as possible.
A test set is used to determine the accuracy of
the model. Usually, the given data set is divided
into training and test sets, with training set
used to build the model and test set used to
validate it.

38
Classification Example
categorical
categorical
continuous
class
Learn Classifier
Training Set
39
Classification Application 1

Direct Marketing
Goal Reduce cost of mailing by targeting a set
of consumers likely to buy a new cell-phone
product.
Approach
Use the data for a similar product introduced
before.
We know which customers decided to buy and which
decided otherwise. This buy, dont buy decision
forms the class attribute.
Collect various demographic, lifestyle, and
company-interaction related information about all
such customers.
Type of business, where they stay, how much they
earn, etc.
Use this information as input attributes to learn
a classifier model.

From Berry Linoff Data Mining Techniques, 1997
40

Ex.1 Credit card purchases authorization
- Credit card companies must determine
whether to authorize credit card purchases based
on past transactions. 4 classes have been
identified
authorize
ask for further identification before
authorization
do not authorize
do not authorize and call police
Ex. 2 Credit card application approval
- Predict if to accept or deny credit card
applications
Historic data

41
Classification Application 2

Fraud Detection
Goal Predict fraudulent cases in credit card
transactions.
Approach
Use credit card transactions and the information
on its account-holder as attributes.
When does a customer buy, what does he buy, how
often he pays on time, etc
Label past transactions as fraud or fair
transactions. This forms the class attribute.
Learn a model for the class of the transactions.
Use this model to detect fraud by observing
credit card transactions on an account.

42
Classification Application 3

Customer Attrition/Churn
Goal To predict whether a customer is likely to
be lost to a competitor.
Approach
Use detailed record of transactions with each of
the past and present customers, to find
attributes.
How often the customer calls, where he calls,
what time-of-the day he calls most, his financial
status, marital status, etc.
Label the customers as loyal or disloyal.
Find a model for loyalty.

From Berry Linoff Data Mining Techniques, 1997
43
Classification Application 4

Sky Survey Cataloging
Goal To predict class (star or galaxy) of sky
objects, especially visually faint ones, based on
the telescopic survey images (from Palomar
Observatory).
3000 images with 23,040 x 23,040 pixels per
image.
Approach
Segment the image.
Measure image attributes (features) - 40 of them
per object.
Model the class based on these features.
Success Story Could find 16 new high red-shift
quasars, some of the farthest objects that are
difficult to find!

From Fayyad, et.al. Advances in Knowledge
Discovery and Data Mining, 1996
44
Classifying Galaxies
Courtesy http//aps.umn.edu

Attributes
Image features,
Characteristics of light waves received, etc.

Early

Class
Stages of Formation

Intermediate
Late

Data Size
72 million stars, 20 million galaxies
Object Catalog 9 GB
Image Database 150 GB

45
Descriptive Data Mining
Goal Describe all of the data (or the process
that generated the data) Density estimation -
what is the probability distribution Dependency
modeling what are the relationships
between variables Clustering (segmentation)
find groups of data objects that are ?
similar to one another within the same
group(cluster) ? dissimilar to the objects in
other clusters ? called unsupervised learning
in Machine Learning
46
Clustering More Example

Ex. 3 Re-design of uniforms for female soldiers
in US army
Goal reduce the number of uniform sizes to be
kept in inventory while still providing good fit
Researchers from Cornell Uni used clustering and
designed a new set of sizes
? - Traditional clothing size system ordered
set of graduated sizes where all dimensions
increase together
? - The new system sizes that fit body types
e.g. one size for short-legged, small waist,
women with wide and long torsos, average arms,
broad shoulders, and skinny necks

47
Clustering Definition

Given a set of data points, each having a set of
attributes, and a similarity measure among them,
find clusters such that
Data points in one cluster are more similar to
one another.
Data points in separate clusters are less similar
to one another.
Similarity Measures
Euclidean Distance if attributes are continuous.
Other Problem-specific Measures.

48
Illustrating Clustering

Euclidean Distance Based Clustering in 3-D space.

Intracluster distances are minimized
Intercluster distances are maximized
49
Clustering Application 1

Market Segmentation
Goal subdivide a market into distinct subsets of
customers where any subset may conceivably be
selected as a market target to be reached with a
distinct marketing mix.
Approach
Collect different attributes of customers based
on their geographical and lifestyle related
information.
Find clusters of similar customers.
Measure the clustering quality by observing
buying patterns of customers in same cluster vs.
those from different clusters.

50
Clustering Application 2

Document Clustering
Goal To find groups of documents that are
similar to each other based on the important
terms appearing in them.
Approach To identify frequently occurring terms
in each document. Form a similarity measure based
on the frequencies of different terms. Use it to
cluster.
Gain Information Retrieval can utilize the
clusters to relate a new document or search term
to clustered documents.

51
(No Transcript)
52
Associative DM

Goal Find relationships among data
market-basket analysis - find combinations
of items that occur
typically together
sequential analysis find sequential
patterns in data
Market-basket analysis
Uses the information about what customers
buy to give us
insight into who they are and why they
make certain purchases
Ex.1 A grocery store retailer is trying to
decide if to put bread on
sale.
He generates association rules and finds
what other products are
typically purchased with bread. A
particular type of cheese is sold
60 of the time the bread is sold and a
jelly is sold 70 of the time.
Based on these findings, he decides
1) to place some cheese and jelly at the end
of the aisle where the
bread is placed and
2) not to place either of these 3 items on
sale at the same time.

53
Market-Basket Analysis More Examples

Where should strawberries be placed to maximize
its sale?
Services purchased together by telecommunication
customers (e.g.
broad band Internet, call forwarding, etc.) help
determine how to
bundle these services together to maximize
revenue.
Unusual combinations of insurance claims can be a
sign of a fraud
Medical histories can give indications of
complications based on
combinations of treatments
Sport analyzing game statistics (shots blocked,
assists, and fouls) to
gain competitive advantage
- When player X is on the floor, player Ys
shot accuracy decreases from 75 to 30
- Bhandari et.al. (1997). Advanced Scout data
mining and knowledge discovery in NBA data, Data
Mining and Knowledge Discovery, 1(1), pp.121-125

54
Association Rule Discovery Definition

Given a set of records each of which contain some
number of items from a given collection
Produce dependency rules which will predict
occurrence of an item based on occurrences of
other items.

Rules Discovered Milk --gt Coke
Diaper, Milk --gt Beer
55
Association Rule Discovery Application 1

Marketing and Sales Promotion
Let the rule discovered be
Bagels, --gt Potato Chips
Potato Chips as consequent gt Can be used to
determine what should be done to boost its sales.
Bagels in the antecedent gt Can be used to see
which products would be affected if the store
discontinues selling bagels.
Bagels in antecedent and Potato chips in
consequent gt Can be used to see what products
should be sold with Bagels to promote sale of
Potato chips!

56
Association Rule Discovery Application 2

Supermarket shelf management.
Goal To identify items that are bought together
by sufficiently many customers.
Approach Process the point-of-sale data
collected with barcode scanners to find
dependencies among items.
A classic rule --
If a customer buys diaper and milk, then he is
very likely to buy beer.
So, dont be surprised if you find six-packs
stacked next to diapers!

57
Association Rule Discovery Application 3

Inventory Management
Goal A consumer appliance repair company wants
to anticipate the nature of repairs on its
consumer products and keep the service vehicles
equipped with right parts to reduce on number of
visits to consumer households.
Approach Process the data on tools and parts
required in previous repairs at different
consumer locations and discover the co-occurrence
patterns.

58
Sequential Pattern Discovery Definition

Given is a set of objects, with each object
associated with its own timeline of events, find
rules that predict strong sequential dependencies
among different events.
Rules are formed by first disovering patterns.
Event occurrences in the patterns are governed by
timing constraints.

59
Sequential Analysis

Finds sequential patterns in data
- These patterns are similar to market-basket
analysis but the relationship is based on time
Ex.1. Most people who purchase CD players,
purchase CDs within 3 days.
Ex.2. The webmaster at the company X periodically
analyses the web log data to determine how the
users of X browse them. He finds that 70 of the
users of page A follow one of the following
patterns
- A-gtB-gtC
- A-gtD-gtB-gtC
- A-gtE-gtB-gtC
He then decides to add a link from page A to C

60
Deviation/Anomaly Detection

Detect significant deviations from normal
behavior
Applications
Credit Card Fraud Detection
Network Intrusion Detection

Typical network traffic at University
level may reach over 100 million connections per
day
61
Challenges of Data Mining

Scalability
Dimensionality
Complex and Heterogeneous Data
Data Quality
Data Ownership and Distribution
Privacy Preservation
Streaming Data

62
Data Mining A KDD Process
Knowledge
Pattern Evaluation

Data mining the core of knowledge discovery
process.

Data Mining
Task-relevant Data
Selection
Data Warehouse
Data Cleaning
Data Integration
Databases
63
Steps of a KDD Process

Learning the application domain
relevant prior knowledge and goals of application
Creating a target data set data selection
Data cleaning and preprocessing (may take 60 of
effort!)
Data reduction and transformation
Find useful features, dimensionality/variable
reduction, invariant representation.
Choosing functions of data mining
summarization, classification, regression,
association, clustering.
Choosing the mining algorithm(s)
Data mining search for patterns of interest
Pattern evaluation and knowledge presentation
visualization, transformation, removing redundant
patterns, etc.
Use of discovered knowledge

64
Data Mining Confluence of Multiple Disciplines
Database Technology
Statistics
Data Mining
Machine Learning
Visualization
Information Science
Other Disciplines
65
Components of DM Algorithms