Title: ?Data Mining?
1?Data Mining?
Start
By Jung, hae-sun
2Contents
- Introduction
- Definition
- Data Mining Applications
- Data Mining Tasks
- 5. Overview of the System
- 6. Data Mining Analysis
- 7. Application
- 8. Reference
31. Introduction
- Data mining is related to
- - Data warehousing
- - Online analytical processing (OLAP)
- - Data visualization
- Data mining needs a data warehouse for effective
mining. The aims of OLAP and data mining are
similar but only data mining involves looking for
unknown patterns. Finally, data mining requires
data visualization of presentation of results.
42. Definition
- A technique using software tools geared for the
user who typically does not know exactly what
he's searching for, but is looking for particular
patterns or trends. Data mining is the process of
sifting through large amounts of data to produce
data content relationships. This is also known as
data surfing. -
53. Data Mining Applications
- Applications in financial, telecom, insurance and
retail companies for - - market segmentation
- - fraud detection
- -better marketing
- - trend analysis
- - market basket analysis
- - customer churn
64. Data Mining Tasks
- Class description
- Association
- Sequential Patterns
- Time-Series analysis
- Prediction
- Classification
- Clustering
7 5. Overview of the System - Recommender
System
Normalized Customer vectors
Product Database
Customer Purchase Database
Data Mining Clustering
Cluster assignments
Products eligible for recommendation
Cluster-specific Product lists
Products List For target customers cluster
Vector for Target customer
Data Mining Associations
Matching Algorithm
Product affinities
Personalized Recommendation List
Target Customer
8 6. Data Mining Analysis (1)
? Clustering
- Neural Clustering Algorithm
- Demographic Clustering Algorithm
? Association Rule
- Apriori Algorithm
- AprioriAll Algorithm
- AprioriTid Algorithm
- DynamicSome Algorithm
- FP-Growth
9 6. Data Mining Analysis (2)
? Association Rule- Concept
- Search for interesting relationships among
items in a given data set.
? Association Rule- Procedure
- Find all frequent itemsets. Each of these
itemsets will occur at least as frequently as a
pre-determined minimum support. - Generate strong association rules from the
frequent itemsets. These rules must satisfy
minimum support and minimum confidence.
10 6. Data Mining Analysis (3)
? Association Rule- Measure
number of transactions containing both A and B
Total number of transactions
P(A B)
n
number of transactions containing both A and B
number of transactions containing A
P(A B)
n
P(B A)
P(A)
11 6. Data Mining Analysis (4)
? Association Rule- Example
Purchased products Purchased products Purchased products Purchased products Purchased products Purchased products
A B C D E F
Customer 1 1 0 0 0 0 1
Customer 2 1 1 0 1 0 1
Customer 3 1 0 1 1 0 1
Customer 4 1 0 0 1 0 1
Customer 5 1 1 0 0 1 0
Support of A D 3/5 0.6 Support of A F
4/5 0.8 Support of A E 1/5 0.2
Step1 Find all frequent itemsets.
Minimum support 60
Large Itemset of transactions Support ()
A 5 100
D 3 60
F 4 80
A,D 3 60
A,F 4 80
D,F 3 60
A,D,F 3 60
12 6. Data Mining Analysis (5)
Step2 Generate strong association rules from the
frequent itemsets.
Rules Support P(A n B) Prob. Of Conditions Confidence
A ? F 80 100 0.8
A ? D 60 100 0.6
D ? F 60 60 1
D, F ? A 60 60 1
A?D Confidence 60/100 0.6, D ? F
Confidence 60/60 1
Minimum Confidence 90
Strong Association Rule D F , etc
13 7. Application (1) -
Safeway Stores
? Data Collection
- Duration 7 months
- Number of Customers 200
- Recommendation Products per each customer
1020
14 7. Application (2) -
Safeway Stores
? Safeway product taxonomy
Product classes (99)
Tea
Petfoods
Soft Drinks
Dried Cat Food
Dried Dog Food
Canned Dog Food
Canned Cat Food
Product subclasses (2302)
Friskies Liver (250g)
Products (30000)
15 7. Application (3) - Safeway
Stores
? Results
- 1957 products were recommended. Of these,
120(6.1) were chosen. - (It is important to recall that the
recommendation list will contain no products - previously purchased by this customer.)
16 8. References
Agrawal, R. and Srikant, R., Fast Algorithms for
mining association rules, In proc. of the VLDB
Conf., 1994 http//www.twocrows.com/glossary.htm,
Two Crows, Data Mining Glossary http//www.mis
.postech.ac.kr/topic/dm_e.html, Data
Mining http//wwwmaths.anu.edu.au/steve/pdcn.pd
f
17End