Business Intelligence Technologies - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Business Intelligence Technologies

Description:

Business Intelligence Technologies Data Mining Market Basket Analysis, Association Rules Dr. Oualid (Walid) Ben Ali * * * * * * Examples of Sequence Web sequence ... – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 40
Provided by: GraduateS93
Category:

less

Transcript and Presenter's Notes

Title: Business Intelligence Technologies


1
Business Intelligence Technologies Data Mining
  • Market Basket Analysis, Association Rules
  • Dr. Oualid (Walid) Ben Ali

2
Agenda
  • Market basket analysis Association rules
  • Software demo
  • Exercise

3
(No Transcript)
4
Barbie ? Candy
  • Put them closer together in the store.
  • Put them far apart in the store.
  • Package candy bars with the dolls.
  • Package Barbie candy poorly selling items.
  • Raise the price on one, lower it on the other.
  • Barbie accessories for proofs of purchase.
  • Do not advertise candy and Barbie together.
  • Offer candies in the shape of a Barbie Doll.

5
Market Basket Analysis (MBA)
  • MBA in retail setting
  • Find out what are bought together
  • Cross-selling
  • Optimize shelf layout
  • Product bundling
  • Timing promotions
  • Discount planning (avoid double-discounts)
  • Product selection under limited space
  • Targeted advertisement, Personalized coupons,
    item recommendations
  • Usage beyond Market Basket
  • Medical (one symptom after another)
  • Financial (customers with mortgage acct also have
    saving acct)

6
(No Transcript)
7
(No Transcript)
8
What the data contains
Transaction No. Item 1 Item 2 Item 3 Item 4
100 Beer Diaper Chocolate Cheese
101 Milk Chocolate Shampoo
102 Beer Wine Vodka
103 Beer Cheese Diaper Chocolate
104 Ice Cream Diaper Beer

Customer No. Age Income Saving_acct Children Mortgage
100 gt50 High Yes Yes Yes
101 35-50 Mid No No No
102 lt35 High Yes No Yes
103 gt50 Mid Yes No Yes
104 lt35 Low No Yes No

9
Rules Discovered from MBA
  • Actionable Rules
  • Wal-Mart customers who purchase Barbie dolls have
    a 60 likelihood of also purchasing one of three
    types of candy bars
  • Trivial Rules
  • Customers who purchase large appliances are very
    likely to purchase maintenance agreements
  • Inexplicable Rules
  • When a new hardware store opens, one of the most
    commonly sold items is toilet bowl cleaners

10
Learning Frequent Itemsets and Association Rules
from Data
A descriptive approach for discovering relevant
and valid associations among items in the data.
If buy diapers
Buy beer
Then
  • The itemset corresponding to this rule is
    Diaper, Beer
  • Itemset A collection of items.
  • Frequent Itemset An itemset that occurs often in
    data.
  • Often times, finding frequent itemsets is enough.

11
Market Basket Analysis
Transaction No. Item 1 Item 2 Item 3 Item 4
100 Beer Diaper Chocolate Cheese
101 Milk Chocolate Shampoo
102 Beer Wine Vodka
103 Beer Cheese Diaper Chocolate
104 Ice Cream Diaper Beer

Examples
Shoppers who buy Diaper are very likely to buy
Beer.
Then
If buy Diaper
Buy Beer
Shoppers who buy Beer and Diaper are likely to
buy Cheese and Chocolate
Then
If buy Beer, Diaper
Buy Cheese, Chocolate
12
Association Rules
  • Rule format
  • If set of items ? Then set of items
  • LHS implies RHS

LHS
RHS
If Diaper, Baby Food
Beer, Wine
Then
13
Evaluation of Association Rules
  • What rules should be considered valid?
  • An association rule is valid if it satisfies some
    evaluation measures

LHS
RHS
If Diaper
Beer
Then
14
Rule Evaluation
  • Milk Wine co-occur
  • But
  • Only 2 out of 200K transactions contain these
    items

Transaction No. Item 1 Item 2 Item 3
100 Beer Diaper Chocolate
101 Milk Chocolate Wine
102 Beer Wine Vodka
103 Beer Cheese Diaper
104 Ice Cream Diaper Beer
.
15
Rule Evaluation Support
  • Support
  • The frequency in which the items in LHS and RHS
    co-occur.
  • E.g., The support of the Diaper ? Beer rule
    is 3/5
  • 60 of the transactions contain both items.
  • No. of transactions containing items in LHS and
    RHS
  • Total No. of transactions in the dataset

Support
Transaction No. Item 1 Item 2 Item 3
100 Beer Diaper Chocolate
101 Milk Chocolate Shampoo
102 Beer Wine Vodka
103 Beer Cheese Diaper
104 Ice Cream Diaper Beer
16
Support evaluation is not enough?
  • My friend, Bill, an 85 years old man, told me a
    joke in a party last Friday
  • An old man is celebrating his 103th birthday.
  • I will hold my 104th birthday party next year.
    You are all welcome to join me, he announces to
    his guests proudly.
  • How do you know you will still be alive then?
    one of his guests asks.
  • Because very few people died between the age of
    103 and 104, he replies.
  • Explain the logic of the old man and provide your
    comments.

17
Rule Evaluation - Confidence
  • Is Beer leading to Diaper purchase or Diaper
    leading to Beer purchase?
  • Among the transactions with Diaper, 100 have
    Beer.
  • Among the transactions with Beer, 75 have
    Diaper.

Transaction No. Item 1 Item 2 Item 3
100 Beer Diaper Chocolate
101 Milk Chocolate Shampoo
102 Beer Wine Vodka
103 Beer Cheese Diaper
104 Ice Cream Diaper Beer
  • No. of transactions containing both LHS
    and RHS
  • No. of transactions containing LHS
  • confidence for Diaper ?Beer 3/3
  • When Diaper is purchased, the likelihood of Beer
    purchase is 100
  • confidence for Beer ?Diaper 3/4
  • When Beer is purchased, the likelihood of Diaper
    purchase is 75
  • So, Diaper ?Beer is a more important rule
    according to confidence.

Confidence
18
Rule Evaluation - Lift
Transaction No. Item 1 Item 2 Item 3 Item 4
100 Beer Diaper Chocolate
101 Milk Chocolate Shampoo
102 Beer Milk Vodka Chocolate
103 Beer Milk Diaper Chocolate
104 Milk Diaper Beer
Whats the support and confidence for rule
Chocolate?Milk?
Support 3/5
Confidence 3/4
Very high support and confidence. Does Chocolate
really lead to Milk purchase?
No! Because Milk occurs in 4 out of 5
transactions. Chocolate is even decreasing the
chance of Milk purchase (3/4 lt 4/5)
Lift (3/4)/(4/5) 0.9375 lt 1
19
Rule Evaluation Lift (cont.)
  • Measures how much more likely is the RHS given
    the LHS than merely the RHS
  • Lift confidence of the rule / frequency of the
    RHS
  • Example Diaper ? Beer
  • Total number of customer in database 1000
  • No. of customers buying Diaper 200
  • No. of customers buying beer 50
  • No. of customers buying Diaper beer 20
  • Frequency of Beer 50/1000 (5)
  • Confidence 20/200 (10)
  • Lift 10/5 2
  • Lift higher than 1 implies people have higher
    chance to buy Beer when they buy Diaper. Lift
    lower than 1 implies people have lower chance to
    buy Milk when they buy Chocolate.

20
Rule Evaluation - Practical Impact
  • Most methods for extracting association rules
    find too many trivial rules. Most are either
    obvious and uninteresting.
  • Example If Maternity Ward ? then patient is a
    woman. Confidence 100, support 100
  • Need to screen for rules that are of particular
    interest and significance.
  • Actionable Keep only rules that can be acted
    upon.
  • Interestingness Various measures for how
    surprising or unexpected a rule is.
  • Example A rule is interesting if it contradicts
    what is currently known (e.g., it contradicts a
    rule that was previously discovered).

21
Algorithm to Extract Association Rules (1)
  • Given a set of transactions T, the goal of
    association rule mining is to find all rules
    having
  • support minsup threshold
  • confidence minconf threshold
  • Brute-force approach
  • List all possible association rules
  • Compute the support and confidence for each rule
  • Prune rules that fail the minsup and minconf
    thresholds
  • ? Computationally prohibitive!

22
Frequent Itemset Generation
  • Brute-force approach
  • Each itemset in the lattice is a candidate
    frequent itemset
  • Count the support of each candidate by scanning
    the database
  • Complexity O(NMw) gt Expensive since M 2d
    !!!Match each transaction against every candidate
  • Complexity O(NMw) gt Expensive since M 2d !!!

23
Mining Association Rules
Example of Rules Milk,Diaper ? Beer (s0.4,
c0.67)Milk,Beer ? Diaper (s0.4,
c1.0) Diaper,Beer ? Milk (s0.4,
c0.67) Beer ? Milk,Diaper (s0.4, c0.67)
Diaper ? Milk,Beer (s0.4, c0.5) Milk ?
Diaper,Beer (s0.4, c0.5)
  • Observations
  • All the above rules are binary partitions of the
    same itemset Milk, Diaper, Beer
  • Rules originating from the same itemset have
    identical support but can have different
    confidence
  • Thus, we may decouple the support and confidence
    requirements

24
Mining Association Rules
  • Two-step approach
  • Frequent Itemset Generation
  • Generate all itemsets whose support ? minsup
  • Rule Generation
  • Generate high confidence rules from each frequent
    itemset, where each rule is a binary partitioning
    of a frequent itemset
  • Frequent itemset generation is still
    computationally expensive

25
Algorithm to Extract Association Rules (2)
  • The standard algorithm Apriori
  • Rakesh Agrawal, Ramakrishnan Srikant Fast
    Algorithms for Mining Association Rules in Large
    Databases. VLDB 1994 487-499
  • The Association Rules problem was defined as
  • Generate all association rules that have
  • support greater than the user-specified minimum
    support
  • and confidence greater than the user-specified
    minimum confidence
  • The base algorithm uses support and confidence,
    but we can also use lift to rank the rules
    discovered by Apriori.
  • The algorithm performs an efficient search over
    the data to find all such rules.

26
Finding Association Rules from Data
  • Association rules discovery problem is decomposed
  • into two sub-problems
  • Find all sets of items (itemsets) whose support
    is above minimum support --- called frequent
    itemsets or large itemsets
  • From each frequent itemset, generate rules whose
    confidence is above minimum confidence.
  • Given a large itemset Y, and X is a subset of Y
  • Calculate confidence of the rule X ? (Y - X)
  • If its confidence is above the minimum
    confidence, then X ? (Y - X) is an association
    rule we are looking for.

27
Example
Transaction No. Item 1 Item 2 Item 3
100 Beer Diaper Chocolate
101 Milk Chocolate Shampoo
102 Beer Wine Vodka
103 Beer Cheese Diaper
104 Ice Cream Diaper Beer
  • A data set with 5 transactions
  • Minimum support 40, Minimum confidence 80
  • Phase 1 Find all frequent itemsets
  • Beer (support80),
  • Diaper (60),
  • Chocolate (40)
  • Beer, Diaper (60)

Phase 2
Beer ? Diaper (conf. 6080 75)
Diaper ? Beer (conf. 6060 100)
28
Phase 1 Finding all frequent itemsetsHow to
perform an efficient search of all frequent
itemsets?
  • Note frequent itemsets of size n contain
    itemsets of size n-1 that also must be frequent
  • Example if diaper, beer is frequent then
    diaper and beer are each frequent as well
  • This means that
  • If an itemset is not frequent (e.g., wine) then
    no itemset that includes wine can be frequent
    either, such as wine, beer .
  • We therefore first find all itemsets of size 1
    that are frequent.
  • Then try to expand these by counting the
    frequency of all itemsets of size 2 that include
    frequent itemsets of size 1.
  • Example
  • If wine is not frequent we need not try to
    find out whether wine, beer is frequent. But if
    both wine beer were frequent then it is
    possible (though not guaranteed) that wine,
    beer is also frequent.
  • Then take only itemsets of size 2 that are
    frequent, and try to expand those, etc.

29
Phase 2 Generating Association Rules
  • Assume Milk, Bread, Butter is a frequent
    itemset.
  • Using items contained in the itemset, list all
    possible rules
  • Milk ? Bread, Butter
  • Bread ? Milk, Butter
  • Butter ? Milk, Bread
  • Milk, Bread ? Butter
  • Milk, Butter ? Bread
  • Bread, Butter ? Milk
  • Calculate the confidence of each rule
  • Pick the rules with confidence above the minimum
    confidence

Confidence of Milk ? Bread, Butter
Support Milk, Bread, Butter Support Milk
No. of transaction that support Milk, Bread,
Butter No. of transaction that support Milk

30
Association
  • If the rule Bread, Butter ? Yogurt is found
    to have minimum confidence.
  • Does it mean the rule
  • Yogurt ? Bread, Butter also has minimum
    confidence?
  • No.
  • Example
  • Support of Yogurt is 20,
  • Bread and Butter is 50
  • Yogurt, Bread, Butter is 10
  • Confidence of Bread, Butter ? Yogurt is
    10/5020
  • Confidence of Yogurt ? Bread, Butter is
    10/2050

31
Agrawal (94)s Apriori AlgorithmAn Example
Transactions
Itemset sup
A 2
B 3
C 3
D 1
E 3
C1
L1
Itemset sup
A 2
B 3
C 3
E 3
T-ID Items
10 A, C, D
20 B, C, E
30 A, B, C, E
40 B, E
1st scan
C2
C2
Itemset sup
A, B 1
A, C 2
A, E 1
B, C 2
B, E 3
C, E 2
Itemset
A, B
A, C
A, E
B, C
B, E
C, E
2nd scan
L2
Itemset sup
A, C 2
B, C 2
B, E 3
C, E 2
Itemset
B, C, E
C3
L3
3rd scan
Itemset sup
B, C, E 2
A,B,C?
32
Sequential Patterns
  • Instead of finding association between items in a
    single transactions, find association between
    items across related transactions over time.

Customer ID Transaction Data. Item 1 Item 2
AA 2/2/2001 Laptop Case
AA 1/13/2002 Wireless network card Router
BB 4/5/2002 laptop iPaq
BB 8/10/2002 Wireless network card Router
  • Sequence Laptop, Wireless Card, Router
  • A sequence has to satisfy some predetermined
    minimum support

33
Examples of Sequence Data
Sequence Database Sequence Element (Transaction) Event(Item)
Customer Purchase history of a given customer A set of items bought by a customer at time t Books, diary products, CDs, etc
Web Data Browsing activity of a particular Web visitor A collection of files viewed by a Web visitor after a single mouse click Home page, index page, contact info, etc
Event data History of events generated by a given sensor Events triggered by a sensor at time t Types of alarms generated by sensors
Genome sequences DNA sequence of a particular species An element of the DNA sequence Bases A,T,G,C
Element (Transaction)
Event (Item)
E1E2
E1E3
E2
E3E4
E2
Sequence
34
Examples of Sequence
  • Web sequence lt Homepage Electronics
    Digital Cameras Canon Digital Camera
    Shopping Cart Order Confirmation Return to
    Shopping gt
  • Sequence of books checked out at a library
  • ltFellowship of the Ring The Two Towers
    Return of the Kinggt

35
Applications of Association Rules
  • Market-Basket Analysis
  • e.g. Product assortment optimization (see next
    slide)
  • Recommendations Determines which books are
    frequently purchased together and recommends
    associated books or products to people who
    express interest in an item.
  • Healthcare Studying the side-effects in patients
    with multiple prescriptions, we can discover
    previously unknown interactions and warn patients
    about them.
  • Fraud detection Finding in insurance data that a
    certain doctor often works with a certain lawyer
    may indicate potential fraudulent activity.
    (virtual items)
  • Sequence Discovery looks for associations
    between items bought over time. E.g., we may
    notice that people who buy chili tend to buy
    antacid within a month. Knowledge like this can
    be used to plan inventory levels.

36
Product Assortment Optimization
Graphs of expected sales (e.g derived from
association rules) and costs (e.g. of purchasing
and holding inventory) can allow us to optimize
the number and selection (choice) of items in a
product category.
Dollars
Revenues
Costs
Margin
Products in Category
Dollars
Max Profit
Margin Revenues - Costs
Products in Category
36
37
Agenda
  • Market basket analysis Association rules
  • Software demo
  • Exercise

38
Agenda
  • Market basket analysis Association rules
  • Software demo
  • Exercise

39
Exercise
Transaction No. Item 1 Item 2 Item 3 Item 4
100 Beer Diaper Chocolate  
101 Milk Chocolate Shampoo  
102 Beer Soap Vodka  
103 Beer Cheese Wine  
104 Milk Diaper Beer Chocolate
  • Given the above list of transactions, do the
    following
  • 1) Find all the frequent itemsets (minimum
    support 40)
  • 2) Find all the association rules (minimum
    confidence 70)
  • 3) For the discovered association rules,
    calculate the lift

40
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com