Mining Association Rules with Constraints - PowerPoint PPT Presentation

About This Presentation
Title:

Mining Association Rules with Constraints

Description:

Association rules mining finds interesting association or correlation ... Recapitulation. Basic idea about mining frequent itemsets with constraints. ... – PowerPoint PPT presentation

Number of Views:501
Avg rating:3.0/5.0
Slides: 48
Provided by: cseY
Category:

less

Transcript and Presenter's Notes

Title: Mining Association Rules with Constraints


1
Mining Association Rules with Constraints
  • Wei Ning
  • Joon Wong
  • COSC 6412 Presentation

2
Outline
  • Introduction
  • Summary of Approach
  • Algorithm CAP
  • Performance Analysis
  • Conclusion
  • References

3
Outline
  • Introduction
  • Summary of Approach
  • Algorithm CAP
  • Performance Analysis
  • Conclusion
  • References

4
Introduction
  • Recall mining association rules
  • Association rules mining finds interesting
    association or correlation relationships among a
    large set of data items.

5
Some problems we met during mining association
rules
  • Overwhelming?
  • Not what you want?
  • Wait so long?
  • Lack of Focus

6
Introduction(cont.)
  • Example in walmart
  • Suppose a manager want to find which is the most
    popular shoes in winter?

7
Outline
  • Introduction
  • Summary of Approach
  • Algorithm CAP
  • Performance Analysis
  • Conclusion
  • References

8
Mining frequent itemsets vs. Mining association
rules
  • Mining frequent itemsets is almost the same as
    Mining association rules

9
Constrained Mining
  • A naive solution
  • First find all frequent sets, and then test them
    for constraint satisfaction
  • Our approach
  • Analyze the properties of constraints
    comprehensively
  • Push them as deeply as possible inside the
    frequent pattern computation.

10
Frequent Itemsets Constraints
  • Given a transaction database
  • Frequent itemset a subset of items frequently
    appear in transactions, e.g. a, c
  • Constraint a predicate over itemsets
  • C(I) sum(I)gt50
  • C(abd)

TDB (min_sup2)
TID Transaction
10 a, b, c
20 b, c, d, f
30 a, c
Item Value
a 40
b 10
c -20
d 10
e -30
true
11
Mining Frequent Itemsets With Constraints
  • Given
  • A transaction database TDB
  • A support threshold min_sup
  • A constraint C
  • Find the complete set of frequent itemsets
    satisfying the constraint
  • Use constraint to
  • Express users focus
  • Improve both effectiveness and efficiency

12
Classification of Constraints
  • We have the following classification of
    constraints
  • Anti-monotone
  • Monotone
  • Succinct
  • Convertible
  • Convertible anti-monotone
  • Convertible monotone
  • Strongly convertible
  • Inconvertible

13
Anti-Monotone
  • Definition 1 (Anti-Monotone) A 1-var constraint
    C is anti-monotone if for all sets S, S
  • S ? S S satisfies C ? S satisfies C.
  • Simply, when an intemset S violates
  • the constraint, so does any of its
  • superset

14
Is Min(S) ? v anti-monotone?
  • S5, 10, 14, v 7
  • ? Min(S) ? 7
  • 5 violates it.
  • Superset 5 5, 10, 5, 14, 5, 10 , 14
  • So does 5, 10, 5, 14, 5, 10 , 14
  • Min(S) ? v is anti-monotone

15
Succinct
  • Definition 2 (Succinct)
  • I ? Item is a succinct set if it can be expressed
    as ?p(Item) for some selection predicate p.
  • SP ? 2Item is a succinct powerset if there is a
    fixed number of succinct sets Item1, Itemk ?
    Item such that SP can be expressed in terms of
    the strict powersets of Item1,,Itemk, using
    union and minus.
  • Finally, a 1-var constraint C is succinct
    provided SATc(Item) is a succinct powerset.

16
Succinct
  • General idea we can enumerate all and only those
    sets that are guaranteed to satisfy the
    constraint.
  • If a constraint is succinct, we can directly
    generate precisely the sets that satisfy it.

17
Succinct example
  • Itemset containing a or b
  • Itemset containing some item with value more than
    30

18
Succinct example
  • C1 ? Item.Price ? 100
  • Item 1 ?Item.price ? 100(Item)a,b
  • 2Item1a, b, a, b
  • SATc1 a, b, a, b
  • SATc1 2Item1
  • C1 is succinct

19
Convertible
  • Convert tough constraints into anti-monotone or
    monotone by properly order items

20
Convertible
  • Definition
  • R is an order of items
  • Convertible anti-monotone
  • Itemset X satisfies constraint ? so does every
    prefix of X w.r.t. R

21
Convertible example
  • constraint C avg(X) ? 25
  • Order items in value-descending order
  • lta, f, g, d, b, h, c, egt
  • Itemset afd satisfies C
  • So do prefixes a and af
  • Thus, it becomes
  • Anti-monotone!

Item Value
a 40
b 0
c -20
d 10
e -30
f 30
g 20
h -10
Item Value
a 40
f 30
g 20
d 10
b 0
h -10
c -20
e -30
22
Commonly Used Constraints A General Picture
Constraint Antimonotone Monotone Succinct
v ? S no yes yes
S ? V no yes yes
S ? V yes no yes
min(S) ? v no yes yes
min(S) ? v yes no yes
max(S) ? v yes no yes
max(S) ? v no yes yes
count(S) ? v yes no weakly
count(S) ? v no yes weakly
sum(S) ? v ( a ? S, a ? 0 ) yes no no
sum(S) ? v ( a ? S, a ? 0 ) no yes no
range(S) ? v yes no no
range(S) ? v no yes no
avg(S) ? v, ? ? ?, ?, ? convertible convertible no
support(S) ? ? yes no no
support(S) ? ? no yes no
23
Optional Proof of min(S) ? v is Anti-monotone
  • According to the table, min(S) ? v is both
    anti-monotone and succinct.
  • I only proof anti-monotone here due to time
    limitation.
  • Something special

24
Constraint Classification
Monotone
Antimonotone
Strongly convertible
Succinct
Convertible anti-monotone
Convertible monotone
Inconvertible
25
Summary of ApproachRecapitulation
  • Basic idea about mining frequent itemsets with
    constraints.
  • Introduce several important constraints.

26
Outline
  • Introduction
  • Summary of Approach
  • Algorithm CAP
  • Performance Analysis
  • Conclusion
  • References

27
Algorithms
  • There are many algorithms in solving constrained
    based association rules mining.
  • Algorithm Direct
  • Algorithm MultiJoins Reorder
  • Algorithm Apriori
  • Algorithm Hybrid(m)
  • Algorithm CAP (Main Focus)

28
Design of Algorithm
  • Sound
  • An algorithm is sound provided it only finds
    frequent sets that satisfy the given constraints.
  • Complete
  • An algorithm is complete provided all frequent
    sets satisfying the given constraints are found.

29
Algorithm Apriori
  • Main idea Use Apriori Algorithm to get the
    frequent item sets. Then apply the constraints
    on the item sets found.
  • Step 1) Apriori with Cfreq
  • Step 2) Apply C Cfreq to get final Ans

30
Algorithm Apriori (Pseudocode)
  • 1. C1 consists of sets of size 1 k 1 Ans ?
  • 2. While (Ck not empty)
  • 2.1 conduct db scan to form Lk from Ck
  • 2.2 form Ck1 from Lk based on Cfreq k
  • 3. For each set S in some Lk
  • Add S to Ans if S satisfies (C Cfreq).

31
The Apriori Algorithm An Example
Itemset sup
A 2
B 3
C 3
D 1
E 3
Itemset sup
A 2
B 3
C 3
E 3
L1
Database TDB
C1
Tid Items
10 A, C, D
20 B, C, E
30 A, B, C, E
40 B, E
1st scan
C2
C2
Itemset sup
A, B 1
A, C 2
A, E 1
B, C 2
B, E 3
C, E 2
Itemset
A, B
A, C
A, E
B, C
B, E
C, E
2nd scan
L2
Itemset sup
A, C 2
B, C 2
B, E 3
C, E 2
C3
L3
Itemset
B, C, E
3rd scan
Itemset sup
B, C, E 2
32
The Apriori Algorithm An Example (cont.)
L1
Itemset sup
A 2
B 3
C 3
E 3
Constraint A, C, E ? T.Item
Database TDB
Tid Items
10 A, C, D
20 B, C, E
30 A, B, C, E
40 B, E
Ans
A
C
E
A, C
C, E
L2
Itemset sup
A, C 2
B, C 2
B, E 3
C, E 2
L3
Itemset sup
B, C, E 2
33
Algorithm CAP
  • Succinct and Anti-monotone
  • Strategy I Replace C1 in the Apriori Algorithm
    by C1C.
  • Anti-monotone but non-succinct
  • Strategy II Define Ck as in the Apriori
    Algorithm. Drop a set S ? Ck from counting if S
    fails C, i.e., constraint satisfaction is tested
    before counting is done.

34
Algorithm CAP (cont.)
  • Succinct but non-anti-monotone
  • Strategy III Too Complicated. To be discussed
    later
  • Non-succinct non-anti-monotone
  • Strategy IV Induce any weaker constraint C1 from
    C. Depending on whether C1 is anti-monotone
    and/or succinct, use one of the strategies I-III
    above for the generation of frequent set.

35
Algorithm CAP (Pseudocode)
  • 1 if Csam ? Csuc ? Cnone is non-empty, prepare
    C1 as indicated in Strategies I, III, and IV k
    1
  • 2 if Csuc is non-empty
  • 2.1 conduct db scan to form L1 as indicated in
    Strategy III
  • 2.2 form C2 as indicated in Strategy III k
    2
  • 3 while (Ck not empty)
  • 3.1 conduct db scan to form Lk from Ck
  • 3.2 form Ck1 from Lk based on Strategy III if
    Csuc is non-empty, and Strategy II for
    constraints in Cam
  • 4. if Cnone is empty, Ans ULk. Otherwise, for
    each set S in some Lk, add S to Ans iff S
    satisfies Cnone.

36
The Algorithm CAP An Example
Constraints A, C, E ? T.Item min support
count 2 Question Which strategy should we
apply?
Database TDB
Tid Items
10 A, C, D
20 B, C, E
30 A, B, C, E
40 B, E
37
The Algorithm CAP An Example (Cont.)
L1
Itemset sup
A 2
C 3
E 3
Database TDB
Apply Strategy I!!!
C1
Tid Items
10 A, C, D
20 B, C, E
30 A, B, C, E
40 B, E
Itemset sup
A 2
C 3
E 3
1st scan
C2
Itemset
A, C
A, E
C, E
2nd scan
C2
Itemset sup
A, C 2
A, E 1
C, E 2
L2
Itemset sup
A, C 2
C, E 2
Ans
A
C
E
A, C
C, E
C3
Itemset

Because A, E is pruned earlier
38
Case 3 Succinct but not anti-monotone. Revisit
1 2 3 4 1,2 2,33,4 1,2,3,4

Some possible frequent sets may be lost e.g.
1,8 1,2,10
Information extracted from past presentation.
39
Case 3 Succinct but not anti-monotone.
Continue
  • Algorithm Direct
  • Idea Play it safe. Generate Cck1 by using Lck
    x F where F is the set of all frequent items.
  • Algorithm MultiJoins
  • Algorithm Reorder

40
Outline
  • Introduction
  • Summary of Approach
  • Algorithm CAP
  • Performance Analysis
  • Conclusion
  • References

41
Performance Analysis (Specification)
  • Programs written in C
  • Generate transactional databases using program
    from IBM Almaden Research Center
  • 100,000 records, domain of 1,000 items
  • Page size 4KB
  • SPARC-10 environment

42
Performance Analysis (Terminology)
  • Speedup
  • Comparison of execution time between two
    algorithms.
  • Item Selectivity
  • x of them items satisfying the constraints.
  • Support Threshold
  • Low support threshold means more frequent set to
    process.

43
Performance Analysis
  • Note Support threshold set at 0.5.
  • For 10 selectivity, CAP runs 80 times faster
    than Apriori!
  • For 30 selectivity, the speedup is about 10
    times.

44
Performance Analysis
  • Note Item Selectivity fixed at 30.
  • Support threshold goes up, frequent item set goes
    down, Apriori improves.
  • CAP still at least 8 times faster.

45
Performance Analysis
Support L1 L2 L3 L4 L5 L6 L7 L8
0.2 174/582 79/969 29/1140 8/1250 1/934 0/451 0/132 0/20
0.6 98/313 1/12 0/1 0 0 0 0 0
  • Each entry is of the form a/b
  • a is the of frequent set satisfying the
    constraint.
  • B is the total number of frequent set.
  • For L4 with support of 0.2, Apriori finds 1250
    frequent sets where 8 of which is found by CAP.

46
Conclusion
  • The idea of anti-monotonicity, succinctness, and
    convertible are introduced in the paper.
  • Sound, complete, and efficient algorithms are
    introduced for the constraint based association
    rule mining.

47
Reference
  • R. Srikant, Q. Vu, and R. Agrawal. Mining
    association rules with item constraints. KDD97.
  • R. Ng, L. V. S. Lakshmanan, J. Han, and A. Pang.
    Exploratory mining and pruning optimizations of
    constrained associations rules. SIGMOD98.
  • J. Pei and J. Han. Can we push more constraints
    into frequent pattern mining? KDD00.
Write a Comment
User Comments (0)
About PowerShow.com