Title: Mining Association Rules with Constraints
1Mining Association Rules with Constraints
- Wei Ning
- Joon Wong
- COSC 6412 Presentation
2Outline
- Introduction
- Summary of Approach
- Algorithm CAP
- Performance Analysis
- Conclusion
- References
3Outline
- Introduction
- Summary of Approach
- Algorithm CAP
- Performance Analysis
- Conclusion
- References
4Introduction
- Recall mining association rules
- Association rules mining finds interesting
association or correlation relationships among a
large set of data items.
5Some problems we met during mining association
rules
- Overwhelming?
- Not what you want?
- Wait so long?
- Lack of Focus
6Introduction(cont.)
- Example in walmart
- Suppose a manager want to find which is the most
popular shoes in winter?
7Outline
- Introduction
- Summary of Approach
- Algorithm CAP
- Performance Analysis
- Conclusion
- References
8Mining frequent itemsets vs. Mining association
rules
- Mining frequent itemsets is almost the same as
Mining association rules
9Constrained Mining
- A naive solution
- First find all frequent sets, and then test them
for constraint satisfaction
- Our approach
- Analyze the properties of constraints
comprehensively - Push them as deeply as possible inside the
frequent pattern computation.
10Frequent Itemsets Constraints
- Given a transaction database
- Frequent itemset a subset of items frequently
appear in transactions, e.g. a, c - Constraint a predicate over itemsets
- C(I) sum(I)gt50
- C(abd)
TDB (min_sup2)
TID Transaction
10 a, b, c
20 b, c, d, f
30 a, c
Item Value
a 40
b 10
c -20
d 10
e -30
true
11Mining Frequent Itemsets With Constraints
- Given
- A transaction database TDB
- A support threshold min_sup
- A constraint C
- Find the complete set of frequent itemsets
satisfying the constraint - Use constraint to
- Express users focus
- Improve both effectiveness and efficiency
12Classification of Constraints
- We have the following classification of
constraints - Anti-monotone
- Monotone
- Succinct
- Convertible
- Convertible anti-monotone
- Convertible monotone
- Strongly convertible
- Inconvertible
13Anti-Monotone
- Definition 1 (Anti-Monotone) A 1-var constraint
C is anti-monotone if for all sets S, S - S ? S S satisfies C ? S satisfies C.
- Simply, when an intemset S violates
- the constraint, so does any of its
- superset
14Is Min(S) ? v anti-monotone?
- 5 violates it.
- Superset 5 5, 10, 5, 14, 5, 10 , 14
- So does 5, 10, 5, 14, 5, 10 , 14
- Min(S) ? v is anti-monotone
15Succinct
- Definition 2 (Succinct)
- I ? Item is a succinct set if it can be expressed
as ?p(Item) for some selection predicate p. - SP ? 2Item is a succinct powerset if there is a
fixed number of succinct sets Item1, Itemk ?
Item such that SP can be expressed in terms of
the strict powersets of Item1,,Itemk, using
union and minus. - Finally, a 1-var constraint C is succinct
provided SATc(Item) is a succinct powerset.
16Succinct
- General idea we can enumerate all and only those
sets that are guaranteed to satisfy the
constraint. - If a constraint is succinct, we can directly
generate precisely the sets that satisfy it.
17Succinct example
- Itemset containing a or b
- Itemset containing some item with value more than
30
18Succinct example
- C1 ? Item.Price ? 100
- Item 1 ?Item.price ? 100(Item)a,b
- 2Item1a, b, a, b
- SATc1 a, b, a, b
- SATc1 2Item1
- C1 is succinct
19Convertible
- Convert tough constraints into anti-monotone or
monotone by properly order items
20Convertible
- Definition
- R is an order of items
- Convertible anti-monotone
- Itemset X satisfies constraint ? so does every
prefix of X w.r.t. R
21Convertible example
- constraint C avg(X) ? 25
- Order items in value-descending order
- lta, f, g, d, b, h, c, egt
- Itemset afd satisfies C
- So do prefixes a and af
- Thus, it becomes
- Anti-monotone!
Item Value
a 40
b 0
c -20
d 10
e -30
f 30
g 20
h -10
Item Value
a 40
f 30
g 20
d 10
b 0
h -10
c -20
e -30
22Commonly Used Constraints A General Picture
Constraint Antimonotone Monotone Succinct
v ? S no yes yes
S ? V no yes yes
S ? V yes no yes
min(S) ? v no yes yes
min(S) ? v yes no yes
max(S) ? v yes no yes
max(S) ? v no yes yes
count(S) ? v yes no weakly
count(S) ? v no yes weakly
sum(S) ? v ( a ? S, a ? 0 ) yes no no
sum(S) ? v ( a ? S, a ? 0 ) no yes no
range(S) ? v yes no no
range(S) ? v no yes no
avg(S) ? v, ? ? ?, ?, ? convertible convertible no
support(S) ? ? yes no no
support(S) ? ? no yes no
23Optional Proof of min(S) ? v is Anti-monotone
- According to the table, min(S) ? v is both
anti-monotone and succinct. - I only proof anti-monotone here due to time
limitation. - Something special
24Constraint Classification
Monotone
Antimonotone
Strongly convertible
Succinct
Convertible anti-monotone
Convertible monotone
Inconvertible
25Summary of ApproachRecapitulation
- Basic idea about mining frequent itemsets with
constraints. - Introduce several important constraints.
26Outline
- Introduction
- Summary of Approach
- Algorithm CAP
- Performance Analysis
- Conclusion
- References
27Algorithms
- There are many algorithms in solving constrained
based association rules mining. - Algorithm Direct
- Algorithm MultiJoins Reorder
- Algorithm Apriori
- Algorithm Hybrid(m)
- Algorithm CAP (Main Focus)
28Design of Algorithm
- Sound
- An algorithm is sound provided it only finds
frequent sets that satisfy the given constraints. - Complete
- An algorithm is complete provided all frequent
sets satisfying the given constraints are found.
29Algorithm Apriori
- Main idea Use Apriori Algorithm to get the
frequent item sets. Then apply the constraints
on the item sets found. - Step 1) Apriori with Cfreq
- Step 2) Apply C Cfreq to get final Ans
30Algorithm Apriori (Pseudocode)
- 1. C1 consists of sets of size 1 k 1 Ans ?
- 2. While (Ck not empty)
- 2.1 conduct db scan to form Lk from Ck
- 2.2 form Ck1 from Lk based on Cfreq k
- 3. For each set S in some Lk
- Add S to Ans if S satisfies (C Cfreq).
31The Apriori Algorithm An Example
Itemset sup
A 2
B 3
C 3
D 1
E 3
Itemset sup
A 2
B 3
C 3
E 3
L1
Database TDB
C1
Tid Items
10 A, C, D
20 B, C, E
30 A, B, C, E
40 B, E
1st scan
C2
C2
Itemset sup
A, B 1
A, C 2
A, E 1
B, C 2
B, E 3
C, E 2
Itemset
A, B
A, C
A, E
B, C
B, E
C, E
2nd scan
L2
Itemset sup
A, C 2
B, C 2
B, E 3
C, E 2
C3
L3
Itemset
B, C, E
3rd scan
Itemset sup
B, C, E 2
32The Apriori Algorithm An Example (cont.)
L1
Itemset sup
A 2
B 3
C 3
E 3
Constraint A, C, E ? T.Item
Database TDB
Tid Items
10 A, C, D
20 B, C, E
30 A, B, C, E
40 B, E
Ans
A
C
E
A, C
C, E
L2
Itemset sup
A, C 2
B, C 2
B, E 3
C, E 2
L3
Itemset sup
B, C, E 2
33Algorithm CAP
- Succinct and Anti-monotone
- Strategy I Replace C1 in the Apriori Algorithm
by C1C. - Anti-monotone but non-succinct
- Strategy II Define Ck as in the Apriori
Algorithm. Drop a set S ? Ck from counting if S
fails C, i.e., constraint satisfaction is tested
before counting is done.
34Algorithm CAP (cont.)
- Succinct but non-anti-monotone
- Strategy III Too Complicated. To be discussed
later - Non-succinct non-anti-monotone
- Strategy IV Induce any weaker constraint C1 from
C. Depending on whether C1 is anti-monotone
and/or succinct, use one of the strategies I-III
above for the generation of frequent set.
35Algorithm CAP (Pseudocode)
- 1 if Csam ? Csuc ? Cnone is non-empty, prepare
C1 as indicated in Strategies I, III, and IV k
1 - 2 if Csuc is non-empty
- 2.1 conduct db scan to form L1 as indicated in
Strategy III - 2.2 form C2 as indicated in Strategy III k
2 - 3 while (Ck not empty)
- 3.1 conduct db scan to form Lk from Ck
- 3.2 form Ck1 from Lk based on Strategy III if
Csuc is non-empty, and Strategy II for
constraints in Cam - 4. if Cnone is empty, Ans ULk. Otherwise, for
each set S in some Lk, add S to Ans iff S
satisfies Cnone.
36The Algorithm CAP An Example
Constraints A, C, E ? T.Item min support
count 2 Question Which strategy should we
apply?
Database TDB
Tid Items
10 A, C, D
20 B, C, E
30 A, B, C, E
40 B, E
37The Algorithm CAP An Example (Cont.)
L1
Itemset sup
A 2
C 3
E 3
Database TDB
Apply Strategy I!!!
C1
Tid Items
10 A, C, D
20 B, C, E
30 A, B, C, E
40 B, E
Itemset sup
A 2
C 3
E 3
1st scan
C2
Itemset
A, C
A, E
C, E
2nd scan
C2
Itemset sup
A, C 2
A, E 1
C, E 2
L2
Itemset sup
A, C 2
C, E 2
Ans
A
C
E
A, C
C, E
C3
Itemset
Because A, E is pruned earlier
38Case 3 Succinct but not anti-monotone. Revisit
1 2 3 4 1,2 2,33,4 1,2,3,4
Some possible frequent sets may be lost e.g.
1,8 1,2,10
Information extracted from past presentation.
39Case 3 Succinct but not anti-monotone.
Continue
- Algorithm Direct
- Idea Play it safe. Generate Cck1 by using Lck
x F where F is the set of all frequent items. - Algorithm MultiJoins
- Algorithm Reorder
40Outline
- Introduction
- Summary of Approach
- Algorithm CAP
- Performance Analysis
- Conclusion
- References
41Performance Analysis (Specification)
- Programs written in C
- Generate transactional databases using program
from IBM Almaden Research Center - 100,000 records, domain of 1,000 items
- Page size 4KB
- SPARC-10 environment
42Performance Analysis (Terminology)
- Speedup
- Comparison of execution time between two
algorithms. - Item Selectivity
- x of them items satisfying the constraints.
- Support Threshold
- Low support threshold means more frequent set to
process.
43Performance Analysis
- Note Support threshold set at 0.5.
- For 10 selectivity, CAP runs 80 times faster
than Apriori! - For 30 selectivity, the speedup is about 10
times.
44Performance Analysis
- Note Item Selectivity fixed at 30.
- Support threshold goes up, frequent item set goes
down, Apriori improves. - CAP still at least 8 times faster.
45Performance Analysis
Support L1 L2 L3 L4 L5 L6 L7 L8
0.2 174/582 79/969 29/1140 8/1250 1/934 0/451 0/132 0/20
0.6 98/313 1/12 0/1 0 0 0 0 0
- Each entry is of the form a/b
- a is the of frequent set satisfying the
constraint. - B is the total number of frequent set.
- For L4 with support of 0.2, Apriori finds 1250
frequent sets where 8 of which is found by CAP.
46Conclusion
- The idea of anti-monotonicity, succinctness, and
convertible are introduced in the paper. - Sound, complete, and efficient algorithms are
introduced for the constraint based association
rule mining.
47Reference
- R. Srikant, Q. Vu, and R. Agrawal. Mining
association rules with item constraints. KDD97. - R. Ng, L. V. S. Lakshmanan, J. Han, and A. Pang.
Exploratory mining and pruning optimizations of
constrained associations rules. SIGMOD98. - J. Pei and J. Han. Can we push more constraints
into frequent pattern mining? KDD00.