Top Down FP-Growth for Association Rule Mining - PowerPoint PPT Presentation

About This Presentation

Title:

Top Down FP-Growth for Association Rule Mining

Description:

Among them, Apriori is the classical algorithm in frequent pattern mining. Better than previous algorithms though, Apriori suffers drawbacks, such as ... – PowerPoint PPT presentation

Number of Views:680

Avg rating:3.0/5.0

Slides: 28

Provided by: llt7

Category:

more less

Transcript and Presenter's Notes

Title: Top Down FP-Growth for Association Rule Mining

1
Top Down FP-Growth for Association Rule Mining

By
Ke Wang

2
Introduction

Classically, for rule A ? B
support computed by count( AB )
frequent --- if pass minimum support threshold
confidence computed by count( AB ) / count(A )
confident if pass minimum confidence threshold
How to mine association rules?
find all frequent patterns
generate rules from the frequent patterns

3
Introduction

Limitations of current research
use uniform minimum support threshold
only use support as pruning measure
Our contribution
improve efficiency
adopt multiple minimum supports
introduce confidence pruning

4
Related work -- Frequent pattern mining

Apriori algorithm
method use anti-monotone property of support to
do pruning, i.e.
if length k pattern is infrequent, its length k1
super-pattern can never be frequent
FP-growth algorithm--better than Apriori
method
build FP-tree to store database
mine FP-tree in bottom-up order

5
Related work -- Association rule mining

Fast algorithms trying to guarantee completeness
of frequent patterns
Parallel algorithms association rule based
query languages
Various association rule mining problems
multi-level multi-dimension rule
constraints on specific item

6
TD-FP-Growth for frequent pattern mining

Similar tree structure as FP-growth
Compressed tree to store the database
nodes on each path of the tree are globally
ordered
Different mining method VS.FP-growth
FP-growth bottom-up tree mining
TD-FP-Growth top-down tree mining

7
TD-FP-Growth for frequent pattern mining
b, e a, b, c, e b, c, e a, c, d a minsup 2
Construct a FP-tree
Entry value count side-link
a b c e 3 3 3 3
8
TD-FP-Growth for frequent pattern mining
b, e a, b, c, e b, c, e a, c, d a minsup 2
FP-growth bottom-up mining
Mining order e, c, b, a
item Head of node-link
a b c e
9
TD-FP-Growth for frequent pattern mining

FP-growth bottom-up mining

item Head of node-link
b c
? drawback!
10
TD-FP-Growth for frequent pattern mining

TD-FP-Growth adopt top-down mining strategy
motivation avoid building extra databases and
sub-trees as FP-growth does
method process nodes on the upper level before
those on the lower level
result any modification happened on the upper
level nodes would not affect the lower level nodes

See example ?
11
TD-FP-Growth for frequent pattern mining
b, e a, b, c, e b, c, e a, c, d a minsup 2
Mining order a, b, c, e
Entry value count side-link
a b c e 3 3 3 3
CT-tree and header table H
12
CT-tree for frequent pattern mining
b, e a, b, c, e b, c, e a, c, d a minsup 2
Entry value count side-link
a b 2 2
Entry value count side-link
a b c e 3 3 3 3
13
CT-tree for frequent pattern mining

Completeness
for entry i in H, we mine all the frequent
patterns that end up with item i, no more and no
less
Complete set of frequent patterns
a
b
c , b, c , a, c
e , b, e , c, e , b, c, e

14
TD-FP-Growth for frequent pattern mining

Comparing to FP-growth, TD-FP-Growth is
Space saving
only one tree and a few header tables
no extra databases and sub-trees
Time saving
does not build extra databases and sub-trees
walk up path only once to update count
information for nodes on the tree and build
sub-header-tables.

15
TD-FP-Growth for association rule mining

Assumptions
There is a class-attribute in the database
Items in the class-attribute called class-items,
others are non-class-items
Each transaction is associated a class-item
Only class-item appears in the right-hand of the
rule

Transaction ID non-class-attribute class-attribute
1 a, b C1
2 d C2
3 e, d, f C3

example rule a, b ? Ci
16
TD-FP-Growth for association rule mining--multi
mini support

Why?
Use uniform minimum support, computation of count
considers only number of appearance
Uniform minimum support is unfair to items that
appears less but worth more.
Eg. responder vs. non-responder
How?
Use different support threshold for different
class

17
TD-FP-Growth for association rule mining -- multi
mini support

multiple VS. uniform
C1 4, C 2 2
rules with relative minsup 50 proportional to
each class -- multiplier in performance
uniform minimum support absolute minsup 1
11 nodes tree, 23 rules
multiple minimum supports absolute minsup1 2
absolute minsup2 1
7 nodes tree, 9 rules
more effective and space-saving
time-saving --- show in performance

c, f, C1 b, e, C2 b, e, f, C1 a, c, f, C1 c, e,
C2 b, c, d, C1
18
TD-FP-Growth for association rule mining--conf
pruning

Motivation
make use of the other constraint of association
rule confidence, to speed up mining
Method
confidence is not anti-monotone
introduce acting constraint of confidence, which
is anti-monotone
push it inside the mining process

19
TD-FP-Growth for association rule mining--conf
pruning
conf(A ? B) count(AB) / count(A) gt minconf ?
count(AB) gt count(A) minconf ?
count(AB) gt minsup minconf (anti-monotone
weaker)
--- the acting constraint of confidence for the
original confidence constraint of rule A ? B

support of rule is computed by count(A)
count(AB) class-count of itemset A related to
class B

20
TD-FP-Growth for association rule mining--conf
pruning
c, f, C1 b, e, C2 b, e, f, C1 a, c, f, C1 a, c,
d, C2 minsup 2 minconf 60
Entry value i count (i) count(i,C1) count(i,C2) side-link
a b c e f 2 2 3 2 3 1 1 2 1 3 1 1 1 1 0
count(e) gt minsup However, both count(e, C1)
count(e, C2) lt minsup minconf ? terminate
mining for e!
Entry value i count (i) count(i,Ci) count(i,C2) side-link
b 2 1 1
21
Performance

Choose several data sets from UC_Irvine Machine
Learning Database Repository
http//www.ics.uci.edu/mlearn/MLRepository.htm
l.

name of dataset of transactions of items in each transaction class distribution of distinct items
Dna-train 2000 61 23.2, 24.25, 52.55 240
Connect-4 67557 43 9.55, 24.62, 65.83 126
Forest 581012 13 0.47, 1.63, 2.99, 3.53, 6.15, 36.36, 48.76 15916
22
Performancefrequent pattern
23
Performance mine rules with multiple minimum
supports
FP-growth is only for frequent pattern mining
relative minsup, proportional to each class
24
Performance mine rules with confidence pruning
25
Conclusions and future work

Conclusions of TD-FP-Growth algorithm
more efficient in finding both frequent patterns
and association rules
more effective in mining rules by using multiple
minimum supports
Introduce a new pruning method confidence
pruning, and push it inside the mining process
thus further speed up mining

26
Conclusions and future work

Future work
Explore other constraint-based association rule
mining method
Mine association rules with item concept
hierarchy
Apply TD-FP-Growth to applications based on
association rule mining
Clustering
Classification

27
Reference

(1) R. Agrawal, T. Imielinski, and A. Swami.
Mining association rules between sets of items in
large databases. Proc. 1993 ACM-SIGMOD Int. Conf.
on Management of Data (SIGMOD93), pages 207-216,
Washington, D.C., May 1993.
(2) U. M. Fayyad, G. Piatetsky-Shapiro, P.
Smyth, and R. Uthurusamy (eds.). Advances in
Knowledge Discovery and Data Mining. AAAI/MIT
Press, 1996.
(3) H. Toivonen. Sampling large databases for
association rules. Proc. 1996 Int. Conf. Very
Large Data Bases (VLDB96), pages 134-145,
Bombay, India, September 1996.
(4) R. Agrawal and S. Srikant. Mining
sequential patterns. Proc. 1995 Int. Conf. Data
Engineering (ICDE95), pages 3-14, Taipei,
Taiwan, March 1995.
(5) J. Han, J. Pei and Y. Yin. Mining Frequent
Patterns without Candidate Generation. Proc. 2000
ACM-SIGMOD Int. Conf. on Management of Data
(SIGMOD00), pages 1-12, Dallas, TX, May 2000.
(6) J. Han, J. Pei, G. Dong, and K. Wang.
Efficient Computation of Iceberg Cubes with
Complex Measures. Proc. 2001 ACM-SIGMOD Int.
Conf., Santa Barbara, CA, May 2001.
And more!