Title: Bottleneck of Frequentpattern Mining
1Bottleneck of Frequent-pattern Mining
- Multiple database scans are costly
- Mining long patterns (i.e. long frequent
itemsets) - needs many passes of scanning
- generates lots of candidates
- To find frequent itemset i1i2i100
- of scans 100
- of Candidates (1001) (1002) (100100)
2100-1 1.271030 ! - Bottleneck candidate-generation-and-test
- Can we avoid candidate generation?
2FP-growth Algorithm
- Use a compressed representation of the database
using a FP-tree - Once an FP-tree has been constructed, it uses a
recursive divide-and-conquer approach to mine the
frequent itemsets - Database ? 1 scan to get frequent 1-itemsets ?
Sort transactions based on f-list ? 1 scan to
create FP-tree - ? For every frequent item p, create p s
conditional pattern base (bottom up) - ? For every frequent item p, create p s
conditional FP-tree - ? Generate all frequent itemsets ending in p
3Construct FP-tree from a Transaction Database
F-listf-c-a-b-m-p
TID Items bought (ordered) frequent
items 100 f, a, c, d, g, i, m, p f, c, a, m,
p 200 a, b, c, f, l, m, o f, c, a, b,
m 300 b, f, h, j, o, w f, b 400 b, c,
k, s, p c, b, p 500 a, f, c, e, l, p, m,
n f, c, a, m, p
min_support 3
- Scan DB once, find frequent 1-itemsets (single
item patterns) - Sort frequent items in frequency descending
order, f-list - Scan DB again, construct FP-tree
4Creating conditional pattern bases
- Starting at the frequent item header table in the
FP-tree (bottom up) - Traverse the FP-tree by following the link of
each frequent item p - Construct ps conditional pattern base (a
sub-database which consists of the set of prefix
paths to p)
Conditional pattern bases item cond. pattern
base p fcam2, cb1 m fca2, fcab1 b fca1, f1,
c1 a fc3 c f3
5From Conditional Pattern-bases to Conditional
FP-trees
- For each pattern-base
- Accumulate the count for each item in the base
- Construct the FP-tree for the frequent items of
the pattern base
m-conditional pattern base fca2, fcab1
All frequent patterns relate to m m(3), fm(3),
cm(3), am(3), fcm(3), fam(3), cam(3), fcam(3)
?
?
6Example 2 minsupp 2