COMP5331 - PowerPoint PPT Presentation

About This Presentation
Title:

COMP5331

Description:

COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong raywong_at_cse COMP5331 * COMP5331 * Threshold = 3 root a:4 b:3 d:2 e:1 f:1 g:1 e:1 g:1 f:1 g:1 b:1 d ... – PowerPoint PPT presentation

Number of Views:9
Avg rating:3.0/5.0
Slides: 51
Provided by: Raymond156
Category:

less

Transcript and Presenter's Notes

Title: COMP5331


1
COMP5331
FP-Tree
Prepared by Raymond Wong Presented by Raymond
Wong raywong_at_cse
2
Large Itemset Mining
  • Frequent Itemset Mining

Problem to find all large (or frequent)
itemsets with support at least a
threshold (i.e., itemsets with
support gt 3)
TID Items Bought
100 a, b, c, d, e, f, g, h
200 a, f, g
300 b, d, e, f, j
400 a, b, d, i, k
500 a, b, e, g
3
Apriori
  1. Join Step
  2. Prune Step

Disadvantage 1 It is costly to handle a large
number of candidate sets
Disadvantage 2 It is tedious to repeatedly scan
the database and check the candidate patterns
Counting Step
4
FP-tree
  • Scan the database once to store all essential
    information in a data structure called FP-tree
    (Frequent Pattern Tree)
  • The FP-tree is concise and is used in directly
    generating large itemsets

5
FP-tree
Step 1 Deduce the ordered frequent items. For
items with the same frequency, the order is given
by the alphabetical order. Step 2 Construct the
FP-tree from the above data Step 3 From the
FP-tree above, construct the FP-conditional tree
for each item (or itemset). Step 4 Determine the
frequent patterns.
6
FP-tree
  • Frequent Itemset Mining

Problem to find all large (or frequent)
itemsets with support at least a
threshold (i.e., itemsets with
support gt 3)
TID Items Bought
100 a, b, c, d, e, f, g, h
200 a, f, g
300 b, d, e, f, j
400 a, b, d, i, k
500 a, b, e, g
7
FP-tree
TID Items Bought
100 a, b, c, d, e, f, g, h
200 a, f, g
300 b, d, e, f, j
400 a, b, d, i, k
500 a, b, e, g
8
FP-tree
TID Items Bought
100 a, b, c, d, e, f, g, h
200 a, f, g
300 b, d, e, f, j
400 a, b, d, i, k
500 a, b, e, g
9
TID Items Bought (Ordered) Frequent Items
100 a, b, c, d, e, f, g, h
200 a, f, g
300 b, d, e, f, j
400 a, b, d, i, k
500 a, b, e, g
Threshold 3
Item Frequency
a
b
c
d
e
f
g
h
i
j
k
4
10
TID Items Bought (Ordered) Frequent Items
100 a, b, c, d, e, f, g, h
200 a, f, g
300 b, d, e, f, j
400 a, b, d, i, k
500 a, b, e, g
Threshold 3
Item Frequency
a
b
c
d
e
f
g
h
i
j
k
4
4
11
TID Items Bought (Ordered) Frequent Items
100 a, b, c, d, e, f, g, h
200 a, f, g
300 b, d, e, f, j
400 a, b, d, i, k
500 a, b, e, g
Threshold 3
Item Frequency
a 4
b 4
d 3
e 3
f 3
g 3
Item Frequency
a
b
c
d
e
f
g
h
i
j
k
4
4
1
3
3
3
3
1
1
1
1
12
TID Items Bought (Ordered) Frequent Items
100 a, b, c, d, e, f, g, h
200 a, f, g
300 b, d, e, f, j
400 a, b, d, i, k
500 a, b, e, g
Threshold 3
a, b, d, e, f, g
a, f, g
b, d, e, f
a, b, d
a, b, e, g
Item Frequency
a 4
b 4
d 3
e 3
f 3
g 3
Item Frequency
a
b
c
d
e
f
g
h
i
j
k
4
4
1
3
3
3
3
1
1
1
1
13
FP-tree
Step 1 Deduce the ordered frequent items. For
items with the same frequency, the order is given
by the alphabetical order. Step 2 Construct the
FP-tree from the above data Step 3 From the
FP-tree above, construct the FP-conditional tree
for each item (or itemset). Step 4 Determine the
frequent patterns.
14
TID Items Bought (Ordered) Frequent Items
100 a, b, c, d, e, f, g, h
200 a, f, g
300 b, d, e, f, j
400 a, b, d, i, k
500 a, b, e, g
Threshold 3
a, b, d, e, f, g
a, f, g
b, d, e, f
a, b, d
a, b, e, g
root
Item Head of node-link
a
b
d
e
f
g
15
TID Items Bought (Ordered) Frequent Items
100 a, b, c, d, e, f, g, h
200 a, f, g
300 b, d, e, f, j
400 a, b, d, i, k
500 a, b, e, g
Threshold 3
a, b, d, e, f, g
a, f, g
b, d, e, f
a, b, d
a, b, e, g
root
Item Head of node-link
a
b
d
e
f
g
a1
a2
b1
d1
e1
f1
g1
16
TID Items Bought (Ordered) Frequent Items
100 a, b, c, d, e, f, g, h
200 a, f, g
300 b, d, e, f, j
400 a, b, d, i, k
500 a, b, e, g
Threshold 3
a, b, d, e, f, g
a, f, g
b, d, e, f
a, b, d
a, b, e, g
root
Item Head of node-link
a
b
d
e
f
g
a2
b1
f1
d1
g1
e1
f1
g1
17
TID Items Bought (Ordered) Frequent Items
100 a, b, c, d, e, f, g, h
200 a, f, g
300 b, d, e, f, j
400 a, b, d, i, k
500 a, b, e, g
Threshold 3
a, b, d, e, f, g
a, f, g
b, d, e, f
a, b, d
a, b, e, g
root
Item Head of node-link
a
b
d
e
f
g
a2
a3
b1
f1
b2
d2
d1
g1
e1
e1
f1
g1
18
TID Items Bought (Ordered) Frequent Items
100 a, b, c, d, e, f, g, h
200 a, f, g
300 b, d, e, f, j
400 a, b, d, i, k
500 a, b, e, g
Threshold 3
a, b, d, e, f, g
a, f, g
b, d, e, f
a, b, d
a, b, e, g
root
Item Head of node-link
a
b
d
e
f
g
a3
b1
a4
b2
f1
b3
d1
d2
g1
e1
e1
f1
f1
g1
19
TID Items Bought (Ordered) Frequent Items
100 a, b, c, d, e, f, g, h
200 a, f, g
300 b, d, e, f, j
400 a, b, d, i, k
500 a, b, e, g
Threshold 3
a, b, d, e, f, g
a, f, g
b, d, e, f
a, b, d
a, b, e, g
root
Item Head of node-link
a
b
d
e
f
g
a4
b1
b3
f1
d1
d2
e1
g1
e1
e1
g1
f1
f1
g1
20
FP-tree
Step 1 Deduce the ordered frequent items. For
items with the same frequency, the order is given
by the alphabetical order. Step 2 Construct the
FP-tree from the above data Step 3 From the
FP-tree above, construct the FP-conditional tree
for each item (or itemset). Step 4 Determine the
frequent patterns.
21
TID Items Bought (Ordered) Frequent Items
100 a, b, c, d, e, f, g, h
200 a, f, g
300 b, d, e, f, j
400 a, b, d, i, k
500 a, b, e, g
Threshold 3
a, b, d, e, f, g
a, f, g
b, d, e, f
a, b, d
a, b, e, g
root
Item Head of node-link
a
b
d
e
f
g
a4
b1
b3
f1
d1
d2
e1
g1
e1
e1
g1
f1
f1
g1
22
root
Item Head of node-link
a
b
d
e
f
g
a4
b1
b3
f1
d1
d2
e1
g1
e1
e1
g1
f1
f1
g1
23
root
Threshold 3
Item Head of node-link
a
b
d
e
f
g
a4
b1
b3
f1
d1
d2
g1
e1
e1
e1
g1
f1
f1
Cond. FP-tree on g

(a1, b1, d1, e1, f1, g1),
g1
24
root
Threshold 3
Item Head of node-link
a
b
d
e
f
g
a4
b1
b3
f1
d1
d2
g1
e1
e1
e1
g1
f1
f1
Cond. FP-tree on g

(a1, b1, d1, e1, f1, g1),
g1
(a1, b1, e1, g1),
25
root
Threshold 3
Item Head of node-link
a
b
d
e
f
g
a4
b1
b3
f1
d1
d2
g1
e1
e1
e1
g1
f1
f1
Cond. FP-tree on g

(a1, b1, d1, e1, f1, g1),
g1
(a1, b1, e1, g1),
(a1, f1, g1)
Item Frequency
a
b
d
e
f
g
3
26
root
Threshold 3
Item Head of node-link
a
b
d
e
f
g
a4
b1
b3
f1
d1
d2
g1
e1
e1
e1
g1
f1
f1
Cond. FP-tree on g
3


(a1, b1, d1, e1, f1, g1),
(a1, g1),
g1
conditional pattern base of g
(a1, b1, e1, g1),
(a1, g1),
(a1, f1, g1)
(a1, g1)
Item Frequency
a
b
d
e
f
g
Item Frequency
a 3
g 3
root
Item Head of node-link
a
3
2
1
2
2
3
27
root
Threshold 3
Item Head of node-link
a
b
d
e
f
g
a4
b1
b3
f1
d1
d2
g1
e1
e1
e1
g1
f1
f1
Cond. FP-tree on f

(a1, b1, d1, e1, f1),
g1
28
root
Threshold 3
Item Head of node-link
a
b
d
e
f
g
a4
b1
b3
f1
d1
d2
g1
e1
e1
e1
g1
f1
f1
Cond. FP-tree on f

(a1, b1, d1, e1, f1),
g1
(a1, f1),
29
root
Threshold 3
Item Head of node-link
a
b
d
e
f
g
a4
b1
b3
f1
d1
d2
g1
e1
e1
e1
g1
f1
f1
Cond. FP-tree on f

(a1, b1, d1, e1, f1),
g1
(a1, f1),
(b1, d1, e1, f1)
Item Frequency
a
b
d
e
f
g
2
30
root
Threshold 3
Item Head of node-link
a
b
d
e
f
g
a4
b1
b3
f1
d1
d2
g1
e1
e1
e1
g1
f1
f1
Cond. FP-tree on f
3


(f1),
(a1, b1, d1, e1, f1),
g1
(f1),
(a1, f1),
(f1)
(b1, d1, e1, f1)
Item Frequency
a
b
d
e
f
g
Item Frequency
f 3
root
2
2
2
2
3
0
31
root
Threshold 3
Item Head of node-link
a
b
d
e
f
g
a4
b1
b3
f1
d1
d2
g1
e1
e1
e1
g1
f1
f1
Cond. FP-tree on e

g1
Item Frequency
a
b
d
e
f
g
32
root
Threshold 3
Item Head of node-link
a
b
d
e
f
g
a4
b1
b3
f1
d1
d2
g1
e1
e1
e1
g1
f1
f1
Cond. FP-tree on e
3


(a1, b1, d1, e1),
g1
g1
(a1, b1, e1),
(b1, d1, e1)
Item Frequency
a
b
d
e
f
g
Item Frequency
b 3
e 3
root
Item Head of node-link
b
2
3
2
3
0
0
33
root
Threshold 3
Item Head of node-link
a
b
d
e
f
g
a4
b1
b3
f1
d1
d2
g1
e1
e1
e1
g1
f1
f1
Cond. FP-tree on d

g1
Item Frequency
a
b
d
e
f
g
34
root
Threshold 3
Item Head of node-link
a
b
d
e
f
g
a4
b1
b3
f1
d1
d2
g1
e1
e1
e1
g1
f1
f1
Cond. FP-tree on d
3


(a2, b2, d2),
g1
g1
g1
(b1, d1)
Item Frequency
a
b
d
e
f
g
Item Frequency
b 3
d 3
root
Item Head of node-link
b
2
3
3
0
0
0
35
root
Threshold 3
Item Head of node-link
a
b
d
e
f
g
a4
b1
b3
f1
d1
d2
g1
e1
e1
e1
g1
f1
f1
Cond. FP-tree on b

g1
Item Frequency
a
b
d
e
f
g
36
root
Threshold 3
Item Head of node-link
a
b
d
e
f
g
a4
b1
b3
f1
d1
d2
g1
e1
e1
e1
g1
f1
f1
Cond. FP-tree on b
4


(a3, b3),
g1
g1
g1
g1
(b1)
Item Frequency
a
b
d
e
f
g
Item Frequency
a 3
b 4
root
Item Head of node-link
a
3
4
0
0
0
0
37
root
Threshold 3
Item Head of node-link
a
b
d
e
f
g
a4
b1
b3
f1
d1
d2
g1
e1
e1
e1
g1
f1
f1
Cond. FP-tree on a

(a4)
g1
Item Frequency
a
b
d
e
f
g
38
root
Threshold 3
Item Head of node-link
a
b
d
e
f
g
a4
b1
b3
f1
d1
d2
g1
e1
e1
e1
g1
f1
f1
Cond. FP-tree on a
4


(a4)
(a4)
g1
g1
g1
g1
Item Frequency
a
b
d
e
f
g
Item Frequency
a 4
root
4
0
0
0
0
0
39
FP-tree
Step 1 Deduce the ordered frequent items. For
items with the same frequency, the order is given
by the alphabetical order. Step 2 Construct the
FP-tree from the above data Step 3 From the
FP-tree above, construct the FP-conditional tree
for each item (or itemset). Step 4 Determine the
frequent patterns.
40
(No Transcript)
41
root
Item Head of node-link
a
a3
root
42
root
Item Head of node-link
a
a3
root
root
Item Head of node-link
b
b3
43
root
root
Item Head of node-link
b
Item Head of node-link
a
b3
a3
root
root
Item Head of node-link
b
b3
44
root
root
Item Head of node-link
b
Item Head of node-link
a
b3
a3
root
root
Item Head of node-link
a
a3
root
root
Item Head of node-link
b
b3
45
Cond. FP-tree on g
Cond. FP-tree on d
3
3
root
root
1. Before generating this cond. tree, we generate
d (support 3)
1. Before generating this cond. tree, we generate
g (support 3)
Item Head of node-link
b
Item Head of node-link
a
b3
a3
2. After generating this cond. tree, we generate
b, d (support 3)
2. After generating this cond. tree, we generate
a, g (support 3)
Cond. FP-tree on b
4
Cond. FP-tree on f
3
root
1. Before generating this cond. tree, we generate
b (support 4)
1. Before generating this cond. tree, we generate
f (support 3)
root
Item Head of node-link
a
a3
2. After generating this cond. tree, we generate
a, b (support 3)
2. After generating this cond. tree, we do not
generate any itemset.
Cond. FP-tree on a
4
Cond. FP-tree on e
3
root
1. Before generating this cond. tree, we generate
a (support 4)
root
1. Before generating this cond. tree, we generate
e (support 3)
Item Head of node-link
b
b3
2. After generating this cond. tree, we do not
generate any itemset.
2. After generating this cond. tree, we generate
b, e (support 3)
46
Complexity
  • Complexity in building FP-tree
  • Two scans of the transactions DB
  • Collect frequent items
  • Construct the FP-tree
  • Cost to insert one transaction
  • Number of frequent items in this transaction

47
Size of the FP-tree
  • The size of the FP-tree is bounded by the overall
    occurrences of the frequent items in the database

48
Height of the Tree
  • The height of the tree is bounded by the maximum
    number of frequent items in any transaction in
    the database

49
Compression
  • With respect to the total number of items stored,
  • is FP-tree more compressed compared with the
    original databases?

50
Details of the Algorithm
  • Procedure FP-growth (Tree, ??)
  • if Tree contains a single path P
  • for each combination (denoted by ??) of the nodes
    in the path P do
  • generate pattern ? U ? with support minimum
    support of nodes in ?
  • else
  • for each ai in the header table of Tree do
  • generate pattern ? ai U ? with support
    ai.support
  • construct ?s conditional pattern base and then
    ?s conditional FP-tree Tree?
  • if Tree? ?? ?
  • Call FP-growth(Tree?, ?)
Write a Comment
User Comments (0)
About PowerShow.com