Mining Frequent patterns without candidate generation - PowerPoint PPT Presentation

About This Presentation
Title:

Mining Frequent patterns without candidate generation

Description:

It may need to scan the database repeatedly and check for the frequencies of the candidates. ... Updating the FP-tree after each new transaction may be costly. ... – PowerPoint PPT presentation

Number of Views:306
Avg rating:3.0/5.0
Slides: 19
Provided by: present421
Learn more at: http://alumni.cs.ucr.edu
Category:

less

Transcript and Presenter's Notes

Title: Mining Frequent patterns without candidate generation


1
Mining Frequent patterns without candidate
generation
  • Jiawei Han,
  • Jian Pei
  • and
  • Yiwen Yin

Abdullah Mueen
2
Problem Mining Frequent Pattern
  • Ia1, a1, , am is a set of items.
  • DBT1, T1, , Tn is the database of
    transactions where each transaction is a non
    empty subset of I.
  • A pattern is also a subset of I.
  • A pattern is frequent if it is contained in
    (supported by) more than a fixed number (?) of
    transactions.

3
Previous work Apriori
  • It may need to generate a huge number of
    candidate itemsets. To discover a frequent
    pattern of size k it needs to generate more than
    2k candidates in total.
  • It may need to scan the database repeatedly and
    check for the frequencies of the candidates.

4
FP-growth
  • FP-growth mines frequent patterns without
    generating the candidate sets. It grows the
    patterns from fragments.
  • It builds an extended prefix tree (FP-tree) for
    the transaction database. This tree is a
    compressed representation of the database. It
    saves repeated scan of the database.

5
FP-tree
TID Items Bought Frequent Items
100 f,a,c,d,g,i,m,p f,c,a,m,p
200 a,b,c,f,l,m,o f,c,a,b,m
300 b,f,h,j,o f,b
400 b,c,k,s,p c,b,p
500 a,f,c,e,l,p,m,n f,c,a,m,p
Minimum support (?) 3
sorted in descending order of the freq.
6
Conditional FP-tree of p
Minimum support (?) 3
Items Bought Frequent Items
f,c,a,m c
f,c,a,m c
c,b c
Conditional FP-tree of p
Conditional pattern base for p
The set of frequent patterns containing p is
cp , p
p
7
Frequent patterns containing m
Items Freq. Items
f,c f,c
f,c f,c
f,c f,c
Items Bought Frequent Items
f,c,a f,c,a
f,c,a f,c,a,
f,c,a,b f,c,a
Conditional pattern base for am
Conditional pattern base for m
Conditional FP-tree of m
Items Freq. Items
f f
f f
f f
Conditional FP-tree of cam
Conditional FP-tree of am
The set of frequent patterns containing m is
pattern base for cam
m, am, cam, fcam, fam
m, am, cam, fcam, fam, cm, fcm
m, am, cam, fcam, fam, cm, fcm, fm
m
m, am
m, am, cam
m, am, cam, fcam
8
Complete Frequent Pattern set
Generated by conditional FP tree of m which is a
single Path
  • A single path generates each combination of its
    nodes as frequent pattern
  • Supports for a pattern is equal to the minimum
    support of a node in it.

9
Pseudocode
  • Procedure FP-growth(Tree,a)
  • if Tree contains a single path P
  • for each combination (ß) of the nodes in P
  • Generate pattern ßUa with support minimum
    support of a node in ß
  • else
  • for each ai in the header of Tree do
  • Generate pattern ß aUai with support
    ai.support.
  • Construct ßs conditional pattern base and
    conditional FP-tree Treeß
  • if Treeß ? Ø
  • Call FP-growth(Treeß, ß)

10
Implementation issues
  • For different support thresholds (?) there are
    different FP-trees. We may chose ?20 if 98 of
    the queries have ?20.
  • Updating the FP-tree after each new transaction
    may be costly. We may count the occurrence
    frequency of every items and update the tree if
    relative frequency of an item gets a large change.

11
New Challenges
  • FP-growth may output a large number of frequent
    patterns for small (?) and very small number of
    frequent patterns for large (?). We may not know
    the (?) for our purpose.
  • Which frequent patterns are good instances for
    generating interesting association rules?

12
Top-K frequent closed patterns
  • Closed pattern is a pattern whose support is
    larger than any of its super pattern.

TID Items Bought Frequent Items
100 f,a,c,d,g,i,m,p f,c,a,m,p
200 a,b,c,f,l,m,o f,c,a,b,m
300 b,f,h,j,o f,b
400 b,c,k,s,p f,c,b,p
500 a,f,c,e,l,p,m,n f,c,a,m,p
  • We can also specify the minimum length of the
    patterns.
  • Top-2 frequent closed patterns with length 2
    is fc and fcam

13
Mining Top-K closed FP
  • The algorithm starts with an FP-tree having 0
    support threshold.
  • While building the tree, it prunes the smaller
    patterns with length lt min_length.
  • After the tree is built, it prunes the relatively
    infrequent patterns by raising the support
    threshold.
  • Mining is performed on the final pruned FP-tree.

14
Compressed Frequent Pattern
  • FP-growth may end up with a large set of
    patterns.
  • We can compress the set of frequent patterns by
    clustering it minimally and selecting a
    representative pattern from each cluster.

fcam, cam, ap, b
15
Clustering Criterion
  • For each cluster there must be a representative
    pattern Pr .
  • D(P,Pr ) d for all patterns inside the cluster
    of Pr .
  • D(P1,P2 ) 1- T(P1)nT(P2) T(P1)UT(P2)
  • T(P) is the set of transactions that support P.
  • D is a metric for closed patterns.

16
Summary
  • FP-tree is an extended prefix tree that
    summarizes the database in a compressed form.
  • FP-growth is an algorithm for mining frequent
    patterns using FP-tree.
  • FP-tree can also be used to mine Top-K frequent
    closed patterns and Compressed frequent patterns.

17
References
  • Mining Frequent Patterns without Candidate
    Generation
  • Jiawei Han, Jian Pei and Yiwen Yin
  • Mining Top-K Frequent Closed Patterns without
    Minimum Support
  • Jiawei Han, Jianyong Wang, Ying Lu and Petre
    Tzetkov
  • Mining Compressed Frequent-Pattern Sets
  • Dong Xin, Jiawei Han, Xipheng Yan and Hong Cheng

18
Thank You
Write a Comment
User Comments (0)
About PowerShow.com