Efficient Algorithms for Mining ShareFrequent Itemsets - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Efficient Algorithms for Mining ShareFrequent Itemsets

Description:

Each item is a binary variable denoting whether an item was purchased. Apriori (Agrawal & Swami, 1994) & Apriori-like algorithms ... – PowerPoint PPT presentation

Number of Views:52

Avg rating:3.0/5.0

Slides: 25

Provided by: Lyc112

Category:

more less

Transcript and Presenter's Notes

Title: Efficient Algorithms for Mining ShareFrequent Itemsets

1
Efficient Algorithms for Mining Share-Frequent
Itemsets

Authors Y. C. Li, J. S. Yeh and C. C. Chang
Speaker Yu-Chiang Li
Date July 28, 2005

2
Outline

Introduction
Related Work
Enhanced Fast Share Measure (EFSM) Algorithm
Support-Counted Fast Share Measure (SuFSM)
Algorithm
Share-Counted Fast Share Measure (ShFSM)
Algorithm
Experimental Results
Conclusions

3
Introduction (1/2)

Goal discovering the buying patterns of
customers
Itemset a group of items (products) bought
together in a transaction
Support the ratio of transactions containing the
itemset to the total transaction number (limited
in informative feedback)
Share the ratio of the total count of items in
the itemset to the total count of items in the
database

4
Introduction (2/2)

Share-confidence framework providing useful
information about numerical values associated
with transaction items ( Carter et al., 1997)
Share-frequent (SH-frequent) itemset usually
includes some infrequent subsets
Fast Share Measure (FSM) algorithm discovers
share-frequent itemsets on small dataset
efficiently
This study proposes Enhanced FSM, SuFSM and ShFSM
to discover share-frequent itemsets more
efficiently than that of FSM

5
Related Work

Support-Confidence Framework (Agrawal et al.,
1993)
Each item is a binary variable denoting whether
an item was purchased
Apriori (Agrawal Swami, 1994) Apriori-like
algorithms
Pattern-growth algorithms (Han et al., 2000 Han
et al, 2004)
Share-Confidence Framework (Carter et al., 1997)
Support-confidence framework does not analyze the
exact number of products purchased
The support count method does not measure the
profit or cost of an itemset
Exhaustive search algorithm (Carter et al., 2000)
FSM algorithm (Li et al., 2005)

6
Related Work
Apriori algorithm (Agrawal and Srikant, 1994)
minSup 40
7
Share-Confidence Framework

Measure value mv(ip, Tq)
mv(D, T01) 1
mv(C, T03) 3
Transaction measure value tmv(Tq)
tmv(T02) 9
Total measure value Tmv(DB)
Tmv(DB)44
Itemset measure value imv(X, Tq)
imv(A, E, T02)4
Local measure value lmv(X)
lmv(BC)24511

Itemset share SH(X)
SH(BC)11/4425
SH-frequent if SH(X) gt minShare, X is a
share-frequent (SH-frequent) itemset

minShare30
9
Existing algorithms

ZP(Zero Pruning)?ZSP(Zero Subset Pruning)
Variants of exhaustive search
Prune the candidate itemsets whose local measure
values are exactly zero
FSM(Fast Share Measure) (Li et al., 2005)
Fast on a small dataset
Generate too many candidates
Existing algorithms are inefficient on a large
datasets

10
ZP Algorithm
11
ZSP Algorithm
12
FSM Fast Share Measure Algorithm

ML Maximum transaction length in DB
MV Maximum measure value in DB
Let min_lmvminShareTmv
Let CF(X)FSM lmv(X)(lmv(X)/k)MV (ML-k)
If CF(X)FSMlt min_lmv, all supersets of X are
infrequent

13
FSM Fast Share Measure Algorithm

minShare30, ML6, MV3, TMV44
min_lmv14
Prune X if CF(X)FSM ltmin_lmv
Let XA B C
CF(X)FSM 3(3/3)3(6-3)12lt14min_lmv

14
Enhanced FSM (EFSM) Algorithm

EFSM instead of joining arbitrary two itemsets
in RCk-1, EFSM joins arbitrary itemset of RCk-1
with a single item in RC1 to generate Ck
efficiently
Reduce time complexity from O(n2k-2) to O(nk)

15
SuFSM (Support-counted FSM)

Xk1 arbitrary superset of X with length k1 in
DB
S(Xk1) the set which contains all Xk1 in DB
dbS(Xk1) the set of transactions of which each
transaction contains at least one Xk1
SuFSM and ShFSM from EFSM which prune the
candidates more efficiently than FSM
SuFSM (Support-counted FSM)
Theorem 1. If lmv(X)Sup(S(Xk1))MV(ML k)lt
min_lmv, all supersets of X are infrequent

16
SuFSM (Support-counted FSM)

lmv(X)/k Sup(X) Sup(S(Xk1))
EX. lmv(BCD)/k15/35, Sup(BCD)3,
Sup(S(BCDk1))2
If there is no superset of X is an SH-frequent
itemset, then the following three equations hold
lmv(X)(lmv(X)/k)MV (ML - k) lt min_lmv
lmv(X)Sup(X) MV (ML - k) lt min_lmv
lmv(X)Sup(S(Xk1)) MV (ML - k) lt min_lmv

17
ShFSM (Share-counted FSM)

dbS(Xk1) the set of transactions of which each
transaction contains at least one Xk1
ShFSM (Share-counted FSM)
Theorem 2. If Tmv(dbS(Xk1)) lt min_lmv, all
supersets of X are infrequent
FSMlmv(X)(lmv(X)/k)MV (ML - k) lt min_lmv
SuFSMlmv(X)Sup(S(Xk1)) MV (ML - k) lt min_lmv
ShFSM Tmv(dbS(Xk1)) lt min_lmv
CF(X)FSMgtCF(X)SuFSMgtCF(X)ShFSM

FSMlmv(X)(lmv(X)/k)MV (ML - k) lt min_lmv
SuFSMlmv(X)Sup(S(Xk1)) MV (ML - k) lt min_lmv
ShFSM Tmv(dbS(Xk1)) lt min_lmv
Ex. X BCD
CF(X)FSM 9(9/3)3(6-3)36
CF(X)SuFSM 923(6-3)18
CF(X)ShFSM 6814

19
ShFSM (Share-counted FSM)

Ex. XAB
Tmv(dbS(Xk1)) tmv(T01)tmv(T05) 6612 lt14
min_lmv

20
Experimental Results (1/3)

PC Pentium IV 1.5 GHZ, 1.5GB SDRAM, running
Windows XP professional
All algorithms were coded in VC 6.0

Figure 1
Figure 2
21
Experimental Results (2/3)
minShare0.1
Figure 3
Figure 4
22
ExperimentalResults (3/3)

T6.I4.D100k.N200.S10
minShare 0.1
ML20 , MV10
Tmv2,302,443

23
Conclusions

This study proposes the Enhanced FSM (EFSM)
algorithm to efficiently reduce the time
complexity of the join step
We have also developed SuFSM and ShFSM from EFSM
SuFSM and ShFSM can efficiently prune the
candidates, and significantly improve the
performance
The experimental results have indicated that
ShFSM has the best performance
In the future, we plan to develop even more
advanced algorithms to accelerate the process of
identifying all share-frequent itemsets

24
Thank You

Write a Comment

User Comments (0)