Title: Efficient Algorithms for Mining ShareFrequent Itemsets
1Efficient Algorithms for Mining Share-Frequent
Itemsets
- Authors Y. C. Li, J. S. Yeh and C. C. Chang
- Speaker Yu-Chiang Li
- Date July 28, 2005
2Outline
- Introduction
- Related Work
- Enhanced Fast Share Measure (EFSM) Algorithm
- Support-Counted Fast Share Measure (SuFSM)
Algorithm - Share-Counted Fast Share Measure (ShFSM)
Algorithm - Experimental Results
- Conclusions
3Introduction (1/2)
- Goal discovering the buying patterns of
customers - Itemset a group of items (products) bought
together in a transaction - Support the ratio of transactions containing the
itemset to the total transaction number (limited
in informative feedback) - Share the ratio of the total count of items in
the itemset to the total count of items in the
database
4Introduction (2/2)
- Share-confidence framework providing useful
information about numerical values associated
with transaction items ( Carter et al., 1997) - Share-frequent (SH-frequent) itemset usually
includes some infrequent subsets - Fast Share Measure (FSM) algorithm discovers
share-frequent itemsets on small dataset
efficiently - This study proposes Enhanced FSM, SuFSM and ShFSM
to discover share-frequent itemsets more
efficiently than that of FSM
5Related Work
- Support-Confidence Framework (Agrawal et al.,
1993) - Each item is a binary variable denoting whether
an item was purchased - Apriori (Agrawal Swami, 1994) Apriori-like
algorithms - Pattern-growth algorithms (Han et al., 2000 Han
et al, 2004) - Share-Confidence Framework (Carter et al., 1997)
- Support-confidence framework does not analyze the
exact number of products purchased - The support count method does not measure the
profit or cost of an itemset - Exhaustive search algorithm (Carter et al., 2000)
- FSM algorithm (Li et al., 2005)
6Related Work
Apriori algorithm (Agrawal and Srikant, 1994)
minSup 40
7Share-Confidence Framework
- Measure value mv(ip, Tq)
- mv(D, T01) 1
- mv(C, T03) 3
- Transaction measure value tmv(Tq)
- tmv(T02) 9
- Total measure value Tmv(DB)
- Tmv(DB)44
- Itemset measure value imv(X, Tq)
- imv(A, E, T02)4
- Local measure value lmv(X)
- lmv(BC)24511
8- Itemset share SH(X)
- SH(BC)11/4425
- SH-frequent if SH(X) gt minShare, X is a
share-frequent (SH-frequent) itemset
minShare30
9Existing algorithms
- ZP(Zero Pruning)?ZSP(Zero Subset Pruning)
- Variants of exhaustive search
- Prune the candidate itemsets whose local measure
values are exactly zero - FSM(Fast Share Measure) (Li et al., 2005)
- Fast on a small dataset
- Generate too many candidates
- Existing algorithms are inefficient on a large
datasets
10ZP Algorithm
11ZSP Algorithm
12FSM Fast Share Measure Algorithm
- ML Maximum transaction length in DB
- MV Maximum measure value in DB
- Let min_lmvminShareTmv
- Let CF(X)FSM lmv(X)(lmv(X)/k)MV (ML-k)
- If CF(X)FSMlt min_lmv, all supersets of X are
infrequent
13FSM Fast Share Measure Algorithm
- minShare30, ML6, MV3, TMV44
- min_lmv14
- Prune X if CF(X)FSM ltmin_lmv
- Let XA B C
- CF(X)FSM 3(3/3)3(6-3)12lt14min_lmv
14Enhanced FSM (EFSM) Algorithm
- EFSM instead of joining arbitrary two itemsets
in RCk-1, EFSM joins arbitrary itemset of RCk-1
with a single item in RC1 to generate Ck
efficiently - Reduce time complexity from O(n2k-2) to O(nk)
15SuFSM (Support-counted FSM)
- Xk1 arbitrary superset of X with length k1 in
DB - S(Xk1) the set which contains all Xk1 in DB
- dbS(Xk1) the set of transactions of which each
transaction contains at least one Xk1 - SuFSM and ShFSM from EFSM which prune the
candidates more efficiently than FSM - SuFSM (Support-counted FSM)
- Theorem 1. If lmv(X)Sup(S(Xk1))MV(ML k)lt
min_lmv, all supersets of X are infrequent
16SuFSM (Support-counted FSM)
- lmv(X)/k Sup(X) Sup(S(Xk1))
- EX. lmv(BCD)/k15/35, Sup(BCD)3,
Sup(S(BCDk1))2 - If there is no superset of X is an SH-frequent
itemset, then the following three equations hold - lmv(X)(lmv(X)/k)MV (ML - k) lt min_lmv
- lmv(X)Sup(X) MV (ML - k) lt min_lmv
- lmv(X)Sup(S(Xk1)) MV (ML - k) lt min_lmv
17ShFSM (Share-counted FSM)
- dbS(Xk1) the set of transactions of which each
transaction contains at least one Xk1 - ShFSM (Share-counted FSM)
- Theorem 2. If Tmv(dbS(Xk1)) lt min_lmv, all
supersets of X are infrequent - FSMlmv(X)(lmv(X)/k)MV (ML - k) lt min_lmv
- SuFSMlmv(X)Sup(S(Xk1)) MV (ML - k) lt min_lmv
- ShFSM Tmv(dbS(Xk1)) lt min_lmv
- CF(X)FSMgtCF(X)SuFSMgtCF(X)ShFSM
18- FSMlmv(X)(lmv(X)/k)MV (ML - k) lt min_lmv
- SuFSMlmv(X)Sup(S(Xk1)) MV (ML - k) lt min_lmv
- ShFSM Tmv(dbS(Xk1)) lt min_lmv
- Ex. X BCD
- CF(X)FSM 9(9/3)3(6-3)36
- CF(X)SuFSM 923(6-3)18
- CF(X)ShFSM 6814
19ShFSM (Share-counted FSM)
- Ex. XAB
- Tmv(dbS(Xk1)) tmv(T01)tmv(T05) 6612 lt14
min_lmv
20Experimental Results (1/3)
- PC Pentium IV 1.5 GHZ, 1.5GB SDRAM, running
Windows XP professional - All algorithms were coded in VC 6.0
Figure 1
Figure 2
21Experimental Results (2/3)
minShare0.1
Figure 3
Figure 4
22ExperimentalResults (3/3)
- T6.I4.D100k.N200.S10
- minShare 0.1
- ML20 , MV10
- Tmv2,302,443
23Conclusions
- This study proposes the Enhanced FSM (EFSM)
algorithm to efficiently reduce the time
complexity of the join step - We have also developed SuFSM and ShFSM from EFSM
- SuFSM and ShFSM can efficiently prune the
candidates, and significantly improve the
performance - The experimental results have indicated that
ShFSM has the best performance - In the future, we plan to develop even more
advanced algorithms to accelerate the process of
identifying all share-frequent itemsets
24Thank You