Efficient Algorithms for Mining ShareFrequent Itemsets - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Efficient Algorithms for Mining ShareFrequent Itemsets

Description:

Each item is a binary variable denoting whether an item was purchased. Apriori (Agrawal & Swami, 1994) & Apriori-like algorithms ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 25
Provided by: Lyc112
Category:

less

Transcript and Presenter's Notes

Title: Efficient Algorithms for Mining ShareFrequent Itemsets


1
Efficient Algorithms for Mining Share-Frequent
Itemsets
  • Authors Y. C. Li, J. S. Yeh and C. C. Chang
  • Speaker Yu-Chiang Li
  • Date July 28, 2005

2
Outline
  • Introduction
  • Related Work
  • Enhanced Fast Share Measure (EFSM) Algorithm
  • Support-Counted Fast Share Measure (SuFSM)
    Algorithm
  • Share-Counted Fast Share Measure (ShFSM)
    Algorithm
  • Experimental Results
  • Conclusions

3
Introduction (1/2)
  • Goal discovering the buying patterns of
    customers
  • Itemset a group of items (products) bought
    together in a transaction
  • Support the ratio of transactions containing the
    itemset to the total transaction number (limited
    in informative feedback)
  • Share the ratio of the total count of items in
    the itemset to the total count of items in the
    database

4
Introduction (2/2)
  • Share-confidence framework providing useful
    information about numerical values associated
    with transaction items ( Carter et al., 1997)
  • Share-frequent (SH-frequent) itemset usually
    includes some infrequent subsets
  • Fast Share Measure (FSM) algorithm discovers
    share-frequent itemsets on small dataset
    efficiently
  • This study proposes Enhanced FSM, SuFSM and ShFSM
    to discover share-frequent itemsets more
    efficiently than that of FSM

5
Related Work
  • Support-Confidence Framework (Agrawal et al.,
    1993)
  • Each item is a binary variable denoting whether
    an item was purchased
  • Apriori (Agrawal Swami, 1994) Apriori-like
    algorithms
  • Pattern-growth algorithms (Han et al., 2000 Han
    et al, 2004)
  • Share-Confidence Framework (Carter et al., 1997)
  • Support-confidence framework does not analyze the
    exact number of products purchased
  • The support count method does not measure the
    profit or cost of an itemset
  • Exhaustive search algorithm (Carter et al., 2000)
  • FSM algorithm (Li et al., 2005)

6
Related Work
Apriori algorithm (Agrawal and Srikant, 1994)
minSup 40
7
Share-Confidence Framework
  • Measure value mv(ip, Tq)
  • mv(D, T01) 1
  • mv(C, T03) 3
  • Transaction measure value tmv(Tq)
  • tmv(T02) 9
  • Total measure value Tmv(DB)
  • Tmv(DB)44
  • Itemset measure value imv(X, Tq)
  • imv(A, E, T02)4
  • Local measure value lmv(X)
  • lmv(BC)24511

8
  • Itemset share SH(X)
  • SH(BC)11/4425
  • SH-frequent if SH(X) gt minShare, X is a
    share-frequent (SH-frequent) itemset

minShare30
9
Existing algorithms
  • ZP(Zero Pruning)?ZSP(Zero Subset Pruning)
  • Variants of exhaustive search
  • Prune the candidate itemsets whose local measure
    values are exactly zero
  • FSM(Fast Share Measure) (Li et al., 2005)
  • Fast on a small dataset
  • Generate too many candidates
  • Existing algorithms are inefficient on a large
    datasets

10
ZP Algorithm
11
ZSP Algorithm
12
FSM Fast Share Measure Algorithm
  • ML Maximum transaction length in DB
  • MV Maximum measure value in DB
  • Let min_lmvminShareTmv
  • Let CF(X)FSM lmv(X)(lmv(X)/k)MV (ML-k)
  • If CF(X)FSMlt min_lmv, all supersets of X are
    infrequent

13
FSM Fast Share Measure Algorithm
  • minShare30, ML6, MV3, TMV44
  • min_lmv14
  • Prune X if CF(X)FSM ltmin_lmv
  • Let XA B C
  • CF(X)FSM 3(3/3)3(6-3)12lt14min_lmv

14
Enhanced FSM (EFSM) Algorithm
  • EFSM instead of joining arbitrary two itemsets
    in RCk-1, EFSM joins arbitrary itemset of RCk-1
    with a single item in RC1 to generate Ck
    efficiently
  • Reduce time complexity from O(n2k-2) to O(nk)

15
SuFSM (Support-counted FSM)
  • Xk1 arbitrary superset of X with length k1 in
    DB
  • S(Xk1) the set which contains all Xk1 in DB
  • dbS(Xk1) the set of transactions of which each
    transaction contains at least one Xk1
  • SuFSM and ShFSM from EFSM which prune the
    candidates more efficiently than FSM
  • SuFSM (Support-counted FSM)
  • Theorem 1. If lmv(X)Sup(S(Xk1))MV(ML k)lt
    min_lmv, all supersets of X are infrequent

16
SuFSM (Support-counted FSM)
  • lmv(X)/k Sup(X) Sup(S(Xk1))
  • EX. lmv(BCD)/k15/35, Sup(BCD)3,
    Sup(S(BCDk1))2
  • If there is no superset of X is an SH-frequent
    itemset, then the following three equations hold
  • lmv(X)(lmv(X)/k)MV (ML - k) lt min_lmv
  • lmv(X)Sup(X) MV (ML - k) lt min_lmv
  • lmv(X)Sup(S(Xk1)) MV (ML - k) lt min_lmv

17
ShFSM (Share-counted FSM)
  • dbS(Xk1) the set of transactions of which each
    transaction contains at least one Xk1
  • ShFSM (Share-counted FSM)
  • Theorem 2. If Tmv(dbS(Xk1)) lt min_lmv, all
    supersets of X are infrequent
  • FSMlmv(X)(lmv(X)/k)MV (ML - k) lt min_lmv
  • SuFSMlmv(X)Sup(S(Xk1)) MV (ML - k) lt min_lmv
  • ShFSM Tmv(dbS(Xk1)) lt min_lmv
  • CF(X)FSMgtCF(X)SuFSMgtCF(X)ShFSM

18
  • FSMlmv(X)(lmv(X)/k)MV (ML - k) lt min_lmv
  • SuFSMlmv(X)Sup(S(Xk1)) MV (ML - k) lt min_lmv
  • ShFSM Tmv(dbS(Xk1)) lt min_lmv
  • Ex. X BCD
  • CF(X)FSM 9(9/3)3(6-3)36
  • CF(X)SuFSM 923(6-3)18
  • CF(X)ShFSM 6814

19
ShFSM (Share-counted FSM)
  • Ex. XAB
  • Tmv(dbS(Xk1)) tmv(T01)tmv(T05) 6612 lt14
    min_lmv

20
Experimental Results (1/3)
  • PC Pentium IV 1.5 GHZ, 1.5GB SDRAM, running
    Windows XP professional
  • All algorithms were coded in VC 6.0

Figure 1
Figure 2
21
Experimental Results (2/3)
minShare0.1
Figure 3
Figure 4
22
ExperimentalResults (3/3)
  • T6.I4.D100k.N200.S10
  • minShare 0.1
  • ML20 , MV10
  • Tmv2,302,443

23
Conclusions
  • This study proposes the Enhanced FSM (EFSM)
    algorithm to efficiently reduce the time
    complexity of the join step
  • We have also developed SuFSM and ShFSM from EFSM
  • SuFSM and ShFSM can efficiently prune the
    candidates, and significantly improve the
    performance
  • The experimental results have indicated that
    ShFSM has the best performance
  • In the future, we plan to develop even more
    advanced algorithms to accelerate the process of
    identifying all share-frequent itemsets

24
Thank You
Write a Comment
User Comments (0)
About PowerShow.com