Fast Frequent Itemset Mining using Compressed Data Representation - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Fast Frequent Itemset Mining using Compressed Data Representation

Description:

Compare the performance with Apriori, Eclat (Zaki 2000), FP-Growth algorithms. Contributions ... Eclat: Vertical representation. Uses tid-intersection in its ... – PowerPoint PPT presentation

Number of Views:241
Avg rating:3.0/5.0
Slides: 21
Provided by: YUD3
Category:

less

Transcript and Presenter's Notes

Title: Fast Frequent Itemset Mining using Compressed Data Representation


1
Fast Frequent Itemset Mining using Compressed
Data Representation
  • Raj P. Gopalan Yudho Giri Sucahyo
  • School of Computing
  • Curtin University of Technology
  • Bentley, Western Australia 6102
  • raj, sucahyoy_at_computing.edu.au

2
Outline
  • Introduction
  • Data Structure
  • Running Examples
  • Performance Study
  • Comparison with other algorithms
  • Further Work
  • Conclusion

3
Introduction
  • Association rule mining
  • Finds interesting patterns or relationships among
    items in a given data set.
  • Two steps
  • Find the frequent itemsets.
  • Use the result of Step 1 to generate association
    rules.
  • Step 1 computationally very expensive.
  • So, focus of significant research effort.

4
Finding Frequent Itemsets
  • Two general approaches
  • Candidate generation-and-test
  • Apriori (Agrawal et al., SIGMOD93) and its
    variants
  • Pattern Growth Approach
  • FP-Growth (Han et al., SIGMOD00)
    H-Mine (Pei et al., ICDM01)
  • Data Structures
  • Array based H-struct in H-Mine
  • Tree based FP-Tree in FP-Growth

5
Contributions
  • Propose a new algorithm (named CT-Mine) for
    mining complete frequent itemsets directly from
    the compressed prefix tree.
  • Compare the performance with Apriori, Eclat (Zaki
    2000), FP-Growth algorithms.

6
Association Rules
  • Given a database of transactions containing
    various items, statements of the form
  • A ? B (10, 80)
  • 80 of transactions that purchase A also purchase
    B and 10 of all transactions contain both of
    them.

7
Transaction Tree
8
CT-Mine Data Structure
  • ITEMTABLE
  • Every item, with its support and a pointer to
    the root of the subtree of the item.
  • COMPRESSED TRANSACTION TREE
  • All transactions of the database containing
    frequent items. Only frequent items will be
    stored in the tree.

9
CT-Mine Algorithm
  • Three steps
  • Identify Frequent Items and Initialize ItemTable
  • Construct the Compressed Transaction Tree
  • Mining
  • Algorithm details in paper.

10
Example
11
Example
  • 5 (5), 45 (4), 345 (3), 2345 (1), 12345 (0),

12
Performance Study on Connect-4
  • Connect-4
  • 67,557 trans
  • of items 129
  • Avg
  • 43 items/trans

13
Performance Study on Pumsb
  • Pumsb
  • 49,046 trans
  • of items 2,087
  • Avg
  • 50 items/trans

14
Performance Study on Chess
  • CHESS
  • 3,196 trans
  • - of items 75
  • Avg
  • 37 items/trans

15
Comparing with Apriori
  • Apriori 4.04
  • also uses a prefix tree
  • suffers from poor performance since it has to
    traverse the database many times to test the
    support of candidate itemsets.
  • CT-Mine
  • follows the pattern growth approach

16
Comparing with Eclat
  • Eclat
  • Vertical representation
  • Uses tid-intersection in its mining process.
  • There is no compression scheme in Eclat.

17
Comparing with FP-Growth
  • FP-Growth uses FP-Tree.
  • FP-Tree is used to group transactions.
  • No compression scheme to combine identical
    subtrees as in CT-Mine.
  • Number of nodes in CT-Mine will be less by up to
    half of the nodes.
  • CT-Mine does not need to build conditional
    FP-Tree.

18
Further Work
  • Extend CT-Mine for very large databases.
  • Integrate constraints into CT-Mine.

19
Conclusion
  • A new algorithm (CT-Mine) for efficient discovery
    of frequent itemsets has presented.
  • A compressed prefix tree is used for efficient
    memory usage.
  • The performance of CT-Mine against Apriori, Eclat
    and FP-Growth on various data sets is presented.
    The results show that CT-Mine outperformed all
    others at most of the common support levels used
    in mining.

20
Thank you !!
http//www.cs.curtin.edu.au/sucahyoy
Write a Comment
User Comments (0)
About PowerShow.com