TreeITL-MINE: Mining Frequent Itemsets Using Pattern Growth, Tid Intersection and Prefix Tree

1 / 23
About This Presentation
Title:

TreeITL-MINE: Mining Frequent Itemsets Using Pattern Growth, Tid Intersection and Prefix Tree

Description:

TreeITL-MINE: Mining Frequent Itemsets Using Pattern Growth, ... MUSHROOM. 8124 trans - # of items: 119. Max: 23 items/trans. Performance Study on Mushroom. 18 ... –

Number of Views:67
Avg rating:3.0/5.0
Slides: 24
Provided by: YUD3
Category:

less

Transcript and Presenter's Notes

Title: TreeITL-MINE: Mining Frequent Itemsets Using Pattern Growth, Tid Intersection and Prefix Tree


1
TreeITL-MINE Mining Frequent Itemsets Using
Pattern Growth, Tid Intersection and Prefix Tree
  • Raj P. Gopalan Yudho Giri Sucahyo
  • School of Computing
  • Curtin University of Technology
  • Bentley, Western Australia 6102
  • raj, sucahyoy_at_computing.edu.au

2
Introduction
  • Association rule mining
  • Finds interesting patterns or relationships among
    items in a given data set.
  • Two steps
  • Find the frequent itemsets.
  • Use the result of Step 1 to generate association
    rules.
  • Step 1 computationally very expensive.
  • So, focus of significant research effort.

3
Finding Frequent Itemsets
  • Two general approaches
  • Candidate generation-and-test
  • Apriori (Agrawal et al., 1993) and its variants
  • Pattern Growth Approach
  • FP-Growth (Han et al., 2000), H-Mine (Pei et
    al., 2001)

4
Contributions
  • Present a new data structure called
    Item-TransLink (ITL).
  • Propose a more efficient algorithm (called
    TreeITL-Mine) based on the pattern growth
    approach using prefix tree.
  • Compare the performance with Apriori, H-Mine, and
    ITL-Mine algorithms.

5
Association Rules
  • Given a database of transactions containing
    various items, statements of the form
  • A ? B (10, 80)
  • 80 of transactions that purchase A also purchase
    B and 10 of all transactions contain both of
    them.

6
Binary Representations of Transactions
Support 2
7
ITL Data Structure
  • Various Data Representation
  • Horizontal (Apriori)
  • Vertical (Eclat, Zaki 2000)
  • Combination (FP-Growth, H-Mine)
  • ITL is based on these observations
  • Item identifiers may be mapped to a range of
    integers.
  • Transaction identifiers can be ignored provided
    the items of each transaction are linked together.

8
ITL Data Structure
  • ITEMTABLE
  • Every item, with its support and a link to the
    first occurrence in TransLink.
  • TRANSLINK
  • Every transaction in database, with
  • items in sorted order.
  • Each item has a link to the next
  • occurrence.

9
Transaction Tree
Root
1
3
4
2
5
5
3
5
4
2
3
5
4
5
4
3
5
4
5
5
4
5
5
5
4
5
4
5
5
5
2345
5
10
TreeITL-Mine Algorithm
  • Four steps
  • Identify Frequent Items and Initialize ItemTable
  • Construct Transaction Tree
  • Construct TransLink and attach to ItemTable
  • Mine Frequent Itemsets of 2 or more items.
  • Algorithm details in paper.

11
Example
12
Example
13
Performance Study on Connect-4
  • Connect-4
  • 8124 trans
  • of items 129
  • Max
  • 23 items/trans

14
Performance Study on Pumsb
  • Pumsb
  • 49,046 trans
  • of items 2,087
  • Max
  • 50 items/trans

15
Performance Study on BMS-Web-View-1
  • BMS-Web-View1
  • 59602 trans
  • of items 497
  • Max
  • 2.5 items/trans

16
Performance Study on Chess
  • CHESS
  • 3,196 trans
  • - of items 75
  • Max
  • 37 items/trans

17
Performance Study on Mushroom
  • MUSHROOM
  • 8124 trans
  • - of items 119
  • Max
  • 23 items/trans

18
Comparing with FP-Growth
  • FP-Growth builds an FP-Tree based on the prefix
    tree concept and uses it during the entire mining
    process.
  • TreeITL-Mine uses a modified prefix tree for
    grouping transactions since it is faster than
    sorting the transactions by comparing their items
    lexicographically. The tree is mapped to ITL for
    mining process.

19
Comparing with Eclat
  • Tid-intersection in Eclat creates a tid-list of
    all transactions in which an item occurs.
  • In TreeITL-Mine, each tid in the tid-list
    represents a group of transactions and the count
    of each group.

20
Comparing with H-Mine
  • ITL remains unchanged during mining. H-struct in
    H-Mine continually re-adjusted.
  • TreeITL-Mine builds a TempList and uses
    tid-intersection. H-Mine builds a series of
    header tables linked to H-struct.
  • TreeITL-Mine uses tid-count intersection for
    extending frequent itemsets. H-Mine needs to
    traverse from the beginning of each transaction
    to check the existence of a pattern.
  • Depend on the characteristics of the dataset.
    Grouping transactions using a prefix tree can
    make the number of ITL entries much smaller than
    H-struct.

21
Conclusion
  • A new algorithm (TreeITL-Mine) for discovering
    frequent itemsets were presented.
  • ITL data structure that combines the features of
    both horizontal and vertical data layouts is
    described.
  • The performance of TreeITL-Mine against ITL-Mine,
    Apriori and H-Mine on various data sets is
    presented. The results show that for a number of
    typical datasets and common support levels used,
    TreeITL-Mine outperforms others.

22
Further Work
  • Extend TreeITL-Mine for very large databases.
  • Integrate constraints into TreeITL-Mine.

23
Thank you !!
Write a Comment
User Comments (0)
About PowerShow.com