TreeITL-MINE: Mining Frequent Itemsets Using Pattern Growth, Tid Intersection and Prefix Tree

1 / 23

About This Presentation

Title:

TreeITL-MINE: Mining Frequent Itemsets Using Pattern Growth, Tid Intersection and Prefix Tree

Description:

TreeITL-MINE: Mining Frequent Itemsets Using Pattern Growth, ... MUSHROOM. 8124 trans - # of items: 119. Max: 23 items/trans. Performance Study on Mushroom. 18 ... –

Number of Views:67

Avg rating:3.0/5.0

Slides: 24

Provided by: YUD3

Category:

more less

Transcript and Presenter's Notes

Title: TreeITL-MINE: Mining Frequent Itemsets Using Pattern Growth, Tid Intersection and Prefix Tree

1
TreeITL-MINE Mining Frequent Itemsets Using
Pattern Growth, Tid Intersection and Prefix Tree

Raj P. Gopalan Yudho Giri Sucahyo
School of Computing
Curtin University of Technology
Bentley, Western Australia 6102
raj, sucahyoy_at_computing.edu.au

2
Introduction

Association rule mining
Finds interesting patterns or relationships among
items in a given data set.
Two steps
Find the frequent itemsets.
Use the result of Step 1 to generate association
rules.
Step 1 computationally very expensive.
So, focus of significant research effort.

3
Finding Frequent Itemsets

Two general approaches
Candidate generation-and-test
Apriori (Agrawal et al., 1993) and its variants
Pattern Growth Approach
FP-Growth (Han et al., 2000), H-Mine (Pei et
al., 2001)

4
Contributions

Present a new data structure called
Item-TransLink (ITL).
Propose a more efficient algorithm (called
TreeITL-Mine) based on the pattern growth
approach using prefix tree.
Compare the performance with Apriori, H-Mine, and
ITL-Mine algorithms.

5
Association Rules

Given a database of transactions containing
various items, statements of the form
A ? B (10, 80)
80 of transactions that purchase A also purchase
B and 10 of all transactions contain both of
them.

6
Binary Representations of Transactions
Support 2
7
ITL Data Structure

Various Data Representation
Horizontal (Apriori)
Vertical (Eclat, Zaki 2000)
Combination (FP-Growth, H-Mine)
ITL is based on these observations
Item identifiers may be mapped to a range of
integers.
Transaction identifiers can be ignored provided
the items of each transaction are linked together.

8
ITL Data Structure

ITEMTABLE
Every item, with its support and a link to the
first occurrence in TransLink.

TRANSLINK
Every transaction in database, with
items in sorted order.
Each item has a link to the next
occurrence.

9
Transaction Tree
Root
1
3
4
2
5
5
3
5
4
2
3
5
4
5
4
3
5
4
5
5
4
5
5
5
4
5
4
5
5
5
2345
5
10
TreeITL-Mine Algorithm

Four steps
Identify Frequent Items and Initialize ItemTable
Construct Transaction Tree
Construct TransLink and attach to ItemTable
Mine Frequent Itemsets of 2 or more items.
Algorithm details in paper.

11
Example
12
Example
13
Performance Study on Connect-4

Connect-4
8124 trans
of items 129
Max
23 items/trans

14
Performance Study on Pumsb

Pumsb
49,046 trans
of items 2,087
Max
50 items/trans

15
Performance Study on BMS-Web-View-1

BMS-Web-View1
59602 trans
of items 497
Max
2.5 items/trans

16
Performance Study on Chess

CHESS
3,196 trans
- of items 75
Max
37 items/trans

17
Performance Study on Mushroom

MUSHROOM
8124 trans
- of items 119
Max
23 items/trans

18
Comparing with FP-Growth

FP-Growth builds an FP-Tree based on the prefix
tree concept and uses it during the entire mining
process.
TreeITL-Mine uses a modified prefix tree for
grouping transactions since it is faster than
sorting the transactions by comparing their items
lexicographically. The tree is mapped to ITL for
mining process.

19
Comparing with Eclat

Tid-intersection in Eclat creates a tid-list of
all transactions in which an item occurs.
In TreeITL-Mine, each tid in the tid-list
represents a group of transactions and the count
of each group.

20
Comparing with H-Mine

ITL remains unchanged during mining. H-struct in
H-Mine continually re-adjusted.
TreeITL-Mine builds a TempList and uses
tid-intersection. H-Mine builds a series of
header tables linked to H-struct.
TreeITL-Mine uses tid-count intersection for
extending frequent itemsets. H-Mine needs to
traverse from the beginning of each transaction
to check the existence of a pattern.
Depend on the characteristics of the dataset.
Grouping transactions using a prefix tree can
make the number of ITL entries much smaller than
H-struct.

21
Conclusion

A new algorithm (TreeITL-Mine) for discovering
frequent itemsets were presented.
ITL data structure that combines the features of
both horizontal and vertical data layouts is
described.
The performance of TreeITL-Mine against ITL-Mine,
Apriori and H-Mine on various data sets is
presented. The results show that for a number of
typical datasets and common support levels used,
TreeITL-Mine outperforms others.

22
Further Work