Parallel Mining of Maximal Frequent Itemsets form Databases - PowerPoint PPT Presentation

1 / 12

About This Presentation

Title:

Parallel Mining of Maximal Frequent Itemsets form Databases

Description:

Number of Views:141

Avg rating:3.0/5.0

Slides: 13

Provided by: Ken124

Category:

more less

Transcript and Presenter's Notes

Title: Parallel Mining of Maximal Frequent Itemsets form Databases

1
Parallel Mining of Maximal Frequent Itemsets form
Databases

Soon M.Chunf and Congnan Luo
Proceedings of the 15th IEEE International
Conference on Tools with Artificial Intelligence
(ICTAI03)

2
Outline

3
Introduction (1)

In mining association rules, the most
time-consuming job is finding all frequent
itemsets from a large database with respect to a
given minimum support
In Apriori, the subset-infrequency based pruning
step prevents many candidate k-itemsets from
being counted in each pass k
In Apriori-like algorithms, if there is a
frequent itemset with length l, then they will
generate and count its 2l subsets.

4
Introduction (2)

Our basic idea is that if we find a large
frequent itemset early, we can avoid counting all
its subsets because they are all frequent
We propose a parallel algorithm, named Parallel
Max-Miner (PMM), for mining maximal frequent
items
The PMM requires multiple passes over the
database, like the Count Distribution algorithm,
need synchronization between nodes at every pass
end

5
Max-Miner algorithm

Unlike Apriori, the Max-Miner algorithm extracts
only the maximal frequent itemset
Superset-frequency based pruning
Max-miner always attempts to look ahead in order
to identify large frequent itemsets early
So all subsets of these discovered frequent
itemsets can be pruned form the search space

6
Set-enumeration tree of Max-Miner (1)
7
Set-enumeration tree of Max-Miner (2)

Each node in the tree is called a candidate group
A candidate group g consists of two components
which are actually two itemsets
The first itemset is called the head of the group
and denoted by h(g)
The second itemset is called the tail of the
group and denoted by t(g)
t(g) is an ordered set and contains all the items
not in h(g) but can potentially appear in any
subnode derived from node g

8
The main procedure of Max-Miner (1)

From the root of the tree at level 0, count the
support of 1-itemsets.
Only the 1-itemsets which are frequent can be
enumerated at level 1
4 nodes are generated at level 1 if 1, 2, 3, and
4 are all frequent 1-itemsets
For the node g1, we need to count the support of
h(g1) t(g1)1,2,3,4
If the support of h(g1) t(g1) is equal or
greater than minsup, then we do not need to
expand the tree from the node g1 anymore

9
The main procedure of Max-Miner (2)

At any node g, if h(g) t(g) is not frequent,
for each item I in t(g), we check if h(g) i
is frequent
If h(g) i is frequent, a corresponding
subnode is generated
We notice that for a candidate group node g, if
an item appears last in the tail of g in
ordering, it will appear in most offsprings of
the node g
To discover the maximal frequent itemsets early,
we better order the subnodes of each node in
ascending order of their support

10
Parallel Max-Miner (PMM) algorithm

The database is evenly divided into N partitions
D0, D1, D2, , DN-1, one for each of the N
nodes P0, P1, P2, , PN-1
Each node has the same number of transactions
allocated
PMM requires multiple passes over database
For each pass k, all the nodes have exactly the
same set of candidate groups, Ck.
Each node count the support of Ck in local
database, independently
At the end of each pass, all nodes exchange the
count information so that they can generate the
same set of Ck-1 for the next pass

11
Performance Evaluation
Speedup of PMM
Sizeup of PMM
12
Conclusion

We proposed a parallel maximal frequent itemset
mining algorithm, Parallel Max-Miner, for
shared-nothing multiprocessor systems
Drawback quire synchronization between nodes to
exchange the count information at the end of
every pass

Write a Comment

User Comments (0)