Mining Compressed FrequentPattern Sets

About This Presentation

Title:

Description:

Number of Views:68

Avg rating:3.0/5.0

Slides: 23

Provided by: DBL94

Category:

Tags: frequentpattern | compressed | compressing | mining | sets

Transcript and Presenter's Notes

Title: Mining Compressed FrequentPattern Sets

1
Mining Compressed Frequent-Pattern Sets

VLDB2005
2

3
Introduction

4
Two major approaches

5
A Motivating Example

Expression of P1
Support of P1
6
A Motivating Example

Our compressing framework
Clustering frequent patterns by pattern
similarity
Pick a representative pattern for each cluster
Three problems
How to measure the similarity of the patterns
How to define quality guaranteed clusters where
there is a representative pattern best describing
the whole cluster
How to efficiently discover these clusters

8
Problem statement

Distance measure Let P1 and P2 be two closed
patterns. The distance of P1 and P2 is defined
as
Ex Let T(P1)t1, t2, t3, t4, t5,
T(P2)t1, t2, t3, t4, t6, then
D(P1, P2)1-4/61/3

9
Clustering criterion

A pattern P is d-covered by another pattern P if
P can be expressed by P and D(P, P)?d.
A set of patterns form a d-cluster if there
exists a representative pattern Pr such that for
each pattern P in the set, P is d-covered by Pr.

10
Pattern Compressing Problem

Given a transaction database, a min_sup M and the
cluster quality measure d
The pattern compression problem is to find a set
of representative patterns R
For each frequent pattern P, there is a
representative pattern Pr?R which covers P
The value of R is minimized.

11
Discovering Representative Patterns

12
To collect the complete coverage information

To find the set of representative patterns
13

14
RPlocal

Algorithm
Follow the depth-first search in pattern space
Remember all previously discovered representative
patterns
For each pattern P
Not covered yet
Being Visited in the second time which traversal
back from its sons
Select a representative pattern using local
method (with P as new probe pattern)

15
Pattern P
Ps son
Visited patterns covering P
16
Efficient Implementation

(c,a) 111010
f does not belong to (c,a). Support of (c,a) is
same as support of (f,c,a). (c,a) is not closed
17
(No Transcript)
18
Performance study

Comparing algorithms
FPclose an efficient algorithm to generate all
closed itemsets, winner of FIMI workshop 2003
RPglobal first use FPclose to generate closed
itemsets, then use global greedy method to find
representative patterns
RPlocal directly used local method to find
representative patterns from raw data

19
Performance Study