Mining Frequent Patterns in Data Streams at Multiple Time Granularities

1 / 33

About This Presentation

Title:

Mining Frequent Patterns in Data Streams at Multiple Time Granularities

Description:

Mine in batches (windows) and use a single-pass algorithm ... Randomly permutates 200 table entries every five batches. 3 Data sets: Length 3, 5 & 7 ... –

Number of Views:441

Avg rating:3.0/5.0

Slides: 34

Provided by: awi88

Category:

more less

Transcript and Presenter's Notes

Title: Mining Frequent Patterns in Data Streams at Multiple Time Granularities

1
Mining Frequent Patterns in Data Streams at
Multiple Time Granularities

C.Giannella, J. Han, J. Pei,
X. Yan and P. Yu

Pele Williams
2
Presentation Outline

Objective and Definition of Key terms
Problem Definition Research Applications
The FP-Stream Structure
Pruning Techniques
Algorithm
Experiments, Results Conclusion

3
Objective

An extension of frequent pattern mining to Data
Streams
Propose the FP-stream structure which enables
the dynamic mining of time-sensitive
frequent patterns

4
Terms Frequent Pattern Mining

An Association Rule Mining technique that is
used to obtain all frequent itemsets (from
items in transactions) for a given minimum
support threshold .

5
Example Frequent Pattern Mining
Pattern Tree (r3) c ( r4) s
Database 01 s c r a 02 s r n 03 n r c a
04 r s 05 r s c
Pattern Base c r s (2) c r (1)
s r(4)
Support 0.6
Frequent Pattern Tree
r5 s4
c1 c2
Rules c r s r
Header Table Item Count r 5 s
4 c 3
6
Problem Definition Solutions (1 of 2)

How can Frequent Pattern Mining be applied to
Data Streams?
- Mine in batches (windows) and use a
single-pass algorithm
How can the completeness of frequent patterns in
data streams be ensured?
- Store a bit more information than the
currently frequent items

7
Problem Definition Solutions (1 of 2)

How can the entire stream be mined within limited
time and memory contraints?
- Retain the counts of frequent and sub-frequent
patterns and prune regularly
How can time-sensitive Frequent Patterns be
mined ?
- Use a time-tilted window

8
Applicable Queries

What is the frequent pattern set over the period
t2 and t3
What are the periods when (a, b) is frequent?
Does the support of a change dramatically in the
period from t3 to t0

9
Research Applications

Mining Alarming Incidents (MAIDS)
Network Traffic Analysis (Intrusion Detection)
Shopping trends
Dynamic Tracing of Stock Fluctuation
Sensor Network Data Analysis
Web Click Stream mining

10
The FP- Stream Structure
This object is copied from the original paper
11
Tilted-Time Windows
This object is copied from the original paper
12
Time Granularity Error

For a time window request consisting of h
batches
Time Granularity Error h - h h/2
Example Last 24 batches 6 hours
Gives the last 31 batches 7 24/2

13
Intermediate Buffer Windows

A temporary location to store batch counts at
different granularity levels. E.g.
B 8 ? f(8, 8) f(7, 7) f(6, 5) f(4, 1) .
B 9 ? f(9, 9) f(8, 8) f(7, 7) f(6, 5)
f(4, 1) .
B 10 ? f(10, 10) f(9, 9) f(8, 7) f(6, 5)
f(4, 1).

14
Intermediate Buffer Windows

B 11? f(11, 11) f(10, 10) f(9, 9) f(8, 7)
f(6, 5) f(4, 1).
B 12? f(12, 12) f(11, 11) f(10, 9) f(8, 5)
f(4, 1).
The size of the titled-time window can grow no
larger than 2log2N) 2

15
Frequent Pattern Pruning

Tail Pruning Drop the counts of the last m to n
windows if the sum of their frequency is lt ?
fI(T) ?W fI(T) fI(T)
Type I Pruning Supersets of an infrequent
pattern are also infrequent
Type II Pruning Drop the tails (or the entire
pattern) of the all supersets if the subset has
been dropped.

This equation is copied from the original paper
16
Algorithm ( 1 of 3)

Batch 1
Compute the frequencies of all items
Create f_list - Frequency ordered items
Create FP-Tree with items with FI ? B1
Mine all Patterns off the tree
Create FP-Stream Structure
( Pattern Tree Time-Tilted Windows)

17
Algorithm ( 2 of 3)

Batchi for i 2
Empty FP-Tree
Sort each transaction, t , according to f_list
Insert into FP-Tree without pruning items
Mine pattern, I off the tree
If I is in FP-Stream Structure
- Add FI(B) to titled-time window for I
- Do Tail Pruning
- Do Type II Pruning

18
Algorithm ( 3 of 3)

If I is not in FP-Stream Structure
- If FI(B) ? B
- Insert I into FP-Stream Structure
- Else Do Type I Pruning
Scan FP-Stream structure
If I was not updated in batch B, insert 0 in Is
tilted-time window
Do tail Pruning
Drop leaf nodes with empty titled-time windows

19
Experiments ( 1 of 2)

IBM Synthetic Market-Basket Data Generator
3M Transactions using 1K distinct items
Each batch was 50K transactions
s 0.5 and 0.75
? 0.1 s
Randomly permutates 200 table entries every five
batches
3 Data sets Length 3, 5 7

20
Experiments ( 2 of 2)

Statistics At the end of each batch
Time Number of seconds/ batch
Size Size of FP-Stream structure
Num Itemset Total number of itemsets
Ave Len Length of an itemset

21
Results ( 1 of 4)

Time Average of 1000 trans/sec

This object is copied from the original paper
22
Results ( 2 of 4)

Size Approx 350k/batch

This object is copied from the original paper
23
Results ( 3 of 4)

Ave Length Length of approx. 3

This object is copied from the original paper
24
Results ( 4 of 4)

Num Itemsets 500 -25, 000 itemsets

This object is copied from the original paper
25
Discussion

Introduce a fading factor for older transactions
Compress the FP-Stream structure further
(parent-child)
Reordering of (sub)frequent items
Limit the length of the titled-time Window

26
Related Work

Stream Data Classification Domingos
Hulten(2000)
Stream Clustering Guha (2000), OCallaghan
(2002)
Mining Frequent Counts in Streams Manku (2002)
Landmark model, mine FP in data streams by
assuming patterns are measured from the start of
the stream up to the current moment.

27
Conclusions

A novel approach to mine time-sensitive frequent
patterns in data streams
Overall space requirements can easily fit into
main memory (less that 3M)
The algorithm does not fall behind the stream
(2000 -180 Transactions per seconds)

28
Opinion

The use of tilted-time windows is a novel and key
idea to the mining of data streams
Choice of the Maximum support error, ?, at which
the loss of support will be insignificant

Questions ?

30
Frequent, Sub-Frequent Infrequent Patterns

For a Period T,
If s minimum support
? maximum support error (0.1s )
Frequent s
s gt Subfrequent ?
Infrequent lt ?

back
31
Natural tilted-time window
This object is copied from the original paper
One month 4 24 31 59 units
32
Logarithmic tilted-time window
One Year Log2 (365 24 4) 1 17
units Rather than 366 24 4 35, 136 units
This object is copied from the original paper
33
Frequent Patterns for Tilted-Time Windows
back
This object is copied from the original paper

Write a Comment

User Comments (0)