Title: Mining Frequent Patterns in Data Streams at Multiple Time Granularities
1Mining Frequent Patterns in Data Streams at
Multiple Time Granularities
- C.Giannella, J. Han, J. Pei,
- X. Yan and P. Yu
Pele Williams
2Presentation Outline
- Objective and Definition of Key terms
- Problem Definition Research Applications
- The FP-Stream Structure
- Pruning Techniques
- Algorithm
- Experiments, Results Conclusion
3Objective
- An extension of frequent pattern mining to Data
Streams - Propose the FP-stream structure which enables
the dynamic mining of time-sensitive
frequent patterns
4Terms Frequent Pattern Mining
-
- An Association Rule Mining technique that is
used to obtain all frequent itemsets (from
items in transactions) for a given minimum
support threshold .
5Example Frequent Pattern Mining
Pattern Tree (r3) c ( r4) s
Database 01 s c r a 02 s r n 03 n r c a
04 r s 05 r s c
Pattern Base c r s (2) c r (1)
s r(4)
Support 0.6
Frequent Pattern Tree
r5 s4
c1 c2
Rules c r s r
Header Table Item Count r 5 s
4 c 3
6Problem Definition Solutions (1 of 2)
- How can Frequent Pattern Mining be applied to
Data Streams? - - Mine in batches (windows) and use a
single-pass algorithm - How can the completeness of frequent patterns in
data streams be ensured? - - Store a bit more information than the
currently frequent items
7Problem Definition Solutions (1 of 2)
- How can the entire stream be mined within limited
time and memory contraints? - - Retain the counts of frequent and sub-frequent
patterns and prune regularly - How can time-sensitive Frequent Patterns be
mined ? - - Use a time-tilted window
8Applicable Queries
- What is the frequent pattern set over the period
t2 and t3 - What are the periods when (a, b) is frequent?
- Does the support of a change dramatically in the
period from t3 to t0
9Research Applications
- Mining Alarming Incidents (MAIDS)
- Network Traffic Analysis (Intrusion Detection)
- Shopping trends
- Dynamic Tracing of Stock Fluctuation
- Sensor Network Data Analysis
- Web Click Stream mining
10The FP- Stream Structure
This object is copied from the original paper
11Tilted-Time Windows
This object is copied from the original paper
12Time Granularity Error
- For a time window request consisting of h
batches - Time Granularity Error h - h h/2
- Example Last 24 batches 6 hours
- Gives the last 31 batches 7 24/2
13Intermediate Buffer Windows
- A temporary location to store batch counts at
different granularity levels. E.g. - B 8 ? f(8, 8) f(7, 7) f(6, 5) f(4, 1) .
- B 9 ? f(9, 9) f(8, 8) f(7, 7) f(6, 5)
f(4, 1) . - B 10 ? f(10, 10) f(9, 9) f(8, 7) f(6, 5)
f(4, 1).
14Intermediate Buffer Windows
- B 11? f(11, 11) f(10, 10) f(9, 9) f(8, 7)
f(6, 5) f(4, 1). - B 12? f(12, 12) f(11, 11) f(10, 9) f(8, 5)
f(4, 1). - The size of the titled-time window can grow no
larger than 2log2N) 2
15Frequent Pattern Pruning
- Tail Pruning Drop the counts of the last m to n
windows if the sum of their frequency is lt ? - fI(T) ?W fI(T) fI(T)
- Type I Pruning Supersets of an infrequent
pattern are also infrequent - Type II Pruning Drop the tails (or the entire
pattern) of the all supersets if the subset has
been dropped.
This equation is copied from the original paper
16Algorithm ( 1 of 3)
- Batch 1
- Compute the frequencies of all items
- Create f_list - Frequency ordered items
- Create FP-Tree with items with FI ? B1
- Mine all Patterns off the tree
- Create FP-Stream Structure
- ( Pattern Tree Time-Tilted Windows)
17Algorithm ( 2 of 3)
- Batchi for i 2
- Empty FP-Tree
- Sort each transaction, t , according to f_list
- Insert into FP-Tree without pruning items
- Mine pattern, I off the tree
- If I is in FP-Stream Structure
- - Add FI(B) to titled-time window for I
- - Do Tail Pruning
- - Do Type II Pruning
18Algorithm ( 3 of 3)
- If I is not in FP-Stream Structure
- - If FI(B) ? B
- - Insert I into FP-Stream Structure
- - Else Do Type I Pruning
- Scan FP-Stream structure
- If I was not updated in batch B, insert 0 in Is
tilted-time window - Do tail Pruning
- Drop leaf nodes with empty titled-time windows
19Experiments ( 1 of 2)
- IBM Synthetic Market-Basket Data Generator
- 3M Transactions using 1K distinct items
- Each batch was 50K transactions
- s 0.5 and 0.75
- ? 0.1 s
- Randomly permutates 200 table entries every five
batches - 3 Data sets Length 3, 5 7
20Experiments ( 2 of 2)
- Statistics At the end of each batch
- Time Number of seconds/ batch
- Size Size of FP-Stream structure
- Num Itemset Total number of itemsets
- Ave Len Length of an itemset
21Results ( 1 of 4)
- Time Average of 1000 trans/sec
This object is copied from the original paper
22Results ( 2 of 4)
This object is copied from the original paper
23Results ( 3 of 4)
- Ave Length Length of approx. 3
This object is copied from the original paper
24Results ( 4 of 4)
- Num Itemsets 500 -25, 000 itemsets
This object is copied from the original paper
25Discussion
- Introduce a fading factor for older transactions
- Compress the FP-Stream structure further
(parent-child) - Reordering of (sub)frequent items
- Limit the length of the titled-time Window
26Related Work
- Stream Data Classification Domingos
Hulten(2000) - Stream Clustering Guha (2000), OCallaghan
(2002) - Mining Frequent Counts in Streams Manku (2002)
Landmark model, mine FP in data streams by
assuming patterns are measured from the start of
the stream up to the current moment.
27Conclusions
- A novel approach to mine time-sensitive frequent
patterns in data streams - Overall space requirements can easily fit into
main memory (less that 3M) - The algorithm does not fall behind the stream
(2000 -180 Transactions per seconds)
28Opinion
- The use of tilted-time windows is a novel and key
idea to the mining of data streams - Choice of the Maximum support error, ?, at which
the loss of support will be insignificant
29 30Frequent, Sub-Frequent Infrequent Patterns
- For a Period T,
- If s minimum support
- ? maximum support error (0.1s )
- Frequent s
- s gt Subfrequent ?
- Infrequent lt ?
back
31Natural tilted-time window
This object is copied from the original paper
One month 4 24 31 59 units
32Logarithmic tilted-time window
One Year Log2 (365 24 4) 1 17
units Rather than 366 24 4 35, 136 units
This object is copied from the original paper
33Frequent Patterns for Tilted-Time Windows
back
This object is copied from the original paper