Mining Frequent Patterns in Data Streams at Multiple Time Granularities

1 / 33
About This Presentation
Title:

Mining Frequent Patterns in Data Streams at Multiple Time Granularities

Description:

Mine in batches (windows) and use a single-pass algorithm ... Randomly permutates 200 table entries every five batches. 3 Data sets: Length 3, 5 & 7 ... –

Number of Views:441
Avg rating:3.0/5.0
Slides: 34
Provided by: awi88
Category:

less

Transcript and Presenter's Notes

Title: Mining Frequent Patterns in Data Streams at Multiple Time Granularities


1
Mining Frequent Patterns in Data Streams at
Multiple Time Granularities
  • C.Giannella, J. Han, J. Pei,
  • X. Yan and P. Yu

Pele Williams
2
Presentation Outline
  • Objective and Definition of Key terms
  • Problem Definition Research Applications
  • The FP-Stream Structure
  • Pruning Techniques
  • Algorithm
  • Experiments, Results Conclusion

3
Objective
  • An extension of frequent pattern mining to Data
    Streams
  • Propose the FP-stream structure which enables
    the dynamic mining of time-sensitive
    frequent patterns

4
Terms Frequent Pattern Mining
  • An Association Rule Mining technique that is
    used to obtain all frequent itemsets (from
    items in transactions) for a given minimum
    support threshold .

5
Example Frequent Pattern Mining
Pattern Tree (r3) c ( r4) s
Database 01 s c r a 02 s r n 03 n r c a
04 r s 05 r s c
Pattern Base c r s (2) c r (1)
s r(4)
Support 0.6
Frequent Pattern Tree
r5 s4
c1 c2
Rules c r s r
Header Table Item Count r 5 s
4 c 3
6
Problem Definition Solutions (1 of 2)
  • How can Frequent Pattern Mining be applied to
    Data Streams?
  • - Mine in batches (windows) and use a
    single-pass algorithm
  • How can the completeness of frequent patterns in
    data streams be ensured?
  • - Store a bit more information than the
    currently frequent items

7
Problem Definition Solutions (1 of 2)
  • How can the entire stream be mined within limited
    time and memory contraints?
  • - Retain the counts of frequent and sub-frequent
    patterns and prune regularly
  • How can time-sensitive Frequent Patterns be
    mined ?
  • - Use a time-tilted window

8
Applicable Queries
  • What is the frequent pattern set over the period
    t2 and t3
  • What are the periods when (a, b) is frequent?
  • Does the support of a change dramatically in the
    period from t3 to t0

9
Research Applications
  • Mining Alarming Incidents (MAIDS)
  • Network Traffic Analysis (Intrusion Detection)
  • Shopping trends
  • Dynamic Tracing of Stock Fluctuation
  • Sensor Network Data Analysis
  • Web Click Stream mining

10
The FP- Stream Structure
This object is copied from the original paper
11
Tilted-Time Windows
This object is copied from the original paper
12
Time Granularity Error
  • For a time window request consisting of h
    batches
  • Time Granularity Error h - h h/2
  • Example Last 24 batches 6 hours
  • Gives the last 31 batches 7 24/2

13
Intermediate Buffer Windows
  • A temporary location to store batch counts at
    different granularity levels. E.g.
  • B 8 ? f(8, 8) f(7, 7) f(6, 5) f(4, 1) .
  • B 9 ? f(9, 9) f(8, 8) f(7, 7) f(6, 5)
    f(4, 1) .
  • B 10 ? f(10, 10) f(9, 9) f(8, 7) f(6, 5)
    f(4, 1).

14
Intermediate Buffer Windows
  • B 11? f(11, 11) f(10, 10) f(9, 9) f(8, 7)
    f(6, 5) f(4, 1).
  • B 12? f(12, 12) f(11, 11) f(10, 9) f(8, 5)
    f(4, 1).
  • The size of the titled-time window can grow no
    larger than 2log2N) 2

15
Frequent Pattern Pruning
  • Tail Pruning Drop the counts of the last m to n
    windows if the sum of their frequency is lt ?
  • fI(T) ?W fI(T) fI(T)
  • Type I Pruning Supersets of an infrequent
    pattern are also infrequent
  • Type II Pruning Drop the tails (or the entire
    pattern) of the all supersets if the subset has
    been dropped.

This equation is copied from the original paper
16
Algorithm ( 1 of 3)
  • Batch 1
  • Compute the frequencies of all items
  • Create f_list - Frequency ordered items
  • Create FP-Tree with items with FI ? B1
  • Mine all Patterns off the tree
  • Create FP-Stream Structure
  • ( Pattern Tree Time-Tilted Windows)

17
Algorithm ( 2 of 3)
  • Batchi for i 2
  • Empty FP-Tree
  • Sort each transaction, t , according to f_list
  • Insert into FP-Tree without pruning items
  • Mine pattern, I off the tree
  • If I is in FP-Stream Structure
  • - Add FI(B) to titled-time window for I
  • - Do Tail Pruning
  • - Do Type II Pruning

18
Algorithm ( 3 of 3)
  • If I is not in FP-Stream Structure
  • - If FI(B) ? B
  • - Insert I into FP-Stream Structure
  • - Else Do Type I Pruning
  • Scan FP-Stream structure
  • If I was not updated in batch B, insert 0 in Is
    tilted-time window
  • Do tail Pruning
  • Drop leaf nodes with empty titled-time windows

19
Experiments ( 1 of 2)
  • IBM Synthetic Market-Basket Data Generator
  • 3M Transactions using 1K distinct items
  • Each batch was 50K transactions
  • s 0.5 and 0.75
  • ? 0.1 s
  • Randomly permutates 200 table entries every five
    batches
  • 3 Data sets Length 3, 5 7

20
Experiments ( 2 of 2)
  • Statistics At the end of each batch
  • Time Number of seconds/ batch
  • Size Size of FP-Stream structure
  • Num Itemset Total number of itemsets
  • Ave Len Length of an itemset

21
Results ( 1 of 4)
  • Time Average of 1000 trans/sec

This object is copied from the original paper
22
Results ( 2 of 4)
  • Size Approx 350k/batch

This object is copied from the original paper
23
Results ( 3 of 4)
  • Ave Length Length of approx. 3

This object is copied from the original paper
24
Results ( 4 of 4)
  • Num Itemsets 500 -25, 000 itemsets

This object is copied from the original paper
25
Discussion
  • Introduce a fading factor for older transactions
  • Compress the FP-Stream structure further
    (parent-child)
  • Reordering of (sub)frequent items
  • Limit the length of the titled-time Window

26
Related Work
  • Stream Data Classification Domingos
    Hulten(2000)
  • Stream Clustering Guha (2000), OCallaghan
    (2002)
  • Mining Frequent Counts in Streams Manku (2002)
    Landmark model, mine FP in data streams by
    assuming patterns are measured from the start of
    the stream up to the current moment.

27
Conclusions
  • A novel approach to mine time-sensitive frequent
    patterns in data streams
  • Overall space requirements can easily fit into
    main memory (less that 3M)
  • The algorithm does not fall behind the stream
    (2000 -180 Transactions per seconds)

28
Opinion
  • The use of tilted-time windows is a novel and key
    idea to the mining of data streams
  • Choice of the Maximum support error, ?, at which
    the loss of support will be insignificant

29
  • Questions ?

30
Frequent, Sub-Frequent Infrequent Patterns
  • For a Period T,
  • If s minimum support
  • ? maximum support error (0.1s )
  • Frequent s
  • s gt Subfrequent ?
  • Infrequent lt ?

back
31
Natural tilted-time window
This object is copied from the original paper
One month 4 24 31 59 units
32
Logarithmic tilted-time window
One Year Log2 (365 24 4) 1 17
units Rather than 366 24 4 35, 136 units
This object is copied from the original paper
33
Frequent Patterns for Tilted-Time Windows
back
This object is copied from the original paper
Write a Comment
User Comments (0)
About PowerShow.com