Title: Online Pattern Discovery Applications in Data Streams
1Online Pattern Discovery Applications in Data
Streams
- Sensor-less Pairs-trading in stock trading (find
highly correlated pairs in n log n time) - Sensor-full Gamma Ray Detection in astrophysics
(burst detection over a large number of window
sizes in almost linear time)
- Dennis Shasha (joint work with Yunyue Zhu)
- yunyue,shasha_at_cs.nyu.edu
2Application 1 Pairs Trading
- Stock prices streams
- The New York Stock Exchange (NYSE)
- 50,000 securities (streams) 100,000 ticks (trade
and quote) - Pairs Trading, a.k.a. Correlation Trading
- Querywhich pairs of stocks were correlated with
a value of over 0.9 for the last three hours?
XYZ and ABC have been correlated with a
correlation of 0.95 for the last three hours. Now
XYZ and ABC become less correlated as XYZ goes up
and ABC goes down. They should converge back
later. I will sell XYZ and buy ABC
3Online Detection of High Correlation
- Given tens of thousands of high speed time series
data streams, to detect high-value correlation,
including synchronized and time-lagged, over
sliding windows in real time. - Real time
- high update frequency of the data stream
- fixed response time, online
4StatStream Algorithm
- Naive algorithm
- N number of streams
- w size of sliding window
- space O(N) and time O(N2w) VS space O(N2) and
time O(N2) . - Suppose that the streams are updated every
second. - With a Pentium 4 PC, the exact method can
monitor only 700 streams with a delay of 2
minutes. - Our Approach
- Discrete Fourier Transform to approximate
correlation - grid structure to filter out unlikely pairs
- Our approach can monitor 10,000 streams with a
delay of 2 minutes.
5StatStream Stream synoptic data structure
- Three level time interval hierarchy
- Time point, Basic window, Sliding window
- Basic window (the key to our technique)
- The computation for basic window i must finish by
the end of the basic window i1 - The basic window time is the system response
time. - Digests
Basic window digests sum DFT coefs
Basic window digests sum DFT coefs
Sliding window digests sum DFT coefs
6Application 2 elastic burst detection
- Discover time intervals with an unusually large
numbers of events. - In astrophysics, the sky is constantly observed
for high-energy particles. When a particular
astrophysical event happens, a shower of
high-energy particles arrives in addition to the
background noise. - In finance, stocks with unusual high trading
volumes should attract the notice of traders (or
perhaps regulators). - Challenge to discover time and duration of
burst, which may vary - In astrophysics, a burst of high-energy particles
associated with a special event might last for a
few milliseconds or a few hours or even a few
days - NB Similar idea may apply to spatial burst
detection.
7Application 2 burst detection
8Burst Detection Problem Statement
- ProblemGiven a time series of positive number
x1, x2,..., xn, and a threshold function f(w),
w1,2,...,n, find the subsequences of any size
such that their sums are above the thresholds - all 0ltwltn, 0ltmltn-w, such that xm xm1 xmw-1
gt f(w) - Brute force search O(n2) time
- Our shift wavelet tree (SWT) O(nk) time.
- k is the size of the output, i.e. the number of
windows with bursts
9Burst Detection Data Structure and Algorithm
- Lemma 1any subsequence s is included by one
window w in the SWT. - Lemma 2 if Sum(s)gtthreshold, then
Sum(w)gtthreshold (no false positives).