Title: Capturing Sensor-Generated Time Series with Quality Guarantees
1Capturing Sensor-Generated Time Series with
Quality Guarantees
- Iosif Lazaridis
- Sharad Mehrotra
- University of California, Irvine
- ICDE 2003, Bangalore, India
2Talk Outline
- Sensors and QUASAR
- Archival Vs. On-line applications
- Poor Mans Compression
- Using Prediction
- Experiments
- Conclusions
3Sensors
- Sensors are becoming smaller, cheaper, and more
configurable - Systems incorporating large numbers of them will
be feasible - Main problems limited energy, limited bandwidth
- Goal limit communication
- Benefits data recipient as well
4Quality Aware Sensing Architecture (QUASAR) _at_ UC
Irvine
- Tradeoff between data/application accuracy, and
system performance - This paper how to capture streaming time series
with a given error tolerance
5Archival Vs. Online Applications (I)
- Online applications discard the time series
history (e.g., intrusion detection) - Archival aspect is important because
- Data may be precious (e.g., once-in-a-lifetime)
- Unforeseen uses of the data (e.g., new mining
applications) - Roll-back feature (e.g., what caused an accident
?)
6Archival Vs. Online Applications (II)
- Online applications have
- timing requirements
- time-variable needs (e.g., sensor S1 may become
more, or less popular with time) - Archival
- timing less important
- constant needs (archived time series must meet
some overall quality criteria)
7Preliminaries
- Let P lt p1, p2, , pn gt be a sequence of
environmental measurements (time series)
generated by the producer, where n now - Let S lts1, s2, , sngt be the server side
representation of the sequence - A within-? quality data collection protocol
guarantees that - for all i error (pi, si) ? ?
8Piecewise Constant Approximation (PCA)
- Given a time series Sn s1n a piecewise
constant approximation of it is a sequence - PCA(Sn) lt (ci, ei) gt
- that allows us to estimate sj as
- s j ci if j ?ei-11, ei
- c1 if j ? e1
9Poor Mans Compression
- Goal Given time series of sensor values,
generate a within-?capt PCA representation of it - Poor Mans Compression - Midrange (PMC-MR)
- Maintain m, M as the minimum/maximum values of
observed samples since last segment - On processing pn, update m and M if needed
- if M - m gt 2?capt , output a segment ((mM )/2,
n-1)
Value
Variant PMC-Mean Uses the mean of the observed
points
?capt 1.6
Time
1 2 3 4 5
10Optimality of PMC-Midrange
CLAIM A PCA representation BETTER with KltK
segments violates ?capt
Hence, the segment of BETTER must contain
ek1 Since PMC-MR output a new segment after ek
? range of values in ek-11,ek1 is gt 2 ?capt
? Segment of BETTER violates ?capt tolerance
11Why Predict?
Future (no data)
Recent Past (precise data in sensor)
History (compressed data in archive)
Time
n ( now )
n - nlag
12Who Should Fit the Predictive Model?
- Server
- Long-haul models possible
- No extra communication
- - No prediction accuracy guarantee
- - Server only sees approximate time series
- Sensor (Data Producer)
- Short-term adaptation (but )
- Sensor sees precise time series
- Prediction accuracy ?pred guarantee
13Producer-Side Prediction
- M is the predictive model, and ? its parameters
- Producer sends (M, ?) to server
- Server estimates time series value spredj using
(M, ?) - Producer generates new (M, ?) when
- error(pj,spredj)gt ?pred
14Issues in Prediction
- Choice of M is domain-specific
- Goal is to minimize communication
- Prediction accuracy error(pj,spredj) lt
?pred - Parameter size ?
- Adaptive Model Selection
- E.g., constant velocity or constant acceleration,
for predicting a moving objects location
15Combining Prediction with Compression
- Imagine that ?pred lt ?capt
- No need to do extra work for archival
- the sequence of (M,?) suffices
- In general, compression should exploit the
information given in the sequence (M,?) - Solution
- djpj-spredj (error time series)
- Compress d1n within- ?capt
16Combining Prediction with Compression (Example)
17Experiments
- Data sets
- Synthetic Random-Walk
- x1 0 and xixi-1si where si drawn
uniformly from -1,1 - Oceanographic Buoy Data
- Environmental attributes (temperature, salinity,
wind-speed, etc.) sampled at 10 intervals from a
buoy in the Pacific Ocean (Tropical Atmosphere
Ocean Project, Pacific Marine Environment
Laboratory) - GPS data collected using IPAQs (not in paper)
- Experiments to test
- Compression Performance of PMC
- Benefits of Model Selection
- Query Accuracy over Compressed Data
- Benefits of Prediction/Compression Combination
18Compression Performance
K/n ratio number of segments/number of samples
19Query Performance Over Compressed Data
How many sensors have values gtv? (Mean
selectivity 50)
20Impact of Model Selection
- Objects moved at approximately constant speed (
measurement noise) - Three models used
- locn c
- locn cvt
- locn cvt0.5at2
- Parameters v, a were estimated at sensor over
moving-window of 5 samples
K/n ratio number of segments/number of samples.
?pred is the localization tolerance in meters
21Combining Prediction with Compression
K/n ratio number of segments/number of samples
22GPS Mobility Data from Mobile Clients (iPAQs)
QUASAR Client Time Series
Latitude Time Series 1800 samples
Compressed Time Series (PMC-MR) Accuracy of 100
m 130 segments
23Conclusions and Future Work
- We motivated the importance of real-time
applications to co-exist with data archival - We showed how compression can be used to reduce
the size of the archived time series optimally - We investigated how prediction can be used to
limit communication by allowing the database to
estimate values ahead of time - We noted the interplay between prediction and
compression and showed how they can be combined - In the future
- Adaptive algorithms for model selection
- Exploiting inter-sensor correlation for further
reducing communication