Title: Quality Aware Sensor Infrastructure QUASAR Project
1Quality Aware Sensor Infrastructure (QUASAR)
Project
- Team
- Faculty Sharad Mehrotra, Nalini
Venkatasubramanian - Postdoc Dimitr Kalashnikov
- Students Qi Han, Iosif Lazaridis, Xingbo Yu
-
Supported in part by a collaborative NSF ITR
grant entitled real-time data capture, analysis,
and querying of dynamic spatio-temporal events
in collaboration with UCLA, U. Maryland, U.
Chicago
2Ubiquitous Sensor Environments
- Generational advances to computing infrastructure
- sensors will be everywhere
- Continuous monitoring and recording of physical
world and its phenomena - Limitless possibilities
- New challenges
- limited bandwidth energy
- highly dynamic systems
- System architectures are due for an overhaul
- at all levels of the system networks, OS,
middleware, databases, applications
Battlefield Monitoring
Sensor Networks
Earthquake Monitoring
Video Surveillance
Traffic Congestion Detection
Target Tracking Detection
Intrusion Detection
3Taxonomy of Applications (1)
- Data Access needs of applications
- Historical data
- Analysis to better understand the physical world
- Current data
- Monitoring and control to optimize the processes
that drive the physical world - Future data
- Forecasting trend in data for decision making
4Taxonomy of Applications (2)
- Predictability of Data access
- Fixed
- data access needs of applications known a-priori
- Unpredictable (ad-hoc)
- Data access needs of applications not known at
any instance of time - Predictable (continuous)
- Data access needs of applications can be
predicted for some time in the future with high
probability
5Application Landscape
Temporal property of data accessed
the future
Predict noise levels around the airport if runway
2 becomes operational
Each evening at 8pm predict the temperature for
the next 5 days
Im going surfing on Sep. 30! Will it be windy?
Visualize current humidity with Mrs. Does new
interpolation scheme.
Notify me immediately when there is a forest fire
How much snow is there in Aspen?
the present
Every month, calculate the average humidity in
California for the last 30 days
Is Mr. Does newly proposed weather model
accurate for 1996-2000?
Did the temperature rise above 40oC in the last
year?
the past
no knowledge some knowledge full
knowledge
Predictability of data access
6Sensor Data Management Infrastructure
- A data collection and management middleware
infrastructure that - provides seamless access to data dispersed
across a hierarchy of sensors, servers, and
archives - supports multiple concurrent applications of
diverse types - adapts to changing application needs
- Fundamental Issues
- Where to store data?
- do not store, at the producers, at the servers
- Where to compute?
- At the client, server, data producers
7Existing DBMS technologies
data producers
- Traditional data management
- client-server architecture
- efficient approaches to data storage querying
- query shipping versus data shipping
- data changes with explicit update
- Limitations
- Sensors generate continuously changing data
- Producers must be considered as first class
entities - Does not exploit the storage, processing, and
communicating capabilities of sensors
8Stream Data Management
synopsis in memory
continuous queries
data streams
stream processing engine
(Approximate) Answer
- Data streams through the server but is not stored
- Continuous queries evaluated against streaming
data - Deals with problems due to dynamic data on the
server side - But
- Does not converse sensor resources (e.g., power)
- Does not exploit the storage and processing
capabilities of sensors - Geared towards continuous monitoring and not
archival applications
9Quasar Architecture
- Hierarchical architecture
- data flows from producers to server to clients
periodically - queries flow the other way
- If client cache does not suffices, then
- query routed to appropriate server
- If server cache does not suffice, then access
current data at producer - This is a logical architecture
- producers could also be clients
- A server may be a base station or a (more)
powerful sensor node - Servers might themselves be hierarchically
organized - The hierarchy might evolve over time
DATA FLOW
QUERY FLOW
client cache
server
server cache and archive
Producer its cache
10Quasar Observations Approach
- Applications can tolerate errors in sensor data
- applications may not require exact answers
- small errors in location during tracking or error
in answer to query result may be OK - data cannot be precise due to measurement errors,
transmission delays, etc. - Communication is the dominant cost
- limited wireless bandwidth, source of major
energy drain - Quasar Approach
- exploit application error tolerance to reduce
communication between producer and server and/or
to conserve energy - Two approaches
- Minimize resource usage given quality constraints
- Maximize quality given resource constraints
11Quasar Issues
- Mapping application quality requirement to data
quality requirements - Examples
- Target tracking quality of track --gt accuracy of
data - Aggregation Queries accuracy of results --gt
accuracy of data - Strategy should adapt to expected application
load - Quality-based data collection
- Minimize sensor resource consumption while
guaranteeing required data quality - Quality-cognizant query processing
- imprecise data representation
- Optimal execution of queries over imprecise data
12Quasar Progress
- Mapping application quality requirement to data
quality requirements - Target Tracking using acoustic sensors MW 03
- Spatial range queries DEXA 03
- Quality-based data collection
- General framework DS Online 03
- To support monitoring queries over current data
Qi03 - For sensor data archival ICDE 03
- With real-time constraints RTSS 03
- With support for in-network aggregation Yu03
- Quality-cognizant query processing
- Aggregation queries Sigmod 01
- Selection Queries ICDE 04
13Quality Aware Data Collection Problem
Sensor time series pn, pn-1, , p1
- Let P lt p1, p2, , pn gt be a sequence of
environmental measurements (time series)
generated by the producer, where n now - Let S lts1, s2, , sngt be the server side
representation of the sequence - A within-? quality data collection protocol
guarantees that - for all i error(pi, si) lt ?
- ? is derived from application quality tolerance
14Simple Data Collection Protocol
Sensor time series pn, pn-1, , p1
- sensor Logic (at time step n)
- Let p last value sent to server
- if error(pn, p) gt ? or on timeout ?
- send pn to server --- sensor if
switch radio on, if need be - server logic (at time step n)
- If new update pn received at step n
- sn pn
- Else
- sn last update sent by sensor
- guarantees maximum error at server less than
equal to ?
15Exploiting Prediction Models
- Producer and server agree upon a prediction model
(M, ?) - Let spredi be the predicted value at time i
based on (M, ?) - sensor Logic (at time step n)
- if error(pn, spredn ) gt ?
- send pn to server
- server logic (at time step n)
- If new update pn received at step n
- sn pn
- Else
- sn spredn based on model (M, ?)
16Challenges in Prediction
- Simple versus complex models?
- Complex and more accurate models require more
parameters (that will need to be transmitted). - Goal is to minimize cost not necessarily best
prediction - How is a model M generated?
- static -- one out of a fixed set of models
- dynamic -- dynamically learn a model from data
- When should a model M or parameters ? be changed?
- immediately on model violation
- too aggressive violation may be a temporary
phenomena - never changed
- too conservative data rarely follows a single
model
17Challenges in Prediction (cont.)
- who updates the model?
- Server
- long-haul prediction models possible, since
server maintains history - might not predict recent behavior well since
server does not know exact S sequence server has
only samples - extra communication to inform the producer
- Producer
- better knowledge of recent history
- long haul models not feasible since producer does
not have history - producers share computation load
- Both
- server looks for new models, sensor performs
parameter fitting given existing models.
18Answering Queries
Probe result
- If query quality tolerance satisfied at server
(more than ?) - Answer query at the server
- Else
- Probe the sensor
- Sensor guaranteed to respond within a bounded
time ? - Approach guarantees quality tolerance of queries
Imprecise data representation
19The Challenge
- How should sensor state be managed to minimize
energy consumption in maintaining data at
required quality - Sensor State error precision, power states
- Power consumption of sensors
20Sensor State Models
Active-Listening-Sleeping Model (ALS)
Upon first sensor-initiated update Or after Ts
sleeping
Upon first sensor initiated update or probe
active
After Tl without traffic
listening
Ta after processing last sensor-initiated update
or probe
Other Models Always-Active (AA) Ta is
infinite Active-Listening (AL) Tl is
infinite Active-Sleeping (AS) Tl is 0
21Issues in Energy Efficient Data Collection
- Issues
- How to maintain the precision range ? for each
sensor - Larger ? increases possibility of expensive
probes - Small ? wastes communication due to
sensor-initiated updates - When to transition between sensor states (I.e,
set Ta, Tl, Ts) - Powering down might not be optimal if we have to
power up immediately - Powering down may increases query response time
- Objective
- set values for Ta, Tl, Ts, and ? that minimizes
energy cost
normalized energy cost energy consumed at each
state state transition energy
22Our Approaches to Energy Efficient Sensor Data
Collection
- We solve the energy optimization problem by
solving two sub-problems - Optimize energy consumption by adjusting range
size under the assumption that the state
transition is fixed - I.e., Ta, Tl, and Ts have been optimally set
- Optimize energy consumption by adapting sensor
states while assuming that the precision range
for sensor is fixed
23Range size Adjustment for the AA/AL Model
- Optimal precision range ? that minimizes E occurs
when - Optimal range can be realized by maintaining this
probability ratio - Can be done at the sensor
- Assuming that ? is the ratio of sensor-initiated
update probability to probe probability - for sensor-initiated update
- with probability min?,1, set ? ?(1?)
- for probe
- with probability min1/ ?,1, set ??/(1
?)
24Range Size Adjustment for the AS/ALS Model
- Sensor side
- Keep track of the number of state transitions of
the last k updates - Piggyback the probability of state transitions
with the Kth update - Server side
- Keep track of the number of sensor-initiated
updates and probes of the last k updates - Upon receiving the Kth update from the sensor
- Compute the optimal precision range ?
- Inform the sensor about the new ?
25Adaptive Range Setting for the AS/ALS Model
26Adaptive State Management
- Consider the AS model for derivation of optimal
Ta to minimize energy consumption - Assuming ?(t) is the probability of receiving a
request at time instant t, the expected energy
consumption for a single silent period is - E is minimized when Ta0 if requests are
uniformly distributed in interval 0, TaTs. - In practice, learn ?(t) at runtime and select Ta
adaptively - Choose a window size w in advance
- Keep track of the last w silent period lengths
and summarizes this information in a histogram - Periodically use the histogram to generate a new
Ta
27Adaptive State Management (Cont)
- ci the number of silent periods for bin i among
the last w silent periods - estimate ? by the distribution which generates a
silent period of length ti with probability ci/w - Ta is chosen to be the value tm that minimizes
the energy consumption as follows
c1
cn-1
c0
bin 1
bin n-1
c2
bin 0
bin 2
t0 t1 t2 t3
tn-1 tnTaTs
28Performance Study
- Simulation Environments
- Modeling sensor
- Power consumption parameters Berkeley motes
- Sensor values
- uniformly from the range -150, 150
- perform a random walk in one dimension every
second, the values either increases or decreases
by an amount sampled uniformly from 0.5,1.5. - Modeling queries
- query arrival times at the server are Poisson
distributed - mean inter-arrival time 2 seconds.
- each query is accompanied by an accuracy
constraint A - Auniform( Aavg(1- Avar ), Aavg(1 Avar ))
- Aavg 20 (average accuracy constraint)
- Avar1 (accuracy constraint variation)
29System Performance Comparison
30Impact of Ta adaptation on System Performance
31Impact of Range Size Adaptation on System
Performance
32Fusing Energy Efficient Data Collection and
In-network Aggregation
access point
access point
- Issues
- Hierarchical precision range ? adjustment
- Cluster forming and dynamic maintenance
33Quasar Progress
- Mapping application quality requirement to data
quality requirements - Target Tracking using acoustic sensors MW 03
- Spatial range queries DEXA 03
- Quality-based data collection
- General framework DS Online 03
- To support monitoring queries over current data
Qi03 - For sensor data archival ICDE 03
- With real-time constraints RTSS 03
- With support for in-network aggregation Yu03
- Quality-cognizant query processing
- Aggregation queries Sigmod 01
- Selection Queries ICDE 04
34Archiving Sensor Data
- Often sensor-based applications are built with
only the real-time utility of time series data. - Values at time instants ltltn are discarded.
- Archiving such data consists of maintaining the
entire S sequence, or an approximation thereof. - Importance of archiving
- Discovering large-scale patterns
- Once-only phenomena, e.g., earthquakes
- Discovering events detected post facto by
rewinding the time series - Future usage of data which may be not known while
it is being collected
35Problem Formulation
- Let P lt p1, p2, , pn gt be the sensor
time series - Let S lt s1, s2, , sn gt be the server
side representation - A within ?archive quality data archival protocol
guarantees that - error(pi, si) lt ?archive
- Trivial Solution modify collection protocol to
collect data at quality guarantee of
min(?archive , ?collect) - then data collection protocol described earlier
will provide a ?archive quality data stream that
can be archived. - Better solutions possible since
- archived data not needed for immediate access by
real-time or forecasting applications (such as
monitoring, tracking) - compression can be used to reduce data transfer
36Data Archival Protocol
Sensor updates for data collection
Compressed representation for archiving
pn, pn-1, ..
compress
processing at sensor exploited to reduce
communication cost and hence battery drain
Sensor memory buffer
- Sensors compresses observed time series p1n
and sends a lossy compression to the server - At time n
- p1n-nlag is at the server in compressed form
s 1n-nlag within-?archive - sn-nlag1n is estimated via a predictive model
(M, ?) - collection protocol guarantees that this remains
within- ?collect - sn1?? can be predicted but its quality is not
guaranteed - it is in the future and thus the sensor has not
observed these values
37Piecewise Constant Approximation (PCA)
- Given a time series Sn s1n a piecewise
constant approximation of it is a sequence - PCA(Sn) lt (ci, ei) gt
- that allows us to estimate sj as
- scapt j ci if j in ei-11, ei
- c1 if jlte1
Value
c1
c4
c3
Time
c2
e1 e2 e3 e4
38Online Compression using PCA
- Goal Given stream of sensor values, generate a
within-?archive PCA representation of a time
series - Approach (PMC-midrange)
- Maintain m, M as the minimum/maximum values of
observed samples since last segment - On processing pn, update m and M if needed
- if M - m gt 2?archive , output a segment ((mM
)/2, n)
Value
Example ?archive 1.5
Time
1 2 3 4 5
39Online Compression using PCA
- PMC-MR
- guarantees that each segment compresses the
corresponding time series segment to
within-?archive - requires O(1) storage
- is instance optimal
- no other PCA representation with fewer segments
can meet the within-?archive constraint - Variant of PMC-MR
- PMC-MEAN, which takes the mean of the samples
seen thus far instead of mid range.
40Improving PMC using Prediction
- Observation
- Prediction models guarantee a within- ?collect
version of the time series at server even before
the compressed time series arrives from the
producer. - Can the prediction model be exploited to reduce
the overhead of compression. - If ?archivegt ?collect no additional effort is
required for archival --gt simply archive the
predicted model. - Approach
- Define an error time series Ei pi-spredi
- Compress E1n to within-?archive instead of
compressing p1n - The archive contains the prediction parameters
and the compressed error time series - Within-?archive of EI (M, ?) can be used to
reconstruct a within- ?archive version of p
41Combing Compression and Prediction (Example)
42Estimating Time Series Values
- Historical samples (before n-nlag) is maintained
at the server within-?archive - Recent samples (between n-nlag1 and n) is
maintained by the sensor and predicted at the
server. - If an application requires ?q precision, then
- if ?q ? ?collect then it must wait for ? time in
case a parameter refresh is en route - if ?q ? ?archive but ?q lt ?collect then it may
probe the sensor or wait for a compressed segment - Otherwise only probing meets precision
- For future samples (after n) immediate probing
not available as an option
43Experiments
- Data sets
- Synthetic Random-Walk
- x1 0 and xixi-1sn where sn drawn
uniformly from -1,1 - Oceanographic Buoy Data
- Environmental attributes (temperature, salinity,
wind-speed, etc.) sampled at 10min intervals from
a buoy in the Pacific Ocean (Tropical Atmosphere
Ocean Project, Pacific Marine Environment
Laboratory) - GPS data collected using IPAQs
- Experiments to test
- Compression Performance of PMC
- Benefits of Model Selection
- Query Accuracy over Compressed Data
- Benefits of Prediction/Compression Combination
44Compression Performance
K/n ratio number of segments/number of samples
45Query Performance Over Compressed Data
How many sensors have values gtv? (Mean
selectivity 50)
46Impact of Model Selection
- Objects moved at approximately constant speed (
measurement noise) - Three models used
- locn c
- locn cvt
- locn cvt0.5at2
- Parameters v, a were estimated at sensor over
moving-window of 5 samples
K/n ratio number of segments/number of samples.
?pred is the localization tolerance in meters
47Combining Prediction with Compression
K/n ratio number of segments/number of samples
48GPS Mobility Data from Mobile Clients (iPAQs)
QUASAR Client Time Series
Latitude Time Series 1800 samples
Compressed Time Series (PMC-MR, ICDE
2003) Accuracy of 100 m 130 segments
49Quasar Progress
- Mapping application quality requirement to data
quality requirements - Target Tracking using acoustic sensors MW 03
- Spatial range queries DEXA 03
- Quality-based data collection
- General framework DS Online 03
- To support monitoring queries over current data
Qi03 - For sensor data archival ICDE 03
- With real-time constraints RTSS 03
- With support for in-network aggregation Yu03
- Quality-cognizant query processing
- Aggregation queries Sigmod 01
- Selection Queries ICDE 04
50Problem Definition
- There is a collection T of imprecise objects
- E.g., 1,3, 2,5, 4,9 represents 2, 3,
5 - The query is Retrieve objects from T which
satisfy predicate ? - The query specifies quality requirements
- The system must return some approximate result
that meets the quality requirements and with
minimum overall cost.
51Impact of Data Imprecision
Selection ?
b
c
d
e
f
a
Imprecise Object o
Precise Object ?? o can be retrieved with a probe
- Objects are classified as
- a is a NO object
- b, f are MAYBE objects
- c, d, e are YES objects
- The exact set is E ? b, ? c, ? d, ? e
52Defining Quality
Selection ?
b
c
d
e
f
a
- Measures the accuracy of an Approximate answer A
- Set-based Quality
- Precision p A ? E / A
- Recall r A ? E / E
- Value-based Quality
- Laxity of an object is l (o ). E.g., l (2,3)
3-21 - Laxity of A is l max max x?A l (x)
- Query specifies upper bounds pq, rq, lmaxq
53Evaluating QaQ Selection Operator
- Another possibility is to store the object and
deal with it later - Might be good under certain situations based on
available memory at the server
54State of QaQ Selection in the middle of execution
Total
In the beginning
T
At some point of operator evaluation
Objects are classified as
Yes
No
Maybe M Ms? Mns
Answer set A contains some seen YES and MAYBE
Y? A
Ms ? A
?
A
55Answer Quality Bounds
- guaranteed precision, guaranteed recall, and
guaranteed laxity at any stage of the execution - Precision p ? p G Y ? A / A
- Recall r ? r G Y ? A / (Y MnsMs-A)
- Laxity lmax max x?A l (x)
56The Decision Problem
- How should the QaQ selection operator decide
- When to probe
- When to forward
- When to ignore
- Objective
- Meet query quality requirement
- Minimize cost
57Cost Model Combined Data Access Probe Cost
Cost
Read Probe Write
58Impact of Probe, Forward, Ignore actions to
quality
- increase, - decrease, remains the same
59Constraints on the Decision
- Some decisions are fixed -- we have no choice!
- No objects with l(o) greater than the query
tolerance lqmax must be forwarded - The precision guarantee pG must never be less
than the query tolerance pq - If no new YES objects are seen might lead to pq
violation - If A ? Y / (Y Ms-A) is less than the
query tolerance rq you cant ignore an object - This might lead to an rq violation if no new YES
objects are seen
60The decision Plane
No Maybe Yes
Probe with probability ppy or ignore
1
2
3
6
Probe
Ignore
Laxity l(o)
s5
lqmax
s3
4
5
7
Forward with probability pfm or ignore
Probe
Forward
s(o)0 0lts(o)lt1 s(o)1
s(o) probability of a MAYBE object satisfying
the selection
61The Optimization Problem
- Free parameters ppy, s3, s5 , pfm
- Estimate
- Number of YES/MAYBE/NO objects
- Number of YES/MAYBE objects exceeding the lqmax
threshold - Distribution of s (o )
- Minimize cost W in parameter space (ppy, s3 , s5
, pfm) subject to Precision, Recall, Laxity
guarantees
62How it works
- Get selectivity estimates of the input set T
- Solve the 4-parameter optimization problem and
obtain optimal values for ppy, s3 , s5 , pfm - Read one object at a time and handle it according
to the decision plane, instantiated with ppy,
s3 , s5 , pfm - Finish when quality requirements are met
63Performance Study
- Size of input T 10,000
- Laxity ranges in 0,100
- Probe cost 100x read/write unit cost.
- We vary
- Precision, Recall, Laxity Requirement
- Query selectivity
- Input Uncertainty (ratio of YES/MAYBE objects)
- Costs are normalized by dividing with T
64Competing Algorithms
- We devised two simple heuristics
- STINGY avoids probes it ignores MAYBE objects
and objects exceeding the lqmax threshold. - STINGY is conservative, but sometimes it is
forced to probe to meet the quality guarantees. - GREEDY forwards all MAYBE objects and probes all
objects that exceed the lqmax threshold. - GREEDY tries to produce the result quickly by not
ignoring objects, but sometimes it uses too many
probes and forwards too many objects
65Varying Laxity
- Input has 20 YES, 20 MAYBE objects
- 90 Precision and 50 Recall is requested
- As the laxity requirement becomes looser, the
cost is reduced since imprecise objects can be
forwarded without a probe
66Varying Precision
- Input has 20 YES, 20 MAYBE objects
- 50 Recall and laxity50 is requested
- Cost increases as Precision requirement
increases, as objects cant be forwarded unprobed
67Varying Recall
- Input has 20 YES, 20 MAYBE objects
- 90 Precision and laxity50 is requested
- Cost increases as Recall requirement increases
- When Recall requirement is low, only part of the
input needs to be read - As Recall requirement tends to 100, all the
input must be read and no objects can be ignored
68Varying Selectivity
- Input has 20 YES, 20 MAYBE objects
- 90 Precision, 50 Recall, and laxity50 is
requested - Cost increases as selectivity increases, since
more objects need to be output
69Varying Input Uncertainty
- Input has 20 YES, 20 MAYBE objects
- 90 Precision, 50 Recall, and laxity50 is
requested - When MAYBE objects are few, no probe cost needs
to be paid the few MAYBE objects can be ignored - When MAYBE objects are many, they cannot be
ignored (Recall might be violated), or forwarded
(Precision violated). Hence, they are probed,
increasing the cost
70Quasar Future Work
- Mapping application quality to data quality
- Other notions of quality (probabilistic, spatial
and temporal resolution) - Quality aware data collection
- Incorporating new notions of quality
- Fault tolerance
- Co-optimizing data collection and network routing
- Quality aware query processing
- More general class of SQL queries
71Plug-in (for UCI-ICS).
- Newly established school with following
departments - Computer Science, Informatics, Statistics
- 50 plus faculty currently, many open positions
- Many new developments
- Lot of new funding
- Cal-IT2 building well under way
- New ICS building just getting started
- Endowed chair search
- Approx. 2 million from anonymous donor
72Database Research _at_ UCI
- Folks C. Li, S. Mehrotra, P. Smyth, G. Tsudik,
N. Venkatasubramanian - Core Technologies
- Indexing, query processing, transactions,
distributed systems, grids - Service Model
- Exploring the privacy, performance, and
algorithmic challenges in providing databases as
an internet service - Customizable search and data analysis
- Customizing/personalizing search, flexible
similarity retrieval over structured,
semi-structured and unstructured data - Event-entity data management
- Extracting, representing, querying and analyzing
a web of entities and events from multimodal data - Data Cleansing
- Entity resolution, event resolution
- Information Integration
- Middleware for querying multiple heterogeneous
databases - Peer to Peer Systems
- Search, resource discovery, data integration
- Data Mining
- Discovering pattern/user models from digital
traces, customizing systems based on models - Sensor Databases
- Management of data in highly dynamic, resource
constrained environments
73Urban Crisis Response Center _at_ Cal-IT2
- NSF Large Information Technology Research Grant
- 12.5 Million Over Five Years
- Research collaboration led by UCI and UCSD
- UCI Sharad Mehrotra (PI) Co-PIs C. Butts, N.
Venkatasubramanian, R. Eguchi (ImageCat), M.
Winslettt (Univ. of Illinois) - UCSD Ramesh Rao (PI), Co-PIs B. Rao, M. Trivedi
- Research Team
- Government Partners
- City of LA, County of LA, City of Irvine, City of
San Diego, State of California
Now hiring postdocs, researchers, programmers,
students
www.ucrec.net -- coming soon!
74Multidisciplinary Research Agenda of UCREC
- Information Technology
- right information
- right person
- right time
- Social and Organizational Science
- The right context
- the distinctive nature of dynamic virtual
organizations - their information needs
- the social and cultural aspects of information
sharing across organizations and individuals