Title: Model-driven Data Acquisition in Sensor Networks
1Model-driven Data Acquisition in Sensor Networks
- Amol Deshpande1,4 Carlos Guestrin4,2 Sam
Madden4,3 - Joe Hellerstein1,4 Wei Hong4
- 1UC Berkeley 2Carnegie Mellon University
- 3MIT 4Intel Research -
Berkeley
2Sensor networks and distributed systems
- A collection of devices that can sense, actuate,
and communicate over a wireless network
- Sensors for temperature, humidity, pressure,
sound, magnetic fields, acceleration, visible and
ultraviolet light, etc.
- Available resources
- 4 MHz, 8 bit CPU
- 40 Kbps wireless
- 3V battery (lasts days or months)
- Analogous issues in other distributed systems,
including streams and the Internet
3Real deployments
4Example Intel Berkeley Lab deployment
5AnalogySensor net as a database
- Data aggregation
- Can reduce communication
TinyDB
Query
SQL-style query
- Declarative interface
- Sensor nets are not just for PhDs
- Decrease deployment time
Every time step
6Limitations of existing approach
- Data collection
- Every node must wake up at every time step
- Data loss ignored
- No quality guarantees
- Data inefficient ignoring correlations
TinyDB
- Query distribution
- Every node must receive query
Query
New Query
SQL-style query
Every time step
7Sensor net data is correlated
- Data is not i.i.d. ? shouldnt ignore
missing data - Observing one sensor ? information about other
sensors (and future values) - Observing one attribute ? information about other
attributes
8Model-driven data acquisition overview
posterior belief
Probabilistic Model
- Strengths of model-based data acquisition
- Observe fewer attributes
- Exploit correlations
- Reuse information between queries
- Directly deal with missing data
- Answer more complex (probabilistic) queries
Query
New Query
SQL-style query with desired confidence
9Probabilistic models and queries
Users perspective
Query SELECT nodeId, temp 0.5C, conf(.95) FROM
sensors WHERE nodeId in 1..8
System selects and observes subset of
nodes Observed nodes 3,6,8
Query result
Node 1 2 3 4 5 6 7 8
Temp. 17.3 18.1 17.4 16.1 19.2 21.3 17.5 16.3
Conf. 98 95 100 99 95 100 98 100
10Probabilistic models and queries
Joint distribution P(X1,,Xn)
Prob. below 1-??
- Learn from historical data
Higher prob., could answer query
11Dynamic models filtering
Joint distribution at time t
Fewer obs. in future queries
- Example Kalman filter
- Learn from historical data
Observe attributes Example Observe X118
12Supported queries
- Value query
- Xi ? with prob. at least 1-?
- SELECT and Range query
- Xi?a,b with prob. at least 1-?
- which sensors have temperature greater than 25C
? - Aggregation
- average ? of subset of attribs. with prob. gt
1-? - combine aggregation and selection
- probability gt 10 sensors have temperature greater
than 25C ?
- Queries require solution to integrals
- Many queries computed in closed-form
- Some require numerical integration/sampling
13Model-driven data acquisition overview
posterior belief
Probabilistic Model
What sensors do we observe ? How do we collect
observations?
Query
SQL-style query with desired confidence
Data gathering plan
14Acquisition costs
- Attributes have different acquisition costs
- Exploit correlation through probabilistic model
- Must consider networking cost
15Network model and plan format
2
8
7
Goal Find subset S that is sufficient to answer
query at minimum cost C(S )
1
9
6
3
12
4
5
10
11
Cost of collecting subset S of sensor values
C(S )
Ca(S )
Ct(S )
- Assume known (quasi-static) network topology
- Define traversal using (1.5-approximate) TSP
- Ct(S ) is expected cost of TSP (lossy
communication)
16Choosing observation plan
Is a subset S sufficient?
Xi2a,b with prob. gt 1-?
17BBQ system
- Multivariate Gaussians
- Learn from historical data
posterior belief
Probabilistic Model
- Exhaustive or greedy search
- Factor 1.5 TSP approximation
Query
SQL-style query with desired confidence
Data gathering plan
- Equivalent to Kalman filter
- Simple matrix operations
18Experimental results
- Redwood trees and Intel Lab datasets
- Learned models from data
- Static model
- Dynamic model Kalman filter, time-indexed
transition probabilities - Evaluated on a wide range of queries
19Cost versus Confidence level
20Obtaining approximate values
Query True temperature value epsilon with
confidence 95
21Approximate range queries
Query Temperature in T1,T2 with confidence 95
22Comparison to other methods
23Intel Lab traversals
24BBQ system
- Multivariate Gaussians
- Learn from historical data
posterior belief
Probabilistic Model
- Exhaustive or greedy search
- Factor 1.5 TSP approximation
Query
- Extensions
- More complex queries
- Other probabilistic models
- More advanced planning
- Outlier detection
- Dynamic networks
- Continuous queries
-
SQL-style query with desired confidence
Data gathering plan
- Equivalent to Kalman filter
- Simple matrix operations
25Conclusions
- Model-driven data acquisition
- Observe fewer attributes
- Exploit correlations
- Reuse information between queries
- Directly deal with missing data
- Answer more complex (probabilistic) queries
- Basis for future sensor network systems