Scalable data handling in sensor networks - PowerPoint PPT Presentation

1 / 47

About This Presentation

Title:

Scalable data handling in sensor networks

Description:

Transitioning from data acquisition systems to distributed ... Directed Diffusion (Heidemann, Estrin), TinyDB (Madden), Cougar (Bonnet) 18. PROGRESSIVELY AGE ... – PowerPoint PPT presentation

Number of Views:23

Avg rating:3.0/5.0

Slides: 48

Provided by: deepakg

Category:

more less

Transcript and Presenter's Notes

Title: Scalable data handling in sensor networks

1
Scalable data handling in sensor networks

Deepak Ganesan
Collaborators Ben Greenstein, Denis
Perelyubskiy, Deborah Estrin (UCLA) , John
Heidemann, Ramesh Govindan (USC/ISI)

2
Outline

Data challenges in high-bandwidth sensor networks
Instance Wireless structural monitoring
Transitioning from data acquisition systems to
distributed storage and search
Generation I Economical wireless data
acquisition systems using motes Under
Preparation
Performance analysis over structural vibration
data.
Generation II Long-lived, distributed storage
and search systems Sensys03
Performance analysis over geo-spatial data
Other research directions
Optimal node placement and transmission structure
under distortion bounds. IPSN04

3
Scaling high-bandwidth wireless sensor network
deployments

We have made a good start at building scalable,
long-term sensor network deployments that deal
with low data rate applications.
Notable Examples
Micro-climate monitoring system at James Reserve
(CENS-UCLA), Bird monitoring at Great Duck Island
(Intel-U.C.Berkeley)
Characteristics
low-data rate (few samples/minute), medium-scale
(100s of nodes) deployments.
Scaling techniques
Duty-cycling low-power listen/transmit, simple
aggregation schemes (TinyDiffusion/TinyDB).
We have very little understanding of how to scale
high-bandwidth sensor network applications
(involving vibration/acoustic/image sensors)
where significant data rates can be expected.
How do we deal with applications that have
predominantly relied on data collection?

4
Challenges in Wireless Structural Monitoring
need more

High Data Rates
100Hz, 16bit sample, 15min shaking events.
Resource-constrained motes
6MHz processor, 4KB RAM, 4MB Flash Memory (40
mins of vibration data)
Diverse user requirements
Data collection of interesting event signatures
of vibration events.
Analysis of data over different time-scales
(long-term and short-term patterns)
State of Art Expensive wireless data acquisition
systems using 802.11

5
Transitioning from centralized to distributed
storage and search.
Method Wireless/Wired data acquisition
systems Advantage Centralized, persistent
storage and unconstrained search. Disadvantage
Expensive, Cumbersome, Highly power-inefficient.
Multi-hop Wireless Data Acquisition using Motes
Distributed In-network storage and search
Current Data Acquisition Systems
6
Transitioning from centralized to distributed
storage and search.
Method Sensor node-based multi-hop data
acquisition systems Advantage Cheap, Easy to
use, centralized storage, more scalable Disadvanta
ge Power-inefficient.
Multi-hop Wireless Data Acquisition using Motes
Distributed In-network storage and search
Current Data Acquisition Systems
7
Transitioning from centralized to distributed
storage and search.
Method Distributed Storage and Search Advantage
Power Efficient, Flexible use Disadvantage
Non-persistent, Restricted storage and search
Multi-hop Wireless Data Acquisition using Motes
Distributed In-network storage and search
Current Data Acquisition Systems
8
Building Multi-hop Wireless Data Acquisition
Systems Using Motes

Goals
Near real-time monitoring
Reliable, synchronized data transfer
Challenge
Limited network bandwidth, hence high latency.
How can we build a low-latency data-acquisition
system?

15 minutes of vibration event data (100KB each
after Huffman coding) from a 20 node multi-hop
wireless network takes 4-8 hours to collect
centrally!
9
Progressive, On-demand Data Collection

Progressive Data Acquisition
Each node stores its data in its local storage
and transmits low-resolution summaries to the
base-station immediately after event.
User can analyze low-resolution data to determine
nodes from which higher resolution data is
required.
Lossless data is collected from a subset of nodes
on-demand within a window of time (before being
phased out of nodes local storage)
What did we achieve?
Low-latency lossy data acquisition
Lossless data acquisition on-demand.

Low-resolution data for 15 minutes of vibration
event data can be collected within 15-30 minutes
of event occurrence
10
Performance Evaluation

Choice of compression scheme
Appropriateness for structural vibration data
Performance metric Compression ratio, error
(rms, psnr)
Efficient implementation on resource-constrained
devices Motes
Power, memory and processing time
Study performed on structural vibration data from
shaker table tests
CUREE-Kajima Joint Research Program, UCLA -
Thomas Kang, John Wallace

11
Why wavelets?

Most of the energy is concentrated in the lower
frequency subbands.
Signal decomposition suggests that it is highly
appropriate for wavelet compression.

12
Mote implementation of Wavelet Codec
13
Compression Ratio and Error for Mote
Implementation
17-fold reduction in data size with an RMS error
of 3.1 (PSNR 30dB)

Good compression ratios can be achieved with low
error

14
Transitioning towards long-term deployments

We achieved low-latency wireless
data-acquisition, but our deployment lifetimes
were still short.
Data Acquisition systems with motes can last for
a few weeks.
How do our system objectives change for a
long-term deployment?
Need smooth transition for researchers who have
depended on data collection systems
system should retain ability to collect new event
signatures on demand.
Need to achieve very low energy usage for long
lifetime
system focus has to shift from data collection to
in-network data storage and search.
Goal Build a networked storage and search system

15
Can existing storage and search systems satisfy
design goals?
16
Approach Provide a gracefully degrading storage

A distributed sensor network is a collection of
nodes sensing spatio-temporally correlated data
and possessing a comparatively larger distributed
storage facility.
A gracefully degrading storage model provides two
benefits
Retains the ability to gather data on-demand.
Offers tradeoff between resolution and query
accuracy Lower resolution data offers lower
query quality but incurs less storage overhead,
and vice-versa.
Questions
How do we build a gracefully degrading networked
store?
Can we efficiently query the distributed data
store?

17
Related Work

Data Storage in sensor networks
Event Storage DCS (Ratnasamy Hotnets 2000)
Indexing schemes DIMS (Li Sensys 2003), DIFS
(Greenstein SPNA 2002)
Multi-resolution computation
Beyond Average (Hellerstein IPSN 2003)
Edge detection (Nowak IPSN 2003)
Wavelet-based compression
Structural-health monitoring (Lynch-2003)
Sensor network databases
Directed Diffusion (Heidemann, Estrin), TinyDB
(Madden), Cougar (Bonnet)

18
Key Design Ideas

Construct distributed load-balanced quad-tree
hierarchy of lossy wavelet-compressed summaries
corresponding to different resolutions and
spatio-temporal scales.
Queries drill-down from root of hierarchy to
focus search on small portions of the network.
Progressively age summaries for long-term storage
and graceful degradation of query quality over
time.

Level 2
Level 1

Level 0
PROGRESSIVELY AGE
PROGRESSIVELY LOSSY
19
Constructing the hierarchy
Initially, nodes fill up their own storage with
raw sampled data.
20
Constructing the hierarchy

Tesselate the network space into grids, and hash
in each to determine location of clusterhead
(ref DCS).
Send wavelet-compressed local time-series to
clusterhead.

21
Processing at each level
Store incoming summaries locally for future
search.

Get compressed summaries from children.
time
Decode
Re-encode at lower resolution and forward to
parent.
y
x
Wavelet encoder/decoder
22
Constructing the hierarchy
Recursively send data to higher levels of the
hierarchy.
23
Distributing storage load
Hash to different locations over time to
distribute load among nodes in the network.
24
Drill-down query processing
User hashes to location where the root is
located. The drill-down query is routed down till
it reaches base.
25
Designing an aging policy for summaries

Eventually, all available storage gets filled,
and we have to decide when and how to drop
summaries.

Local Storage Allocation
Res 3
Res 1
Res 2
Local storage capacity
How do we allocate storage at each node to
summaries at different resolutions to provide
gracefully degrading storage and search
capability?
26
Match system performance to user requirements
95
Query Accuracy
50
Quality Difference
past
Time
present

Objective Minimize worst case difference between
user-desired query quality (blue curve) and query
quality that the system can provide (red step
function).

27
How do we determine the step function?

Height What is the dip in query accuracy when
resolution i becomes unavailable?
What types of queries are being posed (T)?
For each query, q, what is the expected query
error when drill-down queries terminate at level
i1, Error(q,i) ?

Width How long is resolution i stored within the
network before being aged?
Storage allocated to resolution i at each node
(Si)
Total number of nodes in the network (N)
What rate is assigned to resolution i (Ri)?

28
Storage Allocation Constraint-Optimization
problem

Objective Find si, i1..log4N that
Given constraints
Storage constraint Each node cannot store any
greater than its storage limit.
Drill-down constraint It is not useful to store
finer resolution data if coarser resolutions of
the same data is not present.

29
Determining Rate and Drilldown query error
How do we determine communication rates?

Assume Rates are fixed a-priori by
communication/network lifetime constraints.

How do we determine the drill-down query error
when prior information about deployment and data
is limited?
30
Prior information about sampled data
full a priori information
Omniscient Strategy Infeasible. Use all data to
decide optimal allocation.
Solve Constraint Optimization
Training Strategy (can be used when small
training dataset from initial deployment).
1 2 4
Greedy Strategy (when no data is available, use a
simple weighted allocation to summaries).
Finer
Finest
Coarse
No a priori information
31
Distributed trace-driven implementation

Linux implementation for ipaq-class nodes
uses Emstar (cite below), a Linux-based
emulator/simulator for sensor networks.
3D Wavelet codec based on freeware by Geoff Davis
available at http//www.geoffdavis.net.
Query processing in Matlab.
Geo-spatial precipitation dataset
15x12 grid (50km edge) of precipitation data from
1949-1994, from Pacific Northwest. (Caveat Not
real sensor data).
System parameters
compression ratio 6122448.
Training set 6 of total dataset.

M. Widmann and C.Bretherton. 50 km resolution
daily precipitation for the Pacific Northwest,
1949-94.
32
Queries posed over precipitation data

Use queries at different spatio-temporal scales
to evaluate the performance of schemes
Choosing a Query Set
GlobalYearlyEdge look for spatio-temporal
feature (edge between high and low precipitation
areas).
LocalYearlyMean fine spatial and coarse temporal
granularity
GlobalDailyMax coarse spatial and fine temporal
granularity
GlobalYearlyMax coarse spatio-temporal
granularity

33
How efficient is search?
Search is very efficient (lt5 of network queried)
and accurate for different queries studied.
34
Comparing Aging Schemes
Training performs within 1 to optimal . Results
with greedy algorithm are sensitive to weights.
35
Summary

Provide smooth transition from current data
acquisition systems to fully distributed storage
and search systems.
Use progressive transmission wireless
data-acquisition systems as intermediate step
Support long-term storage and querying in
resource-constrained sensor network deployments.
Summarization and in-network storage of data
Training-based optimization to determine system
parameters.

36
Power-Efficient Sensor Placement and Transmission
Structure for Data Gathering under Distortion
Constraints

Collaborators Razvan Cristescu, Baltasar
Beferrul-Lozano (EPFL, Switzerland)
to appear at IPSN 2004

37
Problem Motivation and Description

Motivation
The vision of thousands of 10 nodes is
unrealistic in the near (10 year) term due to
economies of scale and cost of sensors.
Need to add constraint of limited nodes to
optimization.
A user has a bag of N nodes. He/She needs to
place the nodes in a region A such that the
sensed field can be reconstructed with
maximum distortion for any point in A is less
than Dmax
Average distortion over the entire region is less
than Davg
How does the user place the nodes and construct
their communication structure for data gathering
to a sink such that the total multi-hop
communication power is minimized?

38
Complexity of the problem

Interplay of two difficult problems
Find feasible placements that satisfy distortion
bounds.
Find most energy-efficient transmission
structures for each placement (NP-complete)
Simple Example Given configurations I and II,
which would you choose?
Node B is closer to the base-station, hence
transmits its data over less distance
Node B is close to A, therefore, better
correlated. A can jointly compress their data
which will result in lower energy overhead.
Optimal solution is involves finding the most
power-efficient transmission structure among all
feasible placements and possible transmission
structures.

I
II
39
Model and Assumptions

Sensing Model
Jointly Gaussian model for spatial data with
exponential decaying covariance function.
Data aggregation model
Each node on tree jointly compress data from its
entire sub-tree (eg Huffman/Arithmetic coding)
Sink data reconstruction model
Nearest neighbor reconstruction is used to
reconstruct the field given a set of sampled
points.
Communication model
Power-per-bit varies super-linearly with
separation between transmitter and receiver

40
Model and Assumptions

Data Correlation Model
Jointly Gaussian model for spatial data
Sink data reconstruction model
Nearest neighbor reconstruction is used to
reconstruct the field given a set of sampled
points.
Data aggregation model
Each node on tree jointly compresses data from
its entire sub-tree jointly
Communication model
Path-loss model

Jointly Gaussian model for spatial data, X,
measured at nodes, with an N-dimensional
multivariate normal distribution Gn(µ,K)

Covariance matrix
41
Model and Assumptions

Data Correlation Model
Jointly Gaussian model for spatial data
Sink data reconstruction model
Nearest neighbor reconstruction is used to
reconstruct the field given a set of sampled
points.
Data aggregation model
Each node on tree jointly compresses data from
its entire sub-tree jointly
Communication model
Path-loss model

42
Model and Assumptions

Data Correlation Model
Jointly Gaussian model for spatial data
Sink data reconstruction model
Nearest neighbor reconstruction is used to
reconstruct the field given a set of sampled
points.
Data aggregation model
Each node on tree jointly compresses data from
its entire sub-tree jointly
Communication model
Path-loss model

43
Model and Assumptions

Data Correlation Model
Jointly Gaussian model for spatial data
Sink data reconstruction model
Nearest neighbor reconstruction is used to
reconstruct the field given a set of sampled
points.
Data aggregation model
Each node on tree jointly compresses data from
its entire sub-tree jointly
Communication model
Path-loss model

d
? 2 in free space, 2 lt ? lt 4 typically
44
Optimization Problem 1-D Case

Minimize total power
Subject to
Maximum distortion constraint
Average distortion constraint
Total area coverage constraint
Solve using Lagrangian relaxation and numerical
constraint-optimization solving

45
Extend results to 2-D instance

Construct a wheel, with nodes on each radial
spoke being placed optimally using our 1D
placement solution.
Additional constraints
Given N nodes, how do we decide number of nodes
per spoke and number of spokes?
How do we ensure that Voronoi cells satisfy the
average and max distortion bounds?

46
Performance gains over uniformly random placement
and Shortest-path trees

One dimensional placement
1-3 fold reduction in power consumption for 10-20
node linear placements
Two dimensional placement of 100-200 nodes
Typically one order of magnitude reduction in
total power consumption.
Two orders of magnitude reduction in bottleneck
energy consumption (i.e. for node near sink)!
Other interesting observations
Network implodes i.e. with such placement, the
farthest nodes from the base station are the
first to die and nodes nearest to the sink are
the last to die.
This is the behavior that we need since nodes
near the sink are most important for routing.