Scalable data handling in sensor networks

About This Presentation

Title:

Scalable data handling in sensor networks

Description:

Data challenges in high-bandwidth sensor networks ... Directed Diffusion (Heidemann, Estrin), TinyDB (Madden), Cougar (Bonnet) 24. PROGRESSIVELY AGE ... – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 67

Provided by: deepakg

Category:

more less

Transcript and Presenter's Notes

Title: Scalable data handling in sensor networks

1
Scalable data handling in sensor networks

Deepak Ganesan
Collaborators Ben Greenstein, Denis
Perelyubskiy, Deborah Estrin (UCLA) , John
Heidemann, Ramesh Govindan (USC/ISI)

2
Outline

Data challenges in high-bandwidth sensor networks
Transitioning from data acquisition systems to
distributed storage and search
Economical wireless data acquisition systems
using motes.
Long-lived, distributed storage and search
systems
Other notable research directions

3
Outline

Data challenges in high-bandwidth sensor networks
Instance Wireless structural monitoring
Transitioning from data acquisition systems to
distributed storage and search
Generation I Wireless data acquisition systems
Goal
Proposed solution Progressive, on-demand data
acquisition
Performance analysis over structural vibration
data.
Generation II Long-lived, query-response systems
New scaling challenge Storage
In-network storage and processing techniques
Performance Analysis
Other research directions
Optimizing node placement and transmission
structure for data gathering
Complexity at scale Large-scale network
measurement

4
Scaling high-bandwidth sensor network deployments

We have made a good start at building scalable,
long-term wireless sensor network deployments
that deal with low-bandwidth, low-duty rate
applications.
Micro-climate monitoring system at James Reserve
(CENS-UCLA), Bird monitoring at Great Duck Island
(Intel-U.C.Berkeley)
low-data rate (few samples/minute), medium-scale
(100s of nodes) deployments.
Duty-cycling low-power listen/transmit, simple
aggregation schemes (TinyDiffusion/TinyDB).
We have very little understanding of how to scale
high-bandwidth sensor network applications
(involving vibration/acoustic/image sensors)
where significant data rates can be expected.

5
Challenges in Wireless Structural Monitoring

High Data Rates
100Hz, 16bit sample, 15min shaking events.
Resource-constrained motes
6MHz processor, 4KB RAM, 4MB Flash Memory (40
mins of vibration data)
Diverse user requirements
Data collection of interesting event signatures
of vibration events.
Analysis of data over different time-scales
(long-term and short-term patterns)
State of Art Expensive wireless data acquisition
systems using 802.11

6
Transitioning from centralized to distributed
storage and search.
Multi-hop Wireless Data Acquisition using Motes
Distributed In-network storage and search
Current Data Acquisition Systems
7
Transitioning from centralized to distributed
storage and search.
Multi-hop Wireless Data Acquisition using Motes
Distributed In-network storage and search
Current Data Acquisition Systems
8
Transitioning from centralized to distributed
storage and search.
Multi-hop Wireless Data Acquisition using Motes
Distributed In-network storage and search
Current Data Acquisition Systems
9
Building Multi-hop Wireless Data Acquisition
Systems Using Motes

Scaling challenges
Data collection from medium scale networks of
motes incurs high latency.
Data collection incurs large energy cost since
not all the data may be necessary for the user.
Lossless compression (Huffman coding) is
insufficient Only 2 fold reduction in data size

15 minutes of vibration event data (200KB each)
from a 20 node multi-hop wireless network takes
4-8 hours to collect centrally!
10
Progressive, On-demand Data Collection

Low-latency low-resolution data acquisition
immediately after vibration activity.
User can analyze low-resolution data to determine
nodes from which higher resolution data is
required.
Lossless data is collected from a subset of nodes
on-demand before data is phased out from local
stores

Low-resolution data for 15 minutes of vibration
event data can be collected within 15-30 minutes
of event occurrence
11
Wavelets for lossy compression

Why Wavelets?
Wavelets preserve spatio-temporal features
(edges, discontinuities) while providing good
approximation of long-term trends in data
Significant work on seismic data compression has
obtained good performance with wavelet
compression.
Two Questions
Is wavelet compression appropriate for structural
vibration data?
Performance metric Compression ratio, error
(rms, psnr)
Can we implement a resource-constrained by
efficient implementation on resource-constrained
devices Motes?
Study performed on structural vibration data from
shaker table tests
CUREE-Kajima Joint Research Program, UCLA -
Thomas Kang, John Wallace

12
Feasibility Analysis

Signal decomposition suggests that it is highly
appropriate for wavelet compression.

13
Mote implementation of Wavelet Codec
14
Progressive Transmission Detection
An event detection module executes a simple
threshold filter to detect vibration events.
15
Progressive Transmission Decomposition
A detected event sequence is decomposed using an
efficient integer wavelet transform.
16
Progressive Transmission Thresholding and
Quantization
The decomposed signal is thresholded and
quantized to reduce number of bits per sample.
17
Progressive Transmission Reconstruction
A run-length encoder exploits runs of zeros and
sends a packed bitstream to the base-station
18
Compression Ratio and Error for Mote
Implementation
17-fold reduction in data size with an RMS error
of 3.1 (PSNR 30dB)

Good compression ratios can be achieved with low
error

19
Transitioning towards long-term deployments

We achieved low-latency wireless
data-acquisition, but our deployment lifetimes
were still short.
Data Acquisition systems with motes can last for
a few weeks.
How do our system objectives change for a
long-term deployment?
Need to achieve very low energy usage for long
lifetime
system focus has to shift from data collection to
in-network data storage and search.
Need smooth transition for researchers who have
depended on data collection systems
system should retain ability to collect new event
signatures on demand.
Goal Build gracefully degrading storage system
with efficient drill-down search facility.

20
Why is gracefully degrading in-network storage
the right paradigm?

Support for on-demand data acquisition.
Intuitive approach for dealing with

21
Can existing storage and search systems satisfy
design goals?
22
How can we achieve gracefully degrading long-term
storage?

Exploit spatio-temporal correlation to reduce
data.
Exploit distributed storage capacity of sensor
network.
large distributed storage, although limited local
storage.
Store data at multiple resolutions to tradeoff
query quality for storage requirement.
Lower resolution data offers lower query quality
but incurs less storage overhead, and vice-versa.
Exploit low cost of drill-down query processing.
allow approximate query processing that obtain
sufficiently accurate responses.

23
Related Work

Data Storage
Event Storage DCS (Ratnasamy Hotnets 2000)
Indexing schemes DIMS (Li Sensys 2003), DIFS
(Greenstein SPNA 2002)
Multi-resolution computation
Beyond Average (Hellerstein IPSN 2003)
Edge detection (Nowak IPSN 2003)
Sensor network databases
Directed Diffusion (Heidemann, Estrin), TinyDB
(Madden), Cougar (Bonnet)

24
DIMENSIONS Design Key Ideas

Construct distributed load-balanced quad-tree
hierarchy of lossy wavelet-compressed summaries
corresponding to different resolutions and
spatio-temporal scales.
Queries drill-down from root of hierarchy to
focus search on small portions of the network.
Progressively age summaries for long-term storage
and graceful degradation of query quality over
time.

Level 2
Level 1

Level 0
PROGRESSIVELY AGE
PROGRESSIVELY LOSSY
25
Constructing the hierarchy
Initially, nodes fill up their own storage with
raw sampled data.
26
Constructing the hierarchy

Tesselate the network space into grids, and hash
in each to determine location of clusterhead
(ref DCS).
Send wavelet-compressed local time-series to
clusterhead.

27
Processing at each level
Store incoming summaries locally for future
search.

Get compressed summaries from children.
time
Decode
Re-encode at lower resolution and forward to
parent.
y
x
Wavelet encoder/decoder
28
Constructing the hierarchy
Recursively send data to higher levels of the
hierarchy.
29
Distributing storage load
Hash to different locations over time to
distribute load among nodes in the network.
30
What happens when storage fills up?

Eventually, all available storage gets filled,
and we have to decide when and how to drop
summaries.
Allocate storage to each resolution and use each
allocated storage block as a circular buffer.

31
Tradeoff between Age and Storage requirements for
summary

Graceful Query Degradation Provide more accurate
responses to queries on recent data and less
accurate responses to queries on older data.

How do we allocate storage at each node to
summaries at different resolutions to provide
gracefully degrading storage and search
capability?
32
Match system performance to user requirements
95
Query Accuracy
50
Quality Difference
past
Time
present

Objective Minimize worst case difference between
user-desired query quality (blue curve) and query
quality that the system can provide (red step
function).

33
What do we know?

Given
N sensor nodes.
Each node has storage capacity, S.
Data is generated at resolution i at rate Ri.
Quser - User-desired quality degradation.
We might be provided
a set of typical queries, T, that the user
provides.
D(q,k) Query Error when drilldown for query q
terminates at level k.

34
Determining Query Quality from multiple queries
Max Query Find the node which has the maximum
precipitation in January.
50
Error
Edge Query Find nodes along a boundary between
high and low precipitation areas.
5
Only coarsest summary is queried.
All resolutions (from coarsest to finest) are
queried
We need to translate the performance of different
Drill-down queries to a single query quality
metric.
35
Definition Query Quality

Given
T set of typical queries.
D(q,k) Query error when drill-down for query q
in set T terminates at resolution k.
The query quality for queries that refer to data
at time t in the past, Qsystem(t), if k is the
finest available resolution is

36
How many levels of resolution, k are available at
time t ?

Given
Ri Total transmitted data rate from level i
clusterheads to level i1 clusterheads.
Define si Storage allocated to each node for
summaries at resolution i.

Level i1
Level i
37
Storage Allocation Constraint-Optimization
problem

Objective Find si, i1..log4N that
Given constraints
Storage constraint Each node cannot store any
greater than its storage limit.
Drill-down constraint It is not useful to store
finer resolution data if coarser resolutions of
the same data is not present.

38
Determining Rate and Drilldown query error
How do we determine communication rates to, say,
bound query error?

Assume Rates are fixed a-priori by communication
constraints.

How do we determine the drill-down query error
when prior information about deployment and data
is limited?
39
Prior information about sampled data
full a priori information
Omniscient Strategy Baseline. Use all data to
decide optimal allocation.
Solve Constraint Optimization
Training Strategy (can be used when small
training dataset from initial deployment).
1 2 4
Greedy Strategy (when no data is available, use a
simple weighted allocation to summaries).
Finer
Finest
Coarse
No a priori information
40
Distributed trace-driven implementation

Linux implementation for ipaq-class nodes
uses Emstar (J. Elson et al), a Linux-based
emulator/simulator for sensor networks.
3D Wavelet codec based on freeware by Geoff Davis
available at http//www.geoffdavis.net.
Query processing in Matlab.
Geo-spatial precipitation dataset
15x12 grid (50km edge) of precipitation data from
1949-1994, from Pacific Northwest. (Caveat Not
real sensor data).
System parameters
compression ratio 6122448.
Training set 6 of total dataset.

M. Widmann and C.Bretherton. 50 km resolution
daily precipitation for the Pacific Northwest,
1949-94.
41
How efficient is search?
Search is very efficient (lt5 of network queried)
and accurate for different queries studied.
42
Comparing Aging strategies
Training performs within 1 to optimal . Careful
selection of parameters for the greedy algorithm
can provide surprisingly good results (within
2-5 of optimal).
43
Other Research Directions
44
Coupling with real world makes spatial
irregularity inevitable

Terrain and deployment practicalities
Deployments will be biased closer to power
sources, GPS access, communication access etc.
Resource limitations Given a limited number of
nodes, more nodes will be deployed in regions
where greater sensing variability is expected.
There will be built environments where spatial
regularity is feasible (building monitoring), but
outdoor placements will be largely irregular.

Node placement at James Reserve
45
Spatial aggregate Multi-resolution views using
wavelets

Communicate wavelet-compressed low-resolution
view of data to the user.
Nearest neighbor re-sampling
Efficient energy-wise.
Can introduce artifacts and be generally
ineffective in highly irregular settings.
Interpolated Wavelet Lifting to handle highly
irregular settings

50
Highly Irregular case
Error
Regular case
5
Coarsest
Finest
Resolution Queried
Ganesan et al An evaluation of multi-resolution
storage for sensor networks Sensys 2003
Debauchies, Guskov, Schroder, Sweldons Wavelets
on irregular point sets
46
Summary

Provide smooth transition from current data
acquisition systems to fully distributed storage
and search systems.
Progressive aging of summaries can be used to
support long-term spatio-temporal queries in
resource-constrained sensor network deployments.
We describe two algorithms a training-based
algorithm that relies on the availability of
training datasets, and a greedy algorithm can be
used in the absence of such data.
Our results show that
training performs close to optimal for the
dataset that we study.

47
Optimizing node placement and transmission
structure for data gathering

A user has a bag of n nodes. He/She needs to
place the nodes in a region R such that the
sensed field can be reconstructed with
maximum distortion for any point in R is less
than ?max
Average distortion over the entire region is less
than ?avg

Problem How does he/she place the nodes and
construct their communication structure for data
gathering to a sink such that the total multi-hop
communication power is minimized?

48
Complexity
49
Complexity at scale Large-scale network
measurement
50
New challenges in sensor networks

Other systems have a combination of different
constraints massive data storage, distributed
search,
BUT
Not Low power,
Insufficient capacity local storage for
high-bandwidth applications, latency/availability
may not be dominant concern.
Correlated sensor data can be exploited
the greedy algorithm performs well for a
well-chosen summary weighting parameter.

51
Why is search and storage in sensor networks
different?

Wide-area distributed storage mechanisms
(Oceanstore, Chord, CAN) ensure persistent
storage of massive data, and ensure low-latency,
high availability, robustness and load-balancing.
BUT
Some important constraints are different
Low power utilization, Insufficient capacity
local storage for high-bandwidth applications,
latency/availability may not be dominant concern.
New aspects of system and data can be exploited
Correlated sensor data (environmental
monitoring), Deployed for limited set of tasks.

52
Data challenges in structural monitoring
Example Seismic Network Sampling Rate
100Hz Event Data Rate 100s of KB per day/node
A. Store data centrally
B. Store data locally
C. Multi-resolution storage
53
Data challenges in structural monitoring
Example Seismic Network Sampling Rate
100Hz Event Data Rate 100s of KB per day/node
A. Store data centrally
B. Store data locally
C. Multi-resolution storage
54
Centralized Data Collection
Method Transmit all data out of network and
store in a centralized database. Advantage
Centralized, persistent storage and unconstrained
search. Disadvantage Power-inefficient, high
latency due to bandwidth constraints. Example
First Generation multi-hop wireless data
acquisition systems.
A. Store data centrally
B. Store data locally
C. Multi-resolution storage
55
Local storage with distributed indexing
Method Store data locally, and construct
distributed index structures to reduce search
cost. DCS (Ratnasamy Hotnets 2000), DIMS (Li
Sensys2003), DIFS (Greenstein
SPNA2002) Advantage Low communication overhead,
efficient search. Disadvantage Short-term use
when local storage capacity is limited.
A. Store data centrally
B. Store data locally
C. Multi-resolution storage
56
Distributed Indexing Explained
What is the maximum precipitation between
Sept-Dec 2002?
Direct query to quadrant that best matches
query
Caveat Assumes that nodes have sufficient local
storage capacity
57
Distributed Indexing Explained
Caveat Assumes that nodes have sufficient local
storage capacity
58
Distributed Indexing Explained
Caveat Assumes that nodes have sufficient local
storage capacity
59
What is the maximum precipitation between
Sept-Dec 2002?
Direct query to quadrant that best matches
query
raw data
60
Progressive Local Data Storage

Apply progressive coding strategy to local
storage.
Store the data at multiple resolutions locally
and phase out data at different resolutions at
different rates.

How much scaling does this provide?
How to determine the aging periods of different
resolutions?

61
How much scaling can be achieved by progressively
lossy local storage?

Experiments tbd

62
Distributed storage and indexing
Method Provide gracefully degrading storage and
query quality over time. Advantage Long-term
storage in storage-constrained networks,
efficient search. Disadvantage More
communication overhead than (B).
A. Store data centrally
B. Store data locally
C. Multi-resolution storage
63
Storage challenges in seismic monitoring
Method Transmit all data out of network and
store in a centralized database. Advantage
Centralized, persistent storage and unconstrained
search. Excellent initial deployment and
debugging tool. CENS deployments are
currently primarily data-gathering
based. Disadvantage Power-inefficient, high
latency due to bandwidth constraints.
A. Store data centrally
B. Store data locally
C. Multi-resolution storage
64
Storage challenges in seismic monitoring
Method Store data locally, and construct
distributed index structures to reduce search
cost. DCS (Ratnasamy Hotnets 2000), DIMS (Li
Sensys2003), DIFS (Greenstein SPNA
2002) Advantage Low communication overhead,
efficient search. Disadvantage Short-term use
when local storage capacity is limited.
A. Store data centrally
B. Store data locally
C. Multi-resolution storage
65
Storage challenges in seismic monitoring
Method Provide gracefully degrading storage and
query quality over time. Advantage Long-term
storage in storage-constrained networks,
efficient search. Disadvantage More
communication overhead than (B).
A. Store data centrally
B. Store data locally
C. Multi-resolution storage
66
Can existing storage and search systems satisfy
design goals?
WSN
Geo-spatial Data Mining
Exploited Data Correlation
Decentralization
P2P systems Web Caches
Centralized Data Gathering
Storage Utilization

Write a Comment

User Comments (0)