Title: Scalable data handling in sensor networks
1Scalable data handling in sensor networks
- Deepak Ganesan
- Collaborators Ben Greenstein, Denis
Perelyubskiy, Deborah Estrin (UCLA) , John
Heidemann, Ramesh Govindan (USC/ISI)
2Outline
- Data challenges in high-bandwidth sensor networks
- Transitioning from data acquisition systems to
distributed storage and search - Economical wireless data acquisition systems
using motes. - Long-lived, distributed storage and search
systems - Other notable research directions
3Outline
- Data challenges in high-bandwidth sensor networks
- Instance Wireless structural monitoring
- Transitioning from data acquisition systems to
distributed storage and search - Generation I Wireless data acquisition systems
- Goal
- Proposed solution Progressive, on-demand data
acquisition - Performance analysis over structural vibration
data. - Generation II Long-lived, query-response systems
- New scaling challenge Storage
- In-network storage and processing techniques
- Performance Analysis
- Other research directions
- Optimizing node placement and transmission
structure for data gathering - Complexity at scale Large-scale network
measurement
4Scaling high-bandwidth sensor network deployments
- We have made a good start at building scalable,
long-term wireless sensor network deployments
that deal with low-bandwidth, low-duty rate
applications. - Micro-climate monitoring system at James Reserve
(CENS-UCLA), Bird monitoring at Great Duck Island
(Intel-U.C.Berkeley) - low-data rate (few samples/minute), medium-scale
(100s of nodes) deployments. - Duty-cycling low-power listen/transmit, simple
aggregation schemes (TinyDiffusion/TinyDB). - We have very little understanding of how to scale
high-bandwidth sensor network applications
(involving vibration/acoustic/image sensors)
where significant data rates can be expected.
5Challenges in Wireless Structural Monitoring
- High Data Rates
- 100Hz, 16bit sample, 15min shaking events.
- Resource-constrained motes
- 6MHz processor, 4KB RAM, 4MB Flash Memory (40
mins of vibration data) - Diverse user requirements
- Data collection of interesting event signatures
of vibration events. - Analysis of data over different time-scales
(long-term and short-term patterns) - State of Art Expensive wireless data acquisition
systems using 802.11
6Transitioning from centralized to distributed
storage and search.
Multi-hop Wireless Data Acquisition using Motes
Distributed In-network storage and search
Current Data Acquisition Systems
7Transitioning from centralized to distributed
storage and search.
Multi-hop Wireless Data Acquisition using Motes
Distributed In-network storage and search
Current Data Acquisition Systems
8Transitioning from centralized to distributed
storage and search.
Multi-hop Wireless Data Acquisition using Motes
Distributed In-network storage and search
Current Data Acquisition Systems
9Building Multi-hop Wireless Data Acquisition
Systems Using Motes
- Scaling challenges
- Data collection from medium scale networks of
motes incurs high latency. - Data collection incurs large energy cost since
not all the data may be necessary for the user. - Lossless compression (Huffman coding) is
insufficient Only 2 fold reduction in data size
15 minutes of vibration event data (200KB each)
from a 20 node multi-hop wireless network takes
4-8 hours to collect centrally!
10Progressive, On-demand Data Collection
- Low-latency low-resolution data acquisition
immediately after vibration activity. - User can analyze low-resolution data to determine
nodes from which higher resolution data is
required. - Lossless data is collected from a subset of nodes
on-demand before data is phased out from local
stores
Low-resolution data for 15 minutes of vibration
event data can be collected within 15-30 minutes
of event occurrence
11 Wavelets for lossy compression
- Why Wavelets?
- Wavelets preserve spatio-temporal features
(edges, discontinuities) while providing good
approximation of long-term trends in data - Significant work on seismic data compression has
obtained good performance with wavelet
compression. - Two Questions
- Is wavelet compression appropriate for structural
vibration data? - Performance metric Compression ratio, error
(rms, psnr) - Can we implement a resource-constrained by
efficient implementation on resource-constrained
devices Motes? - Study performed on structural vibration data from
shaker table tests - CUREE-Kajima Joint Research Program, UCLA -
Thomas Kang, John Wallace
12Feasibility Analysis
- Signal decomposition suggests that it is highly
appropriate for wavelet compression.
13Mote implementation of Wavelet Codec
14Progressive Transmission Detection
An event detection module executes a simple
threshold filter to detect vibration events.
15Progressive Transmission Decomposition
A detected event sequence is decomposed using an
efficient integer wavelet transform.
16Progressive Transmission Thresholding and
Quantization
The decomposed signal is thresholded and
quantized to reduce number of bits per sample.
17Progressive Transmission Reconstruction
A run-length encoder exploits runs of zeros and
sends a packed bitstream to the base-station
18Compression Ratio and Error for Mote
Implementation
17-fold reduction in data size with an RMS error
of 3.1 (PSNR 30dB)
- Good compression ratios can be achieved with low
error
19Transitioning towards long-term deployments
- We achieved low-latency wireless
data-acquisition, but our deployment lifetimes
were still short. - Data Acquisition systems with motes can last for
a few weeks. - How do our system objectives change for a
long-term deployment? - Need to achieve very low energy usage for long
lifetime - system focus has to shift from data collection to
in-network data storage and search. - Need smooth transition for researchers who have
depended on data collection systems - system should retain ability to collect new event
signatures on demand. - Goal Build gracefully degrading storage system
with efficient drill-down search facility.
20Why is gracefully degrading in-network storage
the right paradigm?
- Support for on-demand data acquisition.
- Intuitive approach for dealing with
21Can existing storage and search systems satisfy
design goals?
22How can we achieve gracefully degrading long-term
storage?
- Exploit spatio-temporal correlation to reduce
data. - Exploit distributed storage capacity of sensor
network. - large distributed storage, although limited local
storage. - Store data at multiple resolutions to tradeoff
query quality for storage requirement. - Lower resolution data offers lower query quality
but incurs less storage overhead, and vice-versa. - Exploit low cost of drill-down query processing.
- allow approximate query processing that obtain
sufficiently accurate responses.
23Related Work
- Data Storage
- Event Storage DCS (Ratnasamy Hotnets 2000)
- Indexing schemes DIMS (Li Sensys 2003), DIFS
(Greenstein SPNA 2002) - Multi-resolution computation
- Beyond Average (Hellerstein IPSN 2003)
- Edge detection (Nowak IPSN 2003)
- Sensor network databases
- Directed Diffusion (Heidemann, Estrin), TinyDB
(Madden), Cougar (Bonnet)
24DIMENSIONS Design Key Ideas
- Construct distributed load-balanced quad-tree
hierarchy of lossy wavelet-compressed summaries
corresponding to different resolutions and
spatio-temporal scales. - Queries drill-down from root of hierarchy to
focus search on small portions of the network. - Progressively age summaries for long-term storage
and graceful degradation of query quality over
time.
Level 2
Level 1
Level 0
PROGRESSIVELY AGE
PROGRESSIVELY LOSSY
25Constructing the hierarchy
Initially, nodes fill up their own storage with
raw sampled data.
26Constructing the hierarchy
- Tesselate the network space into grids, and hash
in each to determine location of clusterhead
(ref DCS). - Send wavelet-compressed local time-series to
clusterhead.
27Processing at each level
Store incoming summaries locally for future
search.
Get compressed summaries from children.
time
Decode
Re-encode at lower resolution and forward to
parent.
y
x
Wavelet encoder/decoder
28Constructing the hierarchy
Recursively send data to higher levels of the
hierarchy.
29Distributing storage load
Hash to different locations over time to
distribute load among nodes in the network.
30What happens when storage fills up?
- Eventually, all available storage gets filled,
and we have to decide when and how to drop
summaries. - Allocate storage to each resolution and use each
allocated storage block as a circular buffer.
31Tradeoff between Age and Storage requirements for
summary
- Graceful Query Degradation Provide more accurate
responses to queries on recent data and less
accurate responses to queries on older data.
How do we allocate storage at each node to
summaries at different resolutions to provide
gracefully degrading storage and search
capability?
32Match system performance to user requirements
95
Query Accuracy
50
Quality Difference
past
Time
present
- Objective Minimize worst case difference between
user-desired query quality (blue curve) and query
quality that the system can provide (red step
function).
33What do we know?
- Given
- N sensor nodes.
- Each node has storage capacity, S.
- Data is generated at resolution i at rate Ri.
- Quser - User-desired quality degradation.
- We might be provided
- a set of typical queries, T, that the user
provides. - D(q,k) Query Error when drilldown for query q
terminates at level k.
34Determining Query Quality from multiple queries
Max Query Find the node which has the maximum
precipitation in January.
50
Error
Edge Query Find nodes along a boundary between
high and low precipitation areas.
5
Only coarsest summary is queried.
All resolutions (from coarsest to finest) are
queried
We need to translate the performance of different
Drill-down queries to a single query quality
metric.
35Definition Query Quality
- Given
- T set of typical queries.
- D(q,k) Query error when drill-down for query q
in set T terminates at resolution k. - The query quality for queries that refer to data
at time t in the past, Qsystem(t), if k is the
finest available resolution is
36How many levels of resolution, k are available at
time t ?
- Given
- Ri Total transmitted data rate from level i
clusterheads to level i1 clusterheads. - Define si Storage allocated to each node for
summaries at resolution i.
Level i1
Level i
37Storage Allocation Constraint-Optimization
problem
- Objective Find si, i1..log4N that
- Given constraints
- Storage constraint Each node cannot store any
greater than its storage limit. - Drill-down constraint It is not useful to store
finer resolution data if coarser resolutions of
the same data is not present.
38Determining Rate and Drilldown query error
How do we determine communication rates to, say,
bound query error?
- Assume Rates are fixed a-priori by communication
constraints.
How do we determine the drill-down query error
when prior information about deployment and data
is limited?
39Prior information about sampled data
full a priori information
Omniscient Strategy Baseline. Use all data to
decide optimal allocation.
Solve Constraint Optimization
Training Strategy (can be used when small
training dataset from initial deployment).
1 2 4
Greedy Strategy (when no data is available, use a
simple weighted allocation to summaries).
Finer
Finest
Coarse
No a priori information
40Distributed trace-driven implementation
- Linux implementation for ipaq-class nodes
- uses Emstar (J. Elson et al), a Linux-based
emulator/simulator for sensor networks. - 3D Wavelet codec based on freeware by Geoff Davis
available at http//www.geoffdavis.net. - Query processing in Matlab.
- Geo-spatial precipitation dataset
- 15x12 grid (50km edge) of precipitation data from
1949-1994, from Pacific Northwest. (Caveat Not
real sensor data). - System parameters
- compression ratio 6122448.
- Training set 6 of total dataset.
M. Widmann and C.Bretherton. 50 km resolution
daily precipitation for the Pacific Northwest,
1949-94.
41How efficient is search?
Search is very efficient (lt5 of network queried)
and accurate for different queries studied.
42Comparing Aging strategies
Training performs within 1 to optimal . Careful
selection of parameters for the greedy algorithm
can provide surprisingly good results (within
2-5 of optimal).
43Other Research Directions
44Coupling with real world makes spatial
irregularity inevitable
- Terrain and deployment practicalities
Deployments will be biased closer to power
sources, GPS access, communication access etc. - Resource limitations Given a limited number of
nodes, more nodes will be deployed in regions
where greater sensing variability is expected. - There will be built environments where spatial
regularity is feasible (building monitoring), but
outdoor placements will be largely irregular.
Node placement at James Reserve
45Spatial aggregate Multi-resolution views using
wavelets
- Communicate wavelet-compressed low-resolution
view of data to the user. - Nearest neighbor re-sampling
- Efficient energy-wise.
- Can introduce artifacts and be generally
ineffective in highly irregular settings. - Interpolated Wavelet Lifting to handle highly
irregular settings
50
Highly Irregular case
Error
Regular case
5
Coarsest
Finest
Resolution Queried
Ganesan et al An evaluation of multi-resolution
storage for sensor networks Sensys 2003
Debauchies, Guskov, Schroder, Sweldons Wavelets
on irregular point sets
46Summary
- Provide smooth transition from current data
acquisition systems to fully distributed storage
and search systems. - Progressive aging of summaries can be used to
support long-term spatio-temporal queries in
resource-constrained sensor network deployments. - We describe two algorithms a training-based
algorithm that relies on the availability of
training datasets, and a greedy algorithm can be
used in the absence of such data. - Our results show that
- training performs close to optimal for the
dataset that we study.
47Optimizing node placement and transmission
structure for data gathering
- A user has a bag of n nodes. He/She needs to
place the nodes in a region R such that the
sensed field can be reconstructed with - maximum distortion for any point in R is less
than ?max - Average distortion over the entire region is less
than ?avg
- Problem How does he/she place the nodes and
construct their communication structure for data
gathering to a sink such that the total multi-hop
communication power is minimized?
48Complexity
49Complexity at scale Large-scale network
measurement
50New challenges in sensor networks
- Other systems have a combination of different
constraints massive data storage, distributed
search, - BUT
- Not Low power,
- Insufficient capacity local storage for
high-bandwidth applications, latency/availability
may not be dominant concern. - Correlated sensor data can be exploited
- the greedy algorithm performs well for a
well-chosen summary weighting parameter.
51Why is search and storage in sensor networks
different?
- Wide-area distributed storage mechanisms
(Oceanstore, Chord, CAN) ensure persistent
storage of massive data, and ensure low-latency,
high availability, robustness and load-balancing.
- BUT
- Some important constraints are different
- Low power utilization, Insufficient capacity
local storage for high-bandwidth applications,
latency/availability may not be dominant concern. - New aspects of system and data can be exploited
- Correlated sensor data (environmental
monitoring), Deployed for limited set of tasks.
52Data challenges in structural monitoring
Example Seismic Network Sampling Rate
100Hz Event Data Rate 100s of KB per day/node
A. Store data centrally
B. Store data locally
C. Multi-resolution storage
53Data challenges in structural monitoring
Example Seismic Network Sampling Rate
100Hz Event Data Rate 100s of KB per day/node
A. Store data centrally
B. Store data locally
C. Multi-resolution storage
54Centralized Data Collection
Method Transmit all data out of network and
store in a centralized database. Advantage
Centralized, persistent storage and unconstrained
search. Disadvantage Power-inefficient, high
latency due to bandwidth constraints. Example
First Generation multi-hop wireless data
acquisition systems.
A. Store data centrally
B. Store data locally
C. Multi-resolution storage
55Local storage with distributed indexing
Method Store data locally, and construct
distributed index structures to reduce search
cost. DCS (Ratnasamy Hotnets 2000), DIMS (Li
Sensys2003), DIFS (Greenstein
SPNA2002) Advantage Low communication overhead,
efficient search. Disadvantage Short-term use
when local storage capacity is limited.
A. Store data centrally
B. Store data locally
C. Multi-resolution storage
56Distributed Indexing Explained
What is the maximum precipitation between
Sept-Dec 2002?
Direct query to quadrant that best matches
query
Caveat Assumes that nodes have sufficient local
storage capacity
57Distributed Indexing Explained
Caveat Assumes that nodes have sufficient local
storage capacity
58Distributed Indexing Explained
Caveat Assumes that nodes have sufficient local
storage capacity
59What is the maximum precipitation between
Sept-Dec 2002?
Direct query to quadrant that best matches
query
raw data
60Progressive Local Data Storage
- Apply progressive coding strategy to local
storage. - Store the data at multiple resolutions locally
and phase out data at different resolutions at
different rates.
- How much scaling does this provide?
- How to determine the aging periods of different
resolutions?
61How much scaling can be achieved by progressively
lossy local storage?
62Distributed storage and indexing
Method Provide gracefully degrading storage and
query quality over time. Advantage Long-term
storage in storage-constrained networks,
efficient search. Disadvantage More
communication overhead than (B).
A. Store data centrally
B. Store data locally
C. Multi-resolution storage
63Storage challenges in seismic monitoring
Method Transmit all data out of network and
store in a centralized database. Advantage
Centralized, persistent storage and unconstrained
search. Excellent initial deployment and
debugging tool. CENS deployments are
currently primarily data-gathering
based. Disadvantage Power-inefficient, high
latency due to bandwidth constraints.
A. Store data centrally
B. Store data locally
C. Multi-resolution storage
64Storage challenges in seismic monitoring
Method Store data locally, and construct
distributed index structures to reduce search
cost. DCS (Ratnasamy Hotnets 2000), DIMS (Li
Sensys2003), DIFS (Greenstein SPNA
2002) Advantage Low communication overhead,
efficient search. Disadvantage Short-term use
when local storage capacity is limited.
A. Store data centrally
B. Store data locally
C. Multi-resolution storage
65Storage challenges in seismic monitoring
Method Provide gracefully degrading storage and
query quality over time. Advantage Long-term
storage in storage-constrained networks,
efficient search. Disadvantage More
communication overhead than (B).
A. Store data centrally
B. Store data locally
C. Multi-resolution storage
66Can existing storage and search systems satisfy
design goals?
WSN
Geo-spatial Data Mining
Exploited Data Correlation
Decentralization
P2P systems Web Caches
Centralized Data Gathering
Storage Utilization