Scalable data handling in sensor networks - PowerPoint PPT Presentation

1 / 66
About This Presentation
Title:

Scalable data handling in sensor networks

Description:

Data challenges in high-bandwidth sensor networks ... Directed Diffusion (Heidemann, Estrin), TinyDB (Madden), Cougar (Bonnet) 24. PROGRESSIVELY AGE ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 67
Provided by: deepakg
Category:

less

Transcript and Presenter's Notes

Title: Scalable data handling in sensor networks


1
Scalable data handling in sensor networks
  • Deepak Ganesan
  • Collaborators Ben Greenstein, Denis
    Perelyubskiy, Deborah Estrin (UCLA) , John
    Heidemann, Ramesh Govindan (USC/ISI)

2
Outline
  • Data challenges in high-bandwidth sensor networks
  • Transitioning from data acquisition systems to
    distributed storage and search
  • Economical wireless data acquisition systems
    using motes.
  • Long-lived, distributed storage and search
    systems
  • Other notable research directions

3
Outline
  • Data challenges in high-bandwidth sensor networks
  • Instance Wireless structural monitoring
  • Transitioning from data acquisition systems to
    distributed storage and search
  • Generation I Wireless data acquisition systems
  • Goal
  • Proposed solution Progressive, on-demand data
    acquisition
  • Performance analysis over structural vibration
    data.
  • Generation II Long-lived, query-response systems
  • New scaling challenge Storage
  • In-network storage and processing techniques
  • Performance Analysis
  • Other research directions
  • Optimizing node placement and transmission
    structure for data gathering
  • Complexity at scale Large-scale network
    measurement

4
Scaling high-bandwidth sensor network deployments
  • We have made a good start at building scalable,
    long-term wireless sensor network deployments
    that deal with low-bandwidth, low-duty rate
    applications.
  • Micro-climate monitoring system at James Reserve
    (CENS-UCLA), Bird monitoring at Great Duck Island
    (Intel-U.C.Berkeley)
  • low-data rate (few samples/minute), medium-scale
    (100s of nodes) deployments.
  • Duty-cycling low-power listen/transmit, simple
    aggregation schemes (TinyDiffusion/TinyDB).
  • We have very little understanding of how to scale
    high-bandwidth sensor network applications
    (involving vibration/acoustic/image sensors)
    where significant data rates can be expected.

5
Challenges in Wireless Structural Monitoring
  • High Data Rates
  • 100Hz, 16bit sample, 15min shaking events.
  • Resource-constrained motes
  • 6MHz processor, 4KB RAM, 4MB Flash Memory (40
    mins of vibration data)
  • Diverse user requirements
  • Data collection of interesting event signatures
    of vibration events.
  • Analysis of data over different time-scales
    (long-term and short-term patterns)
  • State of Art Expensive wireless data acquisition
    systems using 802.11

6
Transitioning from centralized to distributed
storage and search.
Multi-hop Wireless Data Acquisition using Motes
Distributed In-network storage and search
Current Data Acquisition Systems
7
Transitioning from centralized to distributed
storage and search.
Multi-hop Wireless Data Acquisition using Motes
Distributed In-network storage and search
Current Data Acquisition Systems
8
Transitioning from centralized to distributed
storage and search.
Multi-hop Wireless Data Acquisition using Motes
Distributed In-network storage and search
Current Data Acquisition Systems
9
Building Multi-hop Wireless Data Acquisition
Systems Using Motes
  • Scaling challenges
  • Data collection from medium scale networks of
    motes incurs high latency.
  • Data collection incurs large energy cost since
    not all the data may be necessary for the user.
  • Lossless compression (Huffman coding) is
    insufficient Only 2 fold reduction in data size

15 minutes of vibration event data (200KB each)
from a 20 node multi-hop wireless network takes
4-8 hours to collect centrally!
10
Progressive, On-demand Data Collection
  • Low-latency low-resolution data acquisition
    immediately after vibration activity.
  • User can analyze low-resolution data to determine
    nodes from which higher resolution data is
    required.
  • Lossless data is collected from a subset of nodes
    on-demand before data is phased out from local
    stores

Low-resolution data for 15 minutes of vibration
event data can be collected within 15-30 minutes
of event occurrence
11
Wavelets for lossy compression
  • Why Wavelets?
  • Wavelets preserve spatio-temporal features
    (edges, discontinuities) while providing good
    approximation of long-term trends in data
  • Significant work on seismic data compression has
    obtained good performance with wavelet
    compression.
  • Two Questions
  • Is wavelet compression appropriate for structural
    vibration data?
  • Performance metric Compression ratio, error
    (rms, psnr)
  • Can we implement a resource-constrained by
    efficient implementation on resource-constrained
    devices Motes?
  • Study performed on structural vibration data from
    shaker table tests
  • CUREE-Kajima Joint Research Program, UCLA -
    Thomas Kang, John Wallace

12
Feasibility Analysis
  • Signal decomposition suggests that it is highly
    appropriate for wavelet compression.

13
Mote implementation of Wavelet Codec
14
Progressive Transmission Detection
An event detection module executes a simple
threshold filter to detect vibration events.
15
Progressive Transmission Decomposition
A detected event sequence is decomposed using an
efficient integer wavelet transform.
16
Progressive Transmission Thresholding and
Quantization
The decomposed signal is thresholded and
quantized to reduce number of bits per sample.
17
Progressive Transmission Reconstruction
A run-length encoder exploits runs of zeros and
sends a packed bitstream to the base-station
18
Compression Ratio and Error for Mote
Implementation
17-fold reduction in data size with an RMS error
of 3.1 (PSNR 30dB)
  • Good compression ratios can be achieved with low
    error

19
Transitioning towards long-term deployments
  • We achieved low-latency wireless
    data-acquisition, but our deployment lifetimes
    were still short.
  • Data Acquisition systems with motes can last for
    a few weeks.
  • How do our system objectives change for a
    long-term deployment?
  • Need to achieve very low energy usage for long
    lifetime
  • system focus has to shift from data collection to
    in-network data storage and search.
  • Need smooth transition for researchers who have
    depended on data collection systems
  • system should retain ability to collect new event
    signatures on demand.
  • Goal Build gracefully degrading storage system
    with efficient drill-down search facility.

20
Why is gracefully degrading in-network storage
the right paradigm?
  • Support for on-demand data acquisition.
  • Intuitive approach for dealing with

21
Can existing storage and search systems satisfy
design goals?
22
How can we achieve gracefully degrading long-term
storage?
  • Exploit spatio-temporal correlation to reduce
    data.
  • Exploit distributed storage capacity of sensor
    network.
  • large distributed storage, although limited local
    storage.
  • Store data at multiple resolutions to tradeoff
    query quality for storage requirement.
  • Lower resolution data offers lower query quality
    but incurs less storage overhead, and vice-versa.
  • Exploit low cost of drill-down query processing.
  • allow approximate query processing that obtain
    sufficiently accurate responses.

23
Related Work
  • Data Storage
  • Event Storage DCS (Ratnasamy Hotnets 2000)
  • Indexing schemes DIMS (Li Sensys 2003), DIFS
    (Greenstein SPNA 2002)
  • Multi-resolution computation
  • Beyond Average (Hellerstein IPSN 2003)
  • Edge detection (Nowak IPSN 2003)
  • Sensor network databases
  • Directed Diffusion (Heidemann, Estrin), TinyDB
    (Madden), Cougar (Bonnet)

24
DIMENSIONS Design Key Ideas
  • Construct distributed load-balanced quad-tree
    hierarchy of lossy wavelet-compressed summaries
    corresponding to different resolutions and
    spatio-temporal scales.
  • Queries drill-down from root of hierarchy to
    focus search on small portions of the network.
  • Progressively age summaries for long-term storage
    and graceful degradation of query quality over
    time.

Level 2
Level 1

Level 0
PROGRESSIVELY AGE
PROGRESSIVELY LOSSY
25
Constructing the hierarchy
Initially, nodes fill up their own storage with
raw sampled data.
26
Constructing the hierarchy
  • Tesselate the network space into grids, and hash
    in each to determine location of clusterhead
    (ref DCS).
  • Send wavelet-compressed local time-series to
    clusterhead.

27
Processing at each level
Store incoming summaries locally for future
search.

Get compressed summaries from children.
time
Decode
Re-encode at lower resolution and forward to
parent.
y
x
Wavelet encoder/decoder
28
Constructing the hierarchy
Recursively send data to higher levels of the
hierarchy.
29
Distributing storage load
Hash to different locations over time to
distribute load among nodes in the network.
30
What happens when storage fills up?
  • Eventually, all available storage gets filled,
    and we have to decide when and how to drop
    summaries.
  • Allocate storage to each resolution and use each
    allocated storage block as a circular buffer.

31
Tradeoff between Age and Storage requirements for
summary
  • Graceful Query Degradation Provide more accurate
    responses to queries on recent data and less
    accurate responses to queries on older data.

How do we allocate storage at each node to
summaries at different resolutions to provide
gracefully degrading storage and search
capability?
32
Match system performance to user requirements
95
Query Accuracy
50
Quality Difference
past
Time
present
  • Objective Minimize worst case difference between
    user-desired query quality (blue curve) and query
    quality that the system can provide (red step
    function).

33
What do we know?
  • Given
  • N sensor nodes.
  • Each node has storage capacity, S.
  • Data is generated at resolution i at rate Ri.
  • Quser - User-desired quality degradation.
  • We might be provided
  • a set of typical queries, T, that the user
    provides.
  • D(q,k) Query Error when drilldown for query q
    terminates at level k.

34
Determining Query Quality from multiple queries
Max Query Find the node which has the maximum
precipitation in January.
50
Error
Edge Query Find nodes along a boundary between
high and low precipitation areas.
5
Only coarsest summary is queried.
All resolutions (from coarsest to finest) are
queried
We need to translate the performance of different
Drill-down queries to a single query quality
metric.
35
Definition Query Quality
  • Given
  • T set of typical queries.
  • D(q,k) Query error when drill-down for query q
    in set T terminates at resolution k.
  • The query quality for queries that refer to data
    at time t in the past, Qsystem(t), if k is the
    finest available resolution is

36
How many levels of resolution, k are available at
time t ?
  • Given
  • Ri Total transmitted data rate from level i
    clusterheads to level i1 clusterheads.
  • Define si Storage allocated to each node for
    summaries at resolution i.

Level i1
Level i
37
Storage Allocation Constraint-Optimization
problem
  • Objective Find si, i1..log4N that
  • Given constraints
  • Storage constraint Each node cannot store any
    greater than its storage limit.
  • Drill-down constraint It is not useful to store
    finer resolution data if coarser resolutions of
    the same data is not present.

38
Determining Rate and Drilldown query error
How do we determine communication rates to, say,
bound query error?
  • Assume Rates are fixed a-priori by communication
    constraints.

How do we determine the drill-down query error
when prior information about deployment and data
is limited?
39
Prior information about sampled data
full a priori information
Omniscient Strategy Baseline. Use all data to
decide optimal allocation.
Solve Constraint Optimization
Training Strategy (can be used when small
training dataset from initial deployment).
1 2 4
Greedy Strategy (when no data is available, use a
simple weighted allocation to summaries).
Finer
Finest
Coarse
No a priori information
40
Distributed trace-driven implementation
  • Linux implementation for ipaq-class nodes
  • uses Emstar (J. Elson et al), a Linux-based
    emulator/simulator for sensor networks.
  • 3D Wavelet codec based on freeware by Geoff Davis
    available at http//www.geoffdavis.net.
  • Query processing in Matlab.
  • Geo-spatial precipitation dataset
  • 15x12 grid (50km edge) of precipitation data from
    1949-1994, from Pacific Northwest. (Caveat Not
    real sensor data).
  • System parameters
  • compression ratio 6122448.
  • Training set 6 of total dataset.

M. Widmann and C.Bretherton. 50 km resolution
daily precipitation for the Pacific Northwest,
1949-94.
41
How efficient is search?
Search is very efficient (lt5 of network queried)
and accurate for different queries studied.
42
Comparing Aging strategies
Training performs within 1 to optimal . Careful
selection of parameters for the greedy algorithm
can provide surprisingly good results (within
2-5 of optimal).
43
Other Research Directions
44
Coupling with real world makes spatial
irregularity inevitable
  • Terrain and deployment practicalities
    Deployments will be biased closer to power
    sources, GPS access, communication access etc.
  • Resource limitations Given a limited number of
    nodes, more nodes will be deployed in regions
    where greater sensing variability is expected.
  • There will be built environments where spatial
    regularity is feasible (building monitoring), but
    outdoor placements will be largely irregular.

Node placement at James Reserve
45
Spatial aggregate Multi-resolution views using
wavelets
  • Communicate wavelet-compressed low-resolution
    view of data to the user.
  • Nearest neighbor re-sampling
  • Efficient energy-wise.
  • Can introduce artifacts and be generally
    ineffective in highly irregular settings.
  • Interpolated Wavelet Lifting to handle highly
    irregular settings

50
Highly Irregular case
Error
Regular case
5
Coarsest
Finest
Resolution Queried
Ganesan et al An evaluation of multi-resolution
storage for sensor networks Sensys 2003
Debauchies, Guskov, Schroder, Sweldons Wavelets
on irregular point sets
46
Summary
  • Provide smooth transition from current data
    acquisition systems to fully distributed storage
    and search systems.
  • Progressive aging of summaries can be used to
    support long-term spatio-temporal queries in
    resource-constrained sensor network deployments.
  • We describe two algorithms a training-based
    algorithm that relies on the availability of
    training datasets, and a greedy algorithm can be
    used in the absence of such data.
  • Our results show that
  • training performs close to optimal for the
    dataset that we study.

47
Optimizing node placement and transmission
structure for data gathering
  • A user has a bag of n nodes. He/She needs to
    place the nodes in a region R such that the
    sensed field can be reconstructed with
  • maximum distortion for any point in R is less
    than ?max
  • Average distortion over the entire region is less
    than ?avg
  • Problem How does he/she place the nodes and
    construct their communication structure for data
    gathering to a sink such that the total multi-hop
    communication power is minimized?

48
Complexity
49
Complexity at scale Large-scale network
measurement
50
New challenges in sensor networks
  • Other systems have a combination of different
    constraints massive data storage, distributed
    search,
  • BUT
  • Not Low power,
  • Insufficient capacity local storage for
    high-bandwidth applications, latency/availability
    may not be dominant concern.
  • Correlated sensor data can be exploited
  • the greedy algorithm performs well for a
    well-chosen summary weighting parameter.

51
Why is search and storage in sensor networks
different?
  • Wide-area distributed storage mechanisms
    (Oceanstore, Chord, CAN) ensure persistent
    storage of massive data, and ensure low-latency,
    high availability, robustness and load-balancing.
  • BUT
  • Some important constraints are different
  • Low power utilization, Insufficient capacity
    local storage for high-bandwidth applications,
    latency/availability may not be dominant concern.
  • New aspects of system and data can be exploited
  • Correlated sensor data (environmental
    monitoring), Deployed for limited set of tasks.

52
Data challenges in structural monitoring
Example Seismic Network Sampling Rate
100Hz Event Data Rate 100s of KB per day/node
A. Store data centrally
B. Store data locally
C. Multi-resolution storage
53
Data challenges in structural monitoring
Example Seismic Network Sampling Rate
100Hz Event Data Rate 100s of KB per day/node
A. Store data centrally
B. Store data locally
C. Multi-resolution storage
54
Centralized Data Collection
Method Transmit all data out of network and
store in a centralized database. Advantage
Centralized, persistent storage and unconstrained
search. Disadvantage Power-inefficient, high
latency due to bandwidth constraints. Example
First Generation multi-hop wireless data
acquisition systems.
A. Store data centrally
B. Store data locally
C. Multi-resolution storage
55
Local storage with distributed indexing
Method Store data locally, and construct
distributed index structures to reduce search
cost. DCS (Ratnasamy Hotnets 2000), DIMS (Li
Sensys2003), DIFS (Greenstein
SPNA2002) Advantage Low communication overhead,
efficient search. Disadvantage Short-term use
when local storage capacity is limited.
A. Store data centrally
B. Store data locally
C. Multi-resolution storage
56
Distributed Indexing Explained
What is the maximum precipitation between
Sept-Dec 2002?
Direct query to quadrant that best matches
query
Caveat Assumes that nodes have sufficient local
storage capacity
57
Distributed Indexing Explained
Caveat Assumes that nodes have sufficient local
storage capacity
58
Distributed Indexing Explained
Caveat Assumes that nodes have sufficient local
storage capacity
59
What is the maximum precipitation between
Sept-Dec 2002?
Direct query to quadrant that best matches
query
raw data
60
Progressive Local Data Storage
  • Apply progressive coding strategy to local
    storage.
  • Store the data at multiple resolutions locally
    and phase out data at different resolutions at
    different rates.
  • How much scaling does this provide?
  • How to determine the aging periods of different
    resolutions?

61
How much scaling can be achieved by progressively
lossy local storage?
  • Experiments tbd

62
Distributed storage and indexing
Method Provide gracefully degrading storage and
query quality over time. Advantage Long-term
storage in storage-constrained networks,
efficient search. Disadvantage More
communication overhead than (B).
A. Store data centrally
B. Store data locally
C. Multi-resolution storage
63
Storage challenges in seismic monitoring
Method Transmit all data out of network and
store in a centralized database. Advantage
Centralized, persistent storage and unconstrained
search. Excellent initial deployment and
debugging tool. CENS deployments are
currently primarily data-gathering
based. Disadvantage Power-inefficient, high
latency due to bandwidth constraints.
A. Store data centrally
B. Store data locally
C. Multi-resolution storage
64
Storage challenges in seismic monitoring
Method Store data locally, and construct
distributed index structures to reduce search
cost. DCS (Ratnasamy Hotnets 2000), DIMS (Li
Sensys2003), DIFS (Greenstein SPNA
2002) Advantage Low communication overhead,
efficient search. Disadvantage Short-term use
when local storage capacity is limited.
A. Store data centrally
B. Store data locally
C. Multi-resolution storage
65
Storage challenges in seismic monitoring
Method Provide gracefully degrading storage and
query quality over time. Advantage Long-term
storage in storage-constrained networks,
efficient search. Disadvantage More
communication overhead than (B).
A. Store data centrally
B. Store data locally
C. Multi-resolution storage
66
Can existing storage and search systems satisfy
design goals?
WSN
Geo-spatial Data Mining
Exploited Data Correlation
Decentralization
P2P systems Web Caches
Centralized Data Gathering
Storage Utilization
Write a Comment
User Comments (0)
About PowerShow.com