CMPE 259 - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

CMPE 259

Description:

A distributed sensing network comprised of a large number of small sensing ... Uses Emstar (J. Elson et al), a Linux-based emulator/simulator for sensor networks. ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 50
Provided by: jimk212
Category:

less

Transcript and Presenter's Notes

Title: CMPE 259


1
CMPE 259
  • Sensor Networks
  • Katia Obraczka
  • Winter 2005
  • Storage and Querying II

2
Announcements
  • Hw3 is up.
  • Exams.
  • Sign-up for project presentations.
  • Schedule.
  • Course evaluation Mon, 03.14.
  • Need volunteer.

3
Today
  • Storage.
  • Querying.

4
Data-Centric Storage
5
DCS
  • Data dissemination for sensor networks.
  • Naming-based storage.

6
Background
  • Sensornet
  • ? A distributed sensing network comprised of a
    large number of small sensing devices equipped
    with
  • processor memory radio
  • ? Large volume of data
  • Data Dissemination Algorithm
  • ? Scalable.
  • ? Self-organizing.
  • ? Energy efficient.

7
Some definitions
  • Observation
  • ? Low-level output from sensors.
  • ? E.g. detailed temperature and pressure
    readings.
  • Event
  • ? Constellations of low-level observations.
  • ? E.g. elephant-sighting, fire, intruder.
  • Query
  • ? Used to elicit the event information from
    sensornets.
  • ? E.g. is there an intruder? Where is the fire?

8
Data dissemination schemes
  • External Storage (ES)
  • Local Storage (LS)
  • Data-Centric Storage (DCS)

9
External Storage (ES)
10
Local Storage (LS)
11
Local Storage (LS)
12
Data-Centric Storage (DCS)
  • Events are named with keys.
  • DCS provides (key, value) pair.
  • DCS supports two operations
  • ? Put (k, v) stores v ( the observed data )
    according to the key k, the name of the data
  • ? Get (k) retrieves whatever value is stored
    associated with key k
  • Hash function
  • ? Hash a key k into geographic coordinates.
  • ? Put() and Get() operations on the same key k
    hash k to the same location.

13
DCS Example
(11, 28)
(11,28)Hash(elephant)
14
DCS Example
Get(elephant)
(11, 28)
(11,28)Hash(elephant)
15
DCS Example contd..
elephant
fire
16
Geographic Hash Table (GHT)
  • Builds on
  • ? Peer-to-peer Lookup Systems.
  • ? Greedy Perimeter Stateless Routing.

GHT
GPSR
Peer-to-peer lookup system
17
Comparison study
  • Metrics
  • ? Total Messages
  • Total packets sent in the sensor network.
  • ? Hotspot Messages
  • Maximal number of packets sent by any
    particular node.

18
Comparison study contd..
  • Assume ? n is the number of nodes
  • ? Asymptotic costs of O(n) for floods
  • O(n 1/2) for point-to-point
    routing

19
Comparison Study contd..
  • Dtotal, the total number of events detected
  • Q , the number of event types queries for
  • Dq, the number of detected events of event
    types
  • No more than one query for each event type, so
    there are Q queries in total.
  • Assume hotspot occurs on packets sending to the
    access point.

20
Comparison Study contd..
  • DCS is preferable if
  • Sensor network is large
  • Dtotal gtgt maxDq, Q

21
Summary
  • In DCS, relevant data are stored by name at nodes
    within the sensornets.
  • GHT hashes a key k into geographic coordinates,
    the key-value pair is stored at a node in the
    vicinity of the location to which its key hashes.
  • To ensure robustness and scalability, DCS uses
    Perimeter Refresh Protocol (PRP) and Structured
    Replication (SR).
  • Compared with ES and LS, DCS is preferable in
    large sensornet .

22
Multi-Resolution Storage
23
Goals
  • Provide storage and search for raw sensor data in
    data-intensive scientific operations.
  • Previous work
  • Aggregation and querying.
  • Focus on applications whose interests are known a
    priori.

24
Approach
  • Lossy, progressively degrading storage.

25
Constructing the hierarchy
Data
Initially, nodes fill up their own storage with
raw sampled data.
26
Constructing the hierarchy
  • Organize network into grids, and hash in each to
    determine location of clusterhead (ref DCS).
  • Send compressed local time-series to clusterhead.

27
Processing at each level
Store incoming summaries locally for future
search.

Get compressed summaries from children.
time
Decode
Re-encode at lower resolution and forward to
parent.
y
x
28
Constructing the hierarchy
Recursively send data to higher levels of the
hierarchy.
29
Distributing storage load
Hash to different locations over time to
distribute load among nodes in the network.
30
What happens when storage fills up?
  • Eventually, all available storage gets filled,
    and we have to decide when and how to drop
    summaries.
  • Allocate storage to each resolution and use each
    allocated storage block as a circular buffer.

Local Storage Allocation
Res 3
Res 4
Res 1
Res 2
Local storage capacity
31
Tradeoff between storage requirements and query
quality
  • Graceful query degradation providing more
    accurate responses to queries on recent data and
    less accurate responses to queries on older data.

How to allocate storage at each node to summaries
at different resolutions to provide gracefully
degrading storage and search capability?
32
Match system performance to user requirements
95
Query Accuracy
50
Quality Difference
Time
present
past
  • Objective Minimize worst case difference between
    user-desired query quality (blue curve) and query
    quality that the system can provide (red step
    function).

33
For how long should summaries be stored?
  • To achieve desired query quality given systems
    constraints.
  • Given
  • N sensor nodes.
  • Each node has storage capacity, S.
  • Users ask a set of typical queries, T.
  • Data is generated at resolution i at rate Ri.
  • D(q,k) query error when drilldown for query q
    terminates at level k.
  • Quser - User-desired quality degradation.

34
Aging strategy with limited information
full a priori information
Omniscient Strategy (baseline when entire data
is available.
Solve Constraint Optimization
Training Strategy (when small training dataset
from initial deployment).
1 2 4
Greedy Strategy (when no data is available, use a
simple weighted allocation to summaries).
Finer
Finest
Coarse
No a priori information
35
Distributed trace-driven implementation
  • Linux implementation.
  • Uses Emstar (J. Elson et al), a Linux-based
    emulator/simulator for sensor networks.
  • 3D Wavelet codec .
  • Query processing.
  • Geo-spatial precipitation dataset.
  • 15x12 grid (50km edge) of precipitation data from
    1949-1994, from Pacific Northwest.
  • System parameters
  • Compression ratio 6122448.
  • Training set 6 of total dataset.

36
How efficient is search?
Search is very efficient (lt5 of network queried)
and accurate for different queries studied.
37
Comparing aging strategies
Training performs within 1 to optimal . Careful
selection of parameters for the greedy algorithm
can provide surprisingly good results (within
2-5 of optimal).
38
Summary
  • Progressive aging of summaries can be used to
    support long-term spatio-temporal queries in
    resource-constrained sensor network deployments.
  • We describe two algorithms a training-based
    algorithm that relies on the availability of
    training datasets, and a greedy algorithm can be
    used in the absence of such data.
  • Our results show that
  • training performs close to optimal for the
    dataset that we study.
  • the greedy algorithm performs well for a
    well-chosen summary weighting parameter.

39
Continuously Adaptive Continuous Queries (CACQ)
40
CACQ Introduction
  • Proposed continuous query (CQ) systems are based
    on static plans.
  • But, CQs are long running.
  • Initially valid assumptions less so over time.
  • CACQ insight apply continuous adaptivity.
  • Dynamic operator ordering.
  • Process multiple queries simultaneously.
  • Enables sharing of work storage.

41
Outline
  • Background
  • Motivation
  • Continuous Queries
  • Eddies
  • CACQ
  • Contributions
  • Example driven explanation
  • Results Experiments

42
Motivating applications
  • Building monitoring.
  • Variety of sensors (e.g., light, temperature,
    vibration, strain, etc.).
  • Variety of users with different interests (e.g.,
    structural engineers, building managers, building
    users, etc.).

43
Continuous queries
  • Long running, standing queries.
  • From various users.
  • On a number of sensor streams.
  • Installed continuously produce results until
    removed.
  • Lots of queries, over the same data sources
  • Opportunity for work sharing.

44
Eddies adaptivity
  • Eddies (Avnur Hellerstein, SIGMOD 2000)
    Continuous Adaptivity.
  • No static ordering of operators.
  • Routing policy dynamically orders operators on a
    per tuple basis.
  • done and ready bits encode where tuple has been,
    where it can go.

45
CACQ contributions
  • Adaptivity
  • Tuple lineage
  • In addition to ready and done, encode path tuple
    takes through operator.
  • Enables sharing of work and state across queries.
  • Grouped filter
  • Efficiently compute selections over multiple
    queries.
  • Join sharing through State Modules (SteMs)

46
Eddies CACQ Single Query, Single Source
SELECT FROM R WHERE R.a gt 10 AND R.b lt 15
  • Use ready bits to track what to do next
  • All 1s in single source
  • Use done bits to track what has been done
  • Tuple can be output when all bits set
  • Routing policy dynamically orders tuples

R2
R2
R1
R2
R2
R2
R1
R2
1 1 0 0
1 1 0 1
1 1 0 0
1 1 1 0
1 1 11
47
Evaluation
  • Real Java implementation on top of Telegraph QP
  • 4,000 new lines of code in 75,000 line codebase
  • Server Platform
  • Linux 2.4.10
  • Pentium III 733, 756 MB RAM
  • Queries posed from separate workstation
  • Output suppressed
  • Lots of experiments in paper, just a few here

48
CACQ vs. NiagaraCQ Graph
49
Conclusion
  • CACQ sharing and adaptivity for high performance
    monitoring queries over data streams.
  • Features
  • Adaptivity.
  • Adapt to changing query workload without costly
    multi-query reoptimization.
  • Work sharing via tuple lineage.
  • Without constraining the available plans.
  • Computation sharing via grouped filter.
  • Storage sharing via SteMs.
Write a Comment
User Comments (0)
About PowerShow.com