Title: Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks
1Supporting Aggregate Queries Over Ad-Hoc Wireless
Sensor Networks
- Samuel Madden
- UC Berkeley
With Robert Szewczyk, Michael Franklin, and David
Culler
WMCSA June 21, 2002
2Motivation Sensor Nets and In-Network Query
Processing
- Many Sensor Network Applications are Data
Oriented - Queries Natural and Efficient Data Processing
Mechanism - Easy (unlike embedded C code)
- Enable optimizations through abstraction
- Aggregates Common Case
- E.g. Which rooms are in use?
- In-network processing a must
- Sensor networks power and bandwidth constrained
- Communication dominates power cost
- Not subject to Moores law!
3Overview
- Background
- Sensor Networks
- Our Approach Tiny Aggregation (TAG)
- Overview
- Expressiveness
- Illustration
- Optimizations
- Grouping
- Current Status Future Work
4Overview
- Background
- Sensor Networks
- Our Approach Tiny Aggregation (TAG)
- Overview
- Expressiveness
- Illustration
- Optimizations
- Grouping
- Current Status Future Work
5Background Sensor Networks
- A collection of small, radio-equipped, battery
powered, networked microprocessors - Typically Ad-hoc Multihop Networks
- Single devices unreliable
- Very low power tiny batteries power for months
- Apps Environment Monitoring, Personal Nets,
Object Tracking - Data processing plays a key role!
6Berkeley Mica Motes TinyOS
- TinyOS operating system (services)
- 4Mhz Processor
- 4K RAM, 512K EEPROM, 128K code space
- Single channel CSMA half-duplex radio _at_ 40kbits
- Lossy 20 loss _at_ 5ft in Ganesan et al.
- Communication Very Expensive 800 instrs/bit
7Overview
- Background
- Sensor Networks
- Our Approach Tiny Aggregation (TAG)
- Overview
- Expressiveness
- Illustration
- Optimizations
- Grouping
- Current Status Future Work
8The Tiny Aggregation (TAG) Approach
- Push declarative queries into network
- Impose a hierarchical routing tree onto the
network - Divide time into epochs
- Every epoch, sensors evaluate query over local
sensor data and data from children - Aggregate local and child data
- Each node transmits just once per epoch
- Pipelined approach increases throughput
- Depending on aggregate function, various
optimizations can be applied
9SQL Primer
SELECT AVG(light) FROM sensors WHERE sound lt
100 GROUP BY roomNo HAVING AVG(light) lt 50
- SQL is an established declarative language not
wedded to it - Some extensions clearly necessary, e.g. for
sample rates - We adopt a basic subset
- sensors relation (table) has
- One column for each reading-type, or attribute
- One row for each externalized value
- May represent an aggregation of several
individual readings
SELECT aggn(attrn), attrs FROM
sensors WHERE selPreds GROUP BY
attrs HAVING havingPreds EPOCH DURATION s
10Aggregation Functions
- Standard SQL supports the basic 5
- MIN, MAX, SUM, AVERAGE, and COUNT
- We support any function conforming to
Aggnfmerge, finit, fevaluate Fmergelta1gt,lta2gt
? lta12gt finita0 ? lta0gt Fevaluatelta1gt ?
aggregate value (Merge associative, commutative!)
Partial Aggregate
Example Average AVGmerge ltS1, C1gt, ltS2, C2gt ?
lt S1 S2 , C1 C2gt AVGinitv ?
ltv,1gt AVGevaluateltS1, C1gt ? S1/C1
11Query Propagation
- TAG propagation agnostic
- Any algorithm that can
- Deliver the query to all sensors
- Provide all sensors with one or more duplicate
free routes to some root - Paper describes simple flooding approach
- Query introduced at a root rebroadcast by all
sensors until it reaches leaves - Sensors pick parent and level when they hear
query - Reselect parent after k silent epochs
Query
1
P0, L1
2
3
P1, L2
P1, L2
4
P2, L3
6
P3, L3
5
P4, L4
12Illustration Pipelined Aggregation
SELECT COUNT() FROM sensors
Depth d
13Illustration Pipelined Aggregation
SELECT COUNT() FROM sensors
Epoch 1
1
Sensor
1 2 3 4 5
1 1 1 1 1 1
1
1
1
Epoch
1
14Illustration Pipelined Aggregation
SELECT COUNT() FROM sensors
Epoch 2
3
Sensor
1 2 3 4 5
1 1 1 1 1 1
2 3 1 2 2 1
1
2
2
Epoch
1
15Illustration Pipelined Aggregation
SELECT COUNT() FROM sensors
Epoch 3
4
Sensor
1 2 3 4 5
1 1 1 1 1 1
2 3 1 2 2 1
3 4 1 3 2 1
1
3
2
Epoch
1
16Illustration Pipelined Aggregation
SELECT COUNT() FROM sensors
Epoch 4
5
Sensor
1 2 3 4 5
1 1 1 1 1 1
2 3 1 2 2 1
3 4 1 3 2 1
4 5 1 3 2 1
1
3
2
Epoch
1
17Illustration Pipelined Aggregation
SELECT COUNT() FROM sensors
Epoch 5
5
Sensor
1 2 3 4 5
1 1 1 1 1 1
2 3 1 2 2 1
3 4 1 3 2 1
4 5 1 3 2 1
5 5 1 3 2 1
1
3
2
Epoch
1
18Discussion
1
- Result is a stream of values
- Ideal for monitoring scenarios
- One communication / node / epoch
- Symmetric power consumption, even at root
- New value on every epoch
- After d-1 epochs, complete aggregation
- Given a single loss, network will recover after
at most d-1 epochs - With time synchronization, nodes can sleep
between epochs, except during small communication
window
2
3
4
5
19Simulation Result
- Simulation Results
- 2500 Nodes
- 50x50 Grid
- Depth 10
- Neighbors 20
Some aggregates require dramatically more state!
20Optimization Channel Sharing
- Insight Shared channel enables optimizations
- Suppress messages that wont affect aggregate
- E.g., in a MAX query, sensor with value v hears a
neighbor with value v, so it doesnt report - Applies to all such exemplary aggregates
- Learn about query advertisements it missed
- If a sensor shows up in a new environment, it can
learn about queries by looking at neighbors
messages. - Root doesnt have to explicitly rebroadcast query!
21Optimization Hypothesis Testing
- Insight Root can provide information that will
suppress readings that cannot affect the final
aggregate value. - E.g. Tell all the nodes that the MIN is
definitely lt 50 nodes with value 50 need not
participate. - Works for any linear aggregate function
- How is hypothesis computed?
- Blind guess
- Statistically informed guess
- Observation over first few levels of tree /
rounds of aggregate
22Optimization Use Multiple Parents
- For duplicate insensitive (e.g. MAX), or
partitionable (e.g. COUNT) aggregates, - Send (part of) aggregate to all parents
- Decreases variance
- Dramatically, when there are lots of parents
- No extra cost, since all messages broadcast
23Grouping
- Value-based, complete partitioning of records
- If query is grouped, sensors apply predicate to
local readings on each epoch - Aggregate records tagged with group
- When a child record (with group) is received
- If it belongs to a stored group, merge with
existing record for that group - If not, just store it
- At the end of each epoch, transmit one record per
group
24Overview
- Background
- Sensor Networks
- Our Approach Tiny Aggregation (TAG)
- Overview
- Expressiveness
- Illustration
- Optimizations
- Grouping
- Current Status Future Work
25Status Future Work
- Status
- Simple simulator
- Complete set of experiments, including behavior
of algorithms in the face of loss - Generalization of algorithms beyond complete
pipelining - Taxonomy of aggregates to allow optimizations on
functional properties - Basic implementation (shown in demo)
- Future work
- Expressiveness issues
- Aggregates over temporal data
- Nested queries, e.g MAX(AVG(1000 readings) _at_ each
node) - Correctness Issues in The Face Of Loss
- How does the user know which nodes are and are
not included in an aggregate?
26Summary
- Declarative queries for aggregates
- Straightforward, familiar interface
- Enables optimizations
- Snooping techniques for exemplary aggregates
- Multiple parents for partitionable aggregates
- Pipelined, epoch based algorithm
- Streaming Results
- Symmetric communication
- Low-power friendly
27Questions?
28Grouping
- GROUP BY expr
- expr is an expression over one or more attributes
- Evaluation of expr yields a group number
- Each reading is a member of exactly one group
- Example SELECT max(light) FROM sensors
- GROUP BY TRUNC(temp/10)
Result
Sensor ID Light Temp Group
1 45 25 2
2 27 28 2
3 66 34 3
4 68 37 3
Group max(light)
2 45
3 68
29Having
- HAVING preds
- preds filters out groups that do not satisfy
predicate - versus WHERE, which filters out tuples that do
not satisfy predicate - Example
- SELECT max(temp) FROM sensors
- GROUP BY light
- HAVING max(temp) lt 100
- Yields all groups with temperature under 100
30Group Eviction
- Problem Number of groups in any one iteration
may exceed available storage on sensor - Solution Evict!
- Choose one or more groups to forward up tree
- Rely on nodes further up tree, or root, to
recombine groups properly - What policy to choose?
- Intuitively least popular group, since dont
want to evict a group that will receive more
values this epoch. - Experiments suggest
- Policy matters very little
- Evicting as many groups as will fit into a single
message is good
31Simulation Environment
- Java-based simulation visualization for
validating algorithms, collecting data. - Coarse grained event based simulation
- Sensors arranged on a grid, radio connectivity by
Euclidian distance - Communication model
- Lossless All neighbors hear all messages
- Lossy Messages lost with probability that
increases with distance - Symmetric links
- No collisions, hidden terminals, etc.
32Simulation Screenshot
33Experimental Results
- Experiments with simulator
- Performance of basic TAG
- Benefits of hypothesis testing
- Effect of loss
- Most experiments in terms of bytes or messages
sent, since message transmission is the dominant
cost - Depends on radio being turned off between epochs
and aggregation functions being cheap
34Experiment Basic TAG
- Dense Packing, Ideal Communication
35Experiment Hypothesis Testing
- Uniform Value Distribution, Dense Packing, Ideal
Communication
36Experiment Effects of Loss
37Experiment Benefit of Cache
38Pipelined Aggregates
Value from 2 produced at time t arrives at 1 at
time (t1)
- After query propagates, during each epoch
- Each sensor samples local sensors once
- Combines them with PSRs from children
- Outputs PSR representing aggregate state in the
previous epoch. - After (d-1) epochs, PSR for the whole tree output
at root - d Depth of the routing tree
- If desired, partial state from top k levels could
be output in kth epoch - To avoid combining PSRs from different epochs,
sensors must cache values from children
Value from 5 produced at time t arrives at 1 at
time (t3)
39Pipelining Example
SID Epoch Agg.
SID Epoch Agg.
SID Epoch Agg.
40Pipelining Example
Epoch 0
SID Epoch Agg.
1 0 1
SID Epoch Agg.
2 0 1
4 0 1
lt4,0,1gt
lt5,0,1gt
SID Epoch Agg.
3 0 1
5 0 1
41Pipelining Example
Epoch 1
SID Epoch Agg.
1 0 1
1 1 1
2 0 2
SID Epoch Agg.
2 0 1
4 0 1
2 1 1
4 1 1
3 0 2
lt2,0,2gt
lt4,1,1gt
lt3,0,2gt
lt5,1,1gt
SID Epoch Agg.
3 0 1
5 0 1
3 1 1
5 1 1
42Pipelining Example
SID Epoch Agg.
1 0 1
1 1 1
2 0 2
1 2 1
2 0 4
lt1,0,3gt
Epoch 2
SID Epoch Agg.
2 0 1
4 0 1
2 1 1
4 1 1
3 0 2
2 2 1
4 2 1
3 1 2
lt2,0,4gt
lt4,2,1gt
lt3,1,2gt
lt5,2,1gt
SID Epoch Agg.
3 0 1
5 0 1
3 1 1
5 1 1
3 2 1
5 2 1
43Pipelining Example
SID Epoch Agg.
1 0 1
1 1 1
2 0 2
1 2 1
2 0 4
lt1,0,5gt
Epoch 3
SID Epoch Agg.
2 0 1
4 0 1
2 1 1
4 1 1
3 0 2
2 2 1
4 2 1
3 1 2
lt2,1,4gt
lt4,3,1gt
lt3,2,2gt
lt5,3,1gt
SID Epoch Agg.
3 0 1
5 0 1
3 1 1
5 1 1
3 2 1
5 2 1
44Pipelining Example
Epoch 4
lt1,1,5gt
lt2,2,4gt
lt4,4,1gt
lt3,3,2gt
lt5,4,1gt
45Optimization Delta Compression
- If a sensors reading is unchanged from previous
epoch, it need not transmit. - Parents assume value is unchanged
- Leverage child value cache
- Periodic heartbeats to handle disconnection
- Extension if a sensors reading is unchanged by
more than some threshold, it need not transmit - Similar to hypothesis testing with AVERAGE
- Really future work See C. Olsten, Best-Effort
Cache Synchronization, SIGMOD 2002.
46Taxonomy of Aggregates
- TAG insight classifying aggregates according to
various functional properties - Yields a general set of optimizations that can
automatically be applied
Property Examples Affects
Partial State MEDIAN unbounded, MAX 1 record Effectiveness of TAG
Duplicate Sensitivity MIN dup. insensitive, AVG dup. sensitive Routing Redundancy
Exemplary vs. Summary MAX exemplary COUNT summary Applicability of Sampling, Effect of Loss
Monotonic COUNT monotonic AVG non-monotonic Hypothesis Testing, Snooping
47(No Transcript)