Title: Querying Sensor Networks
1Querying Sensor Networks
- Sam Madden
- UC Berkeley
- October 2, 2002 _at_ UCLA
2Introduction
- Programming Sensor Networks Is Hard
- Especially if you want to build a real
application - Declarative Queries Are Easy
- And, can be faster and more robust than most
applications!
3Overview
- Overview of Declarative Systems
- TinyDB
- Features
- Demo
- Challenges Research Issues
- Language
- Optimizations
- The Next Step
4Overview
- Overview of Declarative Systems
- TinyDB
- Features
- Demo
- Challenges Research Issues
- Language
- Optimizations
- The Next Step
5Declarative Queries SQL
- SQL is the traditional declarative language used
in databases - SELECT sel-list
- FROM tables
- WHERE pred
- GROUP BY pred
- HAVING pred
SELECT dept.name, AVG(emp.salary) FROM
emp,dept WHERE emp.dno dept.dno AND
(dept.nameAccounting OR dept.nameMarketing)
GROUP BY dept.name
6Declarative Queries for Sensor Networks
- Examples
- SELECT nodeid, light
- FROM sensors
- WHERE light 400
- SAMPLE PERIOD 1s
1
7General Declarative Advantages
- Data Independence
- Not required to specify how or where, just what.
- Of course, can specify specific addresses when
needed - Transparent Optimization
- System is free to explore different algorithms,
locations, orders for operations
8Data Independence In Sensor Networks
- Vastly simplifies execution for large networks
- Since locations are described by predicates
- Operations are over groups
- Enables tolerance to faults
- Since system is free to choose where and when
operations happen
9Optimization In Sensor Networks
- Optimization Goal Power!
- Where to process data
- In network
- Outside network
- Hybrid
- How to process data
- Predicate Join Ordering
- Index Selection
- How to route data
- Semantically Driven Routing
10Overview
- Overview of Declarative Systems
- TinyDB
- Features
- Demo
- Challenges Research Issues
- Language
- Optimizations
- The Next Step
11TinyDB
- A distributed query processor for networks of
Mica motes - Available today!
- Goal Eliminate the need to write C code for
most TinyOS users - Features
- Declarative queries
- Temporal spatial operations
- Multihop routing
- In-network storage
12TinyDB _at_ 10000 Ft
(Almost) All Queries are Continuous and Periodic
- Written in SQL
- With Extensions For
- Sample rate
- Offline delivery
- Temporal Aggregation
13TinyDB Demo
14Applications Early Adopters
- Some demo apps
- Network monitoring
- Vehicle tracking
- Real future deployments
- Environmental monitoring _at_ GDI (and James
Reserve?) - Generic Sensor Kit
- Parking Lot Monitor
Demo!
15TinyDB Architecture (Per node)
SelOperator
AggOperator
- TupleRouter
- Fetches readings (for ready queries)
- Builds tuples
- Applies operators
- Deliver results (up tree)
TupleRouter
- AggOperator
- Combines local neighbor readings
Network
- SelOperator
- Filters readings
Radio Stack
Schema
TinyAllloc
- Schema
- Catalog of commands attributes (more later)
- TinyAlloc
- Reusable memory allocator!
16TinyAlloc
- Handle Based Compacting Memory Allocator
- For Catalog, Queries
Handle h call MemAlloc.alloc(h,10) (h)0
Sam call MemAlloc.lock(h) tweakString(h) cal
l MemAlloc.unlock(h) call MemAlloc.free(h)
User Program
Compaction
17Schema
- Attribute Command IF
- At INIT(), components register attributes and
commands they support - Commands implemented via wiring
- Attributes fetched via accessor command
- Catalog API allows local and remote queries over
known attributes / commands. - Demo of adding an attribute, executing a command.
18Overview
- Overview of Declarative Systems
- TinyDB
- Features
- Demo
- Challenges Research Issues
- Language
- Optimizations
- Quality
193 Questions
?
?
?
?
?
?
?
- Is this approach expressive enough?
- Can this approach be efficient enough?
- Are the answers this approach gives good enough?
20Q1 Expressiveness
- Simple data collection satisfies most users
- How much of what people want to do is just simple
aggregates? - Anecdotally, most of it
- EE people want filters simple statistics
(unless they can have signal processing) - However, wed like to satisfy everyone!
21Query Language
- New Features
- Joins
- Event-based triggers
- Via extensible catalog
- In network nested queries
- Split-phase (offline) delivery
- Via buffers
22Sample Query 1
- Bird counter
- CREATE BUFFER birds(uint16 cnt)
- SIZE 1
-
- ON EVENT bird-enter()
- SELECT b.cnt1
- FROM birds AS b
- OUTPUT INTO b
- ONCE
23Sample Query 2
- Birds that entered and left within time t of each
other - ON EVENT bird-leave AND bird-enter WITHIN t
- SELECT bird-leave.time, bird-leave.nest
- WHERE bird-leave.nest bird-enter.nest
- ONCE
24Sample Query 3
- Delta compression
- SELECT light
- FROM buf, sensors
- WHERE s.light buf.light t
- OUTPUT INTO buf
- SAMPLE PERIOD 1s
25Sample Query 4
- Offline Delivery Event Chaining
- CREATE BUFFER equake_data( uint16 loc, uint16
xAccel, uint16 yAccel) - SIZE 1000
- PARTITION BY NODE
- SELECT xAccel, yAccel
- FROM SENSORS
- WHERE xAccel t OR yAccel t
- SIGNAL shake_start()
- SAMPLE PERIOD 1s
- ON EVENT shake_start()
- SELECT loc, xAccel, yAccel
- FROM sensors
- OUTPUT INTO BUFFER equake_data(loc, xAccel,
yAccel) - SAMPLE PERIOD 10ms
26Event Based Processing
- Enables internal and chained actions
- Language Semantics
- Events are inter-node
- Buffers can be global
- Implementation plan
- Events and buffers must be local
- Since n-to-n communication not (well) supported
- Next operator expressiveness
27Operator Expressiveness Aggregate Framework
- Standard SQL supports the basic 5
- MIN, MAX, SUM, AVERAGE, and COUNT
- We support any function conforming to
Aggnfmerge, finit, fevaluate Fmerge,
? finita0 ? Fevaluate ?
aggregate value (Merge associative, commutative!)
Partial Aggregate
Example Average AVGmerge , ?
AVGinitv ?
AVGevaluate ? S1/C1
From Tiny AGgregation (TAG), Madden, Franklin,
Hellerstein, Hong. OSDI 2002 (to appear).
28Isobar Finding
29Temporal Aggregates
- TAG was about spatial aggregates
- Inter-node, at the same time
- Want to be able to aggregate across time as well
- Two types
- Windowed AGG(size,slide,attr)
- Decaying AGG(comb_func, attr)
- Demo!
size 4
slide 2
R1 R2 R3 R4 R5 R6
30Expressiveness Review
- Internal nested queries
- With logging of results for offline delivery
- Event based processing
- Extensible aggregates
- Spatial temporal
- On to Question 2 What about efficiency?
31Q2 Efficiency
- Metric power consumption
- Goal reduce communication, which dominates cost
- 800 instrs/bit!
- Standard approach in-network processing,
sleeping whenever you can
32But thats not good enough
- What else can we do to bring down costs?
- Sleep Even More?
- Events are key
- Apply automatic optimization!
- Semantically driven routing
- and topology construction
- Operator placement ordering
- Adaptive data delivery
33TAG
- In-network processing
- Reduces costs depending on type of aggregates
- Exploitation of operator semantics
Tiny AGgregation (TAG), Madden, Franklin,
Hellerstein, Hong. OSDI 2002 (to appear).
34Illustration Pipelined Aggregation
SELECT COUNT() FROM sensors
Depth d
35Illustration Pipelined Aggregation
SELECT COUNT() FROM sensors
Epoch 1
1
Sensor
1
1
1
Epoch
1
36Illustration Pipelined Aggregation
SELECT COUNT() FROM sensors
Epoch 2
3
Sensor
1
2
2
Epoch
1
37Illustration Pipelined Aggregation
SELECT COUNT() FROM sensors
Epoch 3
4
Sensor
1
3
2
Epoch
1
38Illustration Pipelined Aggregation
SELECT COUNT() FROM sensors
Epoch 4
5
Sensor
1
3
2
Epoch
1
39Illustration Pipelined Aggregation
SELECT COUNT() FROM sensors
Epoch 5
5
Sensor
1
3
2
Epoch
1
40Simulation Result
- Simulation Results
- 2500 Nodes
- 50x50 Grid
- Depth 10
- Neighbors 20
Some aggregates require dramatically more state!
41Taxonomy of Aggregates
- TAG insight classify aggregates according to
various functional properties - Yields a general set of optimizations that can
automatically be applied
42Optimization Channel Sharing
- Insight Shared channel enables optimizations
- Suppress messages that wont affect aggregate
- E.g., in a MAX query, sensor with value v hears a
neighbor with value v, so it doesnt report - Applies to all exemplary, monotonic aggregates
- Learn about query advertisements it missed
- If a sensor shows up in a new environment, it can
learn about queries by looking at neighbors
messages. - Root doesnt have to explicitly rebroadcast query!
43Optimization Hypothesis Testing
- Insight Root can provide information that will
suppress readings that cannot affect the final
aggregate value. - E.g. Tell all the nodes that the MIN is
definitely
participate. - Depends on monotonicity
- How is hypothesis computed?
- Blind guess
- Statistically informed guess
- Observation over first few levels of tree /
rounds of aggregate
44Optimization Use Multiple Parents
- For duplicate insensitive aggregates
- Or aggregates that can be expressed as a linear
combination of parts - Send (part of) aggregate to all parents
- Decreases variance
- Dramatically, when there are lots of parents
45TAG Summary
- In Query Processing A Win For Many Aggregate
Functions - By exploiting general functional properties of
operators, many optimizations are possible - Requires new aggregates to be tagged with their
properties - Up next non-aggregate query processing
optimizations a flavor of things to come!
46Attribute Driven Topology Selection
- Observation internal queries often over local
area - Or some other subset of the network
- E.g. regions with light value in 10,20
- Idea build topology for those queries based on
values of range-selected attributes - Requires range attributes, connectivity to be
relatively static
Heideman et. Al, Building Efficient Wireless
Sensor Networks With Low Level Naming. SOSP, 2001.
47Attribute Driven Query Propagation
SELECT WHERE a 5 AND a Precomputed intervals Query Dissemination
Index
4
1,10
20,40
7,15
1
2
3
48Attribute Driven Parent Selection
Even without intervals, expect that sending to
parent with closest value will help
1
2
3
1,10
20,40
7,15
3,6 ? 1,10 3,6 3,7 ? 7,15 ø 3,7 ?
20,40 ø
4
3,6
49Hot off the press
50Operator Placement Ordering
- Observation Nested queries, triggers, and joins
can often be re-ordered - Ordering can dramatically affect the amount of
work you do - Lots of standard database tricks here
51Operator Ordering Example 1
- SELECT light, mag
- FROM sensors
- WHERE pred1(mag)
- AND pred2(light)
- SAMPLE INTERVAL 1s
- Cost (in J) of sampling mag cost of sampling
light - Correct ordering (unless pred1 is very
selective) - 1. Sample light
- 2. Apply pred2
- 3. Sample mag
- 4. Apply pred1
52Operator Ordering Example 2
Every time an event occurs that satisfies pred1,
sample lights once every 5 seconds for 30 seconds
and report the samples that satisfy pred2
- ON EVENT bird-enter()
- WHERE pred1(event)
- SELECT light
- WHERE pred2(light)
- FROM sensors
- SAMPLE INTERVAL 5s
- FOR 30s
Note makes all samples in phase in sample window
Sample light once every 5 seconds. For every
sample that satisfies pred2, check and see if any
events that satisfy pred1 have occurred in the
last 30 seconds.
SELECT s.light FROM bird-enter-events30s AS
e, sensors AS s WHERE e.time pred1(e) AND pred2(s.light) SAMPLE INTERVAL 5s
53Adaptivity For Contention
- Observation Under high contention, radios
deliver fewer total packets than under low
contention. - Insight Dont allow radios to be highly
contested. Drop or aggregate instead. - Higher throughput
- Choice over what gets lost
- Based on semantics!
54Adaptivity for Power Conservation
- For many applications, exact sample rate doesnt
matter - But network lifetime does!
- Idea adaptively adjust sample rate extent of
aggregation based on lifetime goal and observed
power consumption
55Efficiency Summary
- Power is the important metric
- TAG
- In-network processing
- Exploit semantics of network and operators
- Channel sharing
- Hypothesis testing
- Using multiple parents
- Indexing for dissemination collection of data
- Placement and Operator Ordering
- Adaptive Sampling
56Q3 Answer Quality
- Lots of possibilities for improving quality
- Multi-path routing
- When applicable
- Transactional delivery
- a.k.a. custody transfer
- Link-layer retransmission
- Caching
- Failure still possible in all modes
- Open question whats the right quality metric?
57Diffusion as TinyDB Foundation?
- Claim diffusion is an infrastructure upon which
TinyDB could be built - Via declarative language, TinyDB is able to
provide semantic guarantees and transparent
optimization - Operators can be reordered
- Any tuple can be routed to any operator
- No (important) duplicates will be produced
- At what cost? Diffusion can
- Adjust better to loss
- Exploit well-connected networks
- Provide n-m routing, instead of n-1 routing
- Might allow global buffers, events, etc.
58Summary
- Declarative queries are the right interface for
data collection in sensor nets! - In network processing and optimization make
approach viable - Big query language improvements coming soon
- Event driven internal queries
- Adaptive sampling query indexes for
performance! - TinyDB Available Today
- http//telegraph.cs.berkeley.edu/tinydb
59Questions?
60Grouping
- GROUP BY expr
- expr is an expression over one or more attributes
- Evaluation of expr yields a group number
- Each reading is a member of exactly one group
- Example SELECT max(light) FROM sensors
- GROUP BY TRUNC(temp/10)
Result
61Having
- HAVING preds
- preds filters out groups that do not satisfy
predicate - versus WHERE, which filters out tuples that do
not satisfy predicate - Example
- SELECT max(temp) FROM sensors
- GROUP BY light
- HAVING max(temp)
- Yields all groups with temperature under 100
62Group Eviction
- Problem Number of groups in any one iteration
may exceed available storage on sensor - Solution Evict!
- Choose one or more groups to forward up tree
- Rely on nodes further up tree, or root, to
recombine groups properly - What policy to choose?
- Intuitively least popular group, since dont
want to evict a group that will receive more
values this epoch. - Experiments suggest
- Policy matters very little
- Evicting as many groups as will fit into a single
message is good
63Simulation Environment
- Java-based simulation visualization for
validating algorithms, collecting data. - Coarse grained event based simulation
- Sensors arranged on a grid, radio connectivity by
Euclidian distance - Communication model
- Lossless All neighbors hear all messages
- Lossy Messages lost with probability that
increases with distance - Symmetric links
- No collisions, hidden terminals, etc.
64Simulation Screenshot
65Experiment Basic TAG
- Dense Packing, Ideal Communication
66Experiment Hypothesis Testing
- Uniform Value Distribution, Dense Packing, Ideal
Communication
67Experiment Effects of Loss
68Experiment Benefit of Cache
69Pipelined Aggregates
- After query propagates, during each epoch
- Each sensor samples local sensors once
- Combines them with PSRs from children
- Outputs PSR representing aggregate state in the
previous epoch. - After (d-1) epochs, PSR for the whole tree output
at root - d Depth of the routing tree
- If desired, partial state from top k levels could
be output in kth epoch - To avoid combining PSRs from different epochs,
sensors must cache values from children
Value from 2 produced at time t arrives at 1 at
time (t1)
Value from 5 produced at time t arrives at 1 at
time (t3)
70Pipelining Example
71Pipelining Example
Epoch 0
72Pipelining Example
Epoch 1
73Pipelining Example
Epoch 2
74Pipelining Example
Epoch 3
75Pipelining Example
Epoch 4
76Our Stream Semantics
- One stream, sensors
- We control data rates
- Joins between that stream and buffers are allowed
- Joins are always landmark, forward in time, one
tuple at a time - Result of queries over sensors either a single
tuple (at time of query) or a stream - Easy to interface to more sophisticated systems
- Temporal aggregates enable fancy window
operations
77Formal Spec.
- ON EVENT ... WITHIN
SELECT agg()temporalag
g() FROM sensors
events WHERE GROUP BY
HAVING ACTION
WHERE BUFFER
SIGNAL () (SELECT ...
) INTO BUFFER SAMPLE PERIOD
FOR INTERPOLATE
COMBINE temporal_agg()
ONCE
78Buffer Commands
- AT
- CREATE BUFFER ()
- PARTITION BY
- SIZE ,
- AS SELECT ...
- SAMPLE PERIOD
- DROP BUFFER