Title: Queries%20Over%20Streaming%20Sensor%20Data
1Queries Over Streaming Sensor Data
- Samuel Madden
- Qualifying Exam
- University of California, Berkeley
- May 14th, 2002
2Introduction
- Sensor networks are here
- Berkeley on the cutting edge
- Data collection, monitoring are a driving
application - My research
- Query processing for sensor networks
- Server (DBMS) side issues
- In-network issues
- Goal Understand how to pose, distribute, and
process queries over streaming, lossy, wireless,
and power-constrained data sources such as sensor
networks.
3Overview
- Introduction
- Sensor Networks TinyOS
- Research Goals
- Completed Research Sensor Network QP
- Central Query Processor
- In Network, on Sensors
- Research Plan
- Future Implementation Research Efforts
- Time line
- Related Work
4Overview
- Introduction
- Sensor Networks TinyOS
- Research Goals
- Completed Research Sensor Network QP
- Central Query Processor
- In Network, on Sensors
- Research Plan
- Future Implementation Research Efforts
- Time line
- Related Work
5Sensor Networks TinyOS
- A collection of small, radio-equipped, battery
powered networked microprocessors - Typically Ad-hoc Multihop Networks
- Single devices unreliable
- Very low power tiny batteries or solar cells
power for months - Berkeleys Version Mica Motes
- TinyOS operating system (services)
- 4K RAM, 512K EEPROM, 128K code space
- Lossy 20 loss _at_ 5M in Ganesan et al.
experiments - Communication Very Expensive
- 800 instrs/bit xmitted
- Apps Environment Monitoring, Personal Nets,
Object Tracking - Data processing plays a key role!
6Overview
- Introduction
- Sensor Networks TinyOS
- Research Goals
- Completed Research Sensor Network QP
- Central Query Processor
- In Network, on Sensors
- Visualizations
- Research Plan
- Future Implementation Research Efforts
- Time line
- Related Work
7Motivation
- Why apply database approach to sensor network
data processing? - Declarative Queries
- Data independence
- Optimization opportunities
- Hide low-level complexities
- Familiar Interface
- Work sharing
- Adaptivity
- Proper interfaces can leverage existing database
systems - TeleTiny architecture offers all of these
- Suitable for a variety of lossy, streaming
environments (not just TinyOS!) - Sharing Adaptivity are Themes
8Architecture
Lots of help! Fjords ICDE 2002, with
Franklin CACQ SIGMOD 2002, with Shah,
Hellerstein, Raman, Franklin TAG WMCSA
2002, with Szewczyk, Culler, Franklin,
Hellerstein, Hong Catalog with Hong
Telegraph
Fjords Handle push-based data
9Overview
- Introduction
- Sensor Networks TinyOS
- Research Goals
- Completed Research Sensor Network QP
- Central Query Processor
- In Network, on Sensors
- Research Plan
- Future Implementation Research Efforts
- Time line
- Related Work
10Sensor Network Query Processing Challenges
- Query Processor Must Be Able To
- Tolerate lossy data delivery
- Handle failure of individual data sources
- Conserve power on devices whenever possible
- Perhaps by using on-board processing
- E.g. Applying selection predicates in network
- Or by sharing work where ever possible
- Handle push-based data
- Handle streaming data
11Server-side Sensor QP
- Mechanisms
- Continuous Queries
- Sensor Proxies
- Fjord Query Plan Architecture
- Stream Sensitive Operators
12Continuous Queries (CQ)
- Long running queries
- User installs
- Continuously receive answers until deinstallation
- Common in streaming domain
- Instantaneous snapshots dont tell you much may
not be interested in history - Monitoring Queries
- Examine light levels and locate rooms that are in
use - Monitor the temperature in my workspace and
adjust the temperature to be in the range (x,y)
13Continuously Adaptive Continuous Queries (CACQ)
- Given user queries over current sensor data
- Expect that many queries will be over the same
data sources (e.g. traffic sensors) - Queries over current data always looking at same
tuples - Those queries can share
- Current tuples
- Work (e.g. selections)
- Sharing reduces computation, communication
- Continuously Adaptive
- When sharing work, queries come and go
- Over long periods of time, selectivities change
- Assumptions that were valid at the start of the
query no longer valid
14CACQ Overview
S1
S3
R2
R1
R2
S5
R2
R2
R1
S2
R2
R1
R2
S4
R1
S6
15Working Sharing via Tuple Lineage
Q1 SELECT FROM s WHERE A, B, C Q2 SELECT
FROM s WHERE A, B, D
Conventional Queries
Query 1
Query 2
s(C,D,B,A)
s
s
s(C,D,B)
s
s
s(C,D)
s
s
s(C)
s
s()
s
s
s
Data Stream S
16CACQ Contributions
- Continuous adaptivity (operator reordering) via
eddies - All queries within same eddy
- Routing policies to enable that reordering
- Explicit Tuple Lineage
- Within each tuple, store where has been, where it
must go - Maximizes sharing of tuples between queries
- Grouped Filter
- Predicate index that applies range equality
selections for multiple queries at the same time
17CACQ vs. NiagaraCQ
- Performance Comparable for One Experiment in NCQ
Paper - Example where CACQ destroys NCQ
result gt stocks
Expensive
SELECT stocks.sym, articles.text FROM
stocks,articles WHERE stocks.sym articles.sym
AND UDF(stocks)
18CACQ vs. NiagaraCQ 2
SA
SA
SA
S
A
19CACQ vs. NiagaraCQ Graph
20CACQ Review
- Many Queries, One Eddy
- Fine Grained Adaptivity
- Grouped Filter Predicate Index
- Tuple Lineage
21Sensor Proxy
- CQ is a query processing mechanism need to get
data from sensors - Mediate between Sensors and Query Processor
- Push operators out to sensors
- Hide query processing, knowledge of multiple
queries from sensors - Hide details of sensors from query processor
- Enable power-sensitivity
Query Processor
22Fjording The Stream
- Sensors, even through proxy, deliver data
unusually - Query plan implementation
- Useful for streams and distributed environments
- Combine push (streaming) data and pull (static)
data - E.g. traffic sensors with CHP accident reports
23Summary of Server Side QP
- CACQ
- Enables sharing of work between long running
queries - Enable adaptivity for long running queries
- Sensor Proxy
- Hides QP complexity from sensors, power issues
from QP - Fjords
- Enable combination of push and pull data
- Non-blocking processing integral to the query
processor
SIGMOD
ICDE
24Sensor Side Sensor QP
- Research thus far allows central QP to play nice
with sensors - Doesnt address how sensors can help with QP
- Use their processors to processes queries
- Advertise their capabilities and data sources
- Control data delivery rates
- Detect, report, and mitigate errors and failures
- Two pieces thus far
- Tiny Aggregation (TAG) WMCSA Paper,
Resubmission in Progress - Catalog
- Lots of work in progress!
25Catalog
- Problem Given a heterogeneous environment full
of motes, how do I know what data they can
provide or process? - Solution Store a small catalog on each device
describing its capabilities - Mirror that catalog centrally to avoid
overloading sensors - Enables data independence
- Catalog Content
- For each attribute
- Name, Type, Size
- Units (e.g. farenheit)
- Resolution (e.g. 10 bits)
- Calibration Information
- Accessor functions
- Cost information
- Power, time, maximum sample rate
26Tiny Aggregation (TAG)
- How can sensors be leveraged in query processing?
- Insight Aggregate queries common case!
- Users want summaries of information across
hundreds or thousands of nodes - Information from individual nodes
- Often uninteresting
- Could be expensive to retrieve at fine
granularity - Take advantage of tree-based multihop routing
- Common way to collect data at a centralized
location - Combine data at each level to compute aggregates
in network
27Advantages of TAG
- Order of magnitude decrease in communication for
some aggregates - Streaming results
- Converge after transient errors
- Successive results in half the messages of
initial result - Reduces the burden on the upper levels of routing
tree - Declarative queries enable
- Optimizations based on a classification of
aggregate properties - Very simple to deploy, use
28TAG Example
SELECT COUNT FROM SENSORS
29TAG Example
SELECT COUNT FROM SENSORS
Epoch 0
(4, 0, 1)
(6, 0, 1)
(5, 0, 1)
30TAG Example
SELECT COUNT FROM SENSORS
Epoch 1
(3, 0, 2)
(2, 0, 2)
(4, 1, 1)
(6, 1, 1)
(5, 1, 1)
31TAG Example
1,0,6
SELECT COUNT FROM SENSORS
Epoch 2
(3, 1, 2)
(2, 1, 3)
(4, 2, 1)
(6, 2, 1)
(5, 2, 1)
32TAG Example
- Value at Root (d-1) Epochs Old
- New Value Every Epoch
- Nodes must cache old values
1,1,6
SELECT COUNT FROM SENSORS
Epoch 3
(3, 2, 2)
(2, 2, 3)
(4, 3, 1)
(6, 3, 1)
(5, 3, 1)
33TAG Optimizations Loss Tolerance
- Optimizations to Decrease Message Overhead
- When computing a MAX, nodes can suppress their
own transmissions if they hear neighbors with
greater values - Or, root can propagate down a hypothesis
- Suppress values that dont change between epochs
- Techniques to Handle Lossiness of Network
- Cache child results
- Send results up multiple paths in the routing
tree - Grouping
- Techniques for handling too many groups (aka
group eviction)
34Experiment Basic TAG
- Dense Packing, Ideal Communication
35Sensor QP Summary
- In-Sensor Query Processing Consists of
- TAG, for in-network aggregation
- Order of magnitude reduction in communication
costs for simple aggregates. - Techniques for grouping, loss tolerance, and
further reduction in costs - Catalog, for tracking queryable attributes of
sensors - In upcoming implementation
- Selection predicates
- Multiplexing multiple queries over network
36Overview
- Introduction
- Sensor Networks TinyOS
- Research Goals
- Completed Research Sensor Network QP
- Central Query Processor
- In Network, on Sensors
- Research Plan
- Future Implementation Research Efforts
- Time line
- Related Work
37Whats Left?
- Development Tasks
- TeleTiny Implementation
- Sensor Proxy Policies Implementation
- Telegraph (or some adaptive QP) Interface
- Research Tasks
- Publish / Follow-on to TAG
- Query Semantics
- Real-world Deployment Study
- Techniques for Reporting Managing Resources
Loss
38TeleTiny Implementation
- In Progress (Goal Ready for SIGMOD 02 Demo)
- In TinyOS, for Mica Motes, with Wei Hong JMH
- Features
- SELECT and aggregate queries processed in-network
- Ability to query arbitrary attributes
- Including power, signal strength, etc.
- Flexible architecture that can be extended with
additional operators - Multiple simultaneous queries
- UDF / UDAs via VM
- Status
- Aggregation Selection engine built
- No UDFs
- Primitive routing
- No optimizations
- Catalog interface designed, stub implementation
- 20kb of code space!
39Sensor Proxy
- Sensor Proxy Issues
- How to choose what runs on centrally and what
runs on the motes? - Some operators obvious (e.g. join?)
- Storage or computation demands preclude running
in-network - Other operators there is a choice
- Limited resources mean motes will not have
capacity for all pushable operators. - So which subset of operators to push?
40Sensor Proxy (cont)
- Cost-based query optimization problem what to
optimize? - Power load on network
- Central CPU costs
- Basic approach
- Push down as much as possible
- Push high-update rate, low-state aggregate
queries first - Benefit most from TAG
- Satisfy other queries by sampling at minimum rate
that can satisfy all queries, processing centrally
41Research Real World Study
- Goal Characterize performance of TeleTiny on a
building monitoring network running in the
Intel-Research Lab in the PowerBar building. - To
- Demonstrate effectiveness of our approach
- Derive a number of important workload and
real-world parameters that we can only speculate
about - Be cool.
- Also, Telegraph Integration, which should offer
- CACQ over real sensors
- Historical data interface
- Queries that combine historical data and
streaming sensor data - Fancy adaptive / interactive features
- E.g. adjust sample rates on user demand
42Real World Study (Cont.)
- Measurements to obtain
- Types of queries
- Snapshot vs. continuous
- Loss Failure Characteristics
- lost messages, frequency of disconnection
- Power Characteristics
- Amount of Storage
- Server Load
- Variability in Data Rates
- Is adaptivity really needed?
- Lifetime of Queries
43Research Reporting Mitigating Resource
Consumption Loss
- Resource scarcity loss are endemic to the
domain - Problem What techniques can be used to
- Accommodate desired workload despite limited
resources? - Mitigate inform users of losses?
- Key Issue because
- Dramatically affects usability of system
- Otherwise users will roll-their-own
- Dramatically affects quality of system
- Results are poor without some
- additional techniques
- Within themes of my research
- Sharing of resources
- Adaptivity to losses
44Some Resource Loss Tolerance Techniques
- Identify locations of loss
- E.g. annotate reported values with information
about lost children - Provide user with tradeoffs for smoothing loss
- TAG
- Cache results temporal smearing
- Send to multiple parents more messages, less
variance - Or, as in STREAM project, compute lossy
summaries of streams, - Offer user alternatives to unanswerable queries
- E.g. ask if a lower sample rate would be OK?
- Or if a nearby set of sensors would suffice?
- Educate. (Lower expectations!)
- Employ Admission Control, Leases
45Timeline
- May - June 2002
- Complete sensor-side software
- Schema API
- Catalog Server
- UDFs
- SIGMOD Demo
- ICDE Paper on stream semantics
- Resubmit TAG (to OSDI, hopefully.)
- June - August 2002
- Telegraph Integration
- Sensor proxy implementation
- Instrument Deploy Lab Monitoring, Begin Data
Collection
46Timeline (cont.)
- August - November 2002
- Telegraph historical results integration /
implementation - SIGMOD paper on Lab Monitoring deployment
- August - January 2003
- Explore and implement mechanisms for handling
resource constraints faults - February 2003
- VLDB Paper on Resource Constraints
- February - June 2003
- Complete Dissertation
47Overview
- Introduction
- Sensor Networks TinyOS
- Research Goals
- Completed Research Sensor Network QP
- Central Query Processor
- In Network, on Sensors
- Research Plan
- Future Implementation Research Efforts
- Time line
- Related Work
48Related Work
- Database Research
- Cougar (Cornell)
- Sequences Streams
- SEQ (Wisconsin) Temporal Database Systems
- Stanford STREAM
- Architecture similar to CACQ
- State management
- Query Semantics
- Continuous Queries
- NiagaraCQ (Wisconsin)
- Psoup (Chandrasekaran Franklin)
- X/YFilter (Altinel Franklin, Diao Franklin)
- Adaptive / Interactive Query Processing
- CONTROL (Hellerstein, et. al)
- Eddies (Avnur Hellerstein)
- Xjoin / Volcano (Urhan Franklin, Graefe)
49Related Work (Cont.)
- Sensor / Networking Research
- UCLA / ISI / USC (Estrin, Heidemann, et al.)
- Diffusion Sensor-fusion Routing
- Low-level naming Mechanisms for data
collection, joins? - Application specific aggregation
- Impact of Network Density on Data Aggregation
- Aka Greedy Aggregation, or how to choose a good
topology - Network measurements (Ganesan, et al.)
- MIT (Balakrishnan, Morris, et al.)
- Fancy routing protocols (LEACH / Span)
- Insights into data delivery scheduling for power
efficiency - Intentional Naming System (INS)
- Berkeley / Intel
- TinyOS (Hill, et al.), lots of discussion ideas
50Summary
- Query processing is a key feature for improving
usability of sensor networks - TeleTiny Solution Brings
- On the query processor
- Ability to combine query data as it streams in
- Adaptivity and performance
- In the sensor network
- Power efficiency via in-network evaluation
- Catalog
- Upcoming research work
- Real world deployment study
- Evaluation of techniques for resource usage
loss mitigation - TAG resubmission
- Graduation, Summer 2003!
51Thats all, folks!
52Sensor Networks
- A collection of small, radio-equipped, battery
powered networked microprocessors - Typically Ad-hoc
- No predefined network routes
- Multihop
- Routes span at least one intermediate node
- Deployed in hundreds or thousands
- Little concern concern for reliability of a
single device - Very low power, such that tiny batteries or solar
cells can keep them powered for months - Popular in Research Community
- Berkeley Motes
- USC / UCLA / Sensoria WINS Platform
- MIT Cricket
53TinyOS Motes
- TinyOS Project
- Goal build an operating system for sensor
networks - Prototype on simple devices built from
off-the-shelf components. (2cm x 3cm)
- Current generation devices (Mica Motes)
- 50kbit radios, 100ft range
- 4K RAM, 512K EEPROM
- Sensors light, temperature, acceleration,
sound, humidity, magnetic field - Radio Loss Rate 20 _at_ 5M Range
- Communication Dominates Power Cost 800 instrs /
bit xmitted
See Jason Hill, Robert Szewczyk, Alec Woo, Seth
Hollar, David Culler, Kristofer Pister. System
architecture directions for network sensors.
ASPLOS 2000.
54TinyOS
- Lightweight OS for sensors
- Event-driven
- Software based radio stack
- Linked into user programs written in C
- Features
- Network reprogramming
- Time synchronization
- Localization
- Simple VM
- Simulator
55Sensor Network Applications
- Applications
- Environmental Monitoring
- Power, light, temp, movement in buildings
- Activity, weather outside
- Structural
- Moving Object Tracking
- Personal Networks
- Data processing plays a key role!
56Fjords
- Operators (e.g. select, join) data-direction
agnostic - Different Modes of Data Delivery Consumption
- Implemented via queues (connectors)
- Synchronous / asynchronous result production
- Blocking / non-blocking result consumption
- Sensors asynchronous production, non-blocking
consumption - Contrast with
- Iterator model (synchronous production, blocking
consumption) - Exchange operator (Graefe) (asynchronous
production, blocking consumption)
57Pull Example
- Operator
- Queue q
- Tuple process()
- Tuple t q.get(), outt null
- If (t ! null)
- ltprocess tgt
- else do something else
- return outt
Pull Queue Operator parent, child Tuple get()
Tuple t null while (t null) t
child.process() return t
s
- Notice
- Iterator semantics by making get() blocking
- Get() can return null
- Process() can return null
Pull Connection
Scan
58Push Example
- Operator
- Queue q
- Tuple process()
- Tuple t q.get(), outt null
- If (t ! null)
- ltprocess tgt
- else do something else
- return outt
-
- Thread
- while(true)
- Tuple t op.process()
- if (t ! null) op.outq.enqueue(t)
Push Queue Operator parent, child Vector v new
Vector() Tuple get() if (v.size() gt 0) return
v.removeFirst() else return null Tuple
enqueue(Tuple t) v.put(t)
s
Push Connection
Scan
59Relational Operators And Streams
- In addition to query plan mechanism, need new
operators - Selection and Projection Apply Naturally
- Non-Blocking Operators
- Sorts and aggregates over the entire stream
- Nested loops and sort-merge join
- Windowed Operators
- Sorts, aggregates, etc.
- Online, Interactive QP Techniques
- In memory symmetric hash join (Wilschut Apers)
- Alternatives
- Ripple-join (Haas Hellerstein)
- Xjoin (Urhan Franklin), etc.
- Partial Results (Raman Hellerstein)
60CACQ Architecture
61Grouped Filter
62Per Tuple State
63Tuple Lineage
T.c Stem
R.a Stem
S.b Stem
R
R
R
R
Query 1
Query 2
Query 3
64Tuple Lineage
T.c Stem
R.a Stem
S.b Stem
R
R
R
R
Query 1
Query 2
Query 3
65Tuple Lineage
T.c Stem
R.a Stem
S.b Stem
R
R
R
R
Query 1
Query 2
Query 3
66Tuple Lineage
T.c Stem
R
R.a Stem
S.b Stem
R
R
R
Query 1
Query 2
Query 3
67Tuple Lineage
T.c Stem
R.a Stem
S.b Stem
R
R
R
R
Query 1
Query 2
Query 3
68Tuple Lineage
T.c Stem
R.a Stem
S.b Stem
S
S
S
Query 1
Query 2
Query 3
69Tuple Lineage
T.c Stem
R.a Stem
S.b Stem
S
S
S
Query 1
Query 2
Query 3
70Continuous Adaptivity Result
- Attributes uniformly distributed over (0,100)
71Database Style Aggregation
- SELECT aggn(attrn), attrs
- FROM sensors
- WHERE selPreds
- GROUP BY expr
- HAVING havingPreds
- EPOCH DURATION I
Aggnfmerge, finit, fevaluate Fmergelta1gt,
lta2gt ? lta12gt finitv ? lta0gt Fevaluatelta1gt
? aggregate value
Example Average AVGmergeltS1, C1gt, ltS2, C2gt ?
ltS1 S2 , C1 C2gt AVGinitv ?
ltv,1gt AVGevaluateltS1, C1gt ? ltS1/C1 gt
Each a tuple is a Partial State Record (PSR),
representing the combination of local values and
child aggregates at a particular node
72Query Propagation
- TAG propagation agnostic
- Any algorithm that can
- Deliver the query to all sensors
- Provide all sensors with one or more duplicate
free routes to some root - One Approach Flood
- Query introduced at a root rebroadcast by all
sensors until it reaches leaves - Sensors pick parent and level when they hear
query - Reselect parent after k silent epochs
Query
1
P0, L1
2
3
P1, L2
P1, L2
4
P2, L3
6
P3, L3
5
P4, L4
73Pipelined Aggregates
Value from 2 produced at time t arrives at 1 at
time (t1)
- After query propagates, during each epoch
- Each sensor samples local sensors once
- Combines them with PSRs from children
- Outputs PSR representing aggregate state in the
previous epoch. - After (d-1) epochs, PSR for the whole tree output
at root - d Depth of the routing tree
- If desired, partial state from top k levels could
be output in kth epoch - Complication May need to avoid combining PSRs
from different epochs - Conceptually, stall the pipeline
- Solutions
- Introduce delays or adjust delivery rates
(requires schedule) - In paper, use a cache
Value from 5 produced at time t arrives at 1 at
time (t3)
74Pipelining Example
Epoch 0
Introduce delay stages
75Pipelining Example
Epoch 0
SELECT COUNT() FROM sensors
Delay Stage
Delay Stage
Delay Stage
5,0,1
3,0,1
2,0,1
4,0,1
1,0,1
1
76Pipelining Example
Epoch 1
SELECT COUNT() FROM sensors
Delay Stage
Delay Stage
1,0,1
2,0,1
4,0,1
3,0,2
Delay Stage
5,1,1
3,1,1
2,1,1
4,1,1
1,1,1
1
77Pipelining Example
Epoch 2
SELECT COUNT() FROM sensors
Delay Stage
2,0,4
1,0,1
Delay Stage
1,1,1
2,1,1
4,1,1
3,1,2
Delay Stage
5,2,1
3,2,1
2,2,1
4,2,1
1,2,1
1
78Pipelining Example
1,0,5
Epoch 3
SELECT COUNT() FROM sensors
Delay Stage
2,1,4
1,1,1
Delay Stage
1,2,1
2,2,1
4,2,1
3,2,2
Delay Stage
5,3,1
3,3,1
2,3,1
4,3,1
1,3,1
1
79Pipelining Example
1,1,5
Epoch 4
SELECT COUNT() FROM sensors
Delay Stage
2,2,4
1,2,1
Delay Stage
3,3,2
1,3,1
2,3,1
4,3,1
Delay Stage
5,4,1
3,4,1
2,4,1
4,4,1
1,4,1
1
80Discussion
- Result of query is a stream of values
- After transient error, converges in at most d
epochs - Value at root is d-1 epochs old
- New value every epoch
- Versus d epochs for first complete value, or to
collect a snapshot - Delay Stages Conceptually Represent
- Local caches
- Adjusted delivery rates
81Visualizations
- Not published
- Motivate ideas
- E.g. need to combine streaming and static data
- Illustrate Algorithms
- E.g. TAG
- Debug algorithms
- E.g TAG
82Traffic Visualization
83Sensor Network Visualization
84TeleTiny Overview
- Major components
- TINY_ALLOC Memory allocator
- TUPLE_ROUTER Query processor
- AGG_OPERATOR Aggregator
- TINYDB_NETWORK Network interface
- SCHEMA Catalog (aka Introspection) interface
AGG_OPERATOR
SELECT_OPERATOR
TUPLE_ROUTER
TINYDB_NETWORK
Radio Stack
Schema
TinyAllloc
85(No Transcript)