Title: TinyDB
1TinyDB
- Slides modified from
- Sam Madden, Wei Hong
2TinyDB Architecture
TinyDB GUI
JDBC
TinyDB Client API
DBMS
PC side
0
Mote side
0
TinyDB query processor
2
1
3
8
4
5
6
Sensor network
7
3Part 1
4Query Language (TinySQL)
- SELECT ltaggregatesgt, ltattributesgt
- FROM sensors ltbuffergt
- WHERE ltpredicatesgt
- GROUP BY ltexprsgt
- SAMPLE PERIOD ltconstgt ONCE
- INTO ltbuffergt
- TRIGGER ACTION ltcommandgt
5Comparison with SQL
- Single table in FROM clause
- Only conjunctive comparison predicates in WHERE
and HAVING - No subqueries
- No column alias in SELECT clause
- Arithmetic expressions limited to column op
constant - Only fundamental difference SAMPLE PERIOD clause
6TinySQL Examples
Find the sensors in bright nests.
Sensors
- SELECT nodeid, nestNo, light
- FROM sensors
- WHERE light gt 400
- EPOCH DURATION 1s
1
7TinySQL Examples (cont.)
Count the number occupied nests in each loud
region of the island.
8Data Model
- Entire sensor network as one single,
infinitely-long logical table sensors - Columns consist of all the attributes defined in
the network - Typical attributes
- Sensor readings
- Meta-data node id, location, etc.
- Internal states routing tree parent, timestamp,
queue length, etc. - Nodes return NULL for unknown attributes
- On server, all attributes are defined in
catalog.xml
9Query over Stored Data
- Named buffers in Flash memory
- Store query results in buffers
- Query over named buffers
- Analogous to materialized views
- Example
- CREATE BUFFER name SIZE x (field1 type1, field2
type2, ) - SELECT a1, a2 FROM sensors SAMPLE PERIOD d INTO
name - SELECT field1, field2, FROM name SAMPLE PERIOD d
10Event-based Queries
- ON event SELECT
- Run query only when interesting events happens
- Event examples
- Button pushed
- Message arrival
- Bird enters nest
- Analogous to triggers but events are user-defined
11Inside TinyDB
Multihop Network
Query Processor
10,000 Lines Embedded C Code 5,000 Lines
(PC-Side) Java 3200 Bytes RAM (w/ 768 byte
heap) 58 kB compiled code (3x larger than 2nd
largest TinyOS Program)
Filterlight gt 400
Schema
TinyOS
TinyDB
12Part 2
- Routing Structure in TinyDB
13Tree-based Routing
- Tree-based routing
- Used in
- Query delivery
- Data collection
- In-network aggregation
- Relationship to indexing?
14Power Management Approach
- Coarse-grained app-controlled communication
scheduling
Epoch (10s -100s of seconds)
Mote ID
1
zzz
zzz
2
3
4
5
time
2-4s Waking Period
15Illustration Aggregation
SELECT COUNT() FROM sensors
Interval 4
Sensor
Epoch
Interval
1
16Illustration Aggregation
SELECT COUNT() FROM sensors
Interval 3
Sensor
2
Interval
17Illustration Aggregation
SELECT COUNT() FROM sensors
Interval 2
Sensor
1
3
Interval
18Illustration Aggregation
SELECT COUNT() FROM sensors
Interval 1
5
Sensor
Interval
19Illustration Aggregation
SELECT COUNT() FROM sensors
Interval 4
Sensor
Interval
1
20Time Synchronization
- All messages include a 5 byte time stamp
indicating system time in ms - Synchronize (e.g. set system time to timestamp)
with - Any message from parent
- Any new query message (even if not from parent)
- Punt on multiple queries
- Timestamps written just after preamble is xmitted
- All nodes agree that the waking period begins
when (system time epoch dur 0) - And lasts for WAKING_PERIOD ms
- Adjustment of clock happens by changing duration
of sleep cycle, not wake cycle.
21Part 3
- Aggregation Support in TinyDB
22Topics
- In-network aggregation
- Acquisitional Query Processing
- Heterogeneity
- Intermittent Connectivity
- In-network Storage
- Statistics-based summarization and sampling
- In-network Joins
- Adaptivity and Sensor Networks
- Multiple Queries
23Topics
- In-network aggregation
- Acquisitional Query Processing
- Heterogeneity
- Intermittent Connectivity
- In-network Storage
- Statistics-based summarization and sampling
- In-network Joins
- Adaptivity and Sensor Networks
- Multiple Queries
24Tiny Aggregation (TAG)
- Aggregate data along the tree
- Classify aggregates according to various
functional properties
Madden, Franklin, Hellerstein, Hong. Tiny
AGgregation (TAG), OSDI 2002.
25Aggregation Framework
- As in extensible databases, TinyDB supports any
aggregation function conforming to
Aggnfinit, fmerge, fevaluate Finit a0 ?
lta0gt Fmerge lta1gt,lta2gt ? lta12gt Fevaluate lta1gt
? aggregate value
Partial State Record (PSR)
Example Average AVGinit v ?
ltv,1gt AVGmerge ltS1, C1gt, ltS2, C2gt ? lt S1
S2 , C1 C2gt AVGevaluateltS, Cgt ? S/C
Restriction Merge associative, commutative
26Part 1
- Designing an Acquisitional Query Processor
Madden, Franklin, Hellerstein, and Hong. The
Design of An Acqusitional Query Processor.
SIGMOD, 2003.
27Acquisitional Query Processing (ACQP)
- Traditional databases assume that data is
provided a priori - Sensing and sensor data processing incurs
overhead as well! - An acqusitional query processor controls
- When should samples for a particular query be
taken? - In what order should samples for query be taken
and should sampling be interleaved with
processing? - Is it worth expending computational power to
process and relay a sample
28ACQP Whats Different?
- How should the query be processed?
- Sampling as a first class operation
- How does the user control acquisition?
- Rates or lifetimes
- Event-based triggers
- Which nodes have relevant data?
- Index-like data structures
- Which samples should be transmitted?
- Prioritization, summary, and rate control
29Operator Ordering Interleave Sampling Selection
At 1 sample / sec, total power savings could be
as much as 3.5mW ? Comparable to processor!
- SELECT light, mag
- FROM sensors
- WHERE pred1(mag)
- AND pred2(light)
- EPOCH DURATION 1s
- E(sampling mag) gtgt E(sampling light)
- 1500 uJ vs. 90 uJ
30Exemplary Aggregate Pushdown
- SELECT WINMAX(light,8s,8s)
- FROM sensors
- WHERE mag gt x
- EPOCH DURATION 1s
- Novel, general pushdown technique
- Mag sampling is the most expensive operation!
31Part 3
- Open Database Research Issues in Sensor Networks
32Topics
- In-network aggregation
- Acquisitional Query Processing
- Heterogeneity
- Intermittent Connectivity
- In-network Storage
- Statistics-based summarization and sampling
- In-network Joins
- Adaptivity and Sensor Networks
- Multiple Queries
33Heterogeneous Sensor Networks
- Leverage small numbers of high-end nodes to
benefit large numbers of inexpensive nodes - Still must be transparent and ad-hoc
- Key to scalability of sensor networks
- Interesting heterogeneities
- Energy battery vs. outlet power
- Link bandwidth Chipcon vs. 802.11x
- Computing and storage ATMega128 vs. Xscale
- Pre-computed results
- Sensing nodes vs. QP nodes
34Occasionally Connected Sensornets
internet
GTWY
Mobile GTWY
Mobile GTWY
Mobile GTWY
GTWY
35Distributed In-network Storage
- Collectively, sensornets have large amounts of
in-network storage - Good for in-network consumption or caching
- Challenges
- Distributed indexing for fast query dissemination
- Resilience to node or link failures
- Graceful adaptation to data skews
- Minimizing index insertion/maintenance cost
36Statistical Techniques
- Approximations, summaries, and sampling based on
statistics - Applications
- Limited bandwidth and large number of nodes -gt
data reduction - Lossiness -gt predictive modeling
- Uncertainty -gt tracking correlations and changes
over time
37In-Net Regression
- Linear regression simple way to predict future
values, identify outliers
- Regression can be across local or remote values,
multiple dimensions, or with high degree
polynomials - E.g., node A readings vs. node Bs
- Or, location (X,Y), versus temperature
- E.g., over many nodes
Guestrin, Thibaux, Bodik, Paskin, Madden.
Distributed Regression an Efficient Framework
for Modeling Sensor Network Data . Under
submission.
38In-Net Regression (Continued)
- Problem may require data from all sensors to
build model - Solution partition sensors into overlapping
kernels that influence each other - Run regression in each kernel
- Requiring just local communication
- Blend data between kernels
- Requires some clever matrix manipulation
- End result regressed model at every node
- Useful in failure detection, missing value
estimation
39Correlated Attributes
- Data in sensor networks is correlated e.g.,
- Temperature and voltage
- Temperature and light
- Temperature and humidity
- Temperature and time of day
- etc.
40Exploiting Correlations in Query Processing
- Simple idea
- Given predicate P(A) over expensive attribute A
- Replace it with P over cheap attribute A such
that P evaluates to P - Problem unless A and A are perfectly
correlated, P ? P for all time - So we could incorrectly accept or reject some
readings - Alternative use correlations to improve
selectivity estimates in query optimization - Construct conditional plans that vary predicate
order based on prior observations
41Exploiting Correlations (Cont.)
- Insight by observing a (cheap and correlated)
variable not involved in the query, it may be
possible to improve query performance - Improves estimates of selectivities
- Use conditional plans
- Example
42In-Network Join Strategies
- Types of joins
- non-sensor -gt sensor
- sensor -gt sensor
- Optimization questions
- Should the join be pushed down?
- If so, where should it be placed?
- What if a join table exceeds the memory available
on one node?
43Choosing Where to Place Operators
- Idea choose a join node to run the operator
- Over time, explore other candidate placements
- Nodes advertise data rates to their neighbors
- Neighbors compute expected cost of running the
join based on these rates - Neighbors advertise costs
- Current join node selects a new, lower cost node
Bonfils Bonnet, Adaptive and Decentralized
Operator Placement for In-Network QueryProcessing
IPSN 2003.
44Adaptivity In Sensor Networks
- Queries are long running
- Selectivities change
- E.g. night vs day
- Network load and available energy vary
- All suggest that some adaptivity is needed
- Of data rates or granularity of aggregation when
optimizing for lifetimes - Of operator orderings or placements when
selectivities change (c.f., conditional plans for
correlations) - As far as we know, this is an open problem!
45Multiple Queries and Work Sharing
- As sensornets evolve, users will run many queries
simultaneously - E.g., traffic monitoring
- Likely that queries will be similar
- But have different end points, parameters, etc
- Would like to share processing, routing as much
as possible - But how? Again, an open problem.