Title: Database Middleware for Sensor Networks
1Database Middleware for Sensor Networks
Sam Madden Assistant Professor,
MIT madden_at_csail.mit.edu
Slides prepared with Wei Hong
2Motivation
- Sensor networks (aka sensor webs, emnets) are
here - Several widely deployed HW/SW platforms
- Low power radio, small processor, RAM/Flash
- Variety of (novel) applications scientific,
industrial, commercial - Great platform for mobile ubicomp
experimentation - Real, hard research problems to be solved
- Networking, systems, languages, databases
- Central problem ease of access, appropriate
programming abstractions - I will summarize
- Low-level sensornet issues
- A particular middleware architecture
- TinyDB TASK
- Current and future research middleware ideas
3Some Sensornet Apps
smart cooling in data centers
redwood forest microclimate monitoring
http//www.hpl.hp.com/research/dca/smart_cooling/
And More
condition-based maintenance
- Homeland security
- Container monitoring
- Mobile environmental apps
- Bird tracking
- Zebranet
- Home automation
- Etc!
structural integrity
4Architectural Overview
Internet
Directed Diffusion COUGAR
Middleware Issues APIs for current historical
access? Which data when? How to act on
data? Network and node status?
5Declarative Queries
- Programming Apps is Hard
- Limited power budget
- Lossy, low bandwidth communication
- Require long-lived, zero admin deployments
- Distributed Algorithms
- Limited tools, debugging interfaces
- Queries abstract away much of the complexity
- Burden on the database developers
- Users get
- Safe, optimizable programs
- Freedom to think about apps instead of details
6TinyDB Declarative Query Interface to Sensornets
- Platform Berkeley Motes TinyOS
- Continuous variant of SQL TinySQL
- Power and data-acquisition based in-network
optimization framework - Extensible interface for aggregates, new types of
sensors
7Agenda
- Part 1 Sensor Networks (40 mins)
- TinyOS
- NesC
- Part 2 TinyDB TASK (50 mins)
- Data Model and Query Language
- Software Architecture
- 30 minute break
- Part 3 Alternative Middleware (130 mins)
Architectures Research Directions - Finish around 12
8Part 1
- Sensornet Background
- Motes Mote Hardware
- TinyOS
- Programming Model NesC
- TinyOS Architecture
- Major Software Subsystems
- Networking Services
9Sensor Networks a hot topic
- New university courses
- New conferences
- ACM SenSys, IEEE IPSN, etc.
- New industrial research lab projects
- Intel, PARC, MSR, HP, Accenture, etc.
- Startup companies
- Crossbow, Dust, Ember, Sensicast, Moteiv, etc.
- Media Buzz
- Over 30 news articles since July 2002 covering
Intel-Berkeley/UC Berkeley sensor network
activities - One of 10 emerging technologies that will change
the world MIT Technology Review
10A Brief History of Sensornets
- People have used sensors for a long time
- Recent CS History
- (1998) Pottie Kaiser Radio based networks of
sensors - (1998) Pister et al Smart Dust
- Initial focus on optical communication
- By 1999, radio based networks, COTS Dust, Motes
- (1999) Estrin Govindan
- Ad-hoc networks of sensors
- (2000) Culler/Hill et al TinyOS Motes
- (2000) Bonnet/Seshadri Device Database Systems
- (2002) Madden/Franklin/Hellerstein/Hong TinyDB
- (2002) Hill / Dust SPEC, mm3 scale computing
- UCLA / USC / Berkeley Continue to Lead Research
- Many other players now
- TinyOS/Motes as most common platform
- Emerging commercial space
- Crossbow, Ember, Dust, Sensicast, Moteiv, Intel
11Why Now?
- Commoditization of radio hardware
- Cellular and cordless phones, wireless
communication - Low cost -gt many/tiny -gt new applications!
- Real application for ad-hoc network research from
the late 90s - Coming together of EE CS communities
12Motes
13History of Motes
- Initial research goal wasnt hardware
- Has since become more of a priority with emerging
hardware needs, e.g. - Power consumption
- (Ultrasonic) ranging localization
- MIT Cricket, NEST Project
- Connectivity with diverse sensors
- UCLA sensor board
- Even so, now on the 5th generation of devices
- Costs down to 50/node (Moteiv, Dust)
- Greatly improved radio quality
- Multitude of interfaces USB, Ethernet, CF, etc.
- Variety of form factors, packages
14Motes vs. Traditional Computing
- Embedded OS
- Lossy, Adhoc Radio Communication
- Sensing Hardware
- Severe Power Constraints
15NesC/TinyOS
- NesC a C dialect for embedded programming
- Components, wired together
- Quick commands and asynch events
- TinyOS a set of NesC components
- hardware components
- ad-hoc network formation maintenance
- time synchronization
Think of the pair as a programming environment
16Radio Communication
- Low Bandwidth Shared Radio Channel
- 40kBits on motes
- Much less in practice
- Encoding, Contention for Media Access (MAC)
- Very lossy 30 base loss rate
- Argues against TCP-like end-to-end retransmission
- And for link-layer retries
- Generally, not well behaved
17Types of Sensors
- Sensors attach via daughtercard
- Weather
- Temperature
- Light x 2 (high intensity PAR, low intensity,
full spectrum) - Air Pressure
- Humidity
- Vibration
- 2 or 3 axis accelerometers
- Tracking
- Microphone (for ranging and acoustic signatures)
- Magnetometer
- GPS
- RFID Reader
18Non-Volatile Storage
- EEPROM
- 512K off chip, 32K on chip
- Writes at disk speeds, reads at RAM speeds
- Interface random access, read/write 256 byte
pages - Maximum throughput 10Kbytes / second
- MatchBox Filing System
- Provides a Unix-like file I/O interface
- Single, flat directory
- Only one file being read/written at a time
19Power Consumption and Lifetime
- Power typically supplied by a small battery
- 1000-2000 mAH
- 1 mAH 1 milliamp current for 1 hour
- Typically at optimum voltage, current drain rates
- Power Watts (W) Amps (A) Volts (V)
- Energy Joules (J) W time
- Lifetime, power consumption varies by application
- Processor 5mA active, 1 mA idle, 5 uA sleeping
- Radio 5 mA listen, 10 mA xmit/receive, 20mS /
packet - Sensors 1 uA -gt 100s mA, 1 uS -gt 1 S / sample
20Energy Usage in A Typical Data Collection Scenario
- Each mote collects 1 sample of (light,humidity)
data every 10 seconds, forwards it - Each mote can hear 10 other motes
- Process
- Wake up, collect samples ( 1 second)
- Listen to radio for messages to forward (1
second) - Forward data
21Sensors Slow, Power Hungry, Noisy
22TinyOS Getting Started
- The TinyOS home page
- http//webs.cs.berkeley.edu/tinyos
- Start with the tutorials!
- The CVS repository
- http//sf.net/projects/tinyos
- The NesC Project Page
- http//sf.net/projects/nescc
- Crossbow motes (hardware)
- http//www.xbow.com
- Intel Imote
- www.intel.com/research/exploratory/motes.htm.
23Part 2
- The Design and Implementation of TinyDB
24Part 2 Outline
- TinyDB Overview
- Data Model and Query Language
- TinyDB Java API and Scripting
- Demo with TinyDB GUI
- TinyDB Internals
- Extending TinyDB
- TinyDB Status and Roadmap
25TinyDB Revisited
SELECT MAX(mag) FROM sensors WHERE mag gt
thresh SAMPLE PERIOD 64ms
- High level abstraction
- Data centric programming
- Interact with sensor network as a whole
- Extensible framework
- Under the hood
- Intelligent query processing query optimization,
power efficient execution - Fault Mitigation automatically introduce
redundancy, avoid problem areas
App
Query, Trigger
Data
TinyDB
26Feature Overview
- Declarative SQL-like query interface
- Metadata catalog management
- Multiple concurrent queries
- Network monitoring (via queries)
- In-network, distributed query processing
- Extensible framework for attributes, commands and
aggregates - In-network, persistent storage
27Architecture
TinyDB GUI
JDBC
TinyDB Client API
DBMS
PC side
0
Mote side
0
TinyDB query processor
2
1
3
8
4
5
6
Sensor network
7
28Data Model
- Entire sensor network as one single,
infinitely-long logical table sensors - Columns consist of all the attributes defined in
the network - Typical attributes
- Sensor readings
- Meta-data node id, location, etc.
- Internal states routing tree parent, timestamp,
queue length, etc. - Nodes return NULL for unknown attributes
- On server, all attributes are defined in
catalog.xml - Discussion other alternative data models?
29Query Language (TinySQL)
- SELECT ltaggregatesgt, ltattributesgt
- FROM sensors ltbuffergt
- WHERE ltpredicatesgt
- GROUP BY ltexprsgt
- SAMPLE PERIOD ltconstgt ONCE
- INTO ltbuffergt
- TRIGGER ACTION ltcommandgt
30Comparison with SQL
- Single table in FROM clause
- Only conjunctive comparison predicates in WHERE
and HAVING - No subqueries
- No column alias in SELECT clause
- Arithmetic expressions limited to column op
constant - Only fundamental difference SAMPLE PERIOD clause
31TinySQL Examples
Find the sensors in bright nests.
Sensors
- SELECT nodeid, nestNo, light
- FROM sensors
- WHERE light gt 400
- EPOCH DURATION 1s
1
Epoch Nodeid nestNo Light
0 1 17 455
0 2 25 389
1 1 17 422
1 2 25 405
32TinySQL Examples (cont.)
Count the number occupied nests in each loud
region of the island.
Epoch region CNT() AVG()
0 North 3 360
0 South 3 520
1 North 3 370
1 South 3 520
33Event-based Queries
- ON event SELECT
- Run query only when interesting events happens
- Event examples
- Button pushed
- Message arrival
- Bird enters nest
- Analogous to triggers but events are user-defined
34Query over Stored Data
- Named buffers in Flash memory
- Store query results in buffers
- Query over named buffers
- Analogous to materialized views
- Example
- CREATE BUFFER name SIZE x (field1 type1, field2
type2, ) - SELECT a1, a2 FROM sensors SAMPLE PERIOD d INTO
name - SELECT field1, field2, FROM name SAMPLE PERIOD d
35Using the Java API
- SensorQueryer
- translateQuery() converts TinySQL string into
TinyDBQuery object - Static query optimization
- TinyDBNetwork
- sendQuery() injects query into network
- abortQuery() stops a running query
- addResultListener() adds a ResultListener that is
invoked for every QueryResult received - removeResultListener()
- QueryResult
- A complete result tuple, or
- A partial aggregate result, call
mergeQueryResult() to combine partial results - Key difference from JDBC push vs. pull
36Writing Scripts with TinyDB
- TinyDBs text interface
- java net.tinyos.tinydb.TinyDBMain run select
- Query results printed out to the console
- All motes get reset each time new query is posed
- Handy for writing scripts with shell, perl, etc.
37Using the GUI Tools
38Inside TinyDB
Multihop Network
Query Processor
10,000 Lines Embedded C Code 5,000 Lines
(PC-Side) Java 3200 Bytes RAM (w/ 768 byte
heap) 58 kB compiled code (3x larger than 2nd
largest TinyOS Program)
Filterlight gt 400
Schema
TinyOS
TinyDB
39Tree-based Routing
- Tree-based routing
- Used in
- Query delivery
- Data collection
- In-network aggregation
- Relationship to indexing?
40Power Consumption and Lifetime
- Power typically supplied by a small battery
- At full power, device will last 2-3 days -gt
Critical Constraint - Lifetime, power consumption varies by application
- Scales with duty cycle amount of time on
- Low data rate (lt 1 sample / 30 secs) gt 6 months
possible from AA batteries
Must Synchronize!
Fundamental challenge distributed coordination
with low power!
41Time Synchronization
- All messages include a 5 byte time stamp
indicating system time in ms - Synchronize (e.g. set system time to timestamp)
with - Any message from parent
- Any new query message (even if not from parent)
- Punt on multiple queries
- Timestamps written just after preamble is xmitted
- All nodes agree that the waking period begins
when (system time epoch dur 0) - And lasts for WAKING_PERIOD ms
- Adjustment of clock happens by changing duration
of sleep cycle, not wake cycle.
42Extending TinyDB
- Why extending TinyDB?
- New sensors ? attributes
- New control/actuation ? commands
- New data processing logic ? aggregates
- New events
- Analogous to concepts in object-relational
databases
43Adding Attributes
- Types of attributes
- Sensor attributes raw or cooked sensor readings
- Introspective attributes parent, voltage, ram
usage, etc. - Constant attributes constant values that can be
statically or dynamically assigned to a mote,
e.g., nodeid, location, etc.
44Adding Attributes (cont)
- Interfaces provided by Attr component
- StdControl init, start, stop
- AttrRegister
- command registerAttr(name, type, len)
- event getAttr(name, resultBuf, errorPtr)
- event setAttr(name, val)
- command getAttrDone(name, resultBuf, error)
- AttrUse
- command startAttr(attr)
- event startAttrDone(attr)
- command getAttrValue(name, resultBuf, errorPtr)
- event getAttrDone(name, resultBuf, error)
- command setAttrValue(name, val)
45Adding Attributes (cont)
- Steps to adding attributes to TinyDB
- Create attribute nesC components
- Wire new attribute components to TinyDBAttr
configuration - Reprogram TinyDB motes
- Add new attribute entries to catalog.xml
- Constant attributes can be added on the fly
through TinyDB GUI
46Adding Aggregates
- Step 1 wire new nesC components
47Adding Aggregates (cont)
- Step 2 add entry to catalog.xml
- ltaggregategt
- ltnamegtAVGlt/namegt
- ltidgt5lt/idgt
- lttemporalgtfalselt/temporalgt
- ltreaderClassgtnet.tinyos.tinydb.AverageClasslt/read
erClassgt - lt/aggregategt
- Step 3 (optional) implement reader class in Java
- a reader class interprets and finalizes aggregate
state received from the mote network, returns
final result as a string for display.
48TinyDB Status
- Latest released with TinyOS 1.1 (9/03)
- Install the task-tinydb package in TinyOS 1.1
distribution - First release in TinyOS 1.0 (9/02)
- Widely used by research groups as well as
industry pilot projects - Successful deployments in Intel Berkeley Lab and
redwood trees at UC Botanical Garden - Largest deployment 80 weather station nodes
- Network longevity 4-5 months
49The Redwood Tree Deployment
- Redwood Grove in UC Botanical Garden, Berkeley
- Collect dense sensor readings to monitor climatic
variations across - altitudes,
- angles,
- time,
- forest locations, etc.
- Versus sporadic monitoring points with 30lb
loggers! - Current focus study how dense sensor data affect
predictions of conventional tree-growth models
50Data from Redwoods
36m
33m 111
32m 110
30m 109,108,107
20m 106,105,104
10m 103, 102, 101
51TASK
52A SensorNet Dilemma
- Sensors still packaged like HeathKits
- Pretty hard to cope with out of the box
- Bare metal encourages one-off applications
- Inhibits reuse
- Deployment not intuitive
- No configuration/monitoring tools
- SensorNet PhD Factor
- Today 2.5 PhDs needed to deploy a SensorNet
- Needs to be Zero
53TASK Design Requirements
- Ease of S/W Installation
- Deployment tools
- Reconfigurability
- Health/Mgmt Monitoring
- Network Reliability Guarantee
- Interpretable Sensor Results
- Tool Integration
- Audit Trails
- Lifetime estimates
For Developers
- Familiar API
- Extensibility of S/W
- Modular services
54Tiny Application Sensor Kit
TASK Client Tools
External Tools
TaskView
Internet
TASK Field Tools
SensorNet Appliance
TASK Server
- Simplicity vs. Functionality
- Modularity
- Remote control
- Fault Tolerant
TinyDB Sensor Network
55SensorNet Appliance
SNA
- Intelligent Gateway
- Proxy for the sensornet
- Distributes query
- Stages results
- Manages configuration
- Components
- TASK Server
- TinyDB Client (Java)
- DBMS (PostgreSQL)
- WebServer (Apache)
http, other
TASKServer
DBMS
ODBC
TinyDB Client
SensorNet
56Tools
- Field Tool
- In-situ diagnostics
- TaskView
- Integrated tool for management and monitoring
57For more information
- http//triplerock.cs.bekeley.edu/tinydb
58Part 3
- Middleware Architecture and Research Topics
59Architectural Overview
Internet
60Whats Left?
- TinyDB and TinyOS provide a reasonable low-level
substrate - TASK sufficient for many data collection apps
- But there are other architecture issues
- Efficiency concerns
- Currently transmit readings from all sensors on
each epoch - Variable, context sensitive rates
- Data quality issues
- Missing and faulty sensors?
- Architectural issues
- Actuation / closed loop issues stuff
- Disconnection, etc.
61Sensor Network Research
- Very active research area
- Cant summarize it all
- Focus database-relevant research topics
- Some outside of Berkeley
- Other topics that are itching to be scratched
- But, some bias towards work that we find
compelling
62Topics
- Improving TinyDB Efficiency
- In-network aggregation
- Acquisitional Query Processing
- Alternative Architectures
- Statistical Techniques
- Heterogeneity
- Intermittent Connectivity
- New features
- In-network storage
- Closing the loop
- Integration with traditional databases
63Topics
- Improving TinyDB Efficiency
- In-network aggregation
- Acquisitional Query Processing
- Alternative Architectures
- Statistical Techniques
- Heterogeneity
- Intermittent Connectivity
- New features
- In-network storage
- Closing the loop
- Integration with traditional databases
64Tiny Aggregation (TAG)
- In-network processing of aggregates
- Common data analysis operation
- Aka gather operation or reduction in
programming - Communication reducing
- Operator dependent benefit
- Across nodes during same epoch
- Exploit query semantics to improve efficiency!
Madden, Franklin, Hellerstein, Hong. Tiny
AGgregation (TAG), OSDI 2002.
65Basic Aggregation
- In each epoch
- Each node samples local sensors once
- Generates partial state record (PSR)
- local readings
- readings from children
- Outputs PSR during assigned comm. interval
- Interval assigned based on depth in tree
- At end of epoch, PSR for whole network output at
root - New result on each successive epoch
66Illustration In-Network Aggregation
SELECT COUNT() FROM sensors
Interval 4
Sensor
Sample Period
1 2 3 4 5
4 1
3
2
1
4
Interval
Time
1
67Illustration In-Network Aggregation
SELECT COUNT() FROM sensors
Interval 3
Sensor
1 2 3 4 5
4 1
3 2
2
1
4
2
Interval
68Illustration In-Network Aggregation
SELECT COUNT() FROM sensors
Interval 2
Sensor
1 2 3 4 5
4 1
3 2
2 1 3
1
4
1
3
Interval
69Illustration In-Network Aggregation
SELECT COUNT() FROM sensors
Interval 1
5
Sensor
1 2 3 4 5
4 1
3 2
2 1 3
1 5
4
Interval
70Illustration In-Network Aggregation
SELECT COUNT() FROM sensors
Interval 4
Sensor
1 2 3 4 5
4 1
3 2
2 1 3
1 5
4 1
Interval
1
71Illustration In-Network Aggregation
SELECT COUNT() FROM sensors
Interval 4
Sensor
1 2 3 4 5
4 zzz zzz zzz 1
3 zzz zzz 2 zzz
2 1 3 zzz zzz
1 5 zzz zzz zzz zzz
4 zzz zzz zzz 1
Interval
1
72Aggregation Framework
- As in extensible databases, TinyDB supports any
aggregation function conforming to
Aggnfinit, fmerge, fevaluate Finit a0 ?
lta0gt Fmerge lta1gt,lta2gt ? lta12gt Fevaluate lta1gt
? aggregate value
Partial State Record (PSR)
Example Average AVGinit v ?
ltv,1gt AVGmerge ltS1, C1gt, ltS2, C2gt ? lt S1
S2 , C1 C2gt AVGevaluateltS, Cgt ? S/C
Restriction Merge associative, commutative
73Taxonomy of Aggregates
- TAG insight classify aggregates according to
various functional properties - Yields a general set of optimizations that can
automatically be applied
Drives an API!
Property Examples Affects
Partial State MEDIAN unbounded, MAX 1 record Effectiveness of TAG
Monotonicity COUNT monotonic AVG non-monotonic Hypothesis Testing, Snooping
Exemplary vs. Summary MAX exemplary COUNT summary Applicability of Sampling, Effect of Loss
Duplicate Sensitivity MIN dup. insensitive, AVG dup. sensitive Routing Redundancy
74Use Multiple Parents
- Use graph structure
- Increase delivery probability with no
communication overhead - For duplicate insensitive aggregates, or
- Aggs expressible as sum of parts
- Send (part of) aggregate to all parents
- In just one message, via multicast
- Assuming independence, decreases variance
SELECT COUNT()
of parents n E(cnt) n (c/n
p2) Var(cnt) n (c/n)2 p2 (1 p2) V/n
P(link xmit successful) p P(success from A-gtR)
p2 E(cnt) c p2 Var(cnt) c2 p2 (1
p2) ? V
75Multiple Parents Results
- Better than previous analysis expected!
- Losses arent independent!
- Insight spreads data over many links
76Acquisitional Query Processing (ACQP)
- TinyDB acquires AND processes data
- Could generate an infinite number of samples
- An acqusitional query processor controls
- when,
- where,
- and with what frequency data is collected!
- Versus traditional systems where data is provided
a priori
Madden, Franklin, Hellerstein, and Hong. The
Design of An Acqusitional Query Processor.
SIGMOD, 2003.
77ACQP Whats Different?
- How should the query be processed?
- Sampling as a first class operation
- How does the user control acquisition?
- Rates or lifetimes
- Event-based triggers
- Which nodes have relevant data?
- Index-like data structures
- Which samples should be transmitted?
- Prioritization, summary, and rate control
78Operator Ordering Interleave Sampling Selection
At 1 sample / sec, total power savings could be
as much as 3.5mW ? Comparable to processor!
- SELECT light, mag
- FROM sensors
- WHERE pred1(mag)
- AND pred2(light)
- EPOCH DURATION 1s
- E(sampling mag) gtgt E(sampling light)
- 1500 uJ vs. 90 uJ
79Exemplary Aggregate Pushdown
- SELECT WINMAX(light,8s,8s)
- FROM sensors
- WHERE mag gt x
- EPOCH DURATION 1s
- Novel, general pushdown technique
- Mag sampling is the most expensive operation!
80Topics
- Improving TinyDB Efficiency
- In-network aggregation
- Acquisitional Query Processing
- Alternative Architectures
- Statistical Techniques
- Heterogeneity
- Intermittent Connectivity
- New features
- In-network storage
- Closing the loop
- Integration with traditional databases
81Statistical Techniques
- Approximations, summaries, and sampling based on
statistics and statistical models - Applications
- Limited bandwidth and large number of nodes -gt
data reduction - Lossiness -gt predictive modeling
- Uncertainty -gt tracking correlations and changes
over time - Physical models -gt improved query answering
82TinyDB Retrospective
- Data aggregation
- Can reduce communication
TinyDB
Query
SQL-style query
- Declarative interface
- Sensor nets are not just for PhDs
- Decrease deployment time
Every time step
83Limitations of TinyDB approach
TinyDB
Query
SQL-style query
- Data collection
- Every node must wake up at every time step
- Data loss ignored
- No quality guarantees
- Wastes resources by ignoring correlations
- Query distribution
- Every node must receive query
Every time step
84Sensor net data is correlated
- Data is not i.i.d. ? shouldnt ignore missing
data - Observing one sensor ? information about other
sensors (and future values) - Observing one type of reading ? information about
other local readings
85BBQ Model-driven data acquisition
posterior belief
Probabilistic Model
Example model Multidimensional Gaussian
Query
Middleware Layer
SQL-style query with desired confidence
- Strengths of model-based data acquisition
- Observe fewer attributes
- Exploit correlations
- Reuse information between queries
- Directly deal with missing data
- Answer more complex (probabilistic) queries
86Probabilistic models and queries
Users perspective
Query SELECT nodeId, temp 0.5C, conf(.95) FROM
sensors WHERE nodeId in 1..8
System selects and observes subset of
nodes Observed nodes 3,6,8
Query result
Node 1 2 3 4 5 6 7 8
Temp. 17.3 18.1 17.4 16.1 19.2 21.3 17.5 16.3
Conf. 98 95 100 99 95 100 98 100
87Supported queries
- Value query
- Xi ? with prob. at least 1-?
- SELECT and Range query
- Xi?a,b with prob. at least 1-?
- which sensors have temperature greater than 25C
? - Aggregation
- average ? of subset of attribs. with prob. gt
1-? - combine aggregation and selection
- probability gt 10 sensors have temperature greater
than 25C ?
- Queries require solution to integrals
- Many queries computed in closed-form
- Some require numerical integration/sampling
88Experimental results
- Redwood trees and Intel Lab datasets
- Learned models from data
- Static model
- Dynamic model Kalman filter, time-indexed
transition probabilities - Evaluated on a wide range of queries
89Cost versus Confidence level
90Obtaining approximate values
Query True temperature value epsilon with
confidence 95
91Next Step Outliers and Unusual Events
- Once we have a model of the expected behavior, we
can - Detect unusual (low probability) events
- Predict missing values
- Often, there are several expected behavior
modes, which we want to differentiate between
- E.g., if we can characterize failure modes, we
can discard them - Applying well known probabilistic techniques to
allow TinyDB to deal with such issues.
92IDSQ
- Similar idea suppose you want to e.g., localize
a vehicle in a field of sensors - Idea task sensors in order of best improvement
to estimate of some value - Choose leader(s)
- Suppress subordinates
- Task subordinates, one at a time
- Until some measure of goodness (error bound) is
met
See Scalable Information-Driven Sensor Querying
and Routing for ad hoc Heterogeneous Sensor
Networks. Chu, Haussecker and Zhao. Xerox TR
P2001-10113. May, 2001.
93Model location estimate as a point with
2-dimensional Gaussian uncertainty.
Graphical Representation
Principal Axis
94Lots of Other Work with of This Flavor
- Precision / Energy Tradeoff -- Want nodes to
sleep except when their data is needed - Olston et al. Approximate Caching. SIGMOD 03.
- Cheng et al. Kalman Filters. SIGMOD 04.
- Lazaridis and Mehrotra. Approximate Selection
Queries over Imprecise Data. ICDE 2004. - UCI Quasar Project
- Timeliness Real Time Constraints
- John A. Stankovic etl al. Real Time Communication
and Coordination in Sensor Networks. Proceedings
of the IEEE, 91(7), July 2003. - Tian He et al. SPEED a stateless protocol
(ICDCS03)
95In-Net Regression
- Linear regression simple way to predict future
values, identify outliers
- Regression can be across local or remote values,
multiple dimensions, or with high degree
polynomials - E.g., node A readings vs. node Bs
- Or, location (X,Y), versus temperature
- E.g., over many nodes
Guestrin, Thibaux, Bodik, Paskin, Madden.
Distributed Regression an Efficient Framework
for Modeling Sensor Network Data . Under
submission.
96In-Net Regression (Continued)
- Problem may require data from all sensors to
build model - Solution partition sensors into overlapping
kernels that influence each other - Run regression in each kernel
- Requiring just local communication
- Blend data between kernels
- Requires some clever matrix manipulation
- End result regressed model at every node
- Useful in failure detection, missing value
estimation
97Topics
- Improving TinyDB Efficiency
- In-network aggregation
- Acquisitional Query Processing
- Alternative Architectures
- Statistical Techniques
- Heterogeneity
- Intermittent Connectivity
- New features
- In-network storage
- Closing the loop
- Integration with traditional databases
98Heterogeneous Sensor Networks
- Leverage small numbers of high-end nodes to
benefit large numbers of inexpensive nodes - Still must be transparent and ad-hoc
- Key to scalability of sensor networks
- Interesting heterogeneities
- Energy battery vs. outlet power
- Link bandwidth Chipcon vs. 802.11x
- Computing and storage ATMega128 vs. Xscale
- Pre-computed results
- Sensing nodes vs. QP nodes
99Computing Heterogeneity with TinyDB
- Separate query processing from sensing
- Provide query processing on a small number of
nodes - Attract packets to query processors based on
service value - Compare the total energy consumption of the
network
- No aggregation
- All aggregation
- Opportunistic aggregation
- HSN proactive aggregation
Mark Yarvis and York Liu, Intels Heterogeneous
Sensor Network Project, ftp//download.intel.com/r
esearch/people/HSN_IR_Day_Poster_03.pdf.
1005x7 TinyDB/HSN Mica2 Testbed
101Data Packet Saving
- How many aggregators are desired?
- Does placement matter?
102Topics
- Improving TinyDB Efficiency
- In-network aggregation
- Acquisitional Query Processing
- Alternative Architectures
- Statistical Techniques
- Heterogeneity
- Intermittent Connectivity
- New features
- In-network storage
- Closing the loop
- Integration with traditional databases
103Occasionally Connected Sensornets
internet
GTWY
Mobile GTWY
Mobile GTWY
Mobile GTWY
GTWY
104Occasionally Connected Sensornets Challenges
- Networking support
- Tradeoff between reliability, power consumption
and delay - Data custody transfer duplicates?
- Load shedding
- Routing of mobile gateways
- Query processing
- Operation placement in-network vs. on mobile
gateways - Proactive pre-computation and data movement
- Tight interaction between networking and QP
Fall, Hong and Madden, Custody Transfer for
Reliable Delivery in Delay Tolerant Networks,
http//www.intel-research.net/Publications/Berkele
y/081220030852_157.pdf.
105Other Occasionally Connected Work
- Kevin Fall. Delay Tolerant Networks. SIGCOMM
2003. - Juang et al. Enery efficient computing for
wildlife tracking. ASPLOS 2002. - Li et al. Sending messages to mobile users in
disconnected ad-hoc wireless networks. MOBICOM
2000. - Shah et al. Data Mules. SNPA 2003.
106Topics
- Improving TinyDB Efficiency
- In-network aggregation
- Acquisitional Query Processing
- Alternative Architectures
- Statistical Techniques
- Heterogeneity
- Intermittent Connectivity
- New features
- In-network storage
- Closing the loop
- Integration with traditional databases
107Distributed In-network Storage
- Collectively, sensornets have large amounts of
in-network storage - Good for in-network consumption or caching
- Challenges
- Distributed indexing for fast query dissemination
- Resilience to node or link failures
- Graceful adaptation to data skews
- Minimizing index insertion/maintenance cost
108Example DIM
- Functionality
- Efficient range query for multidimensional data.
- Approaches
- Divide sensor field into bins.
- Locality preserving mapping from m-d space to
geographic locations. - Use geographic routing such as GPSR.
- Assumptions
- Nodes know their locations and network boundary
- No node mobility
Xin Li, Young Jin Kim, Ramesh Govindan and Wei
Hong, Distributed Index for Multi-dimentional
Data (DIM) in Sensor Networks, SenSys 2003.
109Topics
- Improving TinyDB Efficiency
- In-network aggregation
- Acquisitional Query Processing
- Alternative Architectures
- Statistical Techniques
- Heterogeneity
- Intermittent Connectivity
- New features
- In-network storage
- Closing the loop
- Integration with traditional databases
110Closing the Loop
- Challenge want more than data collection
- Condition-based sensing, rate adjustment
- Condition-based actuation
- E.g.,
- Kansal et al. Sensor Uncertainty Reduction Using
Low Complexity Actuation. IPSN 2004. - work from Qiong Luo HKUST et al in CIDR.
- Various process control systems ladder logic,
SCADA, etc. - Questions
- Appropriate languages
- Resource contention on actuators
- Closed-loop safety concerns
111Topics
- Improving TinyDB Efficiency
- In-network aggregation
- Acquisitional Query Processing
- Alternative Architectures
- Statistical Techniques
- Heterogeneity
- Intermittent Connectivity
- New features
- In-network storage
- Closing the loop
- Integration with traditional databases
112Alternative Middleware Integration into an
Existing DBMS
113Concluding Remarks
- Sensor networks are an exciting emerging
technology, with a wide variety of applications - Many research challenges in all areas of computer
science - Database community included
- Some agreement that a declarative interface is
right - TinyDB and other early work are an important
first step - But theres lots more to be done!
- Real challenge is building appropriate middleware
abstractions
114Questions?
http//db.lcs.mit.edu/madden/middleware_tutorial.p
pt
115In-Network Join Strategies
- Types of joins
- non-sensor -gt sensor
- sensor -gt sensor
- Optimization questions
- Should the join be pushed down?
- If so, where should it be placed?
- What if a join table exceeds the memory available
on one node?
116Choosing Where to Place Operators
- Idea choose a join node to run the operator
- Over time, explore other candidate placements
- Nodes advertise data rates to their neighbors
- Neighbors compute expected cost of running the
join based on these rates - Neighbors advertise costs
- Current join node selects a new, lower cost node
Bonfils Bonnet, Adaptive and Decentralized
Operator Placement for In-Network QueryProcessing
IPSN 2003.
117Topics
- In-network aggregation
- Acquisitional Query Processing
- Heterogeneity
- Intermittent Connectivity
- In-network Storage
- Statistics-based summarization and sampling
- In-network Joins
- Adaptivity and Sensor Networks
- Multiple Queries
118Adaptivity In Sensor Networks
- Queries are long running
- Selectivities change
- E.g. night vs day
- Network load and available energy vary
- All suggest that some adaptivity is needed
- Of data rates or granularity of aggregation when
optimizing for lifetimes - Of operator orderings or placements when
selectivities change (c.f., conditional plans for
correlations) - As far as we know, this is an open problem!
119Multiple Queries and Work Sharing
- As sensornets evolve, users will run many queries
simultaneously - E.g., traffic monitoring
- Likely that queries will be similar
- But have different end points, parameters, etc
- Would like to share processing, routing as much
as possible - But how? Again, an open problem.