Title: Programming Sensor Networks
1Programming Sensor Networks
- An amalgamation of slides from Indranil Gupta,
Robbert van Renesse, Kenneth Birman, Harold
Abelson, Don Allen, Daniel Coore, Chris Hanson,
George Homsy, Thomas F. Knight, Jr., Radhika
Nagpal, Erik Rauch, Gerald Jay Sussman, Ron
Weiss, Samuel Madden, Robert Szewczyk, Michael J.
Franklin, David Culler, Philippe Bonnet, Johannes
Gehrke, Praveen Seshadri
2Outline
- High Level
- What we can learn from database research?
- How does it relate to sensor networks?
- Sensor Database Overview
- Distributed Computing Prospective
- Data-Centric Storage Approach
- Amorphous Computing
- My Research
3Why databases?
- Sensor networks should be able to
- Accept queries for data
- Respond with results
- Users will need
- An abstraction that guarantees reliable results
- Largely autonomous, long lived network
4Why databases?
- Sensor networks are capable of producing massive
amounts of data - Efficient organization of nodes and data will
extend network lifetime - Database techniques already exist for efficient
data storage and access
5Differences between databases and sensor networks
- Database
- Static data
- Centralized
- Failure is not an option
- Plentiful resources
- Administrated
- Sensor Network
- Streaming data
- Large number of nodes
- Multi-hop network
- No global knowledge about the network
- Frequent node failure
- Energy is the scarce
- Resource, limited memory
- Autonomous
6Bridging the Gap
- What is needed to be able to treat a sensor
network like a database? - How should sensors be modeled?
- How should queries be formulated?
7Sensor Database Overview
8Traditional Approach Warehousing
- Data is extracted from sensors and stored on a
front-end server - Query processing takes place on the front-end.
Warehouse
Front-end
Sensor Nodes
9What Wed Like to DoSensor Database System
- Sensor Database System supports distributed query
processing over a sensor network
SensorDB
SensorDB
SensorDB
SensorDB
SensorDB
Front-end
SensorDB
SensorDB
SensorDB
Sensor Nodes
10Sensor Database System
- Characteristics of a Sensor Network Streams of
data, uncertain data, large number of nodes,
multi-hop network, no global knowledge about the
network, failure is the rule, energy is the
scarce resource, limited memory, no
administration, - Can existing database techniques be reused in
this new context? What are their limitations? - What are the new problems? What are the new
solutions?
11Issues
- Representing sensor data
- Representing sensor queries
- Processing query fragments on sensor nodes
- Distributing query fragments
- Adapting to changing network conditions
- Dealing with site and communication failures
- Deploying and Managing a sensor database system
12Performance Metrics
- High accuracy
- Distance between ideal answer and actual answer?
- Ratio of sensors participating in answer?
- Low latency
- Time between data is generated on sensors and
answer is returned - Limited resource usage
- Energy consumption
13Representing Sensor Data and Sensor Queries
- Sensor Data
- Output of signal processing functions
- Time Stamped values produced over a given
duration - Inherently distributed
- Sensor Queries
- Conditions on time and space
- Location dependent queries
- Constraints on time stamps or aggregates over
time windows - Event notification
14Early Work in Sensor Databases
- Towards Sensor Database System
- Querying the Physical World
- Phillipe Bonnet, Johannes Gehrke, Praveen Seshdri
15Fjording the Stream An Architecture for Queries
over Streaming Sensor Data
- How can existing database querying methods be
applied to streaming data? - How can we combine real-time sensor data with
stored historical data? - What architecture is appropriate for supporting
simultaneous queries? - How can we lower sensor power consumption, while
still supporting a wide range of query types?
16Traditional Database Operators
- Are implemented using pull mechanisms.
- Block on incoming data.
- Most require all the data to be read first. (i.e.
sort, average) - Optimized for classic IO.
- Usually implemented as separate threads.
17Hardware Architecture
- Centralized data processing.
- Sensor proxies read and configure sensors.
- Query processor interacts with proxies to request
and get sensor data. - Sensor proxies support multiple simultaneous
queries, multiplexing the data.
18Operators
- Implemented as state machines.
- Support transition(state) method, which causes
the operator to optionally read from input
queues, write to output queue, and change state. - Multiple operators per thread, called by a
scheduler. (Round robin in the experiments) - Allows fine-grained tuning of processing time
allocated to each operator.
19Sensor Sensitive Operators
- Certain operations are impossible to perform on
continuous data streams. (sum, average, sort) - Can be performed on partial data windows.
- Joins can be implemented by hashing tuples.
- Can provide aggregation based on current data,
with continuous updates to parent operators.??
20Sensor Proxy
- Responsible for configuring sensor that belong to
it, setting sensing frequency, aggregation
policies, etc.. - To save power, each sensor only listens for
commands from proxy during short intervals. - Handles incoming data from sensors, and pushes it
into appropriate queues. - Stores a copy to disk for historical queries.
- Performs data filtering, which it can sometimes
offload to the sensors.
21Building a Fjord
- For all sensor data sources, locate the proxy for
the sensor, and install a query on it to deliver
tuples at a certain rate to a push queue. - For non-sensor data sources, set up a pull queue
to scan for data. - Pipe the data through the operators specified by
the query.
22Query
- Find average car speeds during time window (w),
for all segments the user is interested in
(knownSegments) - More complicated queries are possible, with joins
of streaming sensor data and historical data
stored in a normal database fashion.
23Dataflow for the Query
- Data is pushed from the sensors to the user,
through the filter operator set up by the query. - Multiple similar queries can be added to an
existing fjord, instead of creating one per query.
24Experiment Sensors
- 32 of CalTrans inductive loop sensors equipped
with wireless radio links. - Sensors consist of sixteen pairs of sensors
(referred to as upstream and downstream),
with one pair on either side of the freeway on
eight distinct segments of I-80 . - Collect data at 60Hz and relay it back to a
server, where it is distributed to various
database sources, such as the implemented Fjords.
25A Traffic Application
- Traffic engineers want to know the speed and
length of cars on a freeway. - Two sensors are placed less than 1 car length
apart - The pair of sensors will perform computation
together
26Contd.
- Four time measurements are taken
- The speed and length of the car are deduced by
the two sensors - The results are relayed back to the proxy
27Contd.
- To measure a cars length within 1 foot, assuming
a maximum speed of 60 mph sensors are sampled at
180 Hz - Sensors collaborate locally to find car speed and
length - Results are sent to base station
28Power Usage
29Conclusion
- Fjords allow sensors to be treated as database
sources for querying, with little change in the
overall architecture. - Proxies can optimize energy consumption of
individual sensors based on user queries, and
multiplex data from sensors to multiple queries. - Processing is centralized, but can sometimes be
offloaded to the sensors, to lower energy
consumed by radio transmissions.
30Aggregation
31A Look at Aggregation
- Supporting Aggregate Queries Over Ad-Hoc
Wireless Sensor Networks - Samuel Madden, Robert
Szewczyk, Michael J. Franklin, David Culler - Explores aggregation techniques that are
application independent - Count
- Min
- Max
- Sum
- Average
32At A Glance
- Trying to minimize the number of messages sent
- All aggregation is done by building a tree
33Tricks of the Trade
- How do you ensure an aggregate is correct?
- Compute it multiple times.
- How do you reduce the message overhead of
redistributing queries? - Piggy back the query along with data messages.
- Is there anyway to further reduce the messaging
overhead? - Child nodes only report their aggregates if
theyve changed. - Nodes can take advantage of multiple parents for
redundancy reasons.
34A Different Perspective on Aggregation
- Scalable Fault-Tolerant Aggregation in Large
Process Groups - Indranil Gupta, Robbert van
Renesse, Ken Birman - Large process groups inherently need to
communicate to accomplish a higher level task - Higher level tasks are usually driven by
aggregation
Kenneth Birman, Cornell University
35Goal
- Develop a protocol that allows accurate
estimation of global aggregate function - Each group member should be able to calculate the
global aggregate
Kenneth Birman, Cornell University
36Assumption
- Asynchronous communication medium
- Unreliable message delivery
- Globally unique identifiers
- A routing layer capable of point-to-point
communication - The protocol is initiated at all members
simultaneously - No energy constraints
Kenneth Birman, Cornell University
37Metrics
- Protocol message complexity
- Protocol time complexity
- Completeness of the final result
Kenneth Birman, Cornell University
38A Note on Composable Functions
- If f is a composable global function then
- f(W1, W2) g( f(W1), f(W2) )
- where W1 and W2 are disjoint sets and g is a
- known function
- Example Let f and g be Max
- Max(W1, W2) Max( Max(W1), Max(W2) )
Kenneth Birman, Cornell University
39Straw Man 1 Fully Distributed Solution
- Each member sends their vote to each other member
- O(N2) message complexity
- O(N) time complexity
- Completeness of final result will depend highly
on the mediums loss rate.
Kenneth Birman, Cornell University
40Straw Man 2 Centralized Solution
- Each member sends its vote to a single leader
which calculates the aggregate and disseminates
the result - O(N) message complexity
- O(N) time complexity
- Additional overhead for election of leaders and
coordination between them
Kenneth Birman, Cornell University
41Straw Man 3 Hierarchical Solution
- Grid Box Hierarchy
- Divide members into N/k grid boxes
- Assign each grid box a unique base-k identifier
- Those grid boxes with identifiers matching the
first i base-k digits form a sub tree of height i
Kenneth Birman, Cornell University
42Straw Man 3 Hierarchical Solution Continued
- Global Aggregate Computation
- Performed bottom up
- Requires logKN phases
- Possible due to the composable nature of the
global aggregate function
Kenneth Birman, Cornell University
43How is the Grid Box Hierarchy Built?
- Using a hash function
- Member ID is mapped into 0, 1
- A Member M would belong to grid box
- H(M) (N/K) (written in base-K)
- Any member can calculate the grid box of another
member - Hash function can mirror the geographical/network
topology
Kenneth Birman, Cornell University
44Hierarchy Approach with Leader Election
- Leader election occurs at all the internal nodes
of the tree - Leaders calculate the global aggregate for their
sub-tree (recursive) - The root then disseminates the result to all nodes
Kenneth Birman, Cornell University
45Hierarchy Approach with Leader Election Continued
- Message complexity ? O(N)
- Time complexity ? O(logN)
- Completeness ? This method is not fault tolerant
Kenneth Birman, Cornell University
46The Gossiping Approach
- Adds fault tolerance to Hierarchical Approach
- Gossiping is used to aggregate data instead of
leader election - Algorithm is started simultaneously at all
members - Algorithm requires logKN phases
Kenneth Birman, Cornell University
47The Gossiping Approach Continued
- Phase 1
- Every member M randomly selects members in its
own grid box once per gossip round - M then sends each selected member one randomly
selected vote - After KlogN gossip rounds, M applies the
aggregate function and moves to Phase 2
Kenneth Birman, Cornell University
48The Gossiping Approach Continued
- Phase 2
- For i from 2 to (logKN) 1
- Each member M randomly selects some members
belonging to the same subtree of height i - M then sends these selected members a randomly
selected aggregate from an (i-1) subtree - After collecting enough of the (i-1) subtree
aggregates (or some timeout) loop restarts
Kenneth Birman, Cornell University
49The Gossiping Approach Continued
- Phase 3
- Each member M should now have an estimate of the
global aggregate function - Time Complexity ? O(log2N)
- Message Complexity ? O(Nlog2N)
- Completeness ? Probability that a random members
vote is included in the final aggregation is
lower bounded by (1 - 1/N)
Kenneth Birman, Cornell University
50Simulation Results
- Scalability and fault-tolerance of protocol
- Default Parameters
- N 200 members, K 4
- 2 gossip targets per gossip round
- floor(log2N) gossip rounds per phase
- 25 message loss rate
- .1 member failure rate per gossip round
- Metric
- Incompleteness 1 Completeness
- risk of excluding a member vote in a final
aggregate estimate
Kenneth Birman, Cornell University
51Kenneth Birman, Cornell University
52Kenneth Birman, Cornell University
53Kenneth Birman, Cornell University
54Conclusion
- Aggregation of global properties in large process
groups - Time and message complexity, Completeness
- Traditional solutions dont scale
- Hierarchical gossiping approach
- Scalability
- Good fault-tolerance
Kenneth Birman, Cornell University
55Data-Centric Storage in Sensornets S.
Ratnasamy, D. Estrin, R. Govindan, B. Karp, S.
Shenker
- Motivation for Data-Centric Storage
- In data-rich networks data-centric algorithms
seem to be energy efficient - Data-Centric routing has been shown to be energy
efficient - Data-Centric storage could act as a companion to
data-centric routing to save even more energy
56Data-Centric Storage Applicability
- Assumptions
- Ad-hoc deployment over a known area
- Nodes can communicate with several neighbors via
short range radio - Nodes know their own location
- Energy is scarce (Gasp!)
- Data enters/leaves the sensornet via access
point(s) - Network and communication topology is largely
static
57Data-Centric Storage Applicability
- Definitions
- Observations low-level readings of basic sensors
- Ex temperature, light, humidity et al.
- Event an interesting collection of low level
observations - May combine several modalities
- Event notifications contain the location of the
event, making observations available
58Data-Centric Storage Applicability
- More Definitions
- Task what a user specifies the sensornet to do.
- Action what a node should do upon observing an
event - Query how a user specifies data of interest
59Data-Centric Storage Applicability
- Three types of actions
- External Store
- Data is sent out of the network for processing
- Message Cost O( sqrt(n) )
- Local Store
- Data is stored at the event source
- Query Cost O( n )
- Response Cost O( sqrt(n) )
- Data-Centric Store
- Data is sent to a specific node
- Storage Cost O( sqrt(n) )
- Query / Reponses Cost O( sqrt(n) )
60Data-Centric Storage Applicability
- The Scenario
- Event locations are not known in advance
- Event locations are random
- Tasks are long-lived
- Only one access point
- Detecting events requires much more energy then
ongoing monitoring of data - Users may only be interested in event summaries
61Data-Centric Storage Applicability
- Scenario Parameters
- n nodes in the network
- T unique event types
- Dtotal is the total number of events detected
- Q denotes the number of event types (out of T)
for which queries are issued - Dq is the number of events detected
62Data-Centric Storage Applicability
- Costs
- External Storage
- Total Dtotal sqrt(n)
- Hotspot Dtotal
- Local Storage
- Total Q n Dq sqrt(n)
- Hotspot Q Dq
- Data-Centric Storage
- Total (list) Q sqrt(n) Dtotal sqrt(n)
Dq sqrt(n) - Total (summary) Q sqrt(n) Dtotal sqrt(n)
Q sqrt(n) - Hotspot (list) Q Dq
- Hotspot (summary) 2 Q
63Data-Centric Storage Applicability
- Observations
- As n gets large local storage costs the most
- External storage always incurs a lower total
message count - With summarized events data-centric storage has
the smallest load - With listed events local and data-centric storage
have significantly lower access loads compared to
external storage
64Data-Centric Storage Mechanisms
- Distributed hash-table
- Put( key, value )
- Get( key )
- Implementation details are left to one of several
P2P computing schemes
65Data-Centric Storage Mechanisms
- Greedy Perimeter Stateless Routing (almost)
- In GPSR a message is dropped if no node exists at
the specified location - Data-Centric Storage routes a message to the node
closest to the specified location - To find an event the tuples that describe it are
used as inputs to the hash function. - The query is then routed to the node
corresponding to the hash functions output
66Data-Centric Storage Mechanisms
- Robustness
- Refresh periodically the data-cache sends a
refresh to the event source - If a node closer to the key receives a refresh
then it becomes the new data-cache - Local Replication
- Any node hearing a refresh caches the associate
data
67Data-Centric Storage Mechanisms
- Scalability
- Structured Replication
- Events are stored at the closest mirror
- Reduces storage cost by a factor of 2d
- d is dependant upon the number of mirrors
- Queries must be routed to all mirrors
68The Future of Sensor Networks?
- Amorphous Computing
- Draws heavily from biological and physical
metaphors - The Setup
- Vast number of unreliable components
- Asynchronous
- Irregularly placed, but very dense
- Interconnects are unknown and/or unreliable
- The Goal
- How can we engineer coherent behavior?
69Amorphous Computing
- Programming Paradigms
- Nodes are all identical
- Same program
- Can store local state
- Can generate random numbers
- No knowledge of position or orientation
- Can communicate with all nodes within a radius R
70Amorphous Computing
- Wave Propagation
- Simulates chemical diffusion amongst cells
- Chemicals alter the state of nodes
- Growing-Point Language
71Amorphous Computing Example
- A growing point diffuses pheromone
- Pheromone is specified to diffuse for H hops
72Amorphous Computing Example
- A growing point diffuses pheromone
- Pheromone is specified to diffuse for H hops
- Because of dense deployment, a circle of radius
RH is created
73Wave Propagation Example
- Growing a line
- One GP diffuses a pheromone (blue)
74Wave Propagation Example
- Growing a line
- One GP diffuses a pheromone (blue)
- Green GP diffuses a pheromone that is accepted
only by nodes that have a higher red pheromone
concentration then previous hop.
75Amorphous Computing
- Proven to be able to produce any planar graph.
- Global behavior emerges from local interaction
- Proposes models for using biological components
as computational elements
76Amorphous Computing
- Fault Tolerance
- Redundancy?
- Abstractly structuring systems to produce the
right answer with high probability
77A Little Bit About My Work
- I was born in Santa Monica, CA
- The ultimate goal is Complex Tasking of Sensor
Networks - Currently
- Efficient, in-network algorithms for identifying
contours, gradients, and regions of interest
78Contour/Gradient/Region Finding
- A first stab at in-network processing
- Useful to many applications
- Topology
- Marine biology
- Geology
- Chemical Concentrations
- and much much more!
79In the Future
- Sensor Networks should be Autonomous
- Questions
- What sort of infrastructure makes pattern finding
(and in-network processing) more efficient? - Goal
- To Program or Task the system efficiently
80My Class Project
- Using Mica Testbench to collect real sensornet
data - With this data I plan to perform simulations with
the goal of algorithmic development
81Acknowledgements
- DARPA Sensit Program
- http//www.darpa.mil/ito/research/sensit/
- Many thanks to Steve Beck, Richard Brooks, Jason
Hill, Bill Kaiser, Donald Kossman, Sri Kumar,
Tobias Mayr, Kris Pister, Joe Paradiso
82Acknowledgements
- Scalable Fault-Tolerant Aggregation in Large
Process Groups Indranil Gupta, Robbert van
Renesse, Kenneth Birman - Fjording the Stream An Architecture for Queries
over Streaming Sensor Data Samuel Madden,
Michael J. Franklin - Supporting Aggregate Queries Over Ad-Hoc
Wireless Sensor Networks Samuel Madden, Robert
Szewczyk, Michael J. Franklin, David Culler - Amorphous Computing Harold Abelson, Don
Allen, Daniel Coore, Chris Hanson, George Homsy,
Thomas F. Knight, Jr., Radhika Nagpal, Erik
Rauch, Gerald Jay Sussman, Ron Weiss