Diagnosing and Debugging Wireless Sensor Networks - PowerPoint PPT Presentation

1 / 48

About This Presentation

Title:

Diagnosing and Debugging Wireless Sensor Networks

Description:

Metric (Round robin, least connections, etc.) How does this area differ from WSNs? ... 2 A. Fox, E. Kiciman, D. Patterson, M. Jordan, R. Katz. ... – PowerPoint PPT presentation

Number of Views:76

Avg rating:3.0/5.0

Slides: 49

Provided by: blah154

Category:

more less

Transcript and Presenter's Notes

Title: Diagnosing and Debugging Wireless Sensor Networks

1
Diagnosing and Debugging Wireless Sensor Networks

Eric Osterweil
Nithya Ramanathan

2
Contents

Introduction
Network Management
Parallel Processing
Distributed Fault Tolerance
WSNs
Calibration / Model Based
Conclusion

3
What do apples, oranges, peaches have in common?

Well, they are all fruits, they all grow in
groves of trees, etc.

However, grapes are also fruits, but they grown
on vines! )
4
Defining the Problem

Debugging an iterative process of detecting and
discovering the root-cause of faults
Distinct debugging phases
Pre-deployment
During deployment
Post-deployment
Ongoing Maintenance / Performance Analysis How
different from debugging?

5
Characteristic Failures1,2

Pre-Deployment
Bugs characteristic of wireless, embedded, and
distributed platforms
During Deployment
Not receiving data at the sink
Neighbor density (or lack thereof)
badly placed nodes
Flaky/variable link connectivity

1 R. Szewczyk, J. Polastre, A. Mainwaring, D.
Culler Lessons from a Sensor Network
Expedition. In EWSN, 2004 2 A. Mainwaring, J.
Polastre, R. Szewczyk, D. Culler Wireless Sensor
Networks for Habitat Monitoring. In ACM
International Workshop on Wireless Sensor
Networks and Applications.
6
Characteristic Failures (continued)

Post-Deployment
Failed/rebooted nodes
Funny nodes/sensors
batteries with low-voltage levels
Un-calibrated sensors
Ongoing Maintenance / Performance
Low bandwidth / dropped data from certain regions
High power consumption
Poor load-balancing, or high re-transmission rate

7
Scenarios

You have just deployed a sensor network in the
forest, and are not getting data from any node
what do you do?
You are getting wildly fluctuating averages from
a region is this caused by
Actual environmental fluctuations
Bad sensors
Data randomly dropped
Calculation / algorithmic errors
Tampered nodes

8
Challenges

Existing tools fall-short for sensor networks
Limited visibility
Resource constrained nodes (Cant run gdb)
Bugs characteristic of embedded, distributed, and
wireless platforms
Cant always use existing Internet
fault-tolerance techniques (i.e. rebooting)
Extracting Debugging Information
With minimal disturbance to the network
Identifying information used to infer internal
state
Minimizing central processing
Minimizing resource consumption

9
Challenges (continued)

Applications behave differently in the field
Testing configuration changes
Cant easily log on to nodes
Identifying performance-blocking bugs
Cant continually manually monitor the network
(often physically impossible depending on
deployment environment)

10
Contents

Introduction
Network Management
Parallel Processing
Distributed Fault Tolerance
WSNs
Calibration / Model Based
Conclusion

11
What is Network Management?

I dont have to know anything about my neighbor
to count on them

12
Network Management

Observing and tracking nodes
Routers
Switches
Hosts
Ensuring that nodes are providing connectivity
i.e. doing their jobs

13
Problem

Connectivity failures versus device failures
Correlating outages with their cause(s)

14
Outage Example
15
Approach

Polling
ICMP
SNMP
Downstream event suppression
If routing has failed, ignore events about
downstream nodes
Modeling

16
Outage Example (2)
17
How does this area differ from WSNs?
18
Applied to WSNs

Similarities
Similar topologies
Intersecting operations
Network forwarding, routing, etc.
Connectivity vs. device failures
Differences
Network links
Topology dynamism

19
Contents

Introduction
Network Management
Parallel Processing
Distributed Fault Tolerance
WSNs
Calibration / Model Based
Conclusion

20
What is Parallel Processing?

If one car is fast, are 1,000 cars 1,000 times
faster?

21
Parallel Processing

Coordinating large sets of nodes
Cluster sizes can range to the order of 104 nodes
Knowing nodes states
Efficient resource allocation
Low communication overhead

22
Problem

Detecting faults
Recovery of faults
Reducing communication overhead
Maintenance
Software distributions, upgrades, etc.

23
Approach

Low-overhead state checks
ICMP
UDP-based protocols and topology sensitivity
Ganglia
Process recovery
Process checkpoints
Condor

24
How does this area differ from WSNs?
25
Applied to WSNs

Similarities
Potentially large sets of nodes
Relatively difficult to track state (due to
resources)
Tracking state is difficult
Communication overheads are limiting

26
Applied to WSNs (continued)

Differences
Topology is more dynamic in WSNs
Communications are more constrained
Deployment is not structured around computation
Energy is limiting rather than computation
overhead
WSNs are much less latency sensitive

27
Contents

Introduction
Network Management
Parallel Processing
Distributed Fault Tolerance
WSNs
Calibration / Model Based
Conclusion

28
What is Distributed Fault Tolerance?

Put me in coach PUT ME IN!

29
Distributed Fault Tolerance

High Availability is a broad category
Hot backups (failover)
Load balancing
etc.

30
Problem(s)

HA
Track status of nodes
Keeping access to critical resources available as
much as possible
Sacrifice hardware for low-latency
Load balancing
Track status of nodes
Keeping load even

31
Approach

HA
High frequency/low latency heartbeats
Failover techniques
Virtual interfaces
Shared volume mounting
Load balancing
Metric (Round robin, least connections, etc.)

32
How does this area differ from WSNs?
33
Applied to WSNs

HA / Load balancing
Similarities
Redundant resources
Differences
Where to beginMANY

34
Contents

Introduction
Network Management
Parallel Processing
Distributed Fault Tolerance
WSNs
Calibration / Model Based
Conclusion

35
What are WSNs?

Warning, any semblance of an orderly system is
purely coincidental

36
BluSH1

Shell interface for Intels IMotes
Enables interactive debugging can walk up to a
mote and access internal state
1 Tom Schoellhammer

37
Sympathy1,2

Aids in debugging
pre, during, and post-deployment
Nodes collect metrics periodically broadcast to
the sink
Sink ensures good qualities specified by
programmer
based on metrics and other gathered information
Faults are identified and categorized by metrics
and tests
Spatial-temporal correlation of distributed
events to root-cause failures
Test Injection
Proactively injects network probes to validate a
fault hypothesis
Triggers self-tests (internal actuation)

1 N. Ramanathan, E. Kohler, D. Estrin, "Towards a
Debugging System for Sensor Networks",
International Journal for Network Management,
2005. 2 N. Ramanathan, E. Kohler, L. Girod, D.
Estrin. "Sympathy A Debugging System for Sensor
Networks". in Proceedings of The First IEEE
Workshop on Embedded Networked Sensors, Tampa,
Florida, USA, November 16, 2004
38
SNMS1

Enables interactive health monitoring of WSN in
the field
3 Pieces
Parallel dissemination and collection
Query system for exported attributes
Logging system for asynchronous events
Small footprint / low overhead
Introduces overhead only with human querying

1 Gilman Tolle, David Culler, Design of an
Application-Cooperative Management System for
WSN Second EWSN, Istanbul, Turkey, January 31 -
February 2, 2005
39
Contents

Introduction
Network Management
Parallel Processing
Distributed Fault Tolerance
WSNs
Calibration / Model Based
Conclusion

40
What is Calibration and Modeling?

Hey, if you and I both think the answer is true,
then whose to say were wrong? )

41
Modeling1,2,3

Root-cause Localization in large scale systems
Process of identifying the source of problems in
a system using purely external observations
Identify anomalous behavior based on externally
observed metrics
Statistical analysis and Bayesian networks used
to identify faults

1 E. Kiciman, A. Fox Detecting application-level
failures in component-based internet services.
In IEEE Transactions on Neural Networks, Spring
2004 2 A. Fox, E. Kiciman, D. Patterson, M.
Jordan, R. Katz. Combining statistical
monitoring and predictable recovery for
self-management. In Procs. Of Workshop on
Self-Managed Systems, Oct 2004 3 E. Kiciman, L
Subramanian. Root cause localization in large
scale systems
42
Calibration1,2

Model physical phenomena in order to predict
which sensors are faulty
Model can be based on
Environment that is monitored e.g. assume that
the majority of sensors are providing correct
data and then identify sensors that make this
model inconsistent1
Assumptions about the environment e.g. in a
densely sampled area, values of neighboring
sensors should be similar2
Debugging can be viewed as sensor network system
calibration
Use system metrics instead of sensor data
Based on a model of what metrics in a properly
behaving system should look like, can identify
faulty behavior based on inconsistent metrics.
Locating and using ground truth
In situ deployments
Low communication/energy budgets
Bias
Noise

1 Jessica Feng, S. Megerian, M. Potkonjak
Model-based calibration for Sensor Networks.
IEEE International Conference on Sensors, Oct
2003 2 A Collaborative Approach to In-Place
Sensor Calibration Vladimir Bychovskiy Seapahn
Megerian et al
43
Contents

Introduction
Network Management
Parallel Processing
Distributed Fault Tolerance
WSNs
Calibration / Model Based
Conclusion

44
Promising Ideas

Management by Delegation
Naturally supports heterogeneous architectures by
distributing control over network
Dynamically tasks/empowers lower-capable nodes
using mobile code
AINs
Node can monitor its own behavior, detect,
diagnose, and repair issues
Model-based fault detection
Models of physical environment
Bayesian inference engines

45
Comparison

Network Management
Close, but includes some inflexible assumptions
Parallel Processing
Many similar, but divergent constraints
Distributed Fault Tolerance
Almost totally different
WSNs
New techniques emerging
Calibration
WSN related work becoming available

1 F. Gump et al
46
Conclusion

Distributed debugging is as distributed debugging
does1
WSNs are a particular class of distributed system
There are numerous techniques for distributed
debugging
Different conditions warrant different approaches
OR different spins to existing techniques

1 F. Gump et al
47
References

Todd Tannenbaum, Derek Wright, Karen Miller, and
Miron Livny, "Condor - A Distributed Job
Scheduler", in Thomas Sterling, editor, Beowulf
Cluster Computing with Linux, The MIT Press,
2002. ISBN 0-262-69274-0
http//www.open.com/pdfs/alarmsuppression.pdf
http//www.top500.org/
.E. Culler and J.P. Singh, Parallel Computer
Architecture A Hardware/Software Approach,
Morgan Kaufmann Publishers Inc., San Francisco,
CA, 1999, ISBN 1-55860-343-3.
The Ganglia Distributed Monitoring System
Design, Implementation, and Experience.Matthew
L. Massie, Brent N. Chun, and David E. Culler.
Parallel Computing, Vol. 30, Issue 7, July 2004.
HA-OSCAR Release 1.0 Beta Unleashing HA-Beowulf,
2nd Annual OSCAR symposium, Winnipeg, Manitoba
Canada, May 2004 .