Diagnosing and Debugging Wireless Sensor Networks - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Diagnosing and Debugging Wireless Sensor Networks

Description:

Metric (Round robin, least connections, etc.) How does this area differ from WSNs? ... 2 A. Fox, E. Kiciman, D. Patterson, M. Jordan, R. Katz. ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 49
Provided by: blah154
Category:

less

Transcript and Presenter's Notes

Title: Diagnosing and Debugging Wireless Sensor Networks


1
Diagnosing and Debugging Wireless Sensor Networks
  • Eric Osterweil
  • Nithya Ramanathan

2
Contents
  • Introduction
  • Network Management
  • Parallel Processing
  • Distributed Fault Tolerance
  • WSNs
  • Calibration / Model Based
  • Conclusion

3
What do apples, oranges, peaches have in common?
  • Well, they are all fruits, they all grow in
    groves of trees, etc.

However, grapes are also fruits, but they grown
on vines! )
4
Defining the Problem
  • Debugging an iterative process of detecting and
    discovering the root-cause of faults
  • Distinct debugging phases
  • Pre-deployment
  • During deployment
  • Post-deployment
  • Ongoing Maintenance / Performance Analysis How
    different from debugging?

5
Characteristic Failures1,2
  • Pre-Deployment
  • Bugs characteristic of wireless, embedded, and
    distributed platforms
  • During Deployment
  • Not receiving data at the sink
  • Neighbor density (or lack thereof)
  • badly placed nodes
  • Flaky/variable link connectivity

1 R. Szewczyk, J. Polastre, A. Mainwaring, D.
Culler Lessons from a Sensor Network
Expedition. In EWSN, 2004 2 A. Mainwaring, J.
Polastre, R. Szewczyk, D. Culler Wireless Sensor
Networks for Habitat Monitoring. In ACM
International Workshop on Wireless Sensor
Networks and Applications.
6
Characteristic Failures (continued)
  • Post-Deployment
  • Failed/rebooted nodes
  • Funny nodes/sensors
  • batteries with low-voltage levels
  • Un-calibrated sensors
  • Ongoing Maintenance / Performance
  • Low bandwidth / dropped data from certain regions
  • High power consumption
  • Poor load-balancing, or high re-transmission rate

7
Scenarios
  • You have just deployed a sensor network in the
    forest, and are not getting data from any node
    what do you do?
  • You are getting wildly fluctuating averages from
    a region is this caused by
  • Actual environmental fluctuations
  • Bad sensors
  • Data randomly dropped
  • Calculation / algorithmic errors
  • Tampered nodes

8
Challenges
  • Existing tools fall-short for sensor networks
  • Limited visibility
  • Resource constrained nodes (Cant run gdb)
  • Bugs characteristic of embedded, distributed, and
    wireless platforms
  • Cant always use existing Internet
    fault-tolerance techniques (i.e. rebooting)
  • Extracting Debugging Information
  • With minimal disturbance to the network
  • Identifying information used to infer internal
    state
  • Minimizing central processing
  • Minimizing resource consumption

9
Challenges (continued)
  • Applications behave differently in the field
  • Testing configuration changes
  • Cant easily log on to nodes
  • Identifying performance-blocking bugs
  • Cant continually manually monitor the network
    (often physically impossible depending on
    deployment environment)

10
Contents
  • Introduction
  • Network Management
  • Parallel Processing
  • Distributed Fault Tolerance
  • WSNs
  • Calibration / Model Based
  • Conclusion

11
What is Network Management?
  • I dont have to know anything about my neighbor
    to count on them

12
Network Management
  • Observing and tracking nodes
  • Routers
  • Switches
  • Hosts
  • Ensuring that nodes are providing connectivity
  • i.e. doing their jobs

13
Problem
  • Connectivity failures versus device failures
  • Correlating outages with their cause(s)

14
Outage Example
15
Approach
  • Polling
  • ICMP
  • SNMP
  • Downstream event suppression
  • If routing has failed, ignore events about
    downstream nodes
  • Modeling

16
Outage Example (2)
17
How does this area differ from WSNs?
18
Applied to WSNs
  • Similarities
  • Similar topologies
  • Intersecting operations
  • Network forwarding, routing, etc.
  • Connectivity vs. device failures
  • Differences
  • Network links
  • Topology dynamism

19
Contents
  • Introduction
  • Network Management
  • Parallel Processing
  • Distributed Fault Tolerance
  • WSNs
  • Calibration / Model Based
  • Conclusion

20
What is Parallel Processing?
  • If one car is fast, are 1,000 cars 1,000 times
    faster?

21
Parallel Processing
  • Coordinating large sets of nodes
  • Cluster sizes can range to the order of 104 nodes
  • Knowing nodes states
  • Efficient resource allocation
  • Low communication overhead

22
Problem
  • Detecting faults
  • Recovery of faults
  • Reducing communication overhead
  • Maintenance
  • Software distributions, upgrades, etc.

23
Approach
  • Low-overhead state checks
  • ICMP
  • UDP-based protocols and topology sensitivity
  • Ganglia
  • Process recovery
  • Process checkpoints
  • Condor

24
How does this area differ from WSNs?
25
Applied to WSNs
  • Similarities
  • Potentially large sets of nodes
  • Relatively difficult to track state (due to
    resources)
  • Tracking state is difficult
  • Communication overheads are limiting

26
Applied to WSNs (continued)
  • Differences
  • Topology is more dynamic in WSNs
  • Communications are more constrained
  • Deployment is not structured around computation
  • Energy is limiting rather than computation
    overhead
  • WSNs are much less latency sensitive

27
Contents
  • Introduction
  • Network Management
  • Parallel Processing
  • Distributed Fault Tolerance
  • WSNs
  • Calibration / Model Based
  • Conclusion

28
What is Distributed Fault Tolerance?
  • Put me in coach PUT ME IN!

29
Distributed Fault Tolerance
  • High Availability is a broad category
  • Hot backups (failover)
  • Load balancing
  • etc.

30
Problem(s)
  • HA
  • Track status of nodes
  • Keeping access to critical resources available as
    much as possible
  • Sacrifice hardware for low-latency
  • Load balancing
  • Track status of nodes
  • Keeping load even

31
Approach
  • HA
  • High frequency/low latency heartbeats
  • Failover techniques
  • Virtual interfaces
  • Shared volume mounting
  • Load balancing
  • Metric (Round robin, least connections, etc.)

32
How does this area differ from WSNs?
33
Applied to WSNs
  • HA / Load balancing
  • Similarities
  • Redundant resources
  • Differences
  • Where to beginMANY

34
Contents
  • Introduction
  • Network Management
  • Parallel Processing
  • Distributed Fault Tolerance
  • WSNs
  • Calibration / Model Based
  • Conclusion

35
What are WSNs?
  • Warning, any semblance of an orderly system is
    purely coincidental

36
BluSH1
  • Shell interface for Intels IMotes
  • Enables interactive debugging can walk up to a
    mote and access internal state
  • 1 Tom Schoellhammer

37
Sympathy1,2
  • Aids in debugging
  • pre, during, and post-deployment
  • Nodes collect metrics periodically broadcast to
    the sink
  • Sink ensures good qualities specified by
    programmer
  • based on metrics and other gathered information
  • Faults are identified and categorized by metrics
    and tests
  • Spatial-temporal correlation of distributed
    events to root-cause failures
  • Test Injection
  • Proactively injects network probes to validate a
    fault hypothesis
  • Triggers self-tests (internal actuation)

1 N. Ramanathan, E. Kohler, D. Estrin, "Towards a
Debugging System for Sensor Networks",
International Journal for Network Management,
2005. 2 N. Ramanathan, E. Kohler, L. Girod, D.
Estrin. "Sympathy A Debugging System for Sensor
Networks". in Proceedings of The First IEEE
Workshop on Embedded Networked Sensors, Tampa,
Florida, USA, November 16, 2004
38
SNMS1
  • Enables interactive health monitoring of WSN in
    the field
  • 3 Pieces
  • Parallel dissemination and collection
  • Query system for exported attributes
  • Logging system for asynchronous events
  • Small footprint / low overhead
  • Introduces overhead only with human querying

1 Gilman Tolle, David Culler, Design of an
Application-Cooperative Management System for
WSN Second EWSN, Istanbul, Turkey, January 31 -
February 2, 2005
39
Contents
  • Introduction
  • Network Management
  • Parallel Processing
  • Distributed Fault Tolerance
  • WSNs
  • Calibration / Model Based
  • Conclusion

40
What is Calibration and Modeling?
  • Hey, if you and I both think the answer is true,
    then whose to say were wrong? )

41
Modeling1,2,3
  • Root-cause Localization in large scale systems
  • Process of identifying the source of problems in
    a system using purely external observations
  • Identify anomalous behavior based on externally
    observed metrics
  • Statistical analysis and Bayesian networks used
    to identify faults

1 E. Kiciman, A. Fox Detecting application-level
failures in component-based internet services.
In IEEE Transactions on Neural Networks, Spring
2004 2 A. Fox, E. Kiciman, D. Patterson, M.
Jordan, R. Katz. Combining statistical
monitoring and predictable recovery for
self-management. In Procs. Of Workshop on
Self-Managed Systems, Oct 2004 3 E. Kiciman, L
Subramanian. Root cause localization in large
scale systems
42
Calibration1,2
  • Model physical phenomena in order to predict
    which sensors are faulty
  • Model can be based on
  • Environment that is monitored e.g. assume that
    the majority of sensors are providing correct
    data and then identify sensors that make this
    model inconsistent1
  • Assumptions about the environment e.g. in a
    densely sampled area, values of neighboring
    sensors should be similar2
  • Debugging can be viewed as sensor network system
    calibration
  • Use system metrics instead of sensor data
  • Based on a model of what metrics in a properly
    behaving system should look like, can identify
    faulty behavior based on inconsistent metrics.
  • Locating and using ground truth
  • In situ deployments
  • Low communication/energy budgets
  • Bias
  • Noise

1 Jessica Feng, S. Megerian, M. Potkonjak
Model-based calibration for Sensor Networks.
IEEE International Conference on Sensors, Oct
2003 2 A Collaborative Approach to In-Place
Sensor Calibration Vladimir Bychovskiy Seapahn
Megerian et al
43
Contents
  • Introduction
  • Network Management
  • Parallel Processing
  • Distributed Fault Tolerance
  • WSNs
  • Calibration / Model Based
  • Conclusion

44
Promising Ideas
  • Management by Delegation
  • Naturally supports heterogeneous architectures by
    distributing control over network
  • Dynamically tasks/empowers lower-capable nodes
    using mobile code
  • AINs
  • Node can monitor its own behavior, detect,
    diagnose, and repair issues
  • Model-based fault detection
  • Models of physical environment
  • Bayesian inference engines

45
Comparison
  • Network Management
  • Close, but includes some inflexible assumptions
  • Parallel Processing
  • Many similar, but divergent constraints
  • Distributed Fault Tolerance
  • Almost totally different
  • WSNs
  • New techniques emerging
  • Calibration
  • WSN related work becoming available

1 F. Gump et al
46
Conclusion
  • Distributed debugging is as distributed debugging
    does1
  • WSNs are a particular class of distributed system
  • There are numerous techniques for distributed
    debugging
  • Different conditions warrant different approaches
  • OR different spins to existing techniques

1 F. Gump et al
47
References
  • Todd Tannenbaum, Derek Wright, Karen Miller, and
    Miron Livny, "Condor - A Distributed Job
    Scheduler", in Thomas Sterling, editor, Beowulf
    Cluster Computing with Linux, The MIT Press,
    2002. ISBN 0-262-69274-0
  • http//www.open.com/pdfs/alarmsuppression.pdf
  • http//www.top500.org/
  • .E. Culler and J.P. Singh, Parallel Computer
    Architecture A Hardware/Software Approach,
    Morgan Kaufmann Publishers Inc., San Francisco,
    CA, 1999, ISBN 1-55860-343-3.
  • The Ganglia Distributed Monitoring System
    Design, Implementation, and Experience.Matthew
    L. Massie, Brent N. Chun, and David E. Culler.
    Parallel Computing, Vol. 30, Issue 7, July 2004.
  • HA-OSCAR Release 1.0 Beta Unleashing HA-Beowulf,
    2nd Annual OSCAR symposium, Winnipeg, Manitoba
    Canada, May 2004 .

48
Questions?
  • No? Great! )
Write a Comment
User Comments (0)
About PowerShow.com