Title: Probabilistic Fault Diagnosis in Networked Systems
1Probabilistic Fault Diagnosisin Networked
Systems
Sumanth Kidambi Advisor Dr. C.N.
Hadjicostis July 30th, 2005 Department of
Electrical and Computer Engineering University of
Illinois at Urbana Champaign
2Outline
- Introduction
- Definitions and Notation
- Maximum Likelihood Schemes
- Threshold Analysis
- Algorithm VertexDiag
- Algorithm EdgeDiag
- Results
- Future work
- Summary and Contribution
3Introduction
- Use of networks
- Safety-critical applications (defense, medicine)
- Commercial infrastructure
- Evolution of networks
- Increasing size and complexity
- Greater likelihood of a fault propagating
throughout the entire network - A single fault affects a large number of users
- Need for advanced fault management and
correlation techniques
4Fault Identification Process
Fault Diagnosis
- Fault Diagnosis Process
- Correlate observed failure indications
- Propose hypotheses to explain the alarm set
Katzela, Schwartz, 95
5Categories of Algorithms
- Deterministic
- Guarantees that entire fault set is uniquely
identified given a syndrome - Requires certain assumptions be made on the
structure of the network, behavior of faulty and
non-faulty systems. - Friedman and Simoncini1980, Kime1986
- Probabilistic
- Attempt to diagnose faulty processing elements
with high probability - No restrictive assumptions
- Work by Blough, Dahbura, Lee, Pelc
6System Model
- Random graph G(V,E) with N vertices
- Vertices represent nodes
- Two vertices connected with probability c
- Nodes fail independently of each other with
probability f.
Number of faulty nodes Bernouilli RV with
parameter f ENumFaulty Nf
7Testing Model
- Each node tests its neighbors according to
probability parameters defined a priori - 0-information tester model (Lee, Shin, 1993)
1 - r
e
1 - e
t
8Algorithm VertexDiag
Threshold
Greedy Approach
iterate
Seclude faulty nodes
9Threshold Analysis
- Given local syndrome information
- T1-1(ui) nodes that diagnose ui as faulty
- T0-1(ui) nodes that diagnose ui as faulty
gt
10Threshold Analysis
N
11VertexDiag Finding max P_diag
- For each node ui, Evaluate HF(ui)
- Find Set M such that
- Isolate tests by elements in Set M
- Repeat until SteadyState or max_iterations is
reached
12VertexDiag Results
Fault Coverage
13Algorithm EdgeDiag
Scenario Evaluation
Greedy Approach
iterate
Seclude faulty nodes
14EdgeDiag Unaccounted Negative Tests
- Get Syndrome
- For each negative test (occurs when a node
diagnoses another to be faulty) - Determine if negative test is accounted for
- Negative test is accounted for if
- Test is incident on a node has been labeled as
faulty - Test is efferent from a node which has been
labeled as faulty
Unaccounted
Accounted
1
1
2
2
15EdgeDiag Evaluating Most Likely Scenario
- For each unaccounted test, evaluate J(ui)
PScenarioSyndrome
2. u1 and u2 are non-faulty
1. u1 and u2 are faulty
3. u1 is non-faulty and u2 is faulty
3. u1 is faulty and u2 is non-faulty
16EdgeDiag Greedy Approach
- For each node ui, Evaluate JF(ui)
- Find Set M such that
- Isolate tests by elements in Set M
- Repeat until SteadyState or max_iterations is
reached
17EdgeDiag - Results
Fault Coverage
18Algorithm CombDiag
- Run VertexDiag
- Periodically check for fluctuations in diagnosis
state of VertexDiag - If fluctuating, then return
- Run EdgeDiag to detect residual faulty nodes
19CombDiag Results
Fault Coverage
20Possibilities for future extensions
- Parameter modelling
- Currently assigned a priori
- Based on empirical data
- Possibility for a heuristic scheme to refresh
probability parameters based on some belief
revision scheme - Use of syndrome information
- Algorithm currently requires entire syndrome
information (Category 1 probabilistic diagnosis
algorithm Lee,Shin1993)
21Contribution
- A generalized algorithm
- No assumptions on structure of the network Pelc
- No limit on the number of faulty nodes
- Does not assume complete fault coverage
- High diagnostic accuracy
-
22Acknowledgements
- Dr. Christoforos Hadjicostis
- Vodafone
- Prof. Swenson
23Thank youQuestions?