SystemLevel Diagnosis: A Review - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

SystemLevel Diagnosis: A Review

Description:

Collects and decodes the syndrome by a diagnosis algorithm. Distributed diagnosis ... The syndrome is decoded by a distributed algorithm. Centralized Diagnosis: ... – PowerPoint PPT presentation

Number of Views:162

Avg rating:3.0/5.0

Slides: 26

Provided by: diUn

Category:

more less

Transcript and Presenter's Notes

Title: SystemLevel Diagnosis: A Review

1
System-Level Diagnosis A Review
Computer Science Department, University of Pisa

Seminars for the PhD in Computer Science
Stefano Chessa

2
Faults, Errors and Failures

Fault Abnormal physical condition
caused by temperature, cosmic rays, design
errors, age,
Classified by
Duration
Transient, Intermittent, Permanent
Nature
Logical,
Extent
Error caused by a fault affecting information
Failure system component unable to work
caused by an error

3
Redundancy Management

Fault-Detection / Masking
Information redundancy
Parity bit, codes,..
Hardware redundancy
Duplication, n-modular-redundancy,..
Fault-Diagnosis
Computation redundancy
Tests
Diagnosis algorithms
Repair and Reconfiguration
Replacement / Repair
Graceful degradation
Recovery
Set the system in a consistent state
Backward and Forward recovery

4
System-Level Diagnosis The PMC Model

Introduced in 1967 by Preparata, Metze and Chien
Consider a set V of units
The units are connected by an interconnection
structure
This defines the system graph G(V,L)
Units may be Faulty or Fault-Free
Permanent faults
Units perform mutual tests exploiting the system
interconnections
Test have binary outcomes
The Syndrome is the collection of all the test
outcomes
This defines the test assignment and the
diagnostic graph DG(V, E)

A System Graph and a Diagnostic Graph
5
System-Level Diagnosis The Tests

The test of unit v performed by unit u consists
of three steps
u sends a test input sequence to v
v performs a computation on the test sequence and
returns the output to u
Unit u compares the output of v with the expected
results
The output is binary (0 passes 1 fails)
requires a bidirectional connection
Outcome g of the test performed by unit u on unit
v (denoted as u v) defined according to the
PMC model
u v Tests performed in both directions
with outcomes respectively d,g.

6
System-Level Diagnosis Some Definitions

Centralized diagnosis
An external, reliable diagnoser
Collects and decodes the syndrome by a diagnosis
algorithm
Distributed diagnosis
The syndrome is decoded by a distributed
algorithm
Centralized Diagnosis
Given a syndrome s, a consistent fault set (CFS)
Vf is such that
For each u? Vf, v? V Vf v u
For each u,v? V Vf v u
The goal of the diagnosis is to identify a CFS
of minimum cardinality
However, in general there are many CFSs with
that property

V1 1,2,3 V2 3,4,5
7
System-Level Diagnosis Some Definitions

The diagnosis algorithm outputs sets K, F and S
K units declared fault-free
F units declared faulty
S units declared suspect
The diagnosis is correct if K? V Vf and F ? Vf
The diagnosis is complete if S ? (K ? F V)

8
System-Level Diagnosis Some Definitions

One-Step t-diagnosable systems
Correct and complete diagnosis for any Vf, with
Vf?t
t is the one-step diagnosability of the system
For any syndrome, either
There exists a unique consistent fault set of
cardinality at most t OR
The minimum cardinality of the consistent fault
set exceeds t
Sequentially s-diagnosable systems
Correct diagnosis for any Vf, with Vf?s
s is the sequential diagnosability of the system
For any syndrome, either
The consistent fault sets of cardinality ?s have
a non-empty intersection OR
The minimum cardinality of the consistent fault
set exceeds s

9
System-Level Diagnosis Three Problems

Characterization problem
Finding necessary and sufficient conditions in
order to achieve the desired diagnosability in a
system
Diagnosability problem
Given a test assignment for a system, determine
its one step and sequential diagnosability
Diagnosis problem
Given a system, a test assignment and a syndrome,
determine a consistent fault set of minimum
cardinality

10
The Characterization Problem

Let nV, and d be the diagnostic graph indegree
Necessary conditions for the one-step
diagnosability PMC67
n ? 2t1
d ? t
These conditions are sufficient if no two units
test each other HA74
A general characterization for one-step
t-diagnosable systems is also given HA74
n ? 2t1
d ? t
For each X?V, Xn 2t p, 0?p?t, X is tested
by p1 units in N X

11
The Characterization Problem

Sequential Diagnosable Systems
n ? 2t1 is a necessary condition PMC67
There exists a general characterization HX95

A sequentially 3-diagnosable system
A one-step 2-diagnosable system
12
The Diagnosability Problem

One-step Diagnosability
Firstly solved by Sullivan Sul84
The best algorithm determines the one-step
diagnosability in O(nt 2.5) RT91a
Sequential Diagnosability
The problem is Co-NP Complete RT91b
The sequential diagnosability can be determined
for several classes of graphs

13
The Diagnosis Problem

One-Step Diagnosis
(with no restrictions) the problem is NP-complete
MH76
The problem is O(n 2.5) for t-diagnosable systems
DM84
Step 1 Constructs the L-Graph
Vf is a minimum cover set of the graph (finding
Vf is NP for general graphs)

14
The Diagnosis Problem

Step 2 Finds a maximum matching in the L-Graph
Matches a fault and a fault-free unit
Step 3 Visits the L-Graph starting from a unit
not included in the matching
The unit is fault-free
Sequential Diagnosis
(with no restrictions) sequential diagnosis is
co-NP complete FK78
A general heuristic O(E) have been proposed in
Man80
Many algorithms for several classes of graphs

15
The BGM model

Permanent faults
Faulty units never produce the same (faulty)
outcomes
Sequential Diagnosis
Diagnosis is trivial
Sequential diagnosability is Co-NP Complete
RT91
One-Step Diagnosis
Necessary condition t ? n 2 BGM76
Sufficient conditions are also given BGM76
One-step diagnosability O(nt 2/log t ) RT91

16
The Comparison Models

Tests performed by comparison of the output of
adjacent units
The comparator can be either internal to units or
external
Models with reliable comparators
Two faulty units never produce the same outcomes
Mal80
Two faulty units may produce the same outcomes
CH81
MM81 the comparator is an external unit
subject to faults

17
Distributed Diagnosis

Releases the hypothesis of a centralized and
reliable diagnoser
Firstly introduced in KR80
The diagnosis is performed by a distributed
algorithm
Units perform test on adjacent units
Units exchange diagnostic information with the
neighbors
Fault-free units accept diagnostic information
only from fault-free neighbors
Same invalidation rule of the PMC model
Faults do not occur during diagnosis
This hypothesis was released in KR81 When a
unit u receives new diagnostic information from
v, it tests v and accept the information only if
v is fault-free
Optimal diagnosis algorithm in terms of number of
tests, messages and diagnosis latency RDZ95
Other models will be presented in the forthcoming
seminars

18
Probabilistic Diagnosis

Introduced in MH76
Will be presented in the next seminar by Paolo
Santi

19
Applications

Diagnosis of massively parallel systems
A very large number of processing elements
Regular interconnection structures
Generally used for huge computations
MTTF can be very small even if the single
components are reliable
Wafer-Scale Self-Test
A large number of ICs
ICs arranged in a regular pattern on the wafer
A large number of faulty ICs
Up to 50
With the current test technology testing costs
are increasing rapidly
In the near future testing costs will
exceedmanufacturing costs!!!

20
Wafer Test State of the Art

ICs tested by a test tool
A probe station and a controlling computer
The pads of each IC are probed by the probe
station
The probe station supplies the IC with power,
ground and a test sequence
The IC output sequence is delivered to the
controlling computer
The controlling computer compares the output
sequence with the expected output sequence
In alternative compares the IC outcomes with the
outcomes of a golden unit
Drawbacks
ICs generally cannot be tested at full speed
The test computer generally does not match the
actual speed of ICs
The test is generally not accurate
Limited fault coverage, tests mainly electrical
properties
Time required to test an entire wafer
ICs tested sequentially
After each test the probing station should
position the probes on the next IC
The time to test an IC increases with its
complexity

21
Wafer Test State of the Art
22
Wafer-Scale Self-Test

The ICs perform mutual tests
The tests may proceed in parallel
The tests produce binary outcomes
The test tool collects the syndrome and executes
the diagnosis algorithm
Advantages
The ICs undergo an intensive test before they are
cut and packaged
saves the cost of packaging faulty ICs
ICs tests executed at the operating speed of ICs
(or to a comparable speed)
improves test accuracy
ICs tested in parallel
reduces the time needed to complete the test

23
Wafer-Scale Self-Test
24
Wafer-Scale Self-Test Implementation Issues

Requirements
ICs Interconnection
Comparators to perform comparisons
number, placement,...
Test vectors, clock, power and ground supply to
all ICs
Syndrome collection and diagnosis algorithm
The diagnosis algorithm should be able to
diagnose a large fraction of ICs under realistic
fault situations
A Naive implementation
Large number of bus interconnections across the
entire wafer
The links would cross the ICs boundary
Wafer-level synchronization
Very complex and expensive design
Therefore, for a feasible implementation we need
to
reduce the design complexity by minimizing the
interconnections
and
release the hypothesis of wafer synchronization

25
Conclusions

The classical results of system-Level Diagnosis
are unsuitable to diagnose large, regularly
interconnected systems
The most promising applications require regular
interconnection structures
For this reason
Research has recently focused on diagnosis of
regular systems
The forthcoming seminars will address these topics

Write a Comment

User Comments (0)