SystemLevel Diagnosis: A Review - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

SystemLevel Diagnosis: A Review

Description:

Collects and decodes the syndrome by a diagnosis algorithm. Distributed diagnosis ... The syndrome is decoded by a distributed algorithm. Centralized Diagnosis: ... – PowerPoint PPT presentation

Number of Views:162
Avg rating:3.0/5.0
Slides: 26
Provided by: diUn
Category:

less

Transcript and Presenter's Notes

Title: SystemLevel Diagnosis: A Review


1
System-Level Diagnosis A Review
Computer Science Department, University of Pisa
  • Seminars for the PhD in Computer Science
  • Stefano Chessa

2
Faults, Errors and Failures
  • Fault Abnormal physical condition
  • caused by temperature, cosmic rays, design
    errors, age,
  • Classified by
  • Duration
  • Transient, Intermittent, Permanent
  • Nature
  • Logical,
  • Extent
  • Error caused by a fault affecting information
  • Failure system component unable to work
  • caused by an error

3
Redundancy Management
  • Fault-Detection / Masking
  • Information redundancy
  • Parity bit, codes,..
  • Hardware redundancy
  • Duplication, n-modular-redundancy,..
  • Fault-Diagnosis
  • Computation redundancy
  • Tests
  • Diagnosis algorithms
  • Repair and Reconfiguration
  • Replacement / Repair
  • Graceful degradation
  • Recovery
  • Set the system in a consistent state
  • Backward and Forward recovery

4
System-Level Diagnosis The PMC Model
  • Introduced in 1967 by Preparata, Metze and Chien
  • Consider a set V of units
  • The units are connected by an interconnection
    structure
  • This defines the system graph G(V,L)
  • Units may be Faulty or Fault-Free
  • Permanent faults
  • Units perform mutual tests exploiting the system
    interconnections
  • Test have binary outcomes
  • The Syndrome is the collection of all the test
    outcomes
  • This defines the test assignment and the
    diagnostic graph DG(V, E)

A System Graph and a Diagnostic Graph
5
System-Level Diagnosis The Tests
  • The test of unit v performed by unit u consists
    of three steps
  • u sends a test input sequence to v
  • v performs a computation on the test sequence and
    returns the output to u
  • Unit u compares the output of v with the expected
    results
  • The output is binary (0 passes 1 fails)
  • requires a bidirectional connection
  • Outcome g of the test performed by unit u on unit
    v (denoted as u v) defined according to the
    PMC model
  • u v Tests performed in both directions
    with outcomes respectively d,g.

6
System-Level Diagnosis Some Definitions
  • Centralized diagnosis
  • An external, reliable diagnoser
  • Collects and decodes the syndrome by a diagnosis
    algorithm
  • Distributed diagnosis
  • The syndrome is decoded by a distributed
    algorithm
  • Centralized Diagnosis
  • Given a syndrome s, a consistent fault set (CFS)
    Vf is such that
  • For each u? Vf, v? V Vf v u
  • For each u,v? V Vf v u
  • The goal of the diagnosis is to identify a CFS
    of minimum cardinality
  • However, in general there are many CFSs with
    that property

V1 1,2,3 V2 3,4,5
7
System-Level Diagnosis Some Definitions
  • The diagnosis algorithm outputs sets K, F and S
  • K units declared fault-free
  • F units declared faulty
  • S units declared suspect
  • The diagnosis is correct if K? V Vf and F ? Vf
  • The diagnosis is complete if S ? (K ? F V)

8
System-Level Diagnosis Some Definitions
  • One-Step t-diagnosable systems
  • Correct and complete diagnosis for any Vf, with
    Vf?t
  • t is the one-step diagnosability of the system
  • For any syndrome, either
  • There exists a unique consistent fault set of
    cardinality at most t OR
  • The minimum cardinality of the consistent fault
    set exceeds t
  • Sequentially s-diagnosable systems
  • Correct diagnosis for any Vf, with Vf?s
  • s is the sequential diagnosability of the system
  • For any syndrome, either
  • The consistent fault sets of cardinality ?s have
    a non-empty intersection OR
  • The minimum cardinality of the consistent fault
    set exceeds s

9
System-Level Diagnosis Three Problems
  • Characterization problem
  • Finding necessary and sufficient conditions in
    order to achieve the desired diagnosability in a
    system
  • Diagnosability problem
  • Given a test assignment for a system, determine
    its one step and sequential diagnosability
  • Diagnosis problem
  • Given a system, a test assignment and a syndrome,
    determine a consistent fault set of minimum
    cardinality

10
The Characterization Problem
  • Let nV, and d be the diagnostic graph indegree
  • Necessary conditions for the one-step
    diagnosability PMC67
  • n ? 2t1
  • d ? t
  • These conditions are sufficient if no two units
    test each other HA74
  • A general characterization for one-step
    t-diagnosable systems is also given HA74
  • n ? 2t1
  • d ? t
  • For each X?V, Xn 2t p, 0?p?t, X is tested
    by p1 units in N X

11
The Characterization Problem
  • Sequential Diagnosable Systems
  • n ? 2t1 is a necessary condition PMC67
  • There exists a general characterization HX95

A sequentially 3-diagnosable system
A one-step 2-diagnosable system
12
The Diagnosability Problem
  • One-step Diagnosability
  • Firstly solved by Sullivan Sul84
  • The best algorithm determines the one-step
    diagnosability in O(nt 2.5) RT91a
  • Sequential Diagnosability
  • The problem is Co-NP Complete RT91b
  • The sequential diagnosability can be determined
    for several classes of graphs

13
The Diagnosis Problem
  • One-Step Diagnosis
  • (with no restrictions) the problem is NP-complete
    MH76
  • The problem is O(n 2.5) for t-diagnosable systems
    DM84
  • Step 1 Constructs the L-Graph
  • Vf is a minimum cover set of the graph (finding
    Vf is NP for general graphs)

14
The Diagnosis Problem
  • Step 2 Finds a maximum matching in the L-Graph
  • Matches a fault and a fault-free unit
  • Step 3 Visits the L-Graph starting from a unit
    not included in the matching
  • The unit is fault-free
  • Sequential Diagnosis
  • (with no restrictions) sequential diagnosis is
    co-NP complete FK78
  • A general heuristic O(E) have been proposed in
    Man80
  • Many algorithms for several classes of graphs

15
The BGM model
  • Permanent faults
  • Faulty units never produce the same (faulty)
    outcomes
  • Sequential Diagnosis
  • Diagnosis is trivial
  • Sequential diagnosability is Co-NP Complete
    RT91
  • One-Step Diagnosis
  • Necessary condition t ? n 2 BGM76
  • Sufficient conditions are also given BGM76
  • One-step diagnosability O(nt 2/log t ) RT91

16
The Comparison Models
  • Tests performed by comparison of the output of
    adjacent units
  • The comparator can be either internal to units or
    external
  • Models with reliable comparators
  • Two faulty units never produce the same outcomes
    Mal80
  • Two faulty units may produce the same outcomes
    CH81
  • MM81 the comparator is an external unit
    subject to faults

17
Distributed Diagnosis
  • Releases the hypothesis of a centralized and
    reliable diagnoser
  • Firstly introduced in KR80
  • The diagnosis is performed by a distributed
    algorithm
  • Units perform test on adjacent units
  • Units exchange diagnostic information with the
    neighbors
  • Fault-free units accept diagnostic information
    only from fault-free neighbors
  • Same invalidation rule of the PMC model
  • Faults do not occur during diagnosis
  • This hypothesis was released in KR81 When a
    unit u receives new diagnostic information from
    v, it tests v and accept the information only if
    v is fault-free
  • Optimal diagnosis algorithm in terms of number of
    tests, messages and diagnosis latency RDZ95
  • Other models will be presented in the forthcoming
    seminars

18
Probabilistic Diagnosis
  • Introduced in MH76
  • Will be presented in the next seminar by Paolo
    Santi

19
Applications
  • Diagnosis of massively parallel systems
  • A very large number of processing elements
  • Regular interconnection structures
  • Generally used for huge computations
  • MTTF can be very small even if the single
    components are reliable
  • Wafer-Scale Self-Test
  • A large number of ICs
  • ICs arranged in a regular pattern on the wafer
  • A large number of faulty ICs
  • Up to 50
  • With the current test technology testing costs
    are increasing rapidly
  • In the near future testing costs will
    exceedmanufacturing costs!!!

20
Wafer Test State of the Art
  • ICs tested by a test tool
  • A probe station and a controlling computer
  • The pads of each IC are probed by the probe
    station
  • The probe station supplies the IC with power,
    ground and a test sequence
  • The IC output sequence is delivered to the
    controlling computer
  • The controlling computer compares the output
    sequence with the expected output sequence
  • In alternative compares the IC outcomes with the
    outcomes of a golden unit
  • Drawbacks
  • ICs generally cannot be tested at full speed
  • The test computer generally does not match the
    actual speed of ICs
  • The test is generally not accurate
  • Limited fault coverage, tests mainly electrical
    properties
  • Time required to test an entire wafer
  • ICs tested sequentially
  • After each test the probing station should
    position the probes on the next IC
  • The time to test an IC increases with its
    complexity

21
Wafer Test State of the Art
22
Wafer-Scale Self-Test
  • The ICs perform mutual tests
  • The tests may proceed in parallel
  • The tests produce binary outcomes
  • The test tool collects the syndrome and executes
    the diagnosis algorithm
  • Advantages
  • The ICs undergo an intensive test before they are
    cut and packaged
  • saves the cost of packaging faulty ICs
  • ICs tests executed at the operating speed of ICs
    (or to a comparable speed)
  • improves test accuracy
  • ICs tested in parallel
  • reduces the time needed to complete the test

23
Wafer-Scale Self-Test
24
Wafer-Scale Self-Test Implementation Issues
  • Requirements
  • ICs Interconnection
  • Comparators to perform comparisons
  • number, placement,...
  • Test vectors, clock, power and ground supply to
    all ICs
  • Syndrome collection and diagnosis algorithm
  • The diagnosis algorithm should be able to
    diagnose a large fraction of ICs under realistic
    fault situations
  • A Naive implementation
  • Large number of bus interconnections across the
    entire wafer
  • The links would cross the ICs boundary
  • Wafer-level synchronization
  • Very complex and expensive design
  • Therefore, for a feasible implementation we need
    to
  • reduce the design complexity by minimizing the
    interconnections
  • and
  • release the hypothesis of wafer synchronization

25
Conclusions
  • The classical results of system-Level Diagnosis
    are unsuitable to diagnose large, regularly
    interconnected systems
  • The most promising applications require regular
    interconnection structures
  • For this reason
  • Research has recently focused on diagnosis of
    regular systems
  • The forthcoming seminars will address these topics
Write a Comment
User Comments (0)
About PowerShow.com