Implementation of a FailSafe system - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Implementation of a FailSafe system

Description:

A system that can fail but only to a safe state to avoid anything bad to happen' ... Intermittent effect not always present. Permanent effect always present ... – PowerPoint PPT presentation

Number of Views:328
Avg rating:3.0/5.0
Slides: 41
Provided by: kajarighos
Category:

less

Transcript and Presenter's Notes

Title: Implementation of a FailSafe system


1
Implementation of a Fail-Safe system
  • Kajari Ghosh Dastidar

2
What is a Fail Safe system?
  • A system that can fail but only to a safe state
    to avoid anything bad to happen
  • A type of fault tolerance

3
Outline of the talk
  • What is a Fault Tolerant system?
  • Some Terminologies
  • What is fault?
  • Why have Fault Tolerance?
  • How is fault tolerance obtained?
  • Motivation
  • Related Work
  • Contribution
  • Future work and Conclusion

4
Outline of the talk
  • What is Fault Tolerance?
  • Some Terminologies
  • What is fault?
  • Why have Fault Tolerance?
  • How is fault tolerance obtained?
  • Motivation
  • Contribution
  • Related Work
  • Future work and Conclusion

5
What is Fault Tolerance?
  • A system that continues its desired computation
    in presence of system errors.
  • Making a computer system fault tolerant is one of
    the most essential steps in making the system
    dependable.

6
Outline of the talk
  • What is Fault Tolerance?
  • Some Terminologies
  • What is fault?
  • Why have Fault Tolerance?
  • How is fault tolerance obtained?
  • Motivation
  • Related Work
  • Contribution
  • Future work and Conclusion

7
Terminologies
  • Computation

Computation
s0
s2
s1
s3
s4
s5
s6
Action enabled
8
Terminologies
  • System consists of components which are
  • connected together
  • facilitate the flow of information
  • Interact
  • computes a method or an algorithm to achieve a
    goal

9
Terminologies
  • Error

Incorrect response from a system indicating a
fault is present which will lead to system
failure if no Fault Tolerance is present.
10
Outline of the talk
  • What is Fault Tolerance?
  • Some Terminologies
  • What is fault?
  • Why have Fault Tolerance?
  • How is fault tolerance obtained?
  • Motivation
  • Related Work
  • Contribution
  • Future work and Conclusion

11
What is fault?
  • Fault

Flaw in hardware or software which can result in
a failure.
12
Causes of fault
  • Physical factors from wear out
  • External disturbances
  • Design flaws or defects in hardware
  • Defects in software

13
Types of fault
  • Transient disappear without repair
  • Intermittent effect not always present
  • Permanent effect always present
  • S/W and H/W design errors
  • Operator errors
  • Externally induced damages

14
Types of fault
  • In what way the system should be made fault
    tolerant
  • depends on
  • the type of fault we want to tolerate
  • The number of faults we want to tolerate

15
Outline of the talk
  • What is Fault Tolerance?
  • Some Terminologies
  • What is fault?
  • Why have Fault Tolerance?
  • How is fault tolerance obtained?
  • Motivation
  • Related Work
  • Contribution
  • Future work and Conclusion

16
Why have fault tolerance?
  • Novice users
  • Increasing repair costs
  • Larger systems
  • Digital systems more prevalent
  • More users more dependent on digital systems

17
Why have fault tolerance?
  • Some Related Terms
  • Fault Latency
  • fault can go undetected (does not cause an
    error)
  • Fault Avoidance
  • high quality components and careful design used
  • to avoid occurrence of faults
  • Graceful degradation
  • system performs with degraded but correct
  • performance after occurrence of faults

18
Outline of the talk
  • What is Fault Tolerance?
  • Some Terminologies
  • What is fault?
  • Why have Fault Tolerance?
  • How is fault tolerance obtained?
  • Motivation
  • Related Work
  • Contribution
  • Future work and Conclusion

19
How is fault tolerance obtained?
  • Masking tolerance
  • Non-Masking tolerance
  • Fail Safe

20
How is fault tolerance obtained?
  • Masking

Structural redundancy technique that completely
masks faults within a set of redundant modules.
eg. Triple Modular Redundancy Hybrid
Redundancy
21
How is fault tolerance obtained?
  • Non Masking Tolerance

Due to any failure system goes to an unstable
state before it returns to stability eg.
Forward Recovery Systems Backward
Recovery Systems
22
How is fault tolerance obtained?
  • Fail safe

System can fail but only to a safe state to avoid
catastrophes. Liveness is compromised. eg.
common fail-safe system is the pilot-light sensor
in most gas furnaces. If the pilot light is cold,
a mechanical arrangement disengages the gas
valve, so that the house cannot fill with
unburned gas.
23
How is fault tolerance obtained?
Masking tolerance
Non-Masking tolerance
perfect
unsafe
safe
Fail safe tolerance
24
Outline of the talk
  • What is Fault Tolerance?
  • Some Terminologies
  • What is fault?
  • Why have Fault Tolerance?
  • How is fault tolerance obtained?
  • Motivation
  • Related Work
  • Contribution
  • Future work and Conclusion

25
Motivation
Effectiveness and Drawbacks of the existing models
26
Motivation
Terminologies
  • Reliability
  • Availability
  • Maintainability
  • Mean-time-to-failure (MTTF)
  • Mean-time-to-repair (MTTR)
  • Mean-time-between-failure (MTBF)
  • Fault Detection
  • Fault Containment
  • Fault Diagnosis
  • Recovery
  • Safety Critical Systems

27
Motivation
Effectiveness and Drawbacks of the existing models
Achieving masking tolerance sometimes proves to
be very expensive. Effectiveness depends on
MTBF and life of the system Non-masking
tolerance does not prevent the system to function
erroneously. The above temporary solutions make
the system internally weaker. To keep the system
safe in fail-safe system, the fault detection
should be efficient. What happens when even
fail-safe cant be guaranteed?
28
Outline of the talk
  • What is Fault Tolerance?
  • Some Terminologies
  • What is fault?
  • Why have Fault Tolerance?
  • How is fault tolerance obtained?
  • Motivation
  • Related Work
  • Contribution
  • Future work and Conclusion

29
Related Work
Dijkstra has proposed a self stabilizing system
which marked the foundation of non-masking fault
tolerant systems. Later quite a few papers were
written improving these types of systems. Arora
and Kulkarni has developed a generalized
theoretical model for detecting and correcting
the faults. Ghosh has proposed a new fail-safe
model which points to the limitations of the
previous two fault-tolerant models.
30
Outline of the talk
  • What is Fault Tolerance?
  • Some Terminologies
  • What is fault?
  • Why have Fault Tolerance?
  • How is fault tolerance obtained?
  • Motivation
  • Related Work
  • Contribution
  • Future work and Conclusion

31
Contribution
  • Present a formalization of the system
  • Present methodologies to implement the proposed
  • fail safe system
  • Try to employ distributed fault detectors where
    ever
  • possible
  • Argue that in all situations fail safe cant be
  • guaranteed
  • Where the above tolerance cant be guaranteed,
  • we propose a weaker alternative - cheaper

32
Contribution
Formalization Safety Margin
No failures
Failure
Safety Margin
33
Contribution
Implementation
Whenever a fault occurs in the system which does
not perturb its normal functioning, instead of
masking that, we can raise an alarm before
stopping the system. What type of fault is to
be detected? What should be the degree of
tolerance?
34
Contribution
Why distributed detector?
We cant reply on a single, central detector
what if it fails? More than one detectors is
desired. At least one node should detect the
fault to raise the alarm. The alarm will then
propagate through the system. MTBF plays an
important role here.
35
Contribution
Safety Margin is Null
Crash Failures Certain networks with sparse
connectivity
36
Contribution
Weaker (Cheaper) Alternative
Dijkstras ring topology first algorithm
  • There should be exactly one privilege at any
    instance
  • Number of system states (K) is always greater
    than the
  • number of nodes
  • System stabilizing after a certain period

37
Contribution
Weaker (Cheaper) Alternative
  • The system is deadlocking (or going to a
    pre-defined safe state)
  • Eventually deadlocking effect of failure cant
    be avoided
  • System state needed is two
  • A marker is introduced

38
Outline of the talk
  • What is Fault Tolerance?
  • Some Terminologies
  • What is fault?
  • Why have Fault Tolerance?
  • How is fault tolerance obtained?
  • Motivation
  • Related Work
  • Contribution
  • Future work and Conclusion

39
Future Work
Implementation of the different types of tolerant
models Making the methods more effective Idea
of Replication Reachability
40
Questions?
Write a Comment
User Comments (0)
About PowerShow.com