A survey of dependability patterns - PowerPoint PPT Presentation

About This Presentation
Title:

A survey of dependability patterns

Description:

Patterns diagram for the fault tolerance domain Analysis of Patterns Analysis of Patterns Cont d Conclusion There is a need to improve upon current Fault Tolerant ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 19
Provided by: EdFern6
Learn more at: https://www.cse.fau.edu
Category:

less

Transcript and Presenter's Notes

Title: A survey of dependability patterns


1
A survey of dependability patterns
  • Ingrid Buckley and Eduardo B. Fernandez
  • Dept. of Computer Science and Engineering
    Florida Atlantic UniversityBoca Raton, FL, USA
  • January 18, 2007

2
Introduction
  • Dependability is that property of a system that
    allows one to rely
  • on its service
  • Dependability for critical systems is of utter
    importance in
  • business and critical infrastructures such as
    hospitals, airport and
  • the electricity grid of a country.
  • Dependability is comprised of several pertinent
    aspects
  • Fault Tolerance
  • Safety
  • Availability
  • Reliability

3
Introduction contd
  • Fault Tolerance as it relates to systems,
    software and hardware is the ability to remain
    operable in the presence of faults.
  • Safety is the prevention of catastrophic effects
    on the environment or the users of the system
  • Availability is the ability of a system to
    perform its functions when needed.
  • Reliability measures the success with which the
    system conforms to its specification.
  • We use the Unified Modeling Language (UML), to
    represent fault tolerance patterns.

4
Objectives
  • Classify software and hardware fault
  • tolerance patterns according to their
  • objectives
  • Analyze and evaluate the classified fault
    tolerance patterns
  • Determine how to improve upon existing patterns.
  • Design new fault tolerance patterns for
    unsupported areas within critical systems.

5
Background
  • A pattern is an encapsulated solution to a
    recurrent problem that solves a specific problem
    in a given context and can be tailored to fit
    different situations.
  • A fault is a defective value in the state of a
    component or in the design of a system a fault
    is the manifestation of an error. An error is a
    defective value in an erroneous state of a system
  • A system failure occurs when there is a deviation
    from the systems specification. A failure is the
    manifestation of an error.
  • The System Development Life Cycle (SDLC) is the
    entire process of formal, logical steps taken to
    develop software.

6
Fault Tolerance
  • A system that can mask the effects of a fault and
    continue operating correctly is said to be fault
    tolerant.
  • Fault tolerance requires redundancy and diversity
    which are directly linked to reliability and
    support availability of a system.
  • Diversity in this sense speaks of having
    different versions of a function or system where
    all have the same functionality.
  • The integration of hardware and software fault
    tolerance to cope with the various kinds of
    faults that can appear in a software system is a
    good foundation towards achieving a fault
    tolerant system.
  • There are several fault tolerance patterns that
    have already been written and support different
    levels of the system architecture. Our aim is to
    focus on hardware and software fault tolerant
    patterns.

7
Fault Tolerance Contd
  • Fault Tolerance patterns are a fairly new area in
    association with critical systems , the need for
    them has increased with the need to secure
    systems against failure caused accidentally or
    intentionally by attackers.
  • Due to the diversity of attacks on different
    types of systems, it is highly important to have
    effective fault tolerance techniques to mitigate
    faults that may lead to a failure in a critical
    system.
  • To prevent failures the following is required
  • Detection - Detecting the occurrence of errors
  • Locating the unit or component where the error
    has occurred (diagnosis).
  • Masking- masking errors so as to prevent
    malfunctioning of the system if a fault occurs.
  • Containment of faults -Confine or delimit the
    effects of the error.
  • Recovery- Reconfigure the system to remove the
    faulty unit and erase the effects of the error.

8
Hardware Fault Tolerant Patterns
  • Hardware fault tolerance applies hardware
  • replication to enhance the system
  • availability/reliability in the presence of
  • hardware faults.
  • Hardware Fault Tolerance patterns
  • -The Watch Dog pattern primarily provides
  • protection against time-based faults by
  • creating an alarm whenever liveness
  • messages are not received in a given time
  • frame.

9
Hardware Fault Tolerant Patterns Contd
  • Fail Stop Processor The Fail-Stop Processor
    pattern mainly aims at transforming errors that
    lead to Byzantine/complex failures, and is based
    on redundancy and comparing output from all
    replicas to reach an agreement.
  • Acknowledgement The Acknowledgement pattern
    detects crash failures and is based on
    acknowledging the reception of input within a
    given time interval.

10
Software Fault Tolerant Patterns
  • Software fault tolerance applies software
    redundancy by means of diversity of design to
    tolerate software faults that can occur at the
    design, programming or maintaining phases of the
    software development cycle.
  • Software Fault Tolerance patterns
  • Roll forward The Roll Forward pattern is a
    failure recovery pattern which detects and
    recovers from a fault by monitoring two replicas
    for errors.

11
Software Fault Tolerant Patterns Cont
  • Input Guard Input Guard pattern stops erroneous
    input from propagating the error inside a
    component. A guard is placed at every access
    point of the component to check the validity of
    the input.
  • Fault Container The Fault Container pattern
    provides the same benefits as the combination of
    the Input Guard and the Output Guard patterns,
    because it prevents an error from being
    propagated inside and outside a given component .

12
Hardware/Software Fault Tolerance Pattern
  • The Software Redundancy Pattern deals with
    hardware, software and environmental faults at
    the same time.

13
Patterns diagram for the fault tolerance domain
14
Analysis of Patterns
Pattern Advantage Disadvantage
Watchdog Can be used improve deadlock detection, where strokes can be keyed or contains data to identify strokes from different computational steps. Does not actually checks that the internal computation processing is correct
Acknowledgement The design complexity introduced by the is very low . Does not introduce any space overhead Does not provide means to tolerate faults in a system. Rather, it provides means detect errors. It introduces relatively elevated space overhead that is proportional to the number of simultaneous errors it can deal with
Fail Stop Processor Introduces low time overhead since the processors function in parallel The processors are replicas of the original system on which the Fail-Stop Processor pattern is applied, without any additional functionality. meaning that in practice the processors can be replicas of a legacy system, which cannot be subject to any internal changes such as those that are needed if additional functionality would be required by the processors. The error on the monitored system is detected only after some input has been issued to it. The timeout must be set based on the time it takes for the input to reach the monitored system plus the time it takes for the acknowledge to reach monitoring system.
15
Analysis of Patterns Contd
Pattern Advantage Disadvantage
Roll Forward The time overhead imposed by this pattern is low when errors occur the failed replica is discarded, and the unaffected replica processes the subsequent inputs . The time overhead imposed by this pattern in the absence of errors is high before the replica Is able to receive and process new input, it must copy its new state to the other replica.
Input Guard It stops the contamination of the guarded component from erroneous input that does not conform to the specification of the guarded component. There are various ways that the Input Guard pattern can be implemented, each providing different benefits with respect to the time or space overhead introduced by the guard. Cannot prevent the propagation of errors that do conform with the specification of the guarded component. Has significant time and space over head
Fault Container It stops of errors expressed as input and output content or timing that does not conform to a component specification from entering or exiting that component. The undefined behavior of the container in the presence of errors allows its combination with error detection and error masking patterns The Fault Container pattern cannot prevent the propagation of errors that do not conform with the specification of the contained component. Unless combined with some error detection and system recovery mechanisms, this pattern will result in send- or receive-omission failures (i.e. failure to send output or receive input of the contained component).
16
Conclusion
  • There is a need to improve upon current Fault
    Tolerant Patterns based on our analysis.
  • New Fault Tolerance Patterns are necessary to
    provide dependability in distributed systems
    because many of the fault Tolerance patterns are
    very similar and do not provide a comprehensive
    support for errors that can lead to failure.

17
Future Work
  • Safety, Availability and Reliability Patterns
    being researched.
  • Defining areas of need where current Fault
    Tolerance Patterns are lacking or require
    improvement.
  • Designing new Fault Tolerance Patterns.

18
Recommendations and Questions
  • Feed back
Write a Comment
User Comments (0)
About PowerShow.com