CRITICAL SYSTEMS PROPERTIES - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

CRITICAL SYSTEMS PROPERTIES

Description:

... with the occurrence of accidents and mishaps. ... Damage Is a measure of the loss in a mishap. ... Danger Probability of a hazard leading to a mishap. ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 41
Provided by: regi5
Category:

less

Transcript and Presenter's Notes

Title: CRITICAL SYSTEMS PROPERTIES


1
CRITICAL SYSTEMS PROPERTIES
  • Survey and Taxonomy

2
OVERVIEW
  • What is a critical system?
  • What are its properties?
  • Try to classify the properties.
  • How the different properties are compatible.

3
Critical system
  • There is a lot of disagreement concerning the
    definition of critical system.
  • There are FOUR different views on the subject.
  • There are differing views on the relationship
    between properties and the compatibility of
    techniques from these approaches.

4
The Different Approaches
  • Dependability Approach
  • Safety Approach
  • Security Approach
  • Real time systems Approach

5
Dependability Approach
  • Introduced by Jean Claude Laprie.
  • A Dependable system is one for which the reliance
    may justifiably be placed on both its
    correctness and continuity of its delivery
  • correctness pertains to the conformity with
    requirements, specifications etc.
  • Dependability encapsulates the technical
    meaning of terms such as reliability, safety,
    survivability,security and fault tolerance.

6
Failure
  • Failure is defined as the inability to provide a
    required service from the system because of
    faults.
  • Failure is a property of the external behavior of
    a system.
  • Failures can be either
  • a)Benign or
  • b)Catastrophic

7
  • Benign Failure
  • The consequences of failure Benefits
    provided by normal operation.
  • Catastrophic Failure
  • The consequences of failure gtgt Benefits
    provided by normal operation.

8
Failure and internal states of a system
  • Suppose a system progresses through a set of
    states s1, s2, s3,sN

S1
SN
S2
S3
S4
Latent Error
Fault Activated
Failure Occurs
9
Fault Tolerant system
  • The error or fault is latent from the state of
    activation until it manifests itself in the
    effective fault state.
  • A Fault tolerant system attempts to detect and
    correct such latent errors before they become
    effective.

10
Steps in Fault Tolerance
  • Error Detection Detect the latent error before
    the effective fault state.
  • This can be done by internal consistency checks
    or by comparison with redundant computations.
  • Internal tests are often referred as B.I.S.T
  • (built in self tests)

11
  • Error recovery The erroneous state is replaced
    by an acceptable valid state.
  • Error recovery could be either forward error
    recovery or backward error recovery.
  • e.g Exception handling is an example of
    forward recovery.
  • Another alternative to error recovery is FAULT
    MASKING. This is done through modular redundancy
    wherein several components perform each
    computation independently and the final result is
    selected by majority voting.

12
Failure Semantics
  • This defines the behavior that a component may
    exhibit when it fails to provide its standard
    correct behavior.
  • Omission failure fails to respond to an input
  • Timing failure when a correct response is
    delivered but outside the real time interval.
  • Response failure When a component performs an
    incorrect state change.
  • Crash failure When a component performs an
    omission failure and thereafter performs no other
    action until restarted.
  • Arbitrary (Byzanite) failures Totally
    uncontrolled and can display different symptoms
    to different observers.

13
Faults
  • Faults can be classified according to the
    semantics of failures they may induce.
  • Faults can be due to
  • Design faults
  • Component faults
  • Improper operational faults
  • Environmental anomalies e.g electromagnetic
    disturbances.

14
Fault tolerant system models
  • A FT system that covers many different fault
    modes may provide a different recovery mechanism
    for each, and will become complex which can
    itself create faults.
  • If we design to arbitrary failure we can cover
    all possible faults, however it becomes expensive
    and increases redundancy.
  • So the designer has to trade off the number of
    different kinds of faults that can be tolerated.

15
  • Hybrid Fault Model This model can tolerate
    faults of several different kinds.
  • Trading off difficulty against number of
    different kinds of faults that can be tolerated
    is performed at run time w.r.t. to the faults
    that have actually arrived.

16
Transient Faults- common faults
  • These are temp. faults generally due to
    electromagnetic disturbances. These faults go
    away immediately leaving the devices running
    normally but may corrupt the state data.
  • Self Stabilization is a uniform mechanism for
    recovering from a variety of transient faults. It
    is a phenomena in which the physical processes
    automatically recover to a stable state.

17
Coordination
  • Problem If different components simultaneously
    try to access some global data.
  • Solution Encapsulate the different activities
    within transactions and run a distributed
    concurrency control algorithm.
  • Transactions also provide FAILURE ATOMICITY If
    a transaction fails then any action it may have
    performed are undone.

18
  • The mechanisms and techniques associated with
    dependability approach tend to focus on
    reliability and fault tolerance and place less
    demand/stress on other factors.

19
Safety Engineering Approach
  • The terms reliability and safety are often
    misinterpreted.
  • Reliability is concerned with the incidence of
    failures.
  • Safety is concerned with the occurrence of
    accidents and mishaps.
  • The basic idea of this approach is to focus on
    the consequences that must be avoided rather than
    on the requirement of the system itself, as they
    may be the cause of the undesired consequence.

20
  • Hazards can be prevented by making sure that the
    states of the system that can lead to mishaps are
    avoided.
  • e.g. Air traffic control ---prevent the root
    cause of a possible mid air collision by making
    sure that the planes do not get to near each
    other.

21
Definitions
  • Damage Is a measure of the loss in a mishap.
  • Severity of a hazard Assessment of the worst
    possible damage that could result.
  • Danger Probability of a hazard leading to a
    mishap.
  • Risk Combination of hazard severity Danger.
  • Hazard Analysis Hazard identification
    Categorizing as per severity exploring the
    probability of that happening.

22
SFTA(Software Fault Trace Analysis)
  • This is one of the application of hazard analysis
    in software.
  • Goal Show that the logic contained in the
    software design will not cause mishaps.
  • To determine environment conditions that could
    lead to a mishap.

23
Dependability ---Safety
  • Safety approach focuses on the elimination of the
    undesired event while the dependability approach
    is more concerned with providing the expected
    service
  • Dependability tries to maximize the extent to
    which the system works well while safety approach
    tries to minimize the extent to which it can fail
    badly.

24
Secure Systems Approach
  • Trusted to keep secrets
  • Safeguard privacy
  • Prevent unauthorized disclosure of info.
  • Access Control ModelOne in which the hardware
    can control the read and write access to
    data.
  • Integrity levels are assigned to processes and to
    the data, and processes are allowed to read data
    only of equal or higher integrity level, and to
    write data only of equal or lower integrity
    level.

25
Kernelization
  • Its a unique feature of the secure systems
    approach, it ensures absence of unauthorized
    disclosure of info by a single mechanism called
    security kernel , also known as reference
    monitor
  • The security kernel can be thought as a stripped
    down OS that manages the protection facilities
    provided by the hardware and has functions to
    achieve the next three requirements.

26
Reference monitor
  • The reference monitor is required to be
  • CORRECT-enforce the security policy.
  • COMPLETE Mediate all accesses between
    subjects and objects.
  • TAMPERPROOF Protect itself from
    unauthorized modification.

27
  • Fault tolerance is a mechanism to ensure normal
    or acceptably degraded service despite the
    occurrence of faults, while kernelization is a
    mechanism for avoiding certain kinds of failure
    and do very little to ensure normal service.

28
Real time systems Approach
  • Real time systems are subject to both deadline
    and jitter(variability)constraints.
  • Hard real time- time deadline is very critical.
  • Soft real time time deadlines have a certain
    degree of flexibility.
  • Problems in developing a real time system are
    due to 1)deriving the timing constraints 2)
    constructing a scheduling algorithm

29
Organizing a real time system
  • Cyclic Execution Fixed scheduling of tasks is
    executed cyclically at a fixed rate.
  • Probcan lead to low CPU usage, and its
    fragility.
  • Priority Driven Each task has a certain
    priority and the executive always runs the
    highest priority tasks.
  • Prob priority inversions problems can be
    solved by priority inheritance.
  • Expensive context switching required.

30
  • The Alpha system scheduling is done according to
    a model of benefit accrual.
  • The benefit function associated with them
    indicates the overall benefit b(t) of completing
    the task at time t.
  • The Alpha system attempts to schedule activities
    in such a way that maximum overall benefit
    accrues---best effort scheduling.
  • This method is used in soft real time systems
    only.

31
Formal Models
  • The Early Models Assumed a system composed of
    active subjects(programs operating on behalf of
    users) and passive objects(repositories of
    information)
  • Subjects could read write according to
    restrictions imposed by an access control
    mechanism.
  • Probs- It allowed covert channels in which
    information is conveyed from a highly classified
    object to a more lowly cleared subject. Secondly,
    there was no semantic characterization of read
    and write.

32
  • Noninterference Model The behavior perceived by
    a lowly cleared user should be independent of the
    actions of highly cleared users.
  • Inputs from lowly and highly cleared users are
    interleaved arbitrarily.

33
Assurance Methods
  • Critical systems must not only satisfy their
    critical properties, they must be seen to do
    so.Therefore rigorous methods of assurance are
    applied.
  • Critical systems tolerate extremely small
    probabilities of failure.
  • Highly reliable systems may tolerate failure
    rates of 10-3 to 10-6 per hour.
  • Critical or Ultra critical systems 10-7 to
    10-12
  • per hour.

34
Estimating Failure rates
  • Measure directly in a test environment.
  • Calculate from known or measured failure rates of
    its components plus the knowledge of its design
    or structure.
  • N version programming --- Uses two or more
    independently developed software versions in
    conjunction with comparison to avoid system
    failures.
  • Several studies indicate that a reduction in the
    programming errors per thousand lines of source
    code reduces the density of faults discovered in
    operation.

35
Taxonomy
  • To identify combinations of critical system
    properties that are compatible.
  • Based on two attributes called interaction and
    coupling.
  • Interaction Extent to which the behavior of one
    component in a system can affect the behavior of
    other components. This can vary from linear to
    complex.
  • Linear interaction components affect only those
    components that are functionally downstream of
    them.

36
  • Complex interaction When a component may
    participate in different sequences of
    interactions with many other other components.
  • e.g. Computer systems that maintain global
    notions of coordination and consistency like
    distributed databases are considered to have
    complex interactions.
  • These interactions promote accidents and are
    hard to predict.

37
  • Coupling This refers to the extent of
    flexibility or slack in the system.
  • Loosely coupled systems are usually less time
    constrained compared to tightly coupled systems.
  • e.g. loosely coupled telephone switching
    network.
  • tightly coupled hard real time systems, they
    generally participate in complex interaction with
    the environment.

38
(No Transcript)
39
Conclusion
  • A critical computer system is one whose
    malfunction could lead to unacceptable
    consequences.
  • The determination whether a system is critical is
    made by hazard analysis.
  • There are many fields that have critical systems
    with their own individual approaches, the author
    hopes to bring about some new ideas of cross
    fertilization.

40
Acknowledgement
  • Critical System Properties
  • Survey and Taxonomy
  • John Rushby
  • Computer Science Laboratory
  • SRI International, CA
Write a Comment
User Comments (0)
About PowerShow.com