Safety - PowerPoint PPT Presentation

1 / 69

About This Presentation

Title:

Safety

Description:

Failures are events occurring at specific times. Errors are more static and are inherent ... For example: periodic life tick ensures bus integrity. ... – PowerPoint PPT presentation

Number of Views:29

Avg rating:3.0/5.0

Slides: 70

Provided by: marinal1

Category:

more less

Transcript and Presenter's Notes

Title: Safety

1
Lecture 12 Safety
2

Safety-Related Faults
Safe systems
Random and systematic faults
Single-point failures
Common mode failures
Latent faults
Fail-safe state
Achieving safety

3
Safe systems
4
Faults(???, ????) and Failures(????, ??-?????)

Failures are events occurring at specific times.
Errors are more static and are inherent
characteristics of the system, as in design
errors.
Fault is an unsatisfactory system condition or
state.
Failures and errors are different kinds of faults.

Many systems present hazards, which the system
can identify and address or ignore.
However, safety is a system issue.
The system is either safe or isnot not the
software, not the electronics, not the mechanics.

6
Random and systematic faults Errors are
systematic faults they are intrinsic in the
design or implementation. If it does the wrong
thing, it will always do the wrong thing under
identical circumstances. Failure are random
faults that occur when a component breaks in the
field. Something that once functioned properly
no longer does so. The design is fine, but the
system no longer meet their design
characteristics.
7

Hardware faults may be systematic or random.
Hardware design may have an error a systematic
fault. Random faults occur only in physical
entities such as mechanical or electronic
components. Random faults cannot be designed
away. It is possible to add redundancy for fault
detection, but no one has ever made a CPU that
cannot fail.
Software faults are always systematic, because
software neither breaks down or wears out.
Run-time checking can provide the only means
within a system to identify systematic faults
that occur in rare circumstances.

8
Single-point failures
9
(No Transcript)
10
Safe system continues to be safe in the event of
any single-point failure.
11
Common mode failures
Example If a medical system uses a primary CPU
to control a radiation dose, none of the safety
mechanisms can reside on that CPU. If a primary
CPU fails, then the safety CPU presents overdose.
12
(No Transcript)
13
(No Transcript)
14
Latent faults
If a system cannot routinely validate a safety
measure, then the safety measure cannot be relied
on.
15
(No Transcript)
16
Fail-safe state
17

Achieving safety
Firewall pattern the most fundamental safety
design concept is the separation of safety
channels and non-safety channels.
Channel static path of data and control. Failure
of any component of the channel causes a failure
of the entire channel.
Isolation of all non-safety related software and
hardware components from those with safety
responsibility.
The separation of safety-critical components
simplifies the design and is more cheaper.

18
Redundancy can be homogeneous or
diverse. Homogeneous uses exact replicas of
channels and protects only against random
failures. Diverse uses different means to
perform the same function and protects against
random and systematic. Example from data
storage Homogeneous stores the data 3 times
and compares them before use. Diverse stores a
second copy with 1s complement of the data.
19

Many mechanisms identify data corruption
Parity simple 1-bit parity identifies 1-bit
error.
Hamming codes multiple parity bits to identify
n-bit error and repair (n-1)-bit errors.
Checksums sum data within a block.
Cyclic redundancy check (CRC) more complex
function on a data within a block.
Homogeneous multiple storage store data in
multiple locations and compare prior to use.
Complement multiple storage store data in a 1s
complement, simple bit-to-bit inversion and
compare before use.

Many mechanisms identify data corruption
Redundancy feedback error detection identifies
fault, doesnt attempt to correct it, may attempt
to redo the processing step that was in error,
may terminate the processing by moving to a safe
shutdown state.
Redundancy feed-forward error detection
identifies fault, tries to correct it and keep
processing.
Used when a system doesnt have a safe shutdown
state. A common implementation reconstruction
of correct data from partially corrupted values.

Safety Architectures
Single Channel Protected Design (SCPD)
Multi-Channel Voting Pattern
Homogeneous Redundancy Pattern
Diverse Redundancy Pattern
Monitor-Actuator Pattern
Watchdog Pattern
Safety Executive Pattern

Pattern general solution for a common problem.
Patterns consists of 3 aspects
Problem context that the pattern addresses
Solution the pattern itself
Consequences implications of the use of the
pattern

23
Single Channel Protected Design (SCPD)
Example a train-breaking system
24

Although any stop in the chain could lead to an
accident the channel can be made safe by applying
sufficient hazard control within the channel.
For example periodic life tick ensures bus
integrity. In case of failure, the brake engages
itself and the engine disengages itself
automatically.
The computer watchdog activates the brake if it
is not updated in the proper way and with the
proper interval.

25
(No Transcript)
26
Multi-Channel Voting Pattern
(homogeneous or diverse)
27

Homogeneous Redundancy Pattern

28
(No Transcript)
29
(No Transcript)
30

Diverse Redundancy Pattern

31
(No Transcript)
32
(No Transcript)
33

Monitor-Actuator Pattern

34
(No Transcript)
35
(No Transcript)
36

Watchdog Pattern

37
(No Transcript)
38
(No Transcript)
39

Safety Executive Pattern

40
(No Transcript)
41

8 steps to build safety systems
Identify the hazards
Determine the risks
Define the safety measurements
Create safe requirements
Create safe design
Implement safety
Assure the safety process
Test

42
Identify the hazards
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
(No Transcript)
47
(No Transcript)
48
(No Transcript)
49
(No Transcript)
50
(No Transcript)
51
(No Transcript)
52

Disadvantages of FTA
Requires that all components are identified and
characterized prior to analysis.
The tree becomes very complex in non-trivial
design. Components can typically fail in many
ways.

53
Determine the risks
54
(No Transcript)
55
(No Transcript)
56
(No Transcript)
57

Define the safety measurements
A safety measure is a behavior added to a system
to handle a hazard.
There are many ways to handle hazard
Obviation the hazard can be made physically
impossible
Education educating the users so they wont
create hazardous conditions through equipment
misuse
Alarming announce the hazard to a user, so he
can take appropriate action.

Interlocks the hazard can be removed by using
secondary devices or logic to intercede when a
hazard presents itself.
Internal checking ensure that the system can
detect its malfunctioning prior to accident
Safety equipment
Restriction of access
Labeling

Many consideration arise when controlling a
hazard
Tolerance time
Risk level
Supervision of the device constant,occasional,
unattended
Skill level of the user
Environment
Likelihood of the fault that causes a hazard

60
Create safe requirements Specifying a safe system
often means specifying negations The system
mustnot energize the laser when the safety
interlock is active.
61

Create safe design
Work from safe requirements
Adopt a fundamentally safe architecture
Periodically, during design, revisit the hazards
analysis to add hazards specific to to the design
Select programming-in-small measures that provide
the appropriate level of detection and
correction.
Ensure that independent channels lack common mode
failures.
Adopt a consistent set of strategies to handle
faults.
Build in power-on and periodic run-time tests to
identify latent faults.

62
POST power-on self test BIT build-in test
(periodic continuous tests) POST must test all
the system features used to detect faults. BIT
tests those features that periodically detect
faults. BIT typically identifies only hard
failures and doesnot identify systematic or
software failures.
63

Implement safety
Implementation decisions affect system safety
Language choices
-Compile-time checking
-Run-time checking
Exceptions vs. error codes
Safe language subsets, for example avoiding
void.

C treats you like a consenting child. Pascal
treats you like a naughty child. Ada treats you
like a criminal.
Pascal totally insulates programmers from the
most common C programming errors, but run-time
system is larger and program runs slower.
The following language features improves the
system safety
Strong compile-time checking
Strong run-time checking
Support for encapsulation and abstraction
Exception handling

65
Strong typed languages perform compile time check
for violating grammar rules and also execute
run-time assertions when the software breaks
grammatical rules during run-time, these are
called program-invariant assertions. For
example it is unacceptable not to check for
array index validity or subrange value bounding
in run-time.
66

The fundamental rules for safe programming are
Make it right before you make it fast
Verify that it remains right during program
execution
Explicitly check pre-conditional invariants.
Explicitly check post-conditional invariants.

Assure the safety process
There are a lot of standards to ensure quality
process, in which the quality of a product is
stable and repeatable from product to product.
Key process activities to improve product safety
Continuously track hazard analysis
Use peer reviews to assure quality
Verify design adherence.
Verify coding standard adherence.
Identify how each hazard is handled.

68
Testing Fault seeding simulating all faults
that affect safety to verify that system acts in
the safe, correct manner when a fault
occurs. Unit testing white-box, must
understand the detailed inner working of the
system. Fault seeding during unit tests means
violating system invariants by passing
out-of-range values or a corrupted
data. Integration testing gray-box, after
each component has passed the unit testing,
integration tests add the component together.
These tests simulate missing components,
interfaces between the components.
69

Fault seeding during integration tests may be
done by
Breaking the communication bus
Forcing component failure
Sending corrupted messages
Removing power when unexpected
Flooding the bus with messages(valid or invalid)
Validation testing black-box, testing of
end-user requirements, environmental conditions.

Write a Comment

User Comments (0)