SafetyCritical Systems 2 - PowerPoint PPT Presentation

About This Presentation

Title:

SafetyCritical Systems 2

Description:

Risk Analysis ... Hazard Analysis ... Failure modes and effects analysis (FMEA) ... – PowerPoint PPT presentation

Number of Views:63

Avg rating:3.0/5.0

Slides: 27

Provided by: CT5

Category:

more less

Transcript and Presenter's Notes

Title: SafetyCritical Systems 2

1
Safety-Critical Systems 2

T 79.232
Risk analysis and design for safety
Ilkka Herttua

2
V - Lifecycle model
3
Overall safety lifecycle

4
Risk Analysis

Risk is a combination of the severity (class) and
frequency (probability) of the hazardous event.
Risk Analysis is a process of evaluating the
probability of hazardous events.
The Value of life??
Value of life is estimated between 0.75M 2M
GBP.
USA numbers higher.

5
Risk Analysis

Classes - Catastrophic multiple deaths
gt10
- Critical a death or severe injuries
- Marginal a severe injury
- Insignificant a minor
injury
Frequency Categories
Frequent 0,1 events/year
Probable 0,01
Occasional 0,001
Remote 0,0001
Improbable 0,00001
Incredible 0,000001

6
Hazard Analysis

A Hazard is situation in which there is actual or
potential danger to people or to environment.
Analytical techniques
- Failure modes and effects analysis (FMEA)
- Failure modes, effects and criticality
analysis (FMECA)
- Hazard and operability studies (HAZOP)
- Event tree analysis (ETA)
- Fault tree analysis (FTA)

7
Fault Tree Analysis 1
The diagram shows a heater controller for a tank
of toxic liquid. The computer controls the heater
using a power switch on the basis of information
obtained from a temperature sensor. The sensor is
connected to the computer via an electronic
interface that supplies a binary signal
indicating when the liquid is up to its required
temperature. The top event of the fault tree is
the liquid being heated above its required
temperature.

8
Fault event not fully traced to its source
Basic event, input
Fault event resulting from other events
OR connection
9
Risk acceptability

National/international decision level of an
acceptable loss (ethical, political and
economical)
Risk Analysis Evaluation
ALARP as low as reasonable practical (UK, USA)
Societal risk has to be examined when there is
a possibility of a catastrophe involving a large
number of casualties
GAMAB Globalement Au Moins Aussi Bon not
greater than before (France)
All new systems must offer a level of risk
globally at least as good as the one offered by
any equivalent existing system
MEM minimum endogenous mortality
Hazard due to a new system would not
significantly augment the figure of the minimum
endogenous mortality for an individual

10
Risk acceptability

Tolerable hazard rate (THR) A hazard rate which
guarantees that the resulting risk does not
exceed a target individual risk
SIL 4 10-9 lt THR lt 10-8 per hour
and per function
SIL 3 10-8 lt THR lt 10-7
SIL 2 10-7 lt THR lt 10-6
SIL 1 10-6 lt THR lt 10-5
Potential Loss of Life (PLL) expected number of
casualties per year

11
Current situation / critical systems

Based on the data on recent failures of critical
systems, the following can be concluded
Failures become more and more distributed and
often nation-wide (e.g. commercial systems like
credit card denial of authorisation)
The source of failure is more rarely in hardware
(physical faults), and more frequently in system
design or end-user operation / interaction
(software).
The harm caused by failures is mostly economical,
but sometimes health and safety concerns are also
involved.
Failures can impact many different aspects of
dependability (dependability ability to deliver
service that can justifiably be trusted).

12
Examples of computer failures in critical systems
13
Driving force federation

Safety-related systems have traditionally been
based on the idea of federation. This means, a
failure of any equipment should be confined, and
should not cause the collapse of the entire
system.
When computers were introduced to safety-critical
systems, the principle of federation was in most
cases kept in force.
Applying federation means that Boeing 757 / 767
flight management control system has 80 distinct
microprocessors (300, if redundancy is taken into
account). Although having this number of
microprocessors is no longer too expensive, there
are other problems caused by the principle of
federation.

14
Designing for Safety

Faults groups
- requirement/specification errors
- random component failures
- systematic faults in design (software)
Approaches to tackle problems
- right system architecture (fault-tolerant)
- reliability engineering (component, system)
- quality management (designing and producing
processes)

15
Designing for Safety

Hierarchical design
- simple modules, encapsulated functionality
- separated safety kernel safety critical
functions
Maintainability
- preventative versa corrective maintenance
- scheduled maintenance routines for whole
lifecycle
- easy to find faults and repair short MTTR
mean time to repair
Human error
- Proper HMI

16
Hardware Faults

Intermittent faults
Fault occurs and recurrs over time (loose
connector)
Transient faults
Fault occurs and may not recurr (lightning)
Electromagnetic interference
Permanent faults
Fault persists / physical processor failure
(design fault over current)

17

Fault Tolerance

Fault tolerance hardware- Achieved mainly by
redundancy Redundancy- Adds cost, weight, power
consumption, complexityOther means- Improved
maintenance, single system with better materials
(higher MTBF)

18
Redundancy types

Active Redundancy
Redundant units are always operating.
Dynamic Redundancy (standby)
Failure has to be detected
Changeover to other modul

19
Hardware redundancy techniques

Active techniques
Parallel (k of N)
Voting (majority/simple)
Standby
Operating - hot stand by
Non-operating cold stand by

20
Reliability prediction

Electronic Component
Based on propability and statictical
MIL-Handbook 217 experimental data on actual
device behaviour
Manufacture information and allocated circuit
types
Bath tube curve burn in useful life wear out

21
Safety-Critical Hardware

Fault Detection
Routines to check that hardware works
Signal comparisons
Information redundancy parity check etc..
Watchdog timers
Bus monitoring check that processor alive
Power monitoring

22
Safety-Critical Hardware

Possible hardware
COTS Microprocessors
- No safety firmware, least assurance
Redundancy makes better, but common failures
possible
Fabrication failures, microcode and
documentation errors
Use components which have history and
statistics.

23
Safety-Critical Hardware

Special Microprocessors
Collins Avionics/Rockwell AAMP2
Used in Boeing 747-400 (30 pieces)
High cost bench testing, documentation, formal
verification
Other models SparcV7, TSC695E, ERC32 (ESA
radiation-tolerant), 68HC908GP32 (airbag)

24
Safety-Critical Hardware