Title: System Safety: A systematic processes
1System Safety A systematic processes
2Risk Assessment An evaluation of threats in
terms of severity and probability
3MISSION FOCUS(HAZARD VERSUS RISK)
Identifying and analyzing an existing or
potential condition that can impair mission
accomplishment (No discussion of mission
significance)
HAZARD ID Analysis
A hazard for which we have estimated the
severity, probability, and scope with which it
can impact our mission and accepted it
RISK Assessment Mgmt
4Hazard Identification and Analysis during the
Life Cycle of a system
Concept Concept Definition Development Production Deployment Termination
5Threat assessment process
Q/Q Assess Probability
Q/Q Assess Severity
6THE RISK ASSESSMENT MATRIX
Probability
Likely
Occasional
Seldom
Unlikely
Frequent
A
B
C
D
E
I
Extremely
Catastrophic
High
High
II
Critical
High
Medium
III
Moderate
Low
IV
Negligible
Risk Levels
7A thorough risk assessment process might help
you better understand a hazard you have been
exposed to many times before without incident
No beavers were assaulted in production of this
slide
8Hazard Severity
- What impact will this threat have on people?
- What impact on environment, equipment or
facilities? - What impact on mission?
9Severity Categories
- A key factor in establishing a common
understanding of a safety programs goal - MIL-STD 882 uses four categories
- Cat 1 Catastrophic
- Cat 2 Critical
- Cat 3 Marginal
- Cat 4 Negligible
10Severity Qualified
CATASTROPHIC - Complete mission failure, death,
or loss of system CRITICAL - Major mission
degradation, severe injury, occupational illness,
or major system damage MODERATE - Minor
mission degradation, injury, minor occupational
illness, or minor system damage NEGLIGIBLE -
Less than minor mission degradation, injury,
occupational illness or minor system damage
11Severity Quantified
CATASTROPHIC - Complete mission failure, death,
or loss of system and/or costs exceeding
1B CRITICAL - Major mission degradation, severe
injury, occupational illness, or major system
damage and/ or costs exceeding 1M MODERATE -
Minor mission degradation, injury, minor
occupational illness, or minor system damage
and/or costs exceeding 100,000 NEGLIGIBLE -
Less than minor mission degradation, injury,
occupational illness or minor system damage
and/or costs exceeding 10,000
12ProbabilityExpressed in terms of time,
occurrence, proximity, etc
- Use data to substantiate your assessment
- Use descriptive or quantitative terms
- Use the cumulative probability of all factors
- Examine experientially derived or anecdotal
information from operators - Acknowledge uncertainty There are no guarantees
13THE RISK ASSESSMENT MATRIX
Probability
Likely
Occasional
Seldom
Unlikely
Frequent
A
B
C
D
E
I
Extremely
Catastrophic
High
High
II
Critical
High
Medium
III
Moderate
Low
IV
Negligible
Risk Levels
14Qualified Probability Categories
- FREQUENT
- Individual piece of equipment - Occurs often in
the life of the system - Individual - Occurs often in career
- Fleet or inventory - Continuously experienced
- All Personnel exposed - continuously experienced
- LIKELY
- Individual piece of equipment - Occurs several
times in the life of the system - Individual - Occurs several times in a career
- Fleet or Inventory - Occurs often
- All Personnel exposed - Occurs often
- OCCASIONAL
- Individual piece of equipment - Will occur in
the life of the system - Individual - Will occur in a career
- Fleet or Inventory - Occurs several times in the
life of the system - All Personnel exposed - Occurs sporadically
15Qualified Probability (cont)
- SELDOM
- Individual piece of equipment - Could occur in
the life of the system - Individual person - Could occur in a career
- Fleet or Inventory - Can be expected to occur in
the life of the system - All Personnel exposed - Seldom occurs
- UNLIKELY
- Individual piece of equipment - You assume it
will not occur in the system lifecycle - Individual person - So unlikely you assume it
will not occur in a career - Fleet or Inventory - Unlikely but could occur in
the life of the system - All Personnel exposed - Occurs very rarely
16Probabilities Quantified(In terms of failure or
exposure rates)
- Unlikely 1 failure in 1,000,000,000 events
instead of assuming it will not occur - Seldom 1 failure in 500 million exposures
instead of it could occur - Occasional 1 failure in 1 million exposures
instead of it will occur - Likely 1 failure in 500,000 exposures instead of
it occurs several times - Frequent1 failure in 100,000 events instead of
it occurs often
17Qualitative AssessmentAC 25.1309-1A
- Design Appraisal
- Installation Appraisal
- Failure Modes and Effects Analysis
- Fault Tree Analysis
- Probability Assessment
18Quantitative AssessmentAC 25.1309-1A
- Probability Analysis (PRA)
- Quantitative Probability Terms (QRA)
19FAA Fail-Safe Design ConceptAC 25.1309-1A
- The fail-safe design concept considers the
effects of failures and combinations of failures
in defining a safe design - The following basic objectives apply
- In any system or subsystem, the failure of any
single element, component, or connection during
any one flight should be assumed. Such single
failure should not prevent continued safe flight
and landing - Subsequent failures during the same flight,
whether detected or latent, should also be
assumed unless their joint probability with the
first failure is demonstrated to be extremely
improbable
20Fail-Safe Design Concept
- Fail-Safe designs use the following design
principals A combination of two or more are
usually needed to provide a fail-safe design - Redundant or backup systems
- Isolation of systems, components and elements
- Demonstrated reliability / Periodic inspection
- Failure warning and indication
- Flight crew procedures
- Designed failure effect limits
- Designed failure path
- Increased margins or factors of safety
- Error-tolerant design
21Operational and Maintenance ConsiderationsAC
25.1309-1A
- Flight crew action
- Ground crew action
- Certification check requirements
- Flight with inoperative equipment
22Quantifying or Qualifying Risk?
- Remember Murphys Law for Management Technology
is dominated by those who manage what they dont
understand
23Risk Acceptance Codes
- RAC 1 Unacceptable
- RAC 2 Undesirable
- RAC 3 Acceptable with controls
- RAC 4 - Acceptable
24Risk Assessment Shortcomings
- Deficiencies in RACs represent one of the major
problems facing the system safety effort - Quantitative severity and probabilities scales in
most RAC matrices are too subjective - The RAC is a main driver of system safety efforts
- This code prioritizes the management emphasis
given to a particular problem
25THE ENHANCED RISK ASSESSMENT MATRIX
- Numeric Code is used to prioritize hazards and
determine their acceptability using a
quantitative methodology
Probability
Frequent
Occasional
Likely
Seldom
Unlikely
A
B
C
D
E
1 2 6 8 12
3 4 7 11 15
5 9 10 14 16
13 17 18 19 20
I
Catastrophic
II
Critical
III
Moderate
IV
Negligible
Risk Levels
26THE RISK PRIORITY LIST
Highest Risk
By ranking the hazards, we address them on a
worst-first basis Safety dedicated resources
are always limited and should be directed at the
highest risk
Lowest Risk Warranting action
27ASSESSMENT CHALLENGES
- Over optimism
- Over pessimism
- Misrepresentation/Misunderstanding
- Alarmism / Accident du Jour
- Indiscrimination
- Bias
- Inaccuracy
28Total Risk Exposure Codes
- Expanded scale
- Probability expressed in Exposure
- Severity expressed in Cost
- Combined determination expressed in quantifiable
terms (Now you are talking a language the
bean counters understand)
29Verification Validation
- Quality of data establishes process credibility
- Avoid GIGO syndrome
- Verify and Validate initial estimates with
updated data - Failure rates
- Exposure rates
- Project lifecycle changes
- Number of units in the system
30THE PRIORITY LISTWhat does it accomplish?
Traditional Risk Management - Personnel cant
name or prioritize hazards -- can only identify
general threats ORM - Personnel can name and
prioritize RISKS that impact them and their
mission In a mature NORMal world, every
individual personally benefits by adapting the
knowledge of prioritized hazards that exist in
their life -- (Due diligence is demonstrated when
managers see that their subordinates possess this
knowledge)
31System Safety PrecedenceA systematic approach to
Hazard ID Risk Assess and Control
- Design to minimize hazards
- Robust Redundant systems, assemblies,
components, etc - Install physical barriers
- Isolate known threatening conditions or
environments - Use Warning devices
- Alerts to prevent or reduce unwanted event
- Develop Procedures and Training
- Most commonly used abused hazard control
32Risk Analysis
33Assessing Risk Controls
342 Major Risk Control Approaches
- Employ Macro Risk Control Option(s)
- Reject Avoid Delay Transfer Spread
Compensate Reduce - Implement System Safety Precedence Control
Option(s) - Engineer Guard Improve Design Limit
Exposure Personnel Selection Train Warn
Motivate Reduce Effect - Rehabilitate
35Swiss Cheese Model of Defenses
Hazards
The ideal
The reality
Potential losses (people and assets)
James Reason Managing the Risks of
Organizational Accidents
36Swiss Cheese Model of Defenses
Some holes due to active failures
Defenses in depth
Other holes due to latent conditions
James Reason Managing the Risks of
Organizational Accidents
37Macro Options
- REJECT
- Risk outweighs benefit
- AVOID
- Go around the risk, do it in a different way
- DELAY
- Maybe the problem will be resolved by time
- If delay is an acceptable option consider if
operation is needed at all - TRANSFER
- Better qualified system, i.e.,Pros From Dover
38Macro Options (cont)
- SPREAD
- Modular or separate Hazardous Operations
- COMPENSATE
- Design parallel and redundant systems
- REDUCE
- Design for minimum risk
- Incorporate Safety Devices
- Provide Warning Devices
- Develop SOPs Train
39The Risk Control Macro Option List
- Reject
- Avoid
- Delay
- Transfer
- Spread
- Compensate
- Reduce
- QUESTION Why isnt eliminate on this list?
40Determine Risk Control Effects
- How will this effect probability?
- How will this effect severity?
- How will this impact other sub-systems?
- Some controls support other sub-systems
- Some controls may hinder other sub-systems
- What are the costs vs. benefits?
- Direct Costs
- Indirect Costs
41Direct vs. Indirect Costs
- As a rule of thumb, it is generally acceptable
to calculate indirect costs of a mishap to be 7
times greater than those costs which can directly
be accounted for in the incident or accident
42Risk Control ROTs
- Use the System Safety Precedence order
- Choose the most mission supportive combinations
- Use Integrated Product Teams
- Look for synergistic enhancements
- Man Machine Medium Mission - Management
43Use the 5 M model as you look for systemic issues
- Man
- Doesnt know
- Doesnt care
- Cant physically accomplish
- Machine
- Poor design
- Faulty maintenance
- SOPs
445 M systemic issues (cont)
- Medium
- Weak design considerations
- Lack of provisions for natural phenomena
- Management
- Inadequate procedures
- Inadequate policy
- Inadequate standards controls
- Mission
- Poorly thought out
- Poorly executed
- Weak understanding
- Incompatibilities
45Providing Management Risk Control Options
- Program Manager looking for optimum combinations
- Mission supportive
- Some Risk Controls are incompatible
- Evaluate full cost versus full benefit
- Be prepared for numbers game
- Some Controls reinforce one another
- Win-Win option
- Redundancy Robustness
- Is it needed?
- Can you afford it? i.e., , s, real estate
46Aid to Decision Making
- Be prepared to assist decisions at the right time
- Dont rush Make them as late as possible
without negative impact on timeline - Insure decisions are made at the right level
- It should be establish who makes the tough calls
- Use RAC or TREC to quantify who, what, when
- Provide Mission supportive options
- Use the Macro Option list as a starting point
- Be prepared to offer sound advice
47Dont be one who says, data or information was
not available and our department could not prove
it was unsafe to allow the operation.