Title: System Accidents
1System Accidents
2Accidents happen because
Systems fail in unexpected ways
Failed parts cant be isolated
In systems, parts have complex interactions that
cant be anticipated
Safety systems can make systems more dangerous
3A Story
- In the morning your coffee pot boils dry and the
pot cracks. You dig up a spare pot and make
another cup of coffee. (You are very much in
need of your coffee). Youve left enough time
for class but barely. You rush to class only to
realise that youve forgotten your apartment
keys. Unfortunately you have a co-op interview
that afternoon and need to go home to change.
Your roommate is in Arts and you wont be able to
find him or her until he/she wanders home at 11pm
tonight. Good thing one of your classmates has a
spare suit in the 4rth year study room. You talk
to your classmate only to learn that he/she took
it into the cleaners that morning, there was sale
on and they didnt have an interview for the next
couple days. You show up for your interview
late, in jeans, it goes really badly. You
apologise to the interviewer, explaining... - Adapted from Perrow, 1999.
4The cause of your bad interview is
- the mechanical failure of your coffee pot
- your human error in deciding to make more coffee
and forgetting your keys - external factors in the environment (your
roommates course schedule and the dry cleaning
sale) - the poor design of your apartment door lock
- the procedures you used (having coffee at the
start of your day, not allowing enough time,
etc.)
5Three Mile Island
- Babcock and Wilcox (builders of the equipment)
blamed the operators - human error - Metropolitan Edison (the utility) blamed the
equipment - mechanical failure - The NRC (Nuclear Regulatory Commission) blamed
the design of the system - The operators blamed the procedures
- The presidents commission blamed everyone.
6Why Systems have accidents
- All systems have events.
- An event is an occasion when something does not
work according to design due to - failure
- ageing
- maintenance
7How events become accidents
- Events become accidents when
- they go unnoticed
- the system is tightly coupled and the event
causes other events - The more complex the system is and the more
tightly coupled the system is, the more sensitive
it is to events.
8Examples
- Virginia Electric Power, 1980
- While a worker was cleaning the floor in an
auxiliary building, his shirt caught on a 3 inch
handle of a circuit breaker. Pulling it free he
activated the breaker, which shut off current to
the control rods in the reactor. The reactor
shut down automatically and it took 4 days to
bring it up again, costing hundreds of thousands
of dollars
9- Changing a light bulb in California, 1978
- A worker changing a light bulb on a control panel
in the control room dropped the bulb. The
dropped bulb created a short circuit in some
sensors and controls. The reactor automatically
shut down. The loss of the sensors meant the
operators could not monitor the plant. The
shutdown caused the core to cool too rapidly.
The operators came very close to cracking the
reactor vessel and causing a major meltdown.
10What were the 3 fundamental causes of the August
14, 2003 power blackout?
11- Inadequate situation awareness at FirstEnergy
Corporation, Ohio. - Inadequate diagnostic support
- INADEQUATE TREE TRIMMING
- Source U.S.-Canada Power System Outage Task
Force, Interim Report
12Levels of System Analysis
- Part e.g. a valve
- Unit functionally related collection of parts,
e.g. a steam generator - Subsystem an array of units, e.g. a steam
generator and its water units - Plant the collection of subsystems
- Environment everything beyond the plant
13Incident or Accident
- An event is any abnormal operation.
- An incident is a failure of a part or unit.
- An accident is a failure that results in the loss
of a subsystem, the plant, or has an impact on
the environment. - A component failure accident is a failure caused
by one component - A system accident is caused by the interaction of
multiple components
14Food chain of Accidents
15-30 system accidents
300 incidents
3000 events
15Food chain of Accidents
Grid failure Eastern NA
ACCIDENT
Power line down
INCIDENT
Tree-trimming
EVENT
16Quantifying Victims
- 1st party victims the operators and people
running the plant - 2nd party victims non-cooperating personnel
(e.g. passengers on a ship) - 3rd party victims innocent bystanders
- 4rth party victims future generations of humanity
17Quantifying Victims
- 1st party victims
- 2nd party victims
- 3rd party victims Power Grid Failure
- 4rth party victims Chernobyl
18Accident causing characteristics of Systems
- Linear interactions a process is carried out in
a sequence of steps. One failure affects the
entire system downstream. - Common mode interaction One component services
two or more parts. Common mode failure.
19Accident causing Characteristics of Systems
- Nonlinear interactions unexpected complex
interactions, e.g. proximity interactions.
leak
20Power Grid Events
- Context August afternoon, moderate loads due to
air conditioning, 2 units down - 131 another unit goes down, loads are high
- 305 another line goes down (buffer was now gone)
- 214-359 computer failures in the FE control
room (loss of situation awareness) - 315-339 3 more lines go down due to tree
contact - 339-358 7 more lines trip due to overloading
- 405-408 4 major lines trip due to overloading
cascading across eastern North America
21Power Grid Failure
- Linear interaction 1 minor line trips and then a
connected major line trips - Common mode interaction 1 major line trips and
cascades to 2 or more minor lines - Nonlinear interactions computer failures cause a
loss of SA so operators dont react to line
losses in time and the problem cascades
22Accident causing Characteristics of Systems
- complexities in the system
- tight spacing of equipment
- proximate production steps
- many common mode connections
- limited ability to isolate failed components
- unintended feedback loops
23Accident causing Characteristics of Systems
- tight coupling systems that respond very quickly
and sensitively to perturbations - contrast with loose coupling systems that
incorporate buffers or slack
24Recognizing Tight Coupling
- more time dependent processes
- invariant sequences (X must come before Y)
- the process only works in one way (no alternative
paths) - little slack, require precise quantities and
timing
25Coupling and Complexity
tight
nuclear plants
dams
rail transport
chemical plants
space missions
Coupling
assembly lines
mining
loose
post office
universities
linear
complex
Interactions
Perrow, 1999
26Power Grid Failure System Characteristics
- Tight coupling
- Tree branch can cause grid failure
- No buffer (past 305pm that day)
- Very precise timing for shutting down
lines/reactors - Many common mode connections
- Close proximity spots (tree to line)
- Difficult to isolate areas on short notice
- Many feedback loops
27Cost of Accidents (US)
- Motor vehicle accidents 722 billion/year
- Workplace accidents 8.5 billion/year
- Home accidents 18.2 billion/year
- Public Accidents 12.5 billion/year
- Wickens p.352 Table 14.1
28Workers Compensation
- provide income and medical benefits to victims
and their dependents - reduce court costs and litigation
- encourage accident prevention
- study causes of accidents
29Factors Contributing to Accidents
Natural Factors
Human, Job, Equipment, Physical
Environment,Social Environment factors
Hazard or Operator Error
Management or Design Error
Accident
System Characteristics
Wickens, p. 357
30Employee Factors
- age
- ability
- experience
- drugs/alcohol
- stress
- fatigue
- Motivation Wickens p. 358
31Job Factors
- arousal (boredom) level
- physical workload
- mental workload
- work/rest/shifts
- timing
- ergonomic hazards
32Equipment Factors
- Controls and displays
- electrical/mechanical/thermal/pressure hazards
- toxic substances
- explosive hazards
33Physical Environment Factors
- Illumination
- Noise
- Vibration
- Temperature
- Humidity
- Airborne pollutants
- Fire and radiation
34Social/Psychological Factors
- Management practices
- social norms
- morale
- training
- incentives (motivation)
35Assessing Hazards
- Criticality is a function severity, and
probability - Table 14.3, p. 431 shows a criticality scale
built out of frequency and severity
36Table 14.3
Severity/Frequency Catastrophic Critical Marginal Negligible
Frequent 1 3 7 13
Probable 2 5 9 16
Occasional 4 6 11 18
Remote 8 10 14 19
Improbable 12 15 17 20
37FMECA
- Failure mode and effects criticality analysis
- A priori analysis to anticipate failures
- From Table 14.4 p. 372
- Component Failure Mode Component Effect
Subsystem Effect Criticality Comments - Blade Come loose damage housing other
parts 6 - Blade Fracture others loosen uneven
cut 4
38FMECA
- Idea is to try to predict accidents
- Estimate criticality of them
- Inform redesign or operation of the device
39Fault Tree Analysis
- Analyze the events of an accident
- Modification from text example stick to events!
Operator fails to detect alarm
OR
Audible warning fails
Visual warning fails
40Hints on Doing Fault Tree Analysis
- Past accident Use AND
- Anticipated Accident AND/OR
- Start at the Top
- Top is the last event
- Follow chronologically
41Exercise FTA
- Within a group of four, use fault tree analysis
to - explain the coffee pot incident from the start of
class
42A Story
- In the morning your coffee pot boils dry and the
pot cracks. You dig up a spare pot and make
another cup of coffee. (You are very much in
need of your coffee). Youve left enough time
for class but barely. You rush to class only to
realise that youve forgotten your apartment
keys. Unfortunately you have a co-op interview
that afternoon and need to go home to change.
Your roommate is in Arts and you wont be able to
find him or her until he/she wanders home at 11pm
tonight. Good thing one of your classmates has a
spare suit in the 4rth year study room. You talk
to your classmate only to learn that he/she took
it into the cleaners that morning, there was sale
on and they didnt have an interview for the next
couple days. You show up for your interview
late, in jeans, it goes really badly. You
apologise to the interviewer, explaining... - Adapted from Perrow, 1999.
43Bad interview
AND
Late
Jeans
AND
No suit
Spare at cleaners
Cant find roomate
AND
No suit
AND
No keys
Suit in apt
AND
No keys
Door locks
Coffee pot etc.
44Bad interview
AND
Late
Jeans
AND
No suit
Spare at cleaners
- Areas for improvement through design
Cant find roomate
AND
No suit
AND
No keys
Suit in apt
AND
No keys
Door locks
Coffee pot etc.