Title: Part 2 Session 1 Breakout
1Part 2 Session 1 Breakout 1
Old Lessons Apply in the New World
2Recap from Panel Discussion
- There has to be an optimum balance among
technical performance, time schedule and cost. -
Dr. Eberhard Rees - If eternal vigilance is the price of liberty,
then chronic unease is the price of safety. - Professor James Reason (2005, p 37) (substitute
quality for safety) - Quality and System Safety both are instrumental
in the prevention process
3What is Quality Engineering?
- Juran
- customer satisfaction, or simply "fitness for
use" (p 20) - Ishikawa
- the practice of developing, designing, producing,
and servicing a quality product that is most
economical, useful, and satisfactory to the
customer (p 64) - Crosby
- conformance to requirements (p 21)
- Deming
- a predictable degree of uniformity and
dependability that is suited to the market at low
cost. In other words, quality is meeting customer
needs and wants (p 61)
ASQ, 2001
4Quality Evolution
- Babylonian, Egyptian, Greek, Roman weights and
measures for trade - Trades and craft guilds standards (experts)
- Mass production and machinery (low level
training) - Supervisor quality monitors
- Inspectors (ala quality control)
- Deming Plan-Do-Check Action Cycle
- Juran, Feigenbaum, Ishikawa TQM
- Quality assurance
- designed in, not inspected in
- (James Reason, pp 46/7)
5What is System Safety Engineering?
- System Safety Engineering (SSE) - A subset of the
safety engineering discipline that provides
direct support to programs and projects to
achieve acceptable mishap risk through a
systematic approach of hazard analysis, risk
assessment, and risk management. - (J.R. Goodin/NASA/KSC ( retired), 2004)
- System Safety is the application of engineering
and management principles, criteria, and
techniques to optimize all aspects of safety
within the constraints of operational
effectiveness, time, and cost throughout all
phases of the system life cycle - (Air Force Safety Agency, 2000, p vii)
- System safety is
- A management doctrine, and
- A family of analytical approaches that support
that doctrine (Mohr, Jacobs Sverdrup, 2002)
6Some Analysis Types
- Preliminary Hazard Analysis (PHA)
- System Hazard Analysis (SHA)
- Subsystem Hazard Analysis (SSHA)
- Occupational Health Hazard Assessment (OHHA)
- Software Hazard Analysis
- SSE Analyses consider system limits and risks
Mohr, 2002
7Some Analytical Techniques
- Preliminary Hazard Analysis
- Failure Modes and Effects Analysis
- Fault Tree Analysis
- Event Tree Analysis
- Cause-Consequence Analysis
- Sneak Circuit Analysis
- Probabilistic Risk Assessment
- Digraph Analysis
- Hazard and Operability Study (HAZOP)
- Management Oversight and Risk Tree Analysis
(MORT) - SSE requires a toolbox of techniques there is no
one size fits all tool
Mohr, 2002
8Why System Safety Engineering?
- Support management risk decisions relative to
system hazards - Avoid fly-fail-fix-fly and pilot error
mentalities - Manage safety in the same manner as any other
design or operational parameter - Prevent accidents, not react to them
- Consider impacts to workers, the public,
product quality, productivity, environment,
facilities and equipment
Shivers, 2005
9Effective System Safety Program Attributes
- Management Commitment
- Safety Culture
- Independent Safety Organization
- Communication
- Qualified/Educated Personnel
- Well-Defined Roles, Processes and Tools
Including - Use of Technical Standards, Capture/Use of
Lessons Learned, - Audits and Reviews, Stop Work Authority
- Sufficient Resources
- (Kiessling, Shivers, and Tippet, 2004)
10Systems Thinking
- Learn to view connected events as a system
- Seeing wholes the big picture, unintended
consequences, cause and effect (including delay),
long term views, etc. - Our jobs dont exist in isolation
- Deal with root causes, not symptoms
- Learn to view connected events as a system (Peter
Senge, The Fifth Discipline ) - Seeing wholes the big picture, unintended
consequences, cause and effect (including delay),
long term views, etc. - Our jobs dont exist in isolation
- Deal with root causes, not symptoms
Senge
11Who Should Implement SSE?
- SSE is the responsibility of all technical and
management personnel on a project team - Chief engineers, systems engineers, design
engineers, project managers all must include SSE
thinking as a minimum in their work and
understand what SSE is and does - SSE practitioners generally come from the safety
and mission assurance organizations, but must be
planned for and included in the team activities
Shivers, 2005
12SSE Thinking
- SSE thinking is focused on identifying and
controlling potential failure, while design
engineering thinking might be more focused on
successful operation - Together, the two thought modes are complimentary
and lead to better chance of success, which is
the goal of each - Both thought modes need to be within the realm of
Systems Thinking in general to consider all
impacts of decisions made
Shivers, 2005
13When is SSE Implemented?
- SSE considerations must be included in the up
front conceptualization so that pertinent
information can be used in trade studies and
requirements development - SSE is applied throughout the life cycle with
appropriate tools and analyses brought to bear as
warranted - The system safety process can be applied at any
point in the system life cycle, but the greatest
advantages are achieved when it is used early in
the acquisition life cycle - The system safety process is normally repeated as
the system evolve or changes and as problem areas
are identified (Air Force Safety Agency, 2000, p
14) - Decisions made under cost and schedule pressure
can lead to hazards (Stroup and Naylor, 2001)
14SSE and the Life Cycle
- Early in the life cycle SSE considers hazards
that may occur any time in the life cycle - Early identification usually results in less
expensive corrections - Analysis can be and is done at any time in the
life cycle
Shivers, 2005
15System Safety Program Objectives
- a. Safety, consistent with mission requirements
is designed into the system in a timely,
cost-effective manner - b. Hazards are identified, evaluated, and
eliminated, or the associated risk reduced to a
level acceptable to the managing activity (MA)
throughout the entire life cycle of a system - c. Historical safety data, including lessons
learned from other systems, are considered and
used - d. Minimum risk is sought in accepting and using
new designs, materials, and production and test
techniques - e. Actions taken to eliminate hazards or reduce
risk to a level acceptable to the MA are
documented - f. Retrofit actions are minimized
- g. Changes in design, configuration, or mission
requirements are accomplished in a manner that
maintains a risk level acceptable to the MA - h. Consideration is given to safety, ease of
disposal, and demilitarization of any hazardous
materials associated with the system - i. Significant safety data are documented as
lessons learned and are submitted to data
banks, design handbooks, or specifications - j. Hazards identified after production are
minimized consistent with program restraints
Air Force Safety Agency, 2000, p 1
16Some Concept Phase SSE Tasks
- Concept Trade Studies
- Concept alternative studies include quantitative
and qualitative SSE analysis input and criteria - Concept Definition
- Requirements management, risk management
planning, feasibility and design trades safety
technical requirements generation include results
from SSE analysis
Shivers, 2005
17Some Development Phase SSE Tasks
- Development of contract requirements in the
Statement of Work and for the contract data
requirements (analyses reports) - Example analyses requirements
- System Safety Plan
- Preliminary Hazard List
- Preliminary Hazard Analysis
- Operating Support Hazard Analyses
- System Hazard Analyses
- Fault Tree Analyses (FTA)
- Probabilistic Risk Analysis (PRA)
- Design and Development
- SSE input into specification development and
verification planning
Shivers, 2005
18Some Production Phase SSE Tasks
- Fabrication integration, test and evaluation
- SSE input into ground activities and verification
- Test planning to validate safety features
- Conducting test safely
Shivers, 2005
19Some Operations Phase SSE Tasks
- Operations
- SSE input into operations and performance
validation (must be considered early as well) - Operation and Support Hazard Analyses
- Analyses from the Human Factors Program
Shivers, 2005
20Some Close Out SSE Tasks
- Decommissioning, disposal, recycling
- SSE inputs into process decisions
Shivers, 2005
21NASA SMA Roles
- SMA provides
- SSE practitioners
- Assurance that requirements are set and met
- Development of disciplines and tools
- SMA in-line engineering has a review, evaluation
and concurrence role - The SMA assurance supports engineering,
validation and verification, policy and planning,
and independent assessments
22System Safety Effort Throughout Project Lifecycle
- Proposal Support
- Requirements Definition
- Design Assessment
- Identification of Hazards
- Recommended Hazard Controls
- Assessment of Risk
- Verification of Hazard Controls
- Development of Safety Data Packages
- Interface with KSC Range Safety
- Safety Support during IT Activities
- Track Closure of Verification Items
- Safety Certification
- Prelaunch Safety Support
Goddard Space Flight Center, 2006
23SUMMARY
120
- System safety is involved throughout entire
project lifecycle - Hazards to personnel or mission success are
identified, eliminated or controlled to an
acceptable level of risk - Effectiveness of hazard controls must be verified
- Hazard analysis results and verification results
are documented
Goddard Space Flight Center, 2006
24Organizational Accidents
- Rare, sometimes catastrophic, events that occur
within complex modern technologies - Have multiple causes
- Have devastating effects on uninvolved
populations and things - Contrast with individual accidents that involve a
person as often the victim and agent of the event - Difficult to understand and control
- (James Reason, p 1)
25Generic Cause of Organizational Accidents
- All organizational accidents entail the
breaching of the barriers and safeguards that
separate damaging and injurious hazards from
vulnerable people or assets-collectively termed
losses - In individual accidents such defenses are often
either inadequate or lacking - Three factors of breaching defenses
- Human, technical, organizational
- Governed by production and protection
- (James Reason, p 2)
26Unintended Consequences
- conflicts between production and protection
pressures tend to be resolved in favour of the
former at least until a bad accident occurs. - efficient methods for work arise naturally
- safety adds restrictions to procedures
- rules become more restrictive over time
- the scope of allowable actions is reduced
- violation of procedure becomes necessary to
accomplish the job - (James Reason, p 49)
27Maintenance Can Seriously Damage Your System
- it is often latent conditions created by
maintenance lapses that either set the accident
sequence in motion or thwart its recovery. - of the various possible error types associated
with the reassembly, installation or restoration
of components, omission the failure to carry
out necessary steps in the task comprise the
largest single error type. - (James Reason, pp 85/6)
28Some Well-known Accidents
- USS Thrasher 1963 sinking
- QC of brazing, etc. Quality Problem safety
problem - Poor design, overhaul followed by severe test
- Quality - to prevent, not learn from catastrophe
- Design, manufacturing, identify safety critical
elements, test and verification, test planning - X31 Crash 1995
- Faulty Configuration Management
- Pitot tube heaters not present in design
- Failure to follow procedure, find process
escapes, identify critical failures, verification - Idaho Falls nuclear reactor explosion 1991
- Poor maintenance procedures, on the fly process
modifications, design flaws, QE supervision of
work
NASA, 2006
29Project and Systems Management
- Were developed to manage in an emerging new
environment - A multitude of government agencies, industrial
firms and other organizations, sometimes on an
international basis - Funds in the multimillion to billion dollar
category - Complex technology sometimes reaching beyond the
state of the art - Large forces of scientists, engineers,
technicians and administrative personnel - Construction of extensive and highly specialized
facilities
Rees
30Apollo Program Characteristics
- Program and systems management perspective
- Technical risk trades with cost and schedule
- Planning
- Visibility
- Management review
- Configuration control
- Penetration
- Communication
- Contracting philosophies
- Organization
- Authority, roles and responsibilities
- Innovation
- Goal focus
- Continuous study and application of systems
engineering - Relate actions to schedule and budget
31Systems Aspects
- Such projects of great magnitude and complexity,
had to be considered under the overall systems
point of view - The Apollo Program had shortcomings, setbacks,
and deficiencies during its execution all of
which challenged the management - To assure success, minimize technical risks or
actually mission risks - Keep closely to the time schedule
- Wherever possible must engage in parallel rather
than consecutive developments
Rees
32Tight Budget Control and Highest Economy in
Expenditure
- Budget Controls
- Subordinate to technical needs and the demands of
the time schedule - There is a trade-off between acceptable technical
risks or product quality, time schedule and
project cost. - To eliminate the technical risk problem,
frequently undue quality control or over-testing
of hardware is applied which delays schedules and
makes costs skyrocket.
Rees
33Solid Planning
- Master plans on hardware, software, and
overall systems -
- Technical approaches
- Resources such as facilities, manpower and funds
- Schedules
- Detailed breakdowns of the overall job and the
system into subsystems
Rees
34Visibility
- Management at all levels should know almost in
real time what is going on in the program -
- technical occurrences
- schedule progress or delays
- financial status
- From the outset of the program, proper and
effective channels and ways of communication have
to be established on the government side between
upper and lower echelons of management - Prime contractors must provide equally effective
channels down to their respective subcontractors
Rees
35Significance of Visibility
Enable management on all levels to predict
trends in the progression of the program Vital
for taking corrective steps before the program
runs into impediments The capability of
management to foretell trouble and thus avoid it
by appropriate actions was one of the major
cornerstones of the Apollo success. Dr.
Eberhard Rees
36Review Milestones
Schedule review between government and prime
contractors. Apollo reviews, for instance, in a
chronological sequence Program Requirements
Review PRR Preliminary Design
Review PDR Critical Design
Review CDR Design Certification
Review DCR Pre-Delivery Turn-Over
Review PDTR Flight Readiness
Review FRR Countdown Demonstration Test
and its Review CDDT
Rees
37Significance of Reviews
- Critically examine and assess the project status
-
- Affirm the quality of the product and its
reliability -
- Assure systems safety
- Every review resulted in protocolled action
items -
- Resolve problems
- Authorized go ahead with the next increment of
the overall plan.
Rees
38Configuration Control
- The contractor followed acceptable drawing room
practice as to procedure and discipline - Design intentions were carried through
manufacturing - Only mandatory changes were approved
- The exact configuration, known down to the most
minute detail was delivered to the launching site - Failures or unsuitable hardware or material could
be traced down to the point of origin (Apollo
management called this traceability) - Configuration control carried out in a strict
sense is very expensive. It is, therefore, vital
that these controls not be overdone and that they
are wisely introduced to prime contractors and
subcontractors.
Rees
39Application of the Penetration Principle
- Dr. Eberhard Rees on the Penetration Principle
- It permeated through the contractor
organization to the subcontractor structure.
Spawned by this approach, improved failure
analysis appeared throughout the system
in-process inspection was maintained at a high
level and receiving inspection techniques and
effectiveness were improved, among other
benefits. -
40Significance of Penetration
-
- Improved Communication Channels
-
- Created close interaction of highly dedicated,
competent technical and scientific personnel, all
motivated by the impressive challenge of a huge
complex program, no mater whether they are
government or contractor employees - Most instrumental in this government-contractor
relationship was the establishment of resident
personnel in the prime contractor plants
Rees
41Contracting Principles
- Cost-plus-fixed-fee contracts
- Used because of the uncertainties of effective,
close pricing in such a program with its many
unknowns - Incentive fee contracts
- A base fee of modest proportions
-
- Plus a scaled or incentive segment awarded to a
contractor for success in meeting program product
requirements for performance, cost, and time
schedule - Lends itself well to hardware contracts with
reasonable, well-determined milestones, cost
levels and schedule. - Award Fee contracts
- Used where parameters are not easily
distinguished in advance -
- Support service or engineering service contracts
-
- Motivational in nature
Rees
42Other Pertinent Principles
-
- Organize and motivate to achieve effective high
morale in the workforce - Delegate authority clearly, concisely and
positively to achieve timely decisions - Apply innovative concepts and techniques
courageously - Keep objectives pointed toward the goal
- Require continuing study and application of the
systems engineering approach - Relate actions to schedule and to budget
continuously
Rees
43The Apollo Management System
- Our management system evolved after some
painful experiences in the early days of Apollo.
In fact, at the beginning of the program in 1961,
there was no common system in existence within
the rather young National Aeronautics and Space
Administration. Then as the program gathered
headway and matured, the management system became
better defined, changing as necessary to keep
pace with unfolding events. Early it was learned
that in the environment of a big development
project, there can be no static system. Change
and evolution are inevitable.
Dr. Eberhard Rees
44Program Integration
- Three categories of concern
- First, there are the hardware, systems and
subsystems specialists who devote attention to
the delivery of items that are technically
adequate and qualified for mission performance -
- Second, there are the specialists who approach
the project from the point of view of controlling
costs and schedules. -
- As the third organizational element in the
grouping, there is the on-site resident
management office. To assure that project
management interests were advanced and that
decisions were made and implemented within the
designated scope of authority of the resident
group.
Rees
45Resident Management Offices
- This resident element proved to be a most
important link between government and contractor
activities - To expedite decisions, the resident manager
required functional support, which was provided
by specialized , on-site contract administration
and technical engineering staff - assigned from parent functional organizations of
the responsible Center - could make decisions on the spot or commit the
parent office or function at the Center (within
well-established limits)
Rees
46Significance of the Resident Management Office
Speed the project management process
Provide a dynamic interface with the contractor
on a continuing day-to-day basis Integrate
technical and managerial personnel The
technical functions tend to strive primarily
toward perfection to a degree that possibly
inhibits adequate attention to manufacturing and
launch schedules or cost. The contractor could
well be oriented toward schedule, costs and
profits, whereas the project manager might weigh
concern more heavily on schedule and costs.
Through the office of the resident manager, an
automatic system of checks and balances developed
to the end that each consideration received its
appropriate share of attention.
Rees
47Contractor Penetration
- Contractor penetration is necessary to
obtain visibility - There is an understandably strong desire on
the part of industry to take the control and the
funding and to do the job with but minor
government intervention. The restiveness that
stemmed from such close control gradually
dissipated early in the Apollo Program as the
benefits accruing from the industry-government
teams approach were revealed. The manager must
have control of competent technical and
administrative staff in order to conduct
activities efficiently.
Rees
48Program Management
- While centralized program management has many
values, of prime importance is the assignment of
all responsibility to single organizational
management structures, pyramiding into a single
strong personality. Of course with the
responsibility, the manager must have
commensurate authority to resolve technical,
financial, production and other problems that
otherwise require coordination and approval in
separate channels at different echelons. And the
manager must have clear, concise communications
flowing in all directions. - Dr. Eberhard Rees
49Conclusion
- System Safety and Quality
- necessary components of good program and systems
management - very similar in their objectives, but with quite
different tools and techniques - Must be applied early in the life cycle
- Must be implemented religiously throughout
program execution - Must be continuously examined and improved
- Are complementary for safety and mission success
50Acknowledgements (1 of 2)
- A Brief Overview of Selected System Safety
Analytical Approaches, R. R. Mohr, Jacobs
Engineering, 2002. - Air Force System Safety Handbook, Air Force
Safety Agency, July 2000. - Cost and Schedule The Overlooked Hazards, Ron
Stroup and Warren Naylor, Proceedings of the 19th
International System Safety Conference, 2001. - Improving Performance of the System Safety
Function at the Marshall Space Flight Center, Ed
Kiessling and Herb Shivers, NASA Marshall Space
Flight Center and Donald D. Tippett, The
University of Alabama in Huntsville, Proceedings
of the American Society for Engineering
Management Conference, 9/2004. - Human Factors A Personal Perspective, James
Reason, Human Factors Seminar, Helsinki, 2006. - Managing the Risks of Organizational Accidents,
James Reason, Ashgate, 1997 (9th reprint, 2005). - Quality 101, American Society for Quality, 2001.
51Acknowledgements (2 of 2)
- Safety and Mission Success, Technical Managers
Training, Goddard Space Flight Center, 10/2006 - Some general SSE information in this presentation
was taken from works of Pat Clemens/APT Research,
Huntsville, AL Ronnie Goodin/KSC, retired. - System Safety Engineering Awareness Training for
NASA Managers and Engineers, (not yet released),
2006. - System Safety Engineering Technical Warrant,
Herb Shivers, presented to the NASA Technical
Authority Conference, June 2005. - The Fifth Discipline The Art and Practice of the
Learning Organization, Peter Senge, Currency
Doubleday, 1990 - 1st edition, 1994 - paperback
edition. - System Failure Case Studies, NASA, Office of
Safety and Mission Assurance, Review and
Assessment Division, 2006.