Title: AE 6362 Safety By Design
1AE 6362Safety By Design
- Dr. Daniel P. Schrage
- Professor, School of A.E.
- Georgia Tech
2Presentation Outline
- Introduction to System Approach to Flight Systems
Reliability, Safety and Risk Management - Basic Reliability Relationships
- Basic Flight Safety Considerations
- Basic Risk Management Considerations
- Certification Considerations for
Highly-Integrated or Complex Aircraft Systems
3The Systems Approach
- Based on General Systems Theory and the Georgia
Tech Integrated Product/Process Development
(IPPD) Methodology which combines systems
engineering and quality engineering through a top
down design decision support process - A system is defined as a whole composed of
parts in an orderly arrangement according to some
scheme or plan - Systems can be treated as interacting or
interdependent sets of components forming a
network for the purpose of fulfilling some safety
ofjective - Safety and reliability determinations need to
encompass the measurement and integration of
these separate components of the system
4Hazard and Risk Assessment
- System and components completing various
operations is best viewed in terms of
probabilities of successful completion rather
than simply as success or failure - Its assessment begins with an understanding of
the various components of the system and a
description of its function and goals - From this description safety and reliability
practitioners can identify the potential sources
of hazards and make an assessment of the
associated risks, and use the techniques of
quantified risk assessment (QRA) or probabilistic
risk assessment (PRA) in this process
5Risk Management
- Risk management is a technique which is
increasingly used in organization and by public
bodies to increase safety and reliability and
minimize losses. It involves the identification ,
evaluation and control of risks - Risk identification may be achieved by a
multiplicity of techniques - Risk evaluation encompasses the measurement and
assessment of risk. Implicit in the process is
the need for sound decision making on the nature
of potential socio-technical systems and their
predicted reliability - Risk control strategies may be classified into
four main areas Risk Avoidance, Risk Retention,
Risk Transfer, Risk Reduction
6Reliability Basics
- The steps taken to improve reliability involve
all stages in design, construction and operation
and are used for mechanical and electric
equipment as well as for electronic equipment - Reliability is defined as the probability that an
item will perform its function under stated
conditions for a stated period of time - Reliability (R) is a probability and, such, may
be anywhere in the numerical range R1 (complete
reliability and zero probability of failure) to
R0 (complete unreliability and 100 probability
of failure). Unreliability (F) will now be F1 -
R. - In order to take into account the periods during
which repair or replacement follows breakdown it
is also necessary to define availability - Availability (A) is the probability that an item
will be available at any instant of time. Thus
irrespective of the frequency of breakdown, A1
if repair or replacement is instantaneous,
admittedly an unlikely situation
7Reliability in Series (Lussers Law)
A
B
C
D
R RA X RB X RC X RD X X Rn (Assumes
the reliabilities of the series components are
mutually independent)
Assume all components have the same reliability,
Ri 0.99 100 components R 0.37 300 comps.
R0.05
A
B
Reliability in Parallel
F F1 X F2 X F3 X .X Fn
Assume all component unreliabilities are equal,
Fi 0.4
For two in parallel F .16, R .84 For three
in parallel F 0.064, R 0.936
8The Time Variation of Reliability
- The Normal Lifetime Distribution - Light bulb
wear-out example - The Failure Density Function - represents the
overall failure rate relative to the number
existing at the start - The Hazard Rate - the number remaining at any
subsequent time - The Exponential Lifetime Distribution -
Transmitter tubes example - Multiple Failure Modes - Bus engines example
9The Time Variation of Reliability
- Replacement Policy -
- Scheduled maintenance would improve availability
if it was designed to replace components towards
the end of the random failure period before
wear-out failure had set in to a significant
extent - However, if replacement took place too soon,
costs would rise but availability would not be
significantly improved - Correct selection of replacement time obviously
requires knowledge of the failure density
function for the components involved - Early Failure -
- Hazard rate decreases with time
- Often due to poor quality control during
manufacture - Can be result of incorrect installation
procedures and ever poor maintenance - The bathtub curve and other Distributions
10Greater Reliability and Enhancement
- High-integrity systems - where human safety
dictates high-reliability standards, examples are
commercial air transportation, and emergency
shutdown systems used in chemical and nuclear
reactors - Parallel Redundancy - the single most important
technique for generating high-integrity systems
is the employment of components in parallel
configurations - Standby Redundancy
- Fail-safe Design
- Voting Procedure
- Fractional Dead Time - In emergency shutdown
equipment the fraction of the time during which
the shutdown equipment is inoperative - Complex Configurations
11Limits Attainable On Reliability
- Parallel reliability is only valid if the
component reliabilities are strictly independent
of one another. Two very common effects can
compromise this independence common cause or
common mode and cascade failure - Common mode failure results when a single factor
(for example, a loss of electrical power or a
mechanical failure) simultaneously causes failure
in two or more redundant components. Can model
the presence by assuming it to be in series with
the redundant components - Cascade Failure takes place when the failure of
one component puts extra strain on other
components, which then successively fail as a
result. A potentially very serious example of
cascade failure is occasionally experience when a
surge on the electrical mains supply network can
bring out a circuit breaker, etc. - Diversity is a very important technique that can
be used to counter common mode and cascade
failure is diversity. Frequently, redundant
channels can be based on completely different
physical principles
12FAA AC 25.1309-1ASystem Design and Analysis
- Purpose Describes various acceptable means for
showing compliance with the requirements of FAR
25.1309 (b), (c), and (d) - Applicability
- FAR 25.1309 (b) provides general requirements for
a logical and acceptable inverse relationship
between the probability and the severity of each
failure condition - FAR 25.1309 (c) provides general requirements for
system monitoring, failure warning, and
capability for appropriate corrective crew action - FAR 25.1309 (d) requires that compliance be shown
primarily by analysis - While the above requirements do not apply to the
performance, flight characteristics, and
structural loads and strength requirements, it
does apply to any system on which compliance with
any of those requirements is based - Background
- The Part 25 airworthiness standards are based on
the fail-safe design concept that has evolved
over the years - The fail-safe design concept consider the effects
of failures and combinations of failure in
defining a safe design
13FAA AC 25.1309-1ASystem Design and Analysis
- The FAA Fail-Safe Design Concept
- The fail - safe design concept considers the
effects of failures and combinations of failures
in defining a safe design - a. The following basic objectives pertaining to
failures apply - (1) In any system or subsystem, the failure of
any single element, component, or connection
during any one flight (brake release through
ground deceleration to stop) should be assumed,
regardless of its probability. Such single
failures should not prevent continued safe flight
and landing, or significantly reduce the
capability of the airplane or the ability of the
crew to cope with the resulting failure
conditions - (2) Subsequent failures during the same flight,
whether detected or latent, and combinations
thereof, should also be assumed, unless their
joint probability with the first failure is shown
to be extremely improbable
14FAA AC 25.1309-1ASystem Design and Analysis
- b. The fail-safe design concept uses the
following design principles or techniques in
order to insure a safe design. A combination of
two or more is usually needed to provide a
fail-safe design i.e. to ensure that major
failure conditions are improbable and that
catastrophic failure conditions are extremely
improbable - (1) Designed Integrity and Quality
- (2) Redundancy or Backup Systems
- (3) Isolation of Systems, Components, and
Elements - (4) Proven Reliability
- (5) Failure Warning and Indication
- (6) Flightcrew Procedures
- (7) Checkability
- (8) Designed Failure Effect Limits
- (9) Designed Failure Path
- (10) Margins/Factors of Safety
- (11) Error-Tolerance
15FAA AC 25.1309-1ASystem Design and Analysis
- Qualitative Assessment
- Design Appraisal
- Installation Appraisal
- Failure Modes and Effects Analysis
- Fault Tree or Reliability Block Diagram Analysis
- Qualitative Probability Terms
- Quantitative Assessment
- Probability Analysis
- Quantitative Probability Terms
- Operational and Maintenance Considerations
- Flightcrew Action - Groundcrew Action
- Certification Check Reqts - Flight with Equipment
or Functions Inoperative
16FAA AC 25.1309-1ASystem Design and Analysis
- Step-By-Step Guide
- a. Define the system and its interfaces and
identify the functions that the system is to
perform - b. Identify and classify the significant failure
conditions - c. Chose the means to be used to determine
compliance with 25.1309 (b),( c,) and (d) - d. Implement the design and produce the data
which are agreed with the certificating office as
being acceptable to show compliance
17FAA AC 23.1309-1BEquipment, Systems, and
Installations in Part 23 Airplanes
- Somewhat similar to FAA AC 25.1309-1A
- Isnt as explicit on applying fail-safe design
approach for all sysems - Includes more descriptive Software Qualification
For Airborne System and Applications - Software
Assessment - Failure Conditions Software Levels
- Catastrophic A
- Severe-Major B
- Major C
- Minor D
- No Effect E
- Relates to the Use of RTCA/DO-178B Software
Considerations in Airborne Systems and Equipment
Certifications
18AIR 5022 Reliability and Safety Process
Integration
- Scope
- Describes several of the commonly performed
Reliability and Safety (RS) analysis tasks, with
emphasis on their inter-relationships and common
data elements - Describes how the RS process can be integrated,
reducing duplicate work effort and providing more
accurate, comprehensive, and standardized
analysis results - To illustrate how this integration can be
accomplished, several specific reliability and
safety tasks are performed on a subsystem of an
example product
19ARP 4754 Certification Considerations For Highly
Complex Aircraft Systems
- Scope
- Discusses the certification aspects of
highly-integrated or complex systems installed on
aircraft, taking into account the overall
aircraft operating environment and functions - Addresses the total life cycle for systems that
implement aircraft-level functions - Intended as a guide for both the certification
authorities and applicants for certification of
highly-integrated or complex systems,
particularly those with significant software
elements - Focus is toward ensuring that safety is
adequately assured through the development
process and substantiating the safety of the
implemented system - Specific guidance on how to do the substantiation
work is beyond the scope of this document, though
references are provided where applicable