Safety, Reliability, and Robust Design

About This Presentation

Title:

Safety, Reliability, and Robust Design

Description:

Safety, Reliability, and Robust Design in Embedded Systems * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * fig_08_22 fig_08_22 Block codes ... – PowerPoint PPT presentation

Number of Views:274

Avg rating:3.0/5.0

Slides: 51

Provided by: andrel1

Learn more at: https://eecs.ceas.uc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Safety, Reliability, and Robust Design

1

Safety, Reliability, and Robust Design
in Embedded Systems

2
Risk analysis managing uncertainty GOAL be
prepared for whatever happens Risk analysis
should be done for ALL PHASES of a
project ---planning phase ---development
phase ---the product itself Identify risks
What could you have done during the planning
stage to manage each of these risks? How
likely is it (what is probability) each one will
occur? How likely is it (what is probability)
more than one will occur? What actions will best
manage the risk if it occurs?
3
risk managementidentify, plan for risks
During planning, a Risk Table can be generated
Risks Type Probability Impact Plan
(Pointer) System not available Hardware
failure Color printer unavailable Person
nel absent (one meeting) Personnel
unavailable (several meetings) Personnel
have left project Type Performance
(product wont meet requirements) Cost (budget
overruns) Support (project cant be maintained
as planned) Schedule (project will fall
behind) Probability of this risk
occurring Impact e.g., catastrophic, critical,
marginal, negligible
4
Then table is sorted by probability and impact
and a cutoff line is defined. Everything above
this line must be managed (with a management
plan pointed to in the last column). Useful
reference Embedded Syst. Prog. Nov.
00--examples http//www.embedded.com/2000/0011/00
11feat1.htm Additional interesting reference
H. Petroski, To Engineer is Human The Role of
Failure in Successful Design, Vintage, 1992. .
risk managementidentify, plan for risks
5
professional risk analysis is proactive, not
reactive
6
Important concepts for embedded systems Risk
(Probability of failure) Severity Increased
risk ? decreased safety Safety
failurespossible causes incorrect or
incomplete specification bad design improper
implementation faulty component improper use
RELIABILITY what is the probability of
failure?
7
Some ways to determine reliability --product
performs consistently as expected --MTBF (mean
time between failures) is long --system behavior
is DETERMINISTIC --system responds or FAILS
GRACEFULLY to out-of-bounds or unexpected
conditions and recovers if possible
8
Definitions Fault incorrect or unacceptable
state or condition Fault duration and frequency
determines clasification transientfrom
unexpected external condition-soft intermittent
unstable hardware or marginal design
periodic / aperiodic permanentfailed
component, e.g.hard Error static, inherent
characteristic of system Failure dynamic,
occurs at specific time Possible fault
consequences inappropriate action timingevent
occurs too early or too late sequence of events
incorrect quantitywrong amount of energy or
substance used
9

Achieving reliability
safe design
fault detection
fault management
fault tolerantsystem recovers, fault not
detected
e.g., packet transfers
Definition of reliability for embedded system
probability that a failure is detected by the
user is less than a specified threshold

10
Examplessection 8.5read these
carefully! Ariane 5 rocket register
overflow64-bit word assigned to 16-bit register
in a reused subsystem Mars Pathfinder mission
1997lower priority tasks were allowed to hog
resources, higher priority tasks could not
execute 2004 Mars missionfile management
problems Many more examples in articles at
embedded.com
11
How do we define safety? One criterion single
point failure of a single component will not
lead to unsafe condition common-mode failure
failure of multiple components due to a single
failure event will not lead to an unsafe
condition Safety must be considered THROUGHOUT
the project
12
fig_08_00
Embedded system designproject components Dev
elopment process (waterfall model) Alternati
ve process models Need risk analysis AT EACH
INCREMENT (Aanalysis, Ddesign, Iimplement,
Ttest, Mmaintenance) Basic waterfall model
A--gtD--gtI--gtT--gtM Prototyping A--gtD--gtI--gtT--gtM
Incremental A--gtD--gtI--gtT--gtM--gtA--gtD--gtI--gtT
--gt --gtM Component based A--gtD--gtLibrary--gtInt
egrate--gtT--gtM I
fig_08_00
13

Specifications
Identify hazards
Calculate risk
Define safety measures
Specification document should include safety
standards and guidelines which system complies
with
e.g. Underwriters Laboratory, FCC, FDA, FAA,
AEC, NASA, ISO, NHTSA, etc.
Some industry standards / procedures
FAA DO178B (and newer Do178C).
Medical device industry ISO 14971
Nuclear power industry ( others) IEC 61508,
"Functional Safety of Electrical/Electronic/Progra
mmable
Electronic Safety-related Systems (E/E/PE, or
E/E/PES)" areas

14
Methods

--Process and Tool Chain evaluation (this is the
main focus of DO178B)
--Probability-based models
--Formal methods
--Traditional methods for code testing, e.g.,
basis path testing
--Standard code-checking tools (e.g., avoiding
inclusion of redundant code)

15
fig_08_01
Design and review process steps
fig_08_01
16
fig_08_02

Coding
Trade-off
traditional efficiency (speed/space) vs better
reliability
Some examples
Array declarations const may not be required but
is preferred, e.g.
const int size 5 int myarraysize
Make sure initialization is explicit, do not
depend on compiler, e.g.
int tot 0 for (int j0 jlt10 j) tot tot
j
Do not depend on lazy evaluation, e.g.
if (( a ! 0) (b/a lt 0)) ? if (a!0)
if (b/a lt 0)

17
fig_08_02
Primitive C error-handling May not be
sufficient for embedded system Assert
fig_08_02
18
fig_08_03
Example Good for debugging stage,
allows controlled crash Not robust enough for
final code
fig_08_03
19
fig_08_04
Jump statements consequences may not be
acceptable
fig_08_04
20
fig_08_05
Example Better high compiler
warning level, variable typing, e.g.
fig_08_05
21
fig_08_06
Example system Control Memory Data /
comm Power / reset Peripherals Clock
fig_08_06
22
fig_08_07
Basic method redundancy (triple)
fig_08_07
23
fig_08_08
Higher redundancy
fig_08_08
24
fig_08_09
Reduced capability in case of failure / error
fig_08_09
25
fig_08_10
Alternative monitor only
fig_08_10
26
fig_08_11
Bussing interconnection architectures
fig_08_11
27
fig_08_12
Sequential still can fail at one point
fig_08_12
28
fig_08_13
Better ring
fig_08_13
29
fig_08_14
Even better ring with redundancy
fig_08_14
30
fig_08_15
Signal values magnitude duration ignore det
ect / warn react
fig_08_15
31
fig_08_16
Data errors detect / correct Example errors
in 3 bits
fig_08_16
32
fig_08_17
Error detection example
fig_08_17
33
fig_08_18
Hamming code (review)
fig_08_18
34
fig_08_22
Block codes example Lateral longitudinal
parity
fig_08_22
35
fig_08_23
fig_08_23
36
fig_08_24
More complex codes use the field Z2
fig_08_24
37
fig_08_25
Shift register for encoding, decoding
fig_08_25
38
fig_08_26
Checking data
fig_08_26
39
fig_08_27
syndrome calculator
fig_08_27
40
fig_08_28
Encoding
fig_08_28
41
fig_08_29
Some polynomials must choose correct one
fig_08_29
42
fig_08_30
Power system
fig_08_30
43
fig_08_31
Redundancy and power monitoring
fig_08_31
44
fig_08_32
Potential actions
fig_08_32
45
fig_08_33
Using backups
fig_08_33
46
fig_08_34
Backups short-term fix
fig_08_34
47
fig_08_35
Bus faults buffering
fig_08_35
48
fig_08_36
Bus testing
fig_08_36
49
fig_08_37
Interface system monitoring and testing
fig_08_37
50
table_08_00
Example common fault analysis
table_08_00

Write a Comment

User Comments (0)