Title: Software Reliability: The Physics of Failure
1Software ReliabilityThe Physics of Failure
- SJSU ISE 297
- Donald Kerns
- 7/31/00
2Thesis
The field of software engineering is a very
complex mix of technology, management and human
psychology. Glib usage of the phrase software
reliability implies gross simplifications that
make the measurement useless in all but the most
closely defined situations. A significant
increase in the sophistication of the general
field of software engineering will be necessary
before true measures of software reliability
are meaningful.
3Software is not monolithic, yet the literature
treats it as such. Different types of software
have different failure modes and consequences
4Embedded software
- Does your Furby have bugs?
- Microwave? Car?
- Well defined applications
- Little data
- Tightly constrained resources in the delivered
product drive... - High technical complexity
5Batch/Database Driven Software
- Industrial scale applications
- Highly data intensive
- Usually low of functionality is user
interaction - Once running, most apparent software defects
are actually defective data or business rules
6User interactive
- Usually event driven
- Defects include
- broken functionality
- behavior different than user expectation
- lack of interoperability with other software
7Usage of the word reliability implies defects
and failure, but the community (much less the
literature) has yet to settle on what exactly
constitutes a software failure. Most only use the
first type.
- Catastrophic failures
- Functional failure
- Poor performance
- Wrong answers
- Does not conform to user expectations
All are failures yet the standard reliability
measures are at a loss to evaluate the different
consequences.
8Frequently, software is just the most visible
element of a complex system. Almost all system
defects start out appearing as software errors.
9Example Fire Bay 1
Normal configuration 2 satellite bus
Cost saving configuration Bigger satellite bus
Separation failure was a software defect only if
the software had been modified to fire bay 1
through the bay 2 wiring and didnt.
10More examples
- The system isnt getting the signal, we must have
a software defect! - Is the system configured to scan that part of the
spectrum? - Is the system configured to report the signal
during that portion of the target identification? - Is the system configured to report signals of
that priority? - The Built In Test software is reporting a failed
component, we must have a software defect! - No, by reporting a failed component the software
is functioning CORRECTLY. - It is the component that has a defect.
11Does software age?No, but the behavior of
software depends on the environment that it is
executing in and the environment may degrade.
- Changes in environment may reveal failure modes
that have lain dormant for the life of the
software. - if (strcmp(compiler, Visual C))
- do_compile_things()
- elseif (strcmp(compiler, Borland C))
- crash_in_flames()
- Common environment changes
- Change in system configuration (OS, hardware,
applications). - Increased processor loading due to above.
- Decreased available memory due to above.
- Increased network traffic due to growth.
- Intentional or non-intentional self-modifying
code.
12Does not meet customer expectations is
considered a software defect, however there is
almost always a mismatch between customer
expectations and the economics of the situation.
- Windows normally ships with 10,000s of defects.
Would you pay 10x as much for 10x fewer defects? - Heretical thought
- The methods for producing 80-90 defect free
software have been known since the late 1960s
(inspections, formal requirements, design and
test). - Why arent they being used?
- The field of software engineering is a very
complex mix of technology, management and human
psychology.
13Finally, even if customer expectations are
clearly documented at the beginning of a software
development, and properly executed during
implementation, the installation of that software
system is a significant change to the environment
that developed those expectations. This yields
new expectations.
- Well, since that data is now on the computer we
should be able to - Share it with our other systems.
- Work on it with spreadsheets
- Put it on the web
- Share it with our Aunt Sally
- What do you mean that costs more? The software
is defective. Fix it!
14The software AND customer communities will need
to address all of these issues in a formal,
comprehensive, and consistent manner before the
phrase software reliability has meaning.
15SEI S/W Capability Maturity Model
- 1) Initial. The software process is characterized
as ad hoc, and occasionally even chaotic. Few
processes are defined, and success depends on
individual effort and heroics. - 2) Repeatable. Basic project management processes
are established to track cost, schedule, and
functionality. The necessary process discipline
is in place to repeat earlier successes on
projects with similar applications. - 3) Defined. The software process for both
management and engineering activities is
documented, standardized, and integrated into a
standard software process for the organization.
All projects use an approved, tailored version of
the organization's standard software process for
developing and maintaining software. - 4) Managed. Detailed measures of the software
process and product quality are collected. Both
the software process and products are
quantitatively understood and controlled. - 5) Optimizing. Continuous process improvement is
enabled by quantitative feedback from the process
and from piloting innovative ideas and
technologies.
16Software maturity
17Four years of consistent effort...