Software Reliability - PowerPoint PPT Presentation

About This Presentation
Title:

Software Reliability

Description:

Software Reliability 25 September 2006 About the Evening Lectures Viewing is required All lectures will be recorded and shown during a regular class period Working on ... – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 41
Provided by: Department518
Learn more at: http://wwwx.cs.unc.edu
Category:

less

Transcript and Presenter's Notes

Title: Software Reliability


1
Software Reliability
  • 25 September 2006

2
About the Evening Lectures
  • Viewing is required
  • All lectures will be recorded and shown during a
    regular class period
  • Working on getting them posted on the web so that
    you can download them at other times as well
  • Sign in sheet at lecture
  • Assignment two paragraph summary of what you
    learned
  • Dinner lottery

3
About the Midterm
  • Use of Blackboard
  • http//help.unc.edu/?id4735trail4781
  • Installing SecureExam (see Guidelines on home
    page)
  • Later this week, I will post a dummy exam that
    you are all to take BEFORE the midterm to assure
    that everything is working properly

4
Simplified Model of a Computer
processor
retrieves the instruction directs data
movement
Performs the operations
Arithmetic Logic Unit
Control Unit
instructions
data
the information that it works on
defines an algorithm
MEMORY
5
Points to Remember
  • Computers access information by location and
    doesnt know the value
  • Computers store numbers in fixed size packets,
    which means that they can not grow indefinitely
  • Computers do not distinguish between different
    types of data (e.g., instructions or text or
    numbers)

6
Review Computerized Systems
  • Finance banking stock market commerce
  • Medical diagnostics life support medical
    devices
  • Communications television radio news networks
  • Transportation traffic signals air traffic
    control air craft space craft trains cars
  • Military weapons systems intelligence gathering
  • Energy power plants toxic chemical plants oil
    gas
  • Water sewer
  • Buildings HVAC security lights
  • Personal household items

7
What is a Bug?
8
Bug
  • Problems in code that cause it to behave in an
    unintended, unanticipated or unpredictable manner
  • Origin
  • Grace Hopper (1947) moth in a relay
  • "First actual case of bug being found."
  • Thomas Edison used the term in 1878
  • "Bugs"as such little faults and difficulties are
    called

1906-1992
9
First Computer Bug
10
Why are bugs hard to find?
  • The error can appear in another program
  • Device drivers, memory management
  • The error may only occur occasionally
  • May require multiple conditions to occur

11
Classes of Problems
  • Poorly designed software
  • Poorly understood requirements
  • Poorly designed user interfaces
  • Improper use
  • Data entry problems
  • Simple coding errors

12
80 of software projects fail
  • 50 challenged
  • 2x budget
  • 2x completion time
  • 2/3 planned function
  • 30 impaired
  • Scrapped
  • Standish Group, 1995

13
Sources of Risk
  • Top management commitment
  • User commitment
  • Misunderstood requirements
  • Inadequate user involvement
  • Mismanaged user expectations
  • Scope creep
  • Lack of knowledge or skill
  • Keil et al, A Framework for Identifying
    Software Project Risks, CACM 4111, November
    1998.

14
Cant We Test Out the Problems?
  • In order to establish that the probability of
    failure of software is less than 10-9 in 10
    hours, testing required with one computer is
    greater than 1 million years
  • Butler and Finelli, The Infeasibility of
    Experimental Quantification of Life-Critical
    Software Reliability
  • NIST estimates cost to US economy from inadequate
    software testing gt 59 billion/yr.
  • NIST Planning Report 02-3

15
Simple Problems
  • Tampa couple was billed 4,062,599.57 for a
    months electricity
  • Correct bill was 146.76
  • Input error clearly not good enough check for
    reasonable values
  • High School freshman banned from football because
    of drug use in middle school
  • Actual offense was chewing gum and being tardy
  • Different codes not properly translated - systems
    are only as good as their weakest links

16
User Interface Bug
  • Usability Issue
  • Afghanistan War (December 2001)
  • Friendly fire kills 3 injures 20 when
    satellite-guided bomb landed on a battalion
    command post
  • Use of GPS Receiver to determine coordinators
  • Change battery
  • What should come up?
  • www.washingtonpost.com/ac2/wp-dyn/A8853-2002Mar23

17
Denver Airport Baggage System (1995)
  • 4 years in development at cost of 193M
  • The promise
  • delivered in lt 10 minutes to any part of airport!
  • Massively complex system
  • 4000 cars
  • 21 miles of track
  • scanners
  • photocells
  • 300 computers
  • What happened
  • misrouted and crashed, baggage lost and damaged
  • Delayed opening cost 1.1M/day
  • When airport opened a year late only one airline
    used the system
  • www.cis.gsu.edu/mmoore/CIS3300/handouts/SciAmSept
    1994.html

18
Denver Airport Baggage System (1995)
  • 4 years in development at cost of 193M
  • Massively complex system
  • 4000 cars, 21 miles of track, scanners,
    photocells, 300 computers
  • Cars misrouted and crashed, baggage lost and
    damaged
  • Delayed opening cost 1.1M/day
  • When airport opened a year late only one airline
    used it
  • www.cis.gsu.edu/mmoore/CIS3300/handouts/SciAmSept
    1994.html

19
Denver Airport System
  • Examples of bugs
  • Photocell could not detect bags on the belt and
    therefore didnt stop system
  • System had lost track of state of carts during
    jams
  • Timing between conveyor belts and carts not
    properly synchronized
  • Overall
  • Not just software glitches
  • very complex, poorly engineered system

20
Ariane 5 (1996)
  • Integer overflow

Software error
21
External view
  • Only about 40 seconds after initiation of the
    flight sequence, at an altitude of about 3700 m,
    the launcher veered off its flight path, broke up
    and exploded

22
External view
23
Cost
  • Development cost 7 Billion
  • Delay of more than one year
  • One set of four identical, uninsured scientific
    satellites
  • One rocket
  • 500,000,000

24
What Happened?
  • Overflow tried to put too big a number into too
    small a space
  • Even worse the feature that caused the problem
    wasnt needed! It was only needed to set up the
    launch!
  • archive.eiffel.com/doc/manuals/technology/contract
    /ariane/page.html

25
Bank of New YorkNovember 20, 1985
  • BoNY Nations largest clearer of Govt
    securities.
  • Software to track Federal securities transactions
    wrote new information on top of old.
  • Feds debited the bank for each transaction but
    bank did not know who owed it how much.
  • 90 minutes gt 32 Billion overdraft!

26
Cost of Bug
  • Bank had to borrow 24 billion from federal
    reserves. Interest paid 5 million for 1 day.
    (Annual earnings of bank 120 million)
  • BoNY share prices dropped by 25
  • Federal funds rate dropped from 8.4 to 5.5
  • System down for 28 hours.
  • Fear of financial crisis caused increase in price
    of platinum!

27
Cause of bug
  • Message buffer counter at BoNY system was 16-bit
    long.
  • Counters at Fed (and other banks) 32 bit.
  • More than 32,000 transactions that morning!
    gtCounter overflow
  • Securities database corrupted.

28
The Drama continues
  • Trying to correct it they copied corrupted data
    over the backup.
  • Lost a few hours because of this.
  • Reference Wiener, Digital Woes, 1993

29
Therac-25
  • Landmark case of how things can go terribly wrong
  • Medical linear accelerator radiation therapy for
    cancer patients
  • Used to zap tumors with high energy beams
  • Electron beams for shallow tissue
  • X-ray photons for deeper tissue
  • Eleven Therac-25s were installed
  • Six in Canada
  • Five in the United States
  • Developed by Atomic Energy of Canada Limited
    (AECL).

30
Therac-25
  • Improvements over Therac-20
  • Uses new double pass technique to accelerate
    electrons.
  • Machine itself takes up less space.
  • Other differences from the Therac-20
  • Software now coupled to the rest of the system
    and responsible for safety checks.
  • Hardware safety interlocks removed.
  • Easier to use.

31
Therac-25 Turntable
32
1985-1987 Six known accidents
  • Jun 1985 Patient at Mareitta GA received
    overdose
  • July 1985 Hamilton, Ontario patient severely
    burned, died that November.
  • December 1985 Patient in Yakima, WA ?overdose

33
Vernon Kidd
  • Early March 1986, Tyler, Tx
  • receives dose gt 100 times too high
  • Complained he felt burned..
  • Engineer Its not possible for Therac-25 to give
    an overdose.
  • Engineering firm Machine does not appear capable
    of giving a patient an electrical shock...
  • Died 5 months later
  • Put back in use late March

34
What Went Wrong?
  • User Interface
  • Operator entered code for high energy rather than
    low energy
  • Malfunction message
  • Operator entered Proceed because system was
    known to give quirky errors
  • Result
  • Turntable was in the wrong position

35
3 Weeks Later Ray Cox
  • Second accident in Tyler, Tx
  • Same operator
  • Patient died 1 month later
  • This time they were able to reproduce

36
What would cause that to happen?
  • Race conditions.
  • Several different race condition bugs.
  • Overflow error.
  • The turntable position was not checked every
    256th time the Class3 variable is incremented.
  • No hardware safety interlocks.
  • Wrong information on the console.
  • Non-descriptive error messages.
  • Malfunction 54
  • H-tilt
  • User-override-able error modes.

37
Source of the Bug
  • Incompetent engineering.
  • Safety analysis excluded the software!
  • No usability testing.

38
Sources
  • Leveson, N., Turner, C. S., An Investigation of
    the Therac-25 Accidents. IEEE Computer, Vol. 26,
    No. 7, July 1993, pp. 18-41. http//courses.cs.vt.
    edu/cs3604/lib/Therac_25/Therac_1.html
  • Information for this article was largely obtained
    from primary sources including official FDA
    documents and internal memos, lawsuit
    depositions, letters, and various other sources
    that are not publicly available.

The authors
39
Lots more stories
  • Links will be added to references section of web
  • http//www5.in.tum.de/huckle/bugse.html
  • http//www.baddesigns.com/

40
Final Discussion
  • Should Microsoft be held responsible for the
    business problems and viruses caused by security
    holes in their software?
Write a Comment
User Comments (0)
About PowerShow.com