Title: Why do Projects Fail
1Why do Projects Fail
- Lecture 1 The Problem
- Martyn Thomas
- Visiting Professor, OUCL
2A vision for dependable software
The vision is that, well before the decade has
run to completion, we shall be able to design and
implement the kind of systems that are now
straining our programming ability at the expense
of only a few percent in man-years of what they
cost us now, and that besides that, these systems
will be virtually free of bugs E W Dijkstra 1972
Turing Award Lecture
3Software in the 21st Century
- Software engineering is fifty years old, yet
still immature. - We are planning drive-by-wire cars, guiding
themselves on intelligent roads - We are dreaming if we believe we can build such
real-world systems safely, with todays attitudes
to software engineering. - We have still not achieved Dijkstras vision of
thirty years ago!
4Thirty years after Dijkstras vision Most
computing system projects fail
- Project cancellation
- Major cost or time overrun
- Much less functionality than planned
- Security inadequate
- Major usability problems
- Excessive maintenance / upgrade costs
- Serious in-service failure
5most software projects fail
- Cancelled before delivery 31
- Exceed timescales costs 53 or greatly
reduced functionality - On time and budget 16
- Mean time overrun 190
- Mean cost overrun 222
- Mean functionality delivered 60
- large companies much worse than smaller
- recent figures are better, but still poor
- source The Chaos Report (1995)
http//www.standishgroup.com
6most UK computing projects fail
- Of 1027 projects, 130 (12.7) succeeded
- Of those 130
- 2.3 were development projects (thats only THREE
projects!) - 18.2 maintenance projects
- 79.5 data-conversion projects
- of the 500 development projects in the sample,
THREE (0.6) succeeded. - Source BCS Review 2001 page 62.
7(No Transcript)
8Why does it happen?
- Because
- scale matters. Small processes dont scale up
- process matters. Most developers lack discipline
- rigour matters. Most developers are afraid of
mathematics - engineering is conservative, whereas the software
industry is ruled by fashion - CAA licensing system C vs Ada at Lockheed
Martin eXtreme this, Agile that ... - Who can make things better? You can!
9Scale
- How many valid paths through 200 line module?
- We have found around 750,000
- How big are modern systems?
- Windows is 100M LoC
- Oracle talk about a gigaLoC code base.
- How many paths is that?
- How many do you think they have tested?
- What proportion will ever be executed?
10A medium-scale system En Route ATC at Swanwick
11LACC Control Room
12A medium sized system
- 114 controller workstations
- 20 supervisory/management positions
- 10 engineering positions
- 48-workstation simulator
- 2 15-workstation test systems
- 2.5 million lines of software
- gt500 processors
13Challenges for the future
- Current ATC safety depends on the controllers
ability to clear their sector with radio only. - Future traffic growth requires gt 10 a/c on
frequency. Controllers would be overloaded - So future ATC will depend on automatic systems,
which must not fail. - Target? Probably the avionics standard10-9 pfh
with 99 confidence. - No current air traffic management systems are
built to such standards. This could be your job
in a few years time.
14How can we be sure a system works?
- Assurance showing that a system works
- Much harder than just developing a system that
works - you need to generate evidence that it works
- what evidence is sufficient?
- How safe or reliable is a system that has never
failed? - What evidence does testing provide?
- How can we do better?
15What evidence does testing provide?
- We cannot test every path.
- Testing individual operations or boundary
conditions may find faults, but such tests
provide no evidence of failure probability. - Statistical testing, under operational
conditions, provides evidence of pfh. - But it takes a very long time.
Testing shows the presence, not the absence, of
bugs - E W Dijkstra
16Finding faults by testing?
- type Alert is (Warning, Caution, Advisory)
- function RingBell(Event Alert) return Boolean
- -- return True for Event Warning or Event
Caution, - -- return False for Event Advisory
- is
- Result Boolean
- begin
- if Event Warning then
- Result True
- elsif Event Advisory then
- Result False
- end if
- return Result
- end RingBell
- -- C130J code Caution returns uninitialised
(usually TRUE, as required).
17How safe is a system that has never failed?
- If it has run for n hours without failure, and if
the operating conditions remain much the same,
the best estimate for the probability of failure
in the next n hours is - 0.5
- To show that a system has a pfh of 10-9 with even
50 confidence, we need over 100,000 years of
fault-free testing. (10,000 hours is 13.89
months)
18Statistical testing
- To show an MTBF of n hours, with 99 confidence,
takes around 10n hours of testing with no faults
found. So avionics (10-9 pfh) would need around
1010 hours (gt1,000,000 years.) - With good prior evidence, e.g. from a strong
process, using a Bayesian approach may reduce
this to 100,000 years - Actual testing is trivially short by comparison.
19How can we do better?
- STRONG SOFTWARE ENGINEERING
- Science based Formal Methods
- Mature engineering processes for managing risk
and quality - Sound languages, supported by deep static
analysis - Proof that the implementation has the safety
properties required. - The target is ZERO DEFECTS
- Recognising that there will still be faults, so
we must design and program defensively.
20Summary
- Developing dependable software is difficult
because of the size and complexity of real-life
systems. - The software industry is amateurish and immature.
Most significant projects overrun dramatically
(and unnecessarily) or totally fail. - In next weeks lecture, I will explore why some
failures have occurred, show you what a real-life
zero-defect project looks like, and talk about
what you need to know if you are to become a
professional amongst all these amateurs.
21(No Transcript)