Title: Michael Hogarth, MD
1Building Software An Artful Science
2Software development is risky
To err is human, to really foul things up
requires a computer
- IBMs Consulting Group survey
- 55 of the software developed cost more that
projected - 68 took longer to complete than predicted.
- 88 had to be substantially redesigned.
- Standish Group Study of 8,380 software projects
(1996) - 31 of software projects were canceled before
they were completed - 53 of those are completed cost an average of
189 of their original estimates. - 42 of completed projects - have their original
set of proposed features and functions. - 9 - completed on time and on budget.
3Standish Group report 2006
- 19 of projects were outright failures
- 35 could be categorized as successes (better
than 1996, but not great) - 46 of projects were challenged (either had
cost overruns or delays, or both)
4McDonalds gets McFried
- McDonalds Innovate Project
- 500 million spent for nothing....
- Objective --
- McDonald's planned to spend 1 billion over
five years to tie all its operations in to a
real-time digital network. Eventually, executives
in company headquarters would have been able to
see how soda dispensers and frying machines in
every store were perfect. - Why was it scrubbed?
- information systems don't scrub toilets and they
don't fry potatoes
Barrett, 2003. http//www.baselinemag.com/c/a/Proj
ects-Supply-Chain/McDonalds-McBusted/
5FBIs Virtual Case File
- 2003 - Virtual Case File - networked system for
tracking criminal cases - SAIC spent months writing over 730,000 lines of
computer code - Found to have hundreds of software problems
during testing - 170 million dollar project was cancelled -- SAIC
reaped more than 100 million - Problems
- delayed by over a year. In 2004, the system was
1/10th of the functionality intended and thus
largely unusable after 170 spent - SAIC delivered what FBI requested, the requesting
was flawed, poorly planned, not tied to scheduled
deliverables - Now what?
- Lockheed Martin given contract for 305 million
tied to benchmarks
http//www.washingtonpost.com/wp-dyn/content/artic
le/2006/08/17/AR2006081701485_pf.html
6Causes of the VCF Failure
- Changing requirements (conceived before 9/11,
after 9/11 requirements were altered
significantly) - 14 different managers over the project lifetime
(2 years) - Poor oversight by the primary owner of the
project (FBI) - did not oversee construction
closely - Did not pay attention to new, better commercial
products -- kept head in the sand because it had
to be built fast - Hardware was purchased first, waiting on software
(common problem) -- if software is delayed,
hardware is legacy quickly
http//www.inf.ed.ac.uk/teaching/courses/seoc2/200
4_2005/slides/failures.pdf
7Washington State Licensing Dept
- 1990 - Washington State License Application
Mitigation Project - 41.8 million over 5 years to automate the
States vehicle registration and license renewal
process - 1993 - after 51 million, the original design and
requirements were expected to be obsolete when
finally built - 1997 - Washington legislature pulled the plug --
40 million wasted - Causes
- ambitious
- lack of early deliverables
- development split between in-house and contractor
8J Sainsbury IT failure
to err is human, to really foul up requires a
root password. anonymous
- UK food retailer, J. Sainsbury, invested in an
automated supply-chain management system - System did not perform the functions as needed
- As a result, merchandise was stuck in company
warehouses and not getting to the stores - Company added 3,000 additional clerks to stock
the shelves manually - They killed the project after spending 526
million.....
9Other IT nightmares
- 1999 - 125 million NASA Mars Climate Orbiter
lost in space due to a data conversion error... - Feb 2003 - U.S. Treasury Dept. mailed 50,000
Social Security checks without beneficiary names.
Checks had to be cancelled and reissued... - 2004-2005 - UK Inland Revenue (IRS) software
errors contribute to a 3.45billion tax-credit
overpayment - May 2005 - Toyota had to install a software fix
on 20,000 hybrid Prius vehicles due to problems
with invalid engine warning lights. It is
estimated that the automobile industry spends
2-3billion/year fixing software problems - Sept 2006 - A U.S. Government student loan
service software error made public the personal
data of 21,000 borrowers on its web site - 2008 - new Terminal 5 at Heathrow Airport -New
automated baggage routing system leads to over
20,000 bags being put in temporary storage...
10does it really matter?
11Software bugs can kill...
http//www.wired.com/software/coolapps/news/2005/1
1/69355
12When users inadvertently cause disaster
http//www.wired.com/software/coolapps/news/2005/1
1/69355?currentPage2
13How does this happen?
- Many of the runaway projects are overly
ambitious -- a major issue (senior management
has unrealistic expectations of what can be done) - Most projects failed because of multiple
problems/issues, not one. - Most problems/issues were management related.
- In spite of obvious signs of the runaway software
project (72 of project members are aware), only
19 of senior management is aware - Risk management, an important part of identifying
trouble and managing it, was NOT done in any
fashion in 55 of major runaway projects.
14Causes of failure
- Project objectives not fully specified -- 51
- Bad planning and estimating -- 48
- Technology is new to the organization -- 45
- Inadequate/no project management methods -- 42
- Insufficient senior staff on the team -- 42
- Poor performance by suppliers of
software/hardware (contractors) -- 42
http//members.cox.net/johnsuzuki/softfail.htm
15The cost of IT failures
- 2006 - 1 Trillion dollars spent on IT hardware,
software, and services worldwide... - 18 of all IT projects will be abandoned before
delivery (18 of 1 trillion 180 billion?) - 53 will be delivered late or have cost overruns
- 1995 - Standish estimated the U.S. spent 81
billion for cancelled software projects.....
16Conclusions
- IT projects are more likely to be unsuccessful
than successful - Only 1 in 5 software projects bring full
satisfaction (succeed) - The larger the project, the more likely the
failure
http//www.it-cortex.com/Stat_Failure_Rate.htmThe
20Robbins-Gioia20Survey20(2001)
17Software as engineering
- Software has been viewed more as art than
engineering - has lead to lack of structured methods and
organization for building software systems - Why is a software development methodology
important? - programmers are expensive
- many software system failures can be traced to
poor software development - requirements gathering is incomplete or not well
organized - requirements are not communicated effectively to
the software programmers - inadequate testing (because testers dont
understand the requirements)
18Software Development Lifecycle
- Domain Analysis
- Software Analysis
- Requirements Analysis
- Specification Development
- Programming (software coding)
- Testing
- Deployment
- Documentation
- Training and Support
- Maintenance
19Software Facts and Figures
- Maintenance consumes 40-80 of software costs
during the lifetime of a software system -- the
most important part of the lifecycle - Error correction accounts for 17 of software
maintenance costs - Enhancement is responsible for 60 of software
maintenance costs -- most of the cost is adding
new capability to old software, NOT fixing it. - Relative time spent on phases of the lifecycle
- Development -- defining requirements (15),
design (20), programming (20), testing and
error removal (40), documentation (5) - Maintenance -- defining the change (15),
documentation review (5), tracing logic (25),
implementing the change (20), testing (30),
updating documentation (5)
RL Glass. Facts and Fallacies of Software
Engineering. 2003
20Software development models
- Waterfall model
- specification --gt development --gt testing --gt
deployment - Although many use this still, it is flawed and at
the root of much of the waste in software
development today - Evolutionary development -- interleaves
activities of specification, development, and
validation (testing)
21Evolutionary development
- Exploratory Development
- work with customer/users to explore their
requirement and deliver a final system. The
development starts with the parts of the system
that are understood. New features are added in an
evolutionary fashion. - Throw-away prototyping
- create a prototype (not formal system), which
allows for understanding of the customer/users
requirements. Then one builds the real thing
Sommerville, Software Engineering, 2004
22Spiral Model
- Spiral Model - process that goes through all
steps of the software development lifecycle
repeatedly, with each cycle ending up with a
prototype for the user to see -- it is just for
getting the requirements right, the prototypes
are discarded after each iteration
23Challenges with Evolutionary Development
- The process is not visible to management --
managers often need regular deliverables to
measure progress. - causes a disconnect as managers want evidence of
progress, yet the evolutionary process is fast
and dynamic making deliverables not
cost-effective to produce (they change often) - System can have poor structure
- Continual change can create poor system structure
- Incorporating changes becomes more and more
difficult
Sommerville, Software Engineering, 2004
24Agile software development
- Refers to a group of software development methods
that promote iterative development, open
collaboration, and adaptable processes - Key characteristics
- minimize risk by developing software in multiple
repetitions (timeboxes), iterations last 2-4
weeks - Each iteration passes through a full software
development lifecycle - planning, requirements
gathering, design, writing unit tests, then
coding until the unit tests pass, acceptance
testing by end-users - Emphasizes face-to-face communication over
written communication
25Agile software methods
- Scrum
- Crystal Clear
- Extreme Programming
- Adaptive Software Development
- Feature Driven Development
- Test Driven Development
- Dynamic Systems Development
26Scrum
- A type of Agile methodology
- Composed of sprints that run anywhere from
15-30 days during which the team creates an
increment of potentially shippable software. - The features that go into that sprint version
come from a product backlog, a set of
prioritized high level requirements of work to be
done - During a backlog meeting, the product owner
tells the team of the items in the backlog they
want completed. - The team decides how much can be completed in the
next sprint - requirements are frozen for a sprint -- no
wandering or scope shifting...
http//en.wikipedia.org/wiki/Scrum_(development)
27Scrum and useable software...
- A key feature of Scrum is the idea that one
creates useable software with each iteration - It forces the team to architect the real thing
from the start -- not a prototype that is only
developed for demonstration purposes - For example, a system would start by using the
planned architecture (web based application using
java 2 enterprise architecture, oracle database,
etc...) - It helps to uncover many potential problems with
the architecture, particularly one that requires
a number of integrated components (drivers that
dont work, connections between machines,
software compatibility with the operating system,
digital certificate compatibility or usability,
etc...) - It allows users and management to actually use
the software as it is being built.... invaluable!
28Scrum team roles
- Pigs and Chickens -- think scrambled eggs and
bacon -- the chicken is supportive, but the pig
is committed. - Scrum pigs are committed the building the
software regularly and frequently - Scrum Master -- the one who acts as a project
manager and removes impediments to the team
delivering the sprint goal. Not the leader of the
team, but buffer between team and any chickens or
distracting influences. - Product owner -- the person who has commissioned
the project/software. Also known as the sponsor
of the project. - Scrum chickens are everyone else
- Users, stakeholders (customers, vendors), and
other managers
29Adaptive project management
- Scrum general practices
- customers become part of the development team
(you have to have interested users...) - Scrum is meant to deliver working software after
each sprint, and the user should interact with
this software and provide feedback - Transparency in planning and development --
everyone should know who is accountable for what
and by when - Stakeholder meetings to monitor progress
- No problems are swept under the carpet -- nobody
is penalized for uncovering a problem
http//en.wikipedia.org/wiki/Scrum_(development)
30Typical Scrum Artifacts
- Spring Burn Down Chart
- a chart showing the features for that sprint and
the daily progress in completing these - Product Backlog
- a list of the high level requirements (in plain
user speak) - Sprint Backlog
- A list of tasks to be completed during the sprint
31Agile methods and systems
- Agile works well for small to medium sized
projects (around 50,000 - 100,000 lines of source
code) - Difficult to implement in large, complex system
development with hundreds of developers in
multiple teams - Requires each team be given chunks of work that
they can develop - Integration is key -- need to use standard
components and standards for coding,
interconnecting, data modeling so each team does
not create their own naming conventions and
interfaces to their components.
32Quality assurance
- The MOST IMPORTANT ASPECT of software development
- Quality Assurance does not start with testing
- Quality Assurance starts at the requirements
gathering stage - software faults -- when the software does not
perform as the user intended - bugs
- requirements are good/accurate, but the
programming causes a crash or other abnormal
state that is unexpected - requirements were wrong, programming was correct
-- still a bug from the users perspective
33Some facts about bugs
- Bugs in the form of poor requirements gathering
or poor communication with programmers is by far
the most expense in a software development effort - Bugs caught at the requirements or design stage
are cheap - Bugs caught in the testing phase are expensive to
fix - Bugs not caught are VERY EXPENSIVE in many ways
- loss of customers/user trust
- need to fix it quick -- lends itself to yet
more problems because everyone is panicking to
get it fixed asap.
34Software testing
- System Testing
- black box testing
- white box testing
- Regression Testing
35Black box testing
- Treats software as a black-box without knowledge
of its interior workings - It focuses simply on testing the functionality
according to the requirements - Tester inputs data, and sees the output from the
process
36White box testing
- Tester has knowledge of the internal data
structures and algorithms - Types of white box testing
- Code Coverage - The tester creates tests to cause
all statements in the program to be executed at
least once - Mutation Testing - software code is created that
modifies the software slightly to emulate typical
user mistakes (using the wrong operator or
variable name). Meant to test whether code is
ever used. - Fault injection - Introduce faults in the system
on purpose to test error handling. Makes sure the
error occurs as expected and the system handles
the error rather than crashing or causing an
incorrect state or response. - Static testing - primarily syntax checking and
manual reading of the code to check errors (code
inspections, walkthroughs, code reviews)
37Test Plan
- Outlines the ways in which tests will be
developed, the naming and classification for the
various failed tests (critical, show stopper,
minor, etc..) - Outlines the features to be tested, the approach
to be used, suspension criteria (the conditions
under which a test fails) - Describes the environment -- the test
environment, including hardware, networking,
databases, software, operating system, etc.. - Schedule -- lays out a schedule for the testing
- Acceptance criteria - an objective quality
standard that the software must meet in order to
be considered ready for release (minimum defect
count and severity levels, minimum test coverage,
etc...) - Roles and responsibilities -- who does what in
the testing process
38Test cases
- A description of a specific test or interaction
to test a single behavior or function in the
software - Similar to use cases as they outline a scenario
of interaction -- however, one can have many
tests for a single use case - Example -- login is a use case need a test for
successful login, one for unsuccessful login, one
to test the expiration, lockout, how many tries
before lockout, etc..
39Components of a test case
- Name and number for the test case
- The requirement(s) or feature(s) the test case is
exercising - Preconditions -- what must be set in place for
the test to take place - example, to test whether one can register a death
certificate, one must have a death certificate
filled out and which has passed validations and
has been submitted to the local registrar... - Steps -- list of steps describing how to perform
the test (log in, select patient A, select
medication list, pick Amoxicillin, click submit
to pharmacy, etc..) - Expected results - describe the expected results
up front so the tester knows whether it failed or
passed.
40(No Transcript)
41Regression testing
- designed to find software regressions -- when
previously working functionality is now not
working because of changes made in other parts of
the system - As software is versioned, this is the most common
type of bug or fault - The list of regression tests grows
- a test for the functions in all previous versions
- a test for any previously found bugs -- create a
test to test that scenario - Manual vs. Automated
- mostly done manually, but can be automated -- we
have automated 500 tests
42Risk is good.... huh?
- There is no worthwhile project that has no risk
-- risk is part of the game - Those that run away from risk and focus on what
they know never advance the standard and leave
the field open to their competitors - Example Merryl Lynch ignored online trading at
first, allowing other brokerage firms to create a
new market - eTrade, Fidelity, Schwab. Merril
Lynch eventually entered 10 years later. - Staying still (avoiding risk) means you are
moving backwards - Bob Charrettes Risk Escalator -- everyone is on
an escalator and it is moving against you, you
have to walk to stay put, run to get ahead. If
you stop, you start moving backwards
DeMarco and Lister. Waltzing with Bears Managing
Risk on Software Projects. 2003.
43But dont be blind to risk
- Sometimes those who are big risk takers have a
tendency to emphasize positive thinking by
ignoring the consequences of the risk they are
taking - If there are things that could go wrong, dont be
blind to them -- they exist and you need to
recognize them. - If you dont think of it, you could be
blind-sided by it
DeMarco and Lister. Waltzing with Bears Managing
Risk on Software Projects. 2003.
44Examples of risks
Risk management often gives you more reality
than you want. -- Mike Evans, Senior VP, ASC
Corporation
- BCT.org -- a dependency on externally built and
maintained software (caMATCH) - BCT.org -- a need to have a hard launch date
- eCareNet -- a dependency on complex software only
understood by a small group of gurus (Tolven
system) - TRANSCEND -- integration of system components
that have never been integrated before (this is
common -- first time integration). - TRANSCEND -- clinical input to CRF process has
never been done before. - TRANSCEND -- involves multiple sites not under
our control, user input will be difficult to
obtain because everyone is busy, training will be
difficult because everyone is busy, there are
likely detractors already and we have not voice
in their venue
45Managing risks
- What is a risk? -- a possible future event that
will lead to an undesirable outcome - Not all risks are the same
- they have different probabilities that they will
happen - They have different consequences -- high impact,
low impact - Some may or may not have alternative actions to
avoid or mitigate the risk if it comes to pass --
is there a feasible plan B - Problem -- a risk is a problem that is yet to
occur, a problem is a risk that has occurrred - Risk transition -- when a risk becomes a
problem, thus it is said the risk materialized - Transition indicator -- things that suggest the
risk may transition to a problem. Example --
Russia masses troops on the Georgian border...
DeMarco and Lister. Waltzing with Bears Managing
Risk on Software Projects. 2003.
46Managing risks
- Mitigation - steps you take before the transition
or after to make corrections (if possible) or to
minimize the impact of the now problem. - Steps in risk management
- risk discovery
- exposure analysis (impact analysis)
- contingency planning -- creating planB, planC,
etc.. as options to engage if the risk
materializes - mitigation -- steps taken before transition to
make contingency actions possible - transition monitoring -- tracking of managed
risks, looking for transitions and
materializations (risk management meetings).
DeMarco and Lister. Waltzing with Bears Managing
Risk on Software Projects. 2003.
47Common software project risks
- Schedule flaw - almost always due to neglecting
work or minimizing work that is necessary - Scope creep (requirements inflation) or scope
shifting (because of market conditions or changes
in business requirements) -- inevitable -- dont
believe you can keep scope frozen for very long - recognize it, create a mitigation strategy,
recognize transition, and create a contingency - for example, if requirements need to be added or
changed, need to make sure management is aware
of the consequences and adjustments are made in
capacity, expectation, timeline, budget. - It is not bad to change scope -- it is bad to
change scope and believe nothing else needs to
change
DeMarco and Lister. Waltzing with Bears Managing
Risk on Software Projects. 2003.
48Post mortem evaluations
- No project is 100 successful -- they all have
problems, some have less than others, some have
fatal problems. - It is critical to evaluate projects after they
are completed to characterize common
risks/problems and establishing methods of
mitigation before the next project
49Capability Maturity Model (CMM)
- A measure of the maturity of an organization in
how they approach projects - Originally developed as a tool for assessing the
ability of government contractors processes to
perform a contracted software project (can they
do it?) - Maturity Levels -- 1-5. Level 5 is where a
process is optimized by continuous process
improvement
50CMM in detail
- Level 1 - Ad hoc -- processes are undocumented
and in a state of dynamic change, everything is
ad hoc - Level 2 - Repeatable -- some processes are
repeatable with possibly consistent reults - Level 3 - Defined -- set of defined and
documented standard processes subject to
improvement over time - Level 4 - Managed --using process metrics to
control the process. Management can iddntify ways
to adjust and adapt the process - Level 5 - Optimized -- process improvement
objectives are established (post mortem
evaluation...), and process improvements are
developed to address common causes of process
variation.
51Why medical software is hard...
Courtesy Dr. Andy Coren, Health Information
Technology A Clinicians View. 2008
52Healthcare IT failures
- Hard to discover -- nobody airs dirty laundry
- West Virginia -- system has to be removed a week
after implementation - Mt Sinai -- 6 weeks after implementation, system
is rolled back due to staff complaints