Title: SE 468 Software MeasurementProject Estimation
1SE 468 Software Measurement/Project Estimation
- Dennis Mumaugh, Instructor
- dmumaugh_at_cdm.depaul.edu
- Office Loop, Room CDM 430, X26770
- Office Hours Monday, 400-530
2Administrivia
3Assignment 2
- Due October 12, 2009.
- Questions on Function Points and effort
estimation (aka COCOMO). - Using the Estimate tool from Construx, estimate
the two projects given
4SE 468 Class 3
- Topics Estimating Size and Effort
- Measuring Software Size
- Estimating Software Size
- Size Effort Estimation
- COCOMO
- Reading
- Kan pp. 56, 88-91, 93-96, 456
- Articles on the Class Page and Reading List
5Thought for the Day
- All measurements are made in order to inform a
decision. - A valid measurement is an indication that will
lead a knowledgeable observer to an appropriate
decision.
6Software Metrics
7Software Metrics
- To be comparative between systems, system
measurements need to be taken. - Metrics allow measurements to be made of system
attributes such as reliability, robustness, size,
complexity or maintainability. - Metrics measure some property of a system and, to
be valid, there must be a relationship between
that property and the system behavior measured.
8Software Metrics
- Metrics Strategy
- Gather Historical Data (from source code, project
schedules, RFC, reports etc) - Record metrics.
- Use current metrics within the context of
historical data. Compare effort required on
similar projects etc.
9Commonly Used Metrics
- Schedule Metrics (55)
- Tasks completed/late/rescheduled.
- Lines of Code (46)
- KLOC, Function Point for scheduling and
costing. - Schedule, Quality, Cost Tradeoffs (38)
- of tasks completed on schedule / late /
rescheduled. - Requirements Metrics (37)
- or changed / new requirements (Formal RFC)
- Test Coverage (36)
- Fraction of lines of code covered. (50/60 ?
90).
10Commonly Used Metrics
- Overall Project Risk (36)
- Level of confidence in achieving a schedule date.
- Fault Density
- Unresolved faults. (e.g. Release at .25
/KNCSS) - Fault arrival and close rates
- Determine - ready to deploy. Easier to find than
solve.
11Measuring Software Size
12Software Size
- One of the basic measures of a system is its
size. - This is usually used to estimate the build
effort. - Several measures of size
- Number of modules (compilable units)
- Number of functions
- Number of classes
- Number of methods per class
- Amount of memory used
- And the one used most often
- Length in Lines of Code LOC or KLOC
- Problem The variation in developers code
compactness which can be around 51 in
difference. - Some standards alleviate this problem.
13Estimating Size
- Why are we interested in size?
- Cost effort size
- Actually we calculate duration and derive cost by
multiplying salary times duration - In more detail
- Cost LOC / (Productivity Staff)
- Where productivity is in LOC per day
- Example a programmer averages 20 lines of code
per day - A project with 20,000 lines of code and 10
programmers would take - 20,000/(2010) 100 days
- Management understands size (so it thinks)
14Lines of Code
- Lines of Code (LOC)
- The basis of the Measure LOC is that program
length can be used as a predictor of program
characteristics - such as effort and ease of maintenance.
- The LOC measure is used to measure size of the
software. - One version
- Only Source lines that are DELIVERED as part of
the product are included test drivers and other
support software is excluded - SOURCE lines are created by the project staff
code created by application generators is
excluded - One INSTRUCTION is one line of code or card image
- Declarations are counted as instructions
- Comments are not counted as instructions
15Lines of Code
- Problems
- Only Source lines that are DELIVERED as part of
the product are included test drivers and other
support software is excluded - Not useful in estimating effort testing may take
as much code as delivered product - One INSTRUCTION is one line of code or card image
- Does not consider
- Multi-line statements
- Several statements on a line
- Comments
- Space and punctuation (braces)
- Cannot measure specifications
- Does not consider functionality or complexity
16Lines of Code
- Lines of Code (LOC)
- Advantages
- Artifact of ALL software development projects.
- Easily countable
- Scope for Automation of Counting Since Line of
Code is a physical entity manual counting effort
can be easily eliminated by automating the
counting process. - Well used (many models)
- An Intuitive Metric Line of Code serves as an
intuitive metric for measuring the size of
software due to the fact that it can be seen and
the effect of it can be visualized. - Function Point is more of an objective metric
which cannot be imagined as being a physical
entity, it exists only in the logical space. - This way, LOC comes in handy to express the size
of software among programmers with low levels of
experience.
17Lines of Code
- Disadvantages
- Measuring programming progress by lines of code
is like measuring aircraft building progress by
weight. (Bill Gates) - Lack of Accountability Lines of code measure
suffers from some fundamental problems. Some
think it isn't useful to measure the productivity
of a project using only results from the coding
phase, which usually accounts for only 30 to 35
of the overall effort. - Lack of Cohesion with Functionality estimates
done based on lines of code can adversely go
wrong, in all possibility. - Developer's Experience Implementation of a
specific logic differs based on the level of
experience of the developer. - Penalize well designed,shorter programs
- What about complexity?
- Is a simple numerical calculation equivalent of
an SQL query? - But they are the same number of lines of code.
18Lines of Code
- Disadvantages
- Problem with non-procedural languages.
- Advent of GUI Tools huge variations in
productivity and other metrics with respect to
different languages - Problems with Multiple Languages
- Programming language dependent
- Difference in Languages consider C vs. COBOL
- Lack of Counting Standards
- What is a line of code? Use statement instead.
- Counting comments and blank lines? Declarations?
- Level of detail required is not known early in
the project. - Psychology A programmer whose productivity is
being measured in lines of code, will be rewarded
for generating more lines of code even though he
could write the same functionality with fewer
lines. "What you measure is what you get."
19Lines of Code
- From this data we can develop
- Errors per KLOC (thousand lines of code)
- Defects per KLOC
- per LOC
- Pages of Documentation per KLOC
- Errors / person-month
- LOC per person-month
- /page of documentation
20Estimating Software Size
21Accuracy
- Accuracy of software project estimate is
predicated on - Correct estimate of the size complexity of the
product to be built. - Ability to translate size complexity into human
effort. - The degree to which the project plan reflects the
abilities of the software team. - The stability of product requirements.
- The maturity of the software engineering
environment.
22Conventional Methods LOC/FP Approach
- Compute LOC/FP using estimates of information
domain values - Lines of Code (LOC) aka non-commented source
lines or non-commented source statements - Function Points (FP) formula using inputs,
outputs and computation - Use historical data to build estimates for the
project
23Productivity
- Measured in terms of work effort per unit of time
- Lines of Code per unit time or Function Points
per unit time - Huge variations in productivity and quality among
individuals and even teams, as much as 101. - Sackman, Erikson, and Grant found
- Coding time 201
- Debugging 251
- Program size 51
- Program execution speed 101
- Productivity increases due to new development
methods - Some programmers can do things few others can.
They may be 100 times as productive. - Productivity vs. project size
- See Brooks, Mythical Man Month
24A Case Study
- Computer Aided Design (CAD) for mechanical
components. - System is to execute on an engineering
workstation. - Interface with various computer graphics
peripherals including a mouse, digitizer,
high-resolution color display, laser printer. - Accepts two three dimensional geometric data
from an engineer. - Engineer interacts with and controls CAD through
a user interface. - All geometric data supporting data will be
maintained in a CAD database. - Required output will display on a variety of
graphics devices.
Assume the following major software functions
are identified
25Estimation of LOC
- CAD program to represent mechanical parts
- Estimated LOC (Optimistic 4(Likely)
Pessimistic)/6 - Three point estimation formula (see lecture 4)
26Example LOC Approach
- Average productivity for systems of this type
620 LOC/pm. - Burdened labor rate 8000 per month, the cost
per line of code is approximately 13. - Burdened labor is usually 1 to 1.5 times average
salary. - Based on the LOC estimate and the historical
productivity data, the total estimated project
cost is 431,000 and the estimated effort is 54
person-months.
27Function Points
- Function points are a measure of the size of
computer applications and the projects that build
them. - The size is measured from a functional, or user,
point of view. - It is independent of the computer language,
development methodology, technology or capability
of the project team used to develop the
application. - Can be subjective
- Can be estimated EARLY in the software
development life cycle.
28Function Points
- They were devised by Albrecht (1979) and are
language independent. A function point is - An external input and output or a user
interaction or an external interface or a file
used - Each FP is then weighted by a complexity factor
to achieve the - Unadjusted FP Count (UFC) is ?(fp) (weight)
- The UFC is then adjusted by system attributes
such as distributed processing, re-use,
performance, etc. - There are 14 factors each with a weight of 0 to
5. - to get the Adjusted Function Point Count (AFC).
29Function Types
- The approach is to identify and count a number of
unique function types - External inputs (E.g. File names)
- External outputs (E.g. Reports, messages)
- Queries (interactive inputs needing a response)
- Internal files (invisible outside the system)
- External files or interfaces (files shared with
other software systems) - Each of these is then individually assessed for
complexity and given a weighting value which
varies from 3 (for simple external inputs) to 15
(for complex internal files). - Function Type Low Ave High
- External Input x3 x4 x6
- External Output x4 x5 x7
- External Inquiry x3 x4 x6
- Logical Internal File x7 x10 x15
- External Interface File x5 x7 x10
30Adjusted FP
- In order to find adjusted FP, UFP is multiplied
by technical complexity factor ( TCF ) which can
be calculated by the formula - TCF 0.65 (sum of factors) 0.01
- There are 14 technical complexity factors. Each
complexity factor is rated on the basis of its
degree of influence, from no influence to very
influential 0-5
- Data communications
- Performance
- Heavily used configuration
- Transaction rate
- Online data entry
- End user efficiency
- Online update
- Complex processing
- Reusability
- Installation ease
- Operations ease
- Multiple sites
- Facilitate change
- Distributed functions
Then FP UFP x TCF
31Function Points
- Advantages of FP
- It is not restricted to code
- Language independent
- The necessary data is available early in a
project. We need only a detailed specification. - More accurate than estimated LOC
- Disadvantages of FP
- Subjective counting
- Hard to automate and difficult to compute
- Ignores quality of output
- Oriented to traditional data processing
applications - Effort prediction using the unadjusted function
count is often no worse than when the TCF is added
32Computing Function Points
5
15
8
32
40
10
80
8
10
2
177
33Calculate Degree of Influence (DI)
3
- Does the system require reliable backup and
recovery? - Are data communications required?
- Are there distributed processing functions?
- Is performance critical?
- Will the system run in an existing, heavily
utilized operational environment? - Does the system require on-line data entry?
- Does the on-line data entry require the input
transaction to be built over multiple screens or
operations? - Are the master files updated on-line?
- Are the inputs, outputs, files, or inquiries
complex? - Is the internal processing complex?
- Is the code designed to be reusable?
- Are conversion and installation included in the
design? - Is the system designed for multiple installations
in different organizations? - Is the application designed to facilitate change
and ease of use by the user?
4
1
3
2
4
3
3
2
1
3
5
1
1
34The FP Calculation
- Inputs include
- Count Total (Unadjusted Function Points)
- DI ? Fi (i.e. sum of the Adjustment factors
F1.. F14) - Calculate Function points using the following
formulaFP UFP X 0.65 0.01 X ? Fi - In this exampleFP 177 X 0.65 0.01 X
(34132433213511)FP 177 X 0.65
0.01 X (36)FP 177 X 0.65 0.36FP 177 X
1.01FP 178.77 -
TCF Technical complexity factor
35Using FP to estimate effort
- If for a certain project
- FPEstimated 372
- Organizations average productivity for systems
of this type is 6.5 FP/person month. - Burdened labor rate of 8000 per month
- Cost per FP
- 8000/6.5 ? 1230
- Total project cost
- 372 X 1230 457,650
- 372/6.5 57.23 person-months
- Based on the FP estimate and the historical
productivity data, the total estimated project
cost is 457,650 and the estimated effort is 58
person-months.
363D Function point index
37AVC
- A full specification is needed to estimate
function points and criticism shows that FP
counts can vary by up to 2000 depending on how
one attributes weights. - Function points can be used to estimate the final
code size by using historical data of the average
lines of code (AVC). - Code size AVC Number of function points.
- AVC 200-300 LOC per fp in assembler
- 2 40 LOC per fp in 4GL
38Reconciling FP and LOC
39Size Effort Estimation
40Estimation for OO Projects - I
- Develop estimates using effort decomposition, FP
analysis, and any other method that is applicable
for conventional applications. - Using object-oriented analysis modeling, develop
use-cases and determine a count. - From the analysis model, determine the number of
key classes (called analysis classes). - Categorize the type of interface for the
application and develop a multiplier for support
classes - Interface type Multiplier
- No GUI 2.0
- Text-based user interface 2.25
- GUI 2.5
- Complex GUI 3.0
41Estimation for OO Projects - II
- Multiply the number of key classes (step 3) by
the multiplier to obtain an estimate for the
number of support classes. - Multiply the total number of classes (key
support) by the average number of work-units per
class. Lorenz and Kidd suggest 15 to 20
person-days per class. - Cross check the class-based estimate by
multiplying the average number of work-units per
use-case
42Estimation with Use-Cases
Using 620 LOC/pm as the average productivity for
systems of this type and a burdened labor rate of
8000 per month, the cost per line of code is
approximately 13. Based on the use-case estimate
and the historical productivity data, the total
estimated project cost is 554,000 and the
estimated effort is 68 person-months.
43Estimation for Agile Projects
- Each user scenario (a mini-use-case) is
considered separately for estimation purposes. - The scenario is decomposed into the set of
software engineering tasks that will be required
to develop it. - Each task is estimated separately. Note
estimation can be based on historical data, an
empirical model, or experience. - Alternatively, the volume of the scenario can
be estimated in LOC, FP or some other
volume-oriented measure (e.g., use-case count). - Estimates for each task are summed to create an
estimate for the scenario. - Alternatively, the volume estimate for the
scenario is translated into effort using
historical data. - The effort estimates for all scenarios that are
to be implemented for a given software increment
are summed to develop the effort estimate for the
increment. - Also consider Project Velocity
44Project Velocity (PV)
- Dont bother to consider of programmers or
skill level. This is a rough estimation. - Project velocity tells you how many story points
you can allocate to the next iteration. - The customer gets to pick stories that equal the
project velocity.
45Empirical Estimation Models
- Most of the work in the cost estimation field has
focused on algorithmic cost modeling. - In this process costs are analyzed using
mathematical formulas linking costs or inputs
with metrics to produce an estimated output. - The formulae used in a formal model arise from
the analysis of historical data. - The accuracy of the model can be improved by
calibrating the model to your specific
development environment, which basically involves
adjusting the weightings of the metrics. - Generally there is a great inconsistency of
estimates. Kemerer conducted a study indicating
that estimates varied from as much as 85 - 610
between predicated and actual values. Calibration
of the model can improve these figures. - However, models still produce errors of 50-100.
46Empirical Estimation Models
- Effort equation is based on a single variable,
usually a measure of size. - There are several possible variations
- Effort A size C
- Effort A sizeB
- Effort A sizeB C
- where A, B and C are constants determined by
regression analysis on historical data. - Effort may be measured in
- Staff hours, weeks, months, years . . .
- Size may be measured in
- Lines of code, modules, I/O formats . . .
47Empirical Estimation Models
- Empirical data supporting most empirical models
is derived from a limited sample of projects. - NO estimation model is suitable for all classes
of software projects. - USE the results judiciously.
- General model
- E A B (ev)c
- where A, B, and C are empirically derived
constants.E is effort in person monthsev is the
estimation variable (either in LOC or FP)
48LOC-Oriented Estimation Models
49COCOMO
50COCOMO
- "Cost does not scale linearly with size", is
perhaps the most important principle in
estimation. Barry Boehm used a wide range of
project data and came up the following
relationship of effort versus size - effort C x sizeM
- This is known as the Constructive Cost Model
(COCOMO). C and M are always greater than 1, but
their exact values vary depending upon the
organization and type of project. Typical values
for real-time projects utilizing very best
practices are - C3.6, M1.2.
- Poor software practices can push the value of M
above 1.5. - One fall out of the COCOMO model is that it is
more cost effective to partition a project into
several independent sub-projects each with its
own autonomous team. This "cheats" the
exponential term in the COCOMO model.
51COCOMO Static Adjusted Baseline
- Static single variable effort equation acts as a
baseline equation, - e.g., effort A sizeb
- This provides a basic estimate of effort
- The initial estimate is adjusted by a set of
multipliers that attempts to incorporate the
effect of important product and process
attributes - E.g., if the initial estimate is E 100 staff
months and the complexity of the job is rated
higher than normal, a multiplier 1.1 is
associated with it, yielding an adjusted estimate
of 110 staff months
52The COCOMO Model
- A hierarchy of estimation models
- Model 1 BasicComputes software development
effort ( cost) as a function of size expressed
in estimated lines of code. - Model 2 IntermediateComputes effort as a
function of program size and a set of 15 cost
drivers that include subjective assessments of
product, hardware, personnel, and project
attributes. - Model 3 AdvancedIncludes all aspects of the
intermediate model with an assessment of the cost
drivers impact on each step (analysis, design,
etc) of the software engineering process.
53Three classes of software projects
- OrganicRelatively small, simple. Teams with
good application experience work to a set of less
rigid requirements. - Semi-detachedIntermediate in terms of size and
complexity. Teams with mixed experience levels
meet a mix of rigid and less rigid requirements.
(Ex transaction processing system) - EmbeddedA software project that must be
developed within a set of tight hardware,
software and operational constraints. (Ex
flight control software for an aircraft)
54COCOMO Model
- The basic COCOMO model follows the general layout
of effort estimation models - E a(S)b
- and
- TDEV cEd
- where
- E represents effort in person-months
- TDEV represents project duration in calendar
months - S is the size of the software development in KLOC
- a, b, c, and d are values, derived from past
project data, dependent on the development mode - a, b, c and d values are
- Organic development mode a 2.4 b
1.05 c 2.5 d 0.38 - Semi-detached development mode a 3.0 b
1.12 c 2.5 d 0.35 - Embedded development mode a 3.6 b
1.20 c 2.5 d 0.32
55The COCOMO Model
- The intermediate COCOMO is an extension of the
basic COCOMO model, which used only one predictor
variable, the size (KLOC) variable - The intermediate COCOMO uses 15 more predictor
variables called cost drivers. The manager
assigns a value to each cost driver from the
range - Very low
- Low
- Nominal
- High
- Very high
- Extra high
- For each of the above values, a numerical value
corresponds, which varies with the cost drivers
56The COCOMO Model
- The manager assigns a value to each cost driver
according to the characteristics of the specific
software project - The numerical values that correspond to the
manager assigned values for the 15 cost drivers
are multiplied - The resulting value I is the multiplier that we
use in the intermediate COCOMO formulas for
obtaining the effort estimates - Thus
- I RELY DATA CPLX TIME STOR VIRT TURN
ACAP AEXP PCAP VEXP LEXP MODP TOOL
SCED - Note that although the effort estimation formulas
for the intermediate model are different from
those used for the basic model, the schedule
estimation formulas are the same
57The COCOMO Model
- The required effort to develop the software
system (E) as a function of the nominal effort
(Enom), (where E and Enom are expressed in
Person-Months, and S in KLOC) is - E Enom I,
- where
- INTERMEDIATE COCOMO MODEL
- MODE EFFORT
- Organic Enom 3.2 (S1.05)
- Semi-detached Enom 3.0 (S1.12) See note
section - Embedded Enom 2.8 (S1.20)
- Note intermediate constants differ from the
basic model
58The COCOMO Model
- The number of months estimated for software
development (TDEV) (where TDEV is expressed in
calendar months, and E in Person-Months) - INTERMEDIATE COCOMO MODEL
- MODE SCHEDULE
- Organic TDEV 2.5 (E0.38)
- Semi-detached TDEV 2.5 (E0.35)
- Embedded TDEV 2.5 (E0.32)
- Note intermediate constants are the same as the
basic model
59COCOMO Cost Drivers
60COCOMO Cost Drivers
61The COCOMO Model
- Source Code Size Used in the COCOMO Model
- The source size (S) is expressed in KLOC, i.e.
thousands of delivered lines of code, i.e. , the
source size of the delivered software (which does
not include the size of test drivers or other
temporary code) - If code is reused, then the following formula
should be used for determining the equivalent
software source size Se, for use in the COCOMO
model - Se Sn (a/100) Su
- where Sn is the source size of the new code, Su
is the source size of the reused code, and a is
determined by the formula - a 0.4 D 0.3 C 0.3 I
- based on the percentage of effort required to
adapt the reused design (D) and code (C), as well
as the percentage of effort required to integrate
the modified code (I)
62The COCOMO Model
- Other Parameters Used in the COCOMO Model
- TDEV starts when the project enters the product
design phase (successful completion of a software
requirements review) and ends at the end of
software testing (successful completion of a
software acceptance review) - E covers management and documentation efforts,
but not activities such as training, installation
planning, etc. - COCOMO assumes that the requirements
specification is not substantially changed after
the end of the requirements phase - Person-months can be transformed to person-days
by multiplying by 19, and to person-hours by
multiplying by 152
63The COCOMO Model
- Advantages of COCOMO'81
- COCOMO is transparent, you can see how it works
unlike other models such as SLIM. - Drivers are particularly helpful to the estimator
to understand the impact of different factors
that affect project costs. - Drawbacks of COCOMO'81
- It is hard to accurately estimate KDSI early on
in the project, when most effort estimates are
required. - KDSI, actually, is not a size measure it is a
length measure. - Extremely vulnerable to mis-classification of the
development mode - Success depends largely on tuning the model to
the needs of the organization, using historical
data which is not always available
64Example
- Mode is organic
- Size 200 KDSI
- Cost drivers
- Low reliability gt .88
- High product complexity gt 1.15
- Low application experience gt 1.13
- High programming language experience gt .95
- Other cost drivers assumed to be nominal gt 1.00
- I .88 1.15 1.13 .95 1.086
- Effort 3.2 ( 2001.05 ) 1.086 906 PM
- Development time 2.5 9060.38 33.2391 months
65COCOMO-II
- There is a new modeling capability, COCOMO ll
- It has the equation form E A(Size)B
- where A is calibrated for the local environment
- and B is based upon a smaller set of variables
- It is done during post architecture and early
design - Also, size may be measured in various ways,
including function points.
66COCOMO-II
- Constructive Cost Model or COCOMO II is actually
a hierarchy of estimation models that address the
following areas - Application composition model. Used during the
early stages of software engineering, when
prototyping of user interfaces, consideration of
software and system interaction, assessment of
performance, and evaluation of technology
maturity are paramount. - Early design stage model. Used once requirements
have been stabilized and basic software
architecture has been established. - Post-architecture-stage model. Used during the
construction of the software.
67For more information
- The text Software Engineering Theory and
Practice, Shari Lawrence Pfleeger, Chapter 3.3,
Effort estimation, pp. 98-109 has more
information on the subject.
68Summary
- Models should be an aid to software development
management and engineering and to evolving the
discipline - First do a prediction back of the envelope
- Apply the local model and examine the results
- Apply the general model, e.g., COCOMO
- Examine the range of prediction offered by the
models - Compare the results
- What do they say about the project, my
environment? - What assumptions did I make and do I believe
them? - Am I satisfied with the prediction, when should I
re-predict?
69Summary
- No size or effort estimation model is appropriate
for all software development environments,
development processes, or application types. - Models must be customized (parameters in the
formula must be altered) so that results from the
model agree with the data from the particular
software development environment. - An effort estimation is only, ever, an estimate.
Management should treat it with caution. - To make empirical models as useful as possible,
as much data as possible should be collected from
projects and used to customize (refine) any model
used - The different estimating methods used should be
documented, and all underlying assumptions should
be recorded.
70Next Class
- Topic
- Project Estimation Project Schedule Estimation
Resource Schedule Estimation Overly Optimistic
Schedules The Time Value of Money - Reading
- Kan chapter 12.2. pp. 343-347
- Articles on the Class Page
- Term Paper Proposal
- Due Monday October 5, 2009
- Assignment 2
- Due Monday October 12, 2009
71Journal Exercises
- Read the paper Programmer Productivity The
"Tenfinity Factor lthttp//www.devtopics.com/prog
rammer-productivity-the-tenfinity-factor/gt - Comment.
- What about the impact on estimating?
- Also, think about programmer style and lines of
code measurements.