Title: Planning III Estimating Software Size
1Topic 5
- Planning III Estimating Software Size
- ??? ?????? ???
2Lecture Overview
- 5.1 Background
- 5.2 Popular Estimating Methods
- 5.3 Proxy-based Estimating
- 5.4 The PROBE Size Estimating Method
- 5.5 Object Categories
- 5.6 Estimating Considerations
- 5.7 Summary
- 5.8 Exercises
35.1 Background
45.1 Background 1
- Why Estimate Size?
- Size estimates allow you to make better plans.
- Plan better for the resources, tasks, and
schedule. - The quality of a plan depends on that of the size
estimate. - The more details you have, the more accurate the
estimation will be. - For a large software job, divide it into separate
elements. - Size estimates assist you in tracking progress.
- You can
- judge when the job scope changes.
- better measure the work.
5Background 2
- Estimating models in other fields
- have a large historical base.
- are widely used.
- generate detailed planning data.
- require a size estimate as input.
6Project Planning Framework
- The framework is shown in Figure 5.1.
- Size estimating compare the design elements with
the historical size data to make estimates. - Accurate size estimation will lead to accurate
resource estimation.
7Fig. 5.1 Project Planning Framework
8Estimating Experience
- Studies show that
- size estimation errors may be as high as 100 or
even more. - Only 22 professionals make estimates.
- Usually people have a larger estimation error
percentage (up to 400) in early phase, and then
the percentage declines (25 or less) in later
phases. - Serious size estimation errors result in poor
resource estimates and unrealistic project
schedules. - Fortunately, the size estimation skill can be
learned and improved.
9Size Estimating Criteria
- A widely used method should be
- structured and trainable.
- usable during all development and maintenance
phases. - usable for all types of software product
elements. - suitable for statistical analysis.
- adaptable to the types of your future work.
- able to judge the accuracy of the estimates.
105.2 Popular Size Estimating Methods
115.2 Popular Size Estimating Methods
- Four popular methods
- Wideband-Delphi
- Fuzzy-Logic
- Standard-Component
- Function-Point
- Their concepts form the foundation of the PROBE
(Proxy-based Estimating) method used with the PSP.
12Wideband-Delphi Method 1
- Originated by Rand Co. and refined by Boehm.
- Based on the Delphi process, which usually
contains several cycles. - Several experts (estimators) and a moderator are
involved in the process. - Each expert makes an independent estimate and
submit it to the moderator in each cycle. - The moderator coordinates the estimation process
for these experts whose estimates are anonymous
to all others. - The process runs until experts estimates
converge on a consensus result.
13Wideband-Delphi Method 2
- The methods process is as follows
- A group of experts is each given the programs
specifications and estimation forms. - They meet to discuss project goals, assumptions,
and estimation issues. - They then each anonymously list project tasks and
estimation size. - The estimates are given to the moderator, who
tabulates the results and returns them to the
experts, as illustrated in Figure 5.2.
14Fig. 5.2 The Chart Used in Wideband-Delphi
15Wideband-Delphi Method 3
- Only each experts personal estimate is
identified all others are anonymous. - The experts meet to discuss the results. They
each review the tasks they have defined but not
their size estimates. - The cycle continues at step 3 until the estimates
converge to within an acceptable range.
16Wideband-Delphi Method 4
- Advantages
- can produce accurate results
- uses organizations skills
- works for any size products.
- Disadvantages
- relies on a few experts
- time consuming
- subject to common biases
17Fuzzy-Logic Method 1
- Estimators compare the planned program with prior
programs and select the most appropriate size
category. - Preparing for this method
- gather sufficient size data on previously
developed programs - subdivide these data into size categories and
subcategories - provide a meaningful number of program examples
in each size category and subcategory - extend the size ranges up or down as you get
further data - Table 5.1 shows the example.
18Table 5.1 Example for Fuzzy-Logic Method
19Fuzzy-Logic Method 2
- Advantages
- based on relevant historical data
- easy to use
- requires no special tools or training
- provides reasonably good estimates in cases where
new work is like prior experience
20Fuzzy-Logic Method 3
- Disadvantages
- requires a lot of data
- requires that the estimators be familiar with the
historically developed programs - provides only a crude sizing
- not useful for new program types
- not useful for programs that are much larger or
smaller than the historical data
21Standard-Component Method 1
- Use organizations historical data to determine
the typical size of various types of components - subsystems, modules, screens, reports, etc.
- For a new project
- Judge the number of each type of components will
likely be in the project. - Also determine the maximum and minimum numbers of
each type of components you could image.
22Standard-Component Method 2
- Calculate the estimated number of each type with
- (4likely number max. number min. number) / 6
- The estimated size of each type is
- its estimated number its typical size from
historical data - Sum up the estimated size of all types to get the
total size.
23Table 5.2 Example of Component Estimating
For modules, its estimated size 932 17.5
16310.
24Standard-Component Method 3
- Advantages
- based on relevant historical data
- easy to use
- requires no special tools or training
- provides a rough estimate range
- Disadvantages
- must use large components early in a project
- limited data on large components
- hard to visualize component counts early in a
project
25Function-Point Method 1
- Count the numbers of the 5 types of basic
functions that the commercial application program
will likely need by reviewing its requirements. - Inputs screens or forms through which to add new
data or update existing data in the application - Outputs screens or reports that the application
produces - Inquires screens used to ask for interrogation,
assistance or information - Data files logical collection of records that
the application updates - Interface e.g., files shared with other
applications, shared databases, parameter lists,
26Function-Point Method 2
- Each type has a given weight.
- Calculate the function points of the application
- multiply the number of each type by its weight.
- sum them up to get the total (unadjusted).
- Use historical data on development cost and time
per function point to make the estimates.
27Table 5.3 Function-Point Categories
28Table 5.4 Function-Point Category Example
29Function-Point Method 3
- To improve the estimation, adjust the function
points by 14 influence factors. - The influence factor values are selected from 0
(very simple) to 5 (very complex) by the judgment
of the estimator. - Sum up these values.
- Calculate the complexity multiplier
- 0.650.01(sum of influence factor values)
- Calculate adjusted function points
- complexity multiplier unadjusted function
points - The adjustment is between 35
- Table 5.5 shows an example.
30Table 5.5 An Example of Function-Point Influence
Factors
31Function-Point Method 4
- Advantages
- a well-documented method
- usable in the earliest requirements phase
- independent of programming languages, product
designs, development styles - having a large body of historical data
- with an active user group
32Function-Point Method 5
- Disadvantages
- cannot directly count an existing products
function point content - without historical data, difficult to improve
estimating skill - not reflect language, design, and style
differences - designed for estimating commercial data
processing applications
335.3 Proxy-based Estimating (PROBE)
345.3 Proxy-Based Estimating (PROBE)
- In stead of direct estimating, a proxy is a
substitute to help estimators judge product size. - Proxies, for example, can be objects, screens,
files, scripts, document chapters, function
points, and so on.
35Criteria for a Good Proxy 1
- The proxy size measurement (or estimate) should
closely relate to the effort required to develop
the product. - Use the correlation method (see Topic 15) to
determine which proxy is a better predictor of
product size or development effort.
36Criteria for a Good Proxy 2
- The proxy content of a product should be
automatically countable. - Large amount of proxy data extracted from
historical data is needed to define new
estimates. - The proxy must be a physical entity that can be
precisely defined and algorithmically identified.
37Criteria for a Good Proxy 3
- The proxy should be easy to visualize at the
beginning of a project. - The proxy should be customizable to the special
need of using organizations. - Different product types may use different kind of
proxies to estimate. - The proxy should be sensitive to any
implementation variations that impact development
costs or efforts. - for example, program languages, design styles,
application types
38Objects as Proxies 1
- Objects in OO programming languages are a good
candidate as proxies because they closely relate
to development effort or resources. - Figures 5.3 and 5.4 show close correlation
between estimated object LOC (Sobject size ) and
actual program LOC. - Linear regression formula (See Topic 15 )
- actual program LOC ß0 estimate object
LOC ß1 - Figures 5.5 and 5.6 show high correlation and
significance between estimated object LOC and
actual development hours. - Linear regression formula
- actual development hours ß0 estimate
object LOC ß1
39Objects as Proxies 2
- The PROBE method uses objects as proxies.
- The PROBE method requires that estimators have
historical data on the sizes of objects they have
developed and that these data be divided into
categories.
40Fig. 5.3 Estimated Object LOC vs. Actual Program
LOC (10 Pascal Programs)
41Fig. 5.4 Estimated Object LOC vs. Actual Program
LOC (25 C Programs)
42Fig. 5.5 Estimated Pascal Object LOC vs. Actual
Development Hours with Correlation
0.934, Significance lt 0.005
43Fig. 5.6 Estimated C Object LOC vs. Actual
Development Hours with Correlation
0.980, Significance lt 0.005
44Estimating Object Size with Fuzzy-Logic Method
- Judge the size of each object on a per-objects
method basis with fuzzy-logic method. - decide the category of the object
- judge how many methods it likely will contain
- decide the size range it falls into
- estimated object size ( of methods) (LOC of
the size range) - Table 5.6 shows object category sizes in LOC
per-objects method - For example, a medium-sized Pascal text object
with 4 methods has about 66 (16.484) LOC.
45Table 5.6 Object Category Sizes in LOC per Method
465.4 The PROBE Size Estimating Method
475.4 The PROBE Size Estimating Method
- The PROBE size estimating procedure is shown in
Figure 5.7. - In the procedure, a size estimating template,
shown in Table 5.7, will be used. - Use the template to instruct how to estimate
program size with the PROBE method and linear
regression.
48Fig. 5.7 The Procedure of PROBE Method
49Table 5.7 Size Estimating Template and example 1
50Table 5.7 Size Estimating Template and example 2
51Size Estimating Template 1
- Base Program the program you will enhance and
perform any changes to it. - Base Size (B) the LOC of the base program
- LOC Deleted (D) the LOC to be deleted from the
base program - LOC Modified (M) the LOC to be modified in the
base program
52Size Estimating Template 2
- Projected LOC (P) includes total base additions
LOC (BA) and total new objects LOC (NO) - Base Additions the new functions to be added to
the base program - New Objects the new objects to be added to the
base program - New Reused Objects a new object planned to
develop and general enough to put into the reuse
library - (Note mark an with the new reused objects)
- Reused Objects (R) the objects taken from the
reuse library
53Size Estimating Template 3
- Calculations
- Using actual new and changed LOC and estimated
Object LOC (i.e., PM) from historical data to
obtain regression parameters ß0, ß1 - Estimated New and Changed LOC (N) use regression
parameters to calculate - N ß0 ß1 (P M)
- Estimated Total LOC (T) the estimated size of
the final program -
54Size Estimating Template 4
- Prediction Range using t distribution (See
Section 15.5.2), a desired prediction interval
percent, and the data for linear regression to
calculate - Lower/Upper Prediction Interval the interval (N
Range) within which the actual new and changed
LOC is likely to fall - Prediction Interval Percent the percentage that
the actual new and changed LOC is likely to fall
within the interval
55PROBE Step 1 Conceptual Design
- The conceptual design establishes a preliminary
design approach and names the expected product
objects and their functions. - For an accurate estimate, estimators must refine
the conceptual design to the level of objects.
56PROBE Step 2 Identify Objects
- Determine object type and size
- if it is a new object, use fuzzy-logic method
(see Section 5.3) - of methods (judging size using a per-method
basis) - object type
- relative size
- LOC per method
- estimated object size ( of method) (LOC per
method) - (Note mark an with the new reused objects)
- if the object is taken from the reuse library,
determine - reuse object categories
- object LOC
57PROBE Step 3 Calculate Projected and Modified
LOC
- Projected LOC (P)
- Total Base Additions LOC (BA)
- Total New Objects LOC (NO)
- Using actual new and changed LOC and estimated
object LOC (i.e., PM) from historical data to
obtain regression parameters ß0, ß1 - Estimated New and Changed LOC (N)
- ß0 ß1 (PM)
58PROBE Step 4 Estimate Program Size
- Estimated program size
- Base Size (B)
- Estimated New and Changed LOC (N)
- Reused Total LOC (R)
- LOC Deleted (D)
- LOC Modified (M)
- LOC Modified (M) is subtracted because it is
counted twice one in Base Size (B) and the other
in Estimated New and Changed LOC (N).
59PROBE Step 5 Calculate Prediction Interval 1
- The purpose is to assess the quality of the
estimate. - Use t distribution, a desired prediction interval
percent, and the data for linear regression
calculation to get - Prediction Range
- Prediction Interval LPI, UPI
- Within the prediction interval, the actual new
and changed LOC is likely to fall with the
selected percent.
60PROBE Step 5 Calculate Prediction Interval 2
- Prediction Interval LPI, UPI
- Lower Prediction Interval (LPI)
- Estimated New and Changed LOC (N) -
Range - Upper Prediction Interval (UPI)
- Estimated New and Changed LOC (N)
Range
61PROBE Step 5 Calculate Prediction Interval 3
- Prediction Range using the data for linear
regression calculation in Step 3 -
-
- Where
- xi is the estimated object LOC (i.e., PM) in
historical data - xavg is the average of xi
- xk is the estimated object LOC (i.e., PM) in the
new program - yi is actual new and changed LOC in historical
data
625.5 Object Categories
635.5 Object Categories
- The PROBE method uses objects as proxies. It
needs object categories and their sizes to judge
the size of the new objects. - Table 5.6 shows the example that we want to
produce. - for C and Object Pascal
- object sizes per method are divided into 5 size
ranges very small, small, medium, large, and
very large.
64Object Size Distribution
- Assume historical object data are normally
distributed. - Figure 5.8 shows the relation between the normal
distribution and the size ranges.
65Fig. 5.8 Normal Distribution with Size Ranges
s
66Approaches
- Two approaches
- If the size data are close to normal
distribution, use normal distribution in the
calculation . - Otherwise, use normal distribution of natural log
in the calculation.
67Approach I Procedure with Original Data
- Use historical data on objects
- divide them into categories
- calculate the size range midpoints for each
category. - use statistical techniques of normal distribution
in the calculation - The following is the procedure
68Step 1 Classification
- Divide objects in historical data into
categories. - Basic object categories
- logic, control
- I/O, files, display
- data, text, calculation
- set-up, error handling
69Step 2 Calculating LOC per Method
- For each category, calculate the LOC per method
of each object. - Table 5.9 shows the example for 13 objects
belonging to the object category, text.
70Table 5.8 Pascal Text Object LOC per Method
71Step 3 Calculating Standard
Deviation
- For each category, calculate the variance and
standard deviation of the values from Step 2. - Variance s2 (1/n) ?i1..n(xi xavg )2
- Standard Deviation v of Variance s
- where xi (LOC per method) of each object of
one specific category xavg denotes the average. - The following table shows the example.
72Table 5.9 Pascal Text Object Standard Deviation
73Step 4 Calculating Size Range
Midpoints
- For each category, use the average xavg and
standard deviation s to calculate the size range
midpoints - VL xavg 2s
- L xavg s
- M xavg
- S xavg s
- VS xavg - 2s
74Negative Size Range Midpoints
- Table 5.10 shows the example of the size range
midpoints and the 13 object sizes per method. - However, in this example, the VS midpoint is a
negative number, which is not what we want and
means that the object data are not normally
distributed.
75Table 5.10 Pascal Text Object and Size Ranges
76Approach II Procedure with Natural Log of
Original Data
- The trick to handle the negative size range
midpoint is to calculate the natural log of the
data. - The following is the procedure with the first 2
steps same as the previous procedure.
77Step 3 Calculating the Natural Log
and Average
- For each category, calculate the natural log (ln)
of the LOC per method of each object, and the
average, avgln, of these log values. - Table 5.11 is the example for object category,
text.
78Table 5.11 Pascal Text Object ln (LOC per Method)
79Step 4 Calculating Standard
Deviation
- For each category, calculate the variance and
standard deviation of these log values. - Variance s2 (1/n) ?i1..n(xi xavg )2
- Standard Deviation v of Variance s
- where xi ln (LOC per method) of each object
of a category.
80Step 5 Calculating the Log of Size
Range Midpoints
- For each category, use average of log values and
standard deviation to calculate the log of size
range midpoints - ln(VL) avgln 2s
- ln(L) avgln s
- ln(M) avgln
- ln(S) avgln s
- ln(VS) avgln - 2s
81Step 6 Calculating the Antilog
- For each category, calculate the antilog to get
the midpoints of the size ranges - VL eln(VL)
- V eln(L)
- M eln(M)
- S eln(S)
- VS eln(VS)
- Table 5.12 shows the example of the size range
midpoints and the 13 object sizes per method.
82Table 5.12 Pascal Text Objects and Size Ranges
835.6 Estimating Considerations
845.6 Estimating Considerations 1
- The PSP estimating objective is to improve your
estimating ability by tracking and analyzing your
estimates. - If your estimating process is stable, the linear
regression method, based on the historical data
gathered from consistently biased estimates, can
make an accurate bias adjustment. - While you gain more experience and your process
evolves, you should adjust your statistical
calculations to include only the newer and more
representative data and drop the old ones.
85Estimating Considerations 2
- Whenever your linear regression parameters appear
unreasonable (e.g., ß1 far from 1.0, large ß0 ),
use the averaging method to estimate. - This method uses a ratio to adjust size or time
based on historical averages. - estimated program LOC estimated object LOC
ratio - where ratio historical average
- ?(final program LOC) / ? (estimated
object LOC)
86Estimating Considerations 3
- To use the linear regression method, you must
have at least three programs for which you have
made object LOC estimates. - If you have at least three programs but enough
historical estimate data, - obtain ß0 and ß1 from your actual size data
- If you do not have enough historical data but at
least one program, - use the averaging method
87Estimating Considerations 4
- You can learn to reduce the estimating bias by
comparing, during postmortem, estimates made at
each phase with their actual size. - To estimate unprecedented products, the best
answer is to resist making firm estimates until
you have completed a feasibility study and build
some prototypes.
885.7 Summary
895.7 Summary 1
- Accurate size estimate will help you to make
better development plans. - Size estimating skill improve with practice.
- A defined and measured process provides a
repeatable basis for improvement.
90 Summary 2
- With PROBE, estimates are based on one of
following four methods - Method A regression with estimated object LOC
- Method B regression with estimated new and
changed LOC - Method C size or time adjustment based on
historical averages (the averaging method) - Method D engineering judgment
91Summary 3
- Method A regression with estimated object LOC
- use the relationship between estimated object LOC
and - actual new and changed LOC
- actual development time
- The criteria for using this method are
- 3 or more data points that are correlated (r2 gt
0.5) - Reasonable regression parameters
92Summary 4
- Method B regression with estimated new and
changed LOC - use the relationship between estimated new and
changed LOC and - actual new and changed LOC
- actual development time
- The criteria for using this method are
- 3 or more data points that are correlated (r2 gt
0.5) - Reasonable regression parameters
93Summary 5
- Method C adjusting size or time based on
historical averages. - the averaging method
- easy to use
- used when linear regression parameters (ß0, ß1)
appear unreasonable (that is, methods A and B can
not be used) - assuming that there is no fixed overhead
- The criteria for using this method are
- At least one data point is required.
94Summary 6
- Method D engineering Judgment with estimated
object LOC - used in absence of historical data
- using judgment from estimated object LOC to
estimate - actual new and changed LOC
- actual development time
- used when methods A, B, and C can not be used
955.8 Exercises
96Program 3A 1
- Use PSP0.1 (see Appendix in Topic 4) to write
program 3A. - Program 3A Requirements
- Write a program to count
- the total program LOC
- the total LOC in each object
- the number of methods in each object
- If an object-oriented program is not used as the
input, count - the total program LOC
- the total LOC in each procedure or function
97Program 3A 2
- Use the counting standard produced by report
exercise R1. - It is acceptable to enhance program 2A or to
reuse some of its methods, procedures, or
functions in developing this program. - Program 3A Testing
- Thoroughly test the program.
- At a minimum, test program 1A, 2A, and 3A.
- Prepare a test report that includes a table as
the format in Table 5.11.
98Table 5.11 Test Result Format ? Program 3A
a) Format for object-oriented program designs
b) Format for non object-oriented program designs
99Program 4A 1
- Use PSP1 (see Appendix in this topic) to write
program 4A. - Program 4A Requirements
- Write a program to calculate the linear
regression size-estimating parameters
100Program 4A 2
- Use the historical data of a set of n programs
object LOC, xi, and new and changed LOC, yi. - Enhance the linked list of program 1A to hold the
n data records, where each record has 2 real
numbers.
101Program 4A 3
- Program 4A Testing
- Thoroughly test the program.
- At a minimum, test it with the 3 cases shown in
Table 5.12 - use the data for estimated object LOC and actual
new and changed LOC. - use the data for estimated new and changed LOC
and actual new and changed LOC. - use the data, estimated new and changed LOC and
actual new and changed LOC, you have gathered for
the programs 2A, 3A and 4A you have developed. - Prepare a test report that includes a table as
the format in Table 5.13.
102Table 5.12 Size Estimating Regression Data
103Table 5.13 Test Result Format ? Program 4A
104Appendix PSP 1
105PSP1 Process Script 1
106PSP1 Process Script 2
107PSP1 Planning Script 1
108PSP1 Planning Script 2
109PSP1 Development Script 1
110PSP1 Development Script 2
111PSP1 Postmortem Script 1
112PSP1 Postmortem Script 2
113Size Estimating Template 1
114Size Estimating Template 2
115Test Report Template
116PSP1 Project Plan Summary 1
117PSP1 Project Plan Summary 2