Software Engineering Process I Estimating Software Size - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

Software Engineering Process I Estimating Software Size

Description:

We'll briefly cover four estimation methods, then explain the proxy-based PROBE approach ... Get anonymous estimates, and hand them to a moderator ... – PowerPoint PPT presentation

Number of Views:186
Avg rating:3.0/5.0
Slides: 59
Provided by: users3
Category:

less

Transcript and Presenter's Notes

Title: Software Engineering Process I Estimating Software Size


1
Software Engineering Process IEstimating
Software Size
  • INFO 636
  • Glenn Booker

2
Why Plan?
  • As emphasized earlier, we need a good estimate of
    the amount of work to be performed, in order to
    predict effort and time accurately (per Boehm)
  • Estimation is one of the most challenging aspects
    of managing software development, hence our
    substantial focus on it here

3
Estimation Example
  • Other fields have well established formulas for
    estimating work
  • Construction knows the cost per square foot of
    various types of construction
  • More complex projects look at the linear amount
    of walls, and the areas of various parts (walls,
    ceilings, etc.) to develop good estimates

4
Size Estimation Process
  • The framework, or process, for planning a project
    was covered last lecture
  • Define system requirements
  • Product conceptual design
  • Estimate product size
  • Estimate resources and schedule
  • Develop the product
  • Refine basis for later estimates

5
Estimation Tools
  • Most software estimation tools have been
    calibrated to use software size as an input, and
    produce effort and schedule as outputs
  • COCOMO, SLIM, PriceS, and McConnells tables in
    Rapid Development
  • Often start at fairly large project sizes, e.g.
    10,000 LOC and up

6
Estimation Tools
  • We need a basis for estimation which works for an
    individual (programmer)
  • Most organizations use either no estimation
    methods, or use terribly unreliable ones
  • 100 error is far too common

7
Desired Estimation Goals
  • Criteria for a good estimation method include
  • Use structured and trainable methods
  • Should apply to both development and maintenance
  • Should be able to handle all aspects of
    development, not just code

8
Desired Estimation Goals
  • It should be suitable for statistical analysis
  • It should be adaptable to future types of work
  • It should be possible to judge the accuracy of
    your work (and hence refine the model)
  • Well briefly cover four estimation methods, then
    explain the proxy-based PROBE approach

9
Estimation Methods
  • Wideband-Delphi Method
  • Fuzzy Logic Method
  • Standard Component Method
  • Function Point Method
  • Proxy-based Estimating

10
Wideband-Delphi Method
  • This method was developed by Rand Corporation
  • It uses several people to estimate the same task,
    then applies a Delphi method to get a consensus
    estimate
  • The process is
  • Discuss the problem

11
Wideband-Delphi Method
  • Get anonymous estimates, and hand them to a
    moderator
  • Find the median estimate, and show everyone the
    set of estimates
  • Discuss the results, to uncover different views
    of the project scope
  • Repeat the process until estimates converge to
    within a predefined range

12
Fuzzy Logic Method
  • This approach uses historic data to arrive at
    some meaningful estimates based on qualitative
    descriptions
  • Size categories such as Very Small, Small,
    Medium, Large, and Very Large
  • How data are divided into these categories
    depends on the type of data

13
Fuzzy Logic Method
  • Data with a small range (say, a factor of five
    from very small to very large) can use a linear
    divisions
  • Data with a large range can use a base 10
    logarithmic division (as shown in the text)

14
Fuzzy Logic Method
  • Linear division breaks up sizes into evenly
    divided pieces
  • Heres an example for the N track
  • If your work to read the text involves chapters
    from 23 to 75 pages long (I made those numbers
    up), then the range of sizes is 75-2352 pages
  • Divide that range into five pieces by dividing by
    four 52/4 13

15
Fuzzy Logic Method
  • The midpoints of each size are just the lowest
    size, then add the 13 four times
  • Very Small midpoint 23 pages
  • Small midpoint 231336 pages
  • Medium midpoint 2313249 pages
  • Large midpoint 23 13362 pages
  • Very Large midpoint 23 13475 pages (which
    equals the largest chapter size)

16
Fuzzy Logic Method
  • Use half of 13, or 6.5, to find the ranges for
    each size
  • Very Small range is up to 236.529.5 pages
  • Small range is 29.5 to 366.542.5 pages
  • Medium range is 42.5 to 496.555.5 pages
  • Large range is 55.5 to 626.568.5 pages
  • Very Large range is 68.5 pages and up
  • Notice each categorys range is also 13 pages,
    since we have linear divisions

17
Fuzzy Logic Method
  • The logarithmic version is messier, since we have
    to
  • Convert the sizes to their logarithms
  • Follow the linear approach using the logarithms
  • Take everything to the power of 10 to convert it
    back to the original units

18
Fuzzy Logic Method
  • The example in the book has LOC ranging from 173
    to 10,341 LOC
  • The log10 of 173 is 2.238
  • The log10 of 10,341 is 4.014
  • The difference is 4.014 2.238 1.776
  • Divide the difference by four to get the interval
    1.776/40.444
  • Mimic slide 15 to find the midpoints

19
Fuzzy Logic Method
  • The midpoints of each size are just the lowest
    size, then add the 0.444 four times
  • Very Small midpoint 2.238
  • Small midpoint 2.238 0.444 2.682
  • Medium midpoint 2.238 0.4442 3.126
  • Large midpoint 2.238 0.4443 3.570
  • Very Large midpoint 2.238 0.4444 4.014
    (which equals the largest code size)
  • Mimic slide 16 to find the ranges of each size
    category

20
Fuzzy Logic Method
  • Use half of 0.444, or 0.222, to find the ranges
    for the first size (then just keep adding 0.444
    to each range boundary)
  • Very Small range is up to 2.2380.2222.460
  • Small range is 2.460 to 2.4600.4442.904
  • Medium range is 2.904 to 2.9040.4443.348
  • Large range is 3.348 to 3.3480.4443.792
  • Very Large range is 3.792 and up

21
Fuzzy Logic Method
  • Now take 10 to the power of the logarithms to
    find the actual LOC
  • Very Small range is up to 102.460288 LOC
  • Small range is 288 to 102.904802 LOC
  • Medium range is 802 to 103.3482228 LOC
  • Large range is 2228 to 103.7926194 LOC
  • Very Large range is 6194 LOC and up
  • This is the basis for the poorly labeled table at
    the bottom of page 104 in the text

22
Fuzzy Logic Method
  • An asideTables 5.2 in the text divide each of
    the five basic categories (Very Small, etc.) into
    five more subranges
  • This follows the same approach, just adding more
    detail to each category
  • Its unlikely youll have enough data to worry
    about subranges

23
Standard Component Method
  • The Standard Component Method, by Putnam,
    assumes you have a substantial database from
    which to make your estimates
  • Make a realistic estimate of how many screens you
    think will be in your system
  • Estimate the lowest and highest possible numbers
    of screens you could imagine will be in your
    system

24
Standard Component Method
  • For actual estimation, usen (lowest number
    highest number 4realistic number)/6
  • The idea is to try to account for possible error
    in your estimate
  • Repeat this process for each type of component in
    your system

25
Function Point Method
  • The function point approach uses function
    points as a proxy for the complexity of the
    system, independent of the programming language
    used
  • See ISYS 420, lecture 8 for details of this
    approach

26
Function Point Method
  • Each input or output function, interface, file,
    and inquiry is judged on a fixed complexity scale
    of small to large (not shown in the Humphrey
    text), and assigned some number of function
    points
  • The total number of function points is adjusted
    for 14 influence factors, such as the
    developers expertise, business environment, etc.

27
Function Point Method
  • While a great language-independent method for
    judging the complexity of a program, it isnt as
    reliable for estimating development effort
  • See IFPUG for more details

28
Proxy-based Estimating
  • We are trying to predict the final size of a
    software product
  • Measuring or estimating that directly is tricky
    at best, so we use proxies to help get there
  • A proxy is an intermediate concept or substitute
    for what we really want to predict

29
Proxy-based Estimating
  • The overall process is like this
  • We want to take the conceptual design, and break
    it into parts which correspond to the proxies
    available
  • Estimate each part of the system, based on the
    proxies
  • Add them up to get the overall product size

30
Choosing a Proxy
  • The proxy size should correspond to the
    development effort size
  • Proxy content should be countable and easy to
    visualize
  • Proxy must be customizable
  • The proxy should be sensitive to the same factors
    which affect development

31
Possible Proxies
  • In a manner similar to function points, any
    characteristic of the system could be proxies
  • Input screens, output reports, data files
  • Objects or classes
  • The fuzzy logic and function point concepts are
    essentially blended to produce the PROBE approach

32
PROBE Method
  • PROxy-Based Estimation (PROBE) uses objects as
    proxies
  • See also Appendix C, Tables C36 and C40
  • First choose appropriate proxy categories (e.g.
    Table 5.7, p. 117)
  • For code, calculation, data, I/O, control, print,
    etc. might be suitable proxies
  • Reading, discussion, homework, (N track)

33
PROBE Method
  • Choose reasonable size options for the proxies
  • For class, you might only have enough data for
    three sizes instead of five
  • Analyze your historic data to determine
    approximate sizes (LOC) for each proxy
  • For N track, the amount of effort needed

34
PROBE Method
  • Now start using your method for a given
    assignment
  • Develop a conceptual design for the solution
  • Use your proxies to estimate the amount of code
    or effort needed to develop them
  • The example on page 120 is the first use of form
    C39 (p. 683)

35
A Course Note
  • P track students will use the estimating pretty
    much as written in the text
  • Our forms are slightly different
  • N track students will develop their own proxies
    to correspond to their weekly activities, and
    create a custom form N39 to follow a similar
    process

36
PROBE Method
  • The BASE PROGRAM section of C39 is a summary of
    the expected changes to the preexisting code
  • Base Size (B) is the amount of code already
    present
  • LOC Deleted (D) is how much existing code you
    plan to remove
  • LOC Modified (M) is how much existing code you
    expect to change

37
PROBE Method
  • The PROJECTED LOC section contains
  • Base Additions (BA) are planned additions to
    existing code (new lines within existing modules)
  • New Objects (NO) are new modules or classes which
    will need to be implemented
  • Your proxy structure is used to describe the
    Type, Methods, and Relative Size of the changes
    to BA and NO

38
PROBE Method
  • The REUSED OBJECTS (R) section of C39 is used to
    describe
  • Code youll reuse from another preexisting
    source
  • Code youll create during this assignment which
    will be reusable
  • These tend to be rare during the course

39
PROBE Method
  • Now comes the number crunching part
  • The Projected LOC (P) is the total amount of new
    development for this assignment P BA NO
  • The terms b0 (hereafter beta0) and b1 (beta1)
    are linear regression parameters from your work
    history
  • By now you have a history of planned LOC or
    effort, and actual

40
PROBE Method
  • What the flock are beta0 and beta1?
  • The classic equation for a line is y mx b
  • m is the slope, which corresponds to beta1
  • b is the y-intercept, which is beta0
  • Here the x axis is the planned LOC or effort,
    and the y axis has actual values

41
PROBE Method
42
PROBE Method
  • See regression handout for an example of
    calculating beta0 and beta1
  • Note that Sxi2 means S(xi2) not S(xi)2
  • When you use this, make sure the formulas are
    correct
  • n changes each week as new data is created

43
PROBE Method
  • Incidentally, if your estimates are always
    perfect, youd have beta1 1, and beta0 0
    (why?)
  • Once you have beta0 and beta1, find
  • New and Changed LOC (N) beta0 beta1(P M)
  • Its critical to note that later calculations
    for prediction interval use N, not P

44
PROBE Method
  • The expected size of the application after this
    project is
  • Total LOC (T) N B - D M R
  • The Total New Reused is the sum of code flagged
    (with a ) in the New Objects section which are
    being reused
  • Dont need to use this very often

45
PROBE Method
  • Then we get to the Range calculation
  • We have a refined estimate of the size of the
    system, but want to establish a prediction
    interval in which the real outcome is likely to
    fall
  • See the PSP_Calculation_Example.xls spreadsheet

46
PROBE Method
  • To find the Range, we start with a parameter from
    the t distribution
  • Called t(a/2, n-2) where
  • a/2 is the width of the prediction interval
    generally 70 or 90
  • n-2 is the number of degrees of freedom again,
    n is the number of data pairs
  • In Excel, use TINV(1 - a/2, n - 2)

47
PROBE Method
  • Next we need the standard deviation, s
  • Thats why column G adds up(Yi - b0 b1Xi)2
  • s sqrt S(Yi - b0 b1 Xi)2 / (n-2)
  • Now theres a new term, xk (xk)
  • xk P M
  • This is the same term used in the N formula the
    projected and modified LOC

48
PROBE Method
  • Now use this to plug into formula 5.3 on page 124
  • Im not going to copy it here ?
  • Notice in the spreadsheet the column H
    calculation of(Xi - Xavg)2which is also used
    to find the Range

49
PROBE Method
  • Finally, find the Upper and Lower Prediction
    Intervals (UPI and LPI)
  • UPI N Range
  • LPI N Range
  • The Prediction Interval Percent is either 70 or
    90, the value used to find t

50
PROBE Method
  • If Range is comparable to N in magnitude
  • Choose a Prediction Interval Percent of 70 to
    keep Range smaller, and/or
  • Look for data fliers which can have a strong
    influence on sigma (s)
  • E.g. data points with relatively large value of
    (Yi - b0 b1Xi)2

51
Object Size Ranges
  • The fuzzy logic method (starting on slide 12)
    summarizes the two most likely approaches for
    defining size ranges based on your historic data
  • A Linear approach, generally best if the range of
    the data is well under a factor of 10
  • A logarithmic approach for wider range data

52
Object Size Ranges
  • If your work is following a true normal
    distribution, then your objects should have
  • 6.68 each in Very Small and Very Large
    categories
  • 24.17 each in Small and Large categories
  • 38.30 in the Medium category
  • Its good to see if this holds

53
Object Size Ranges
  • If your object size distribution is really
    skewed, you could
  • Reconsider the size categories
  • Look for better proxies
  • See if your design approach is leaning toward
    very large or very small objects, or very
    inconsistent object sizes

54
N Track Notes
  • Youll use most of the preceding discussion
  • Youll have different proxies instead of the
    Base Program, Projected LOC, and Reused
    Objects
  • Youll have some equivalent of P and N, and
    still find beta0, beta1, and Range
  • Your P and N will measure time instead of LOC
  • Youll still find prediction intervals UPI, LPI

55
Improving Estimation
  • We tend to try to estimate many small things for
    a large task
  • The estimation errors tend to cancel each other
    somewhat
  • The PSP allows you to know what your estimation
    errors have been, and hence improve later
    estimates
  • Though thats hard to see during the term

56
Improving Estimation
  • As you follow this consistently, your values for
    beta0 and beta1 will tend to stabilize
  • Then you dont have to keep recalculating them!
  • If you get really weird beta0 and beta1, or have
    no history yet, look at other options for
    refining your estimate, on page 679 (Table C35)

57
Improving Estimation
  • On large projects, look for a consistent, and
    fairly low, level of abstraction
  • The conceptual design might need to be refined to
    provide enough detail for a good estimate
  • If a single object performs the work of many
    kinds of proxies, then it probably needs to be
    broken down

58
Improving Estimation
  • Estimating products which have no precedent is
    really tough
  • Make sure the level of uncertainty is clear to
    your customer
  • Avoid overcompensating for your own history of
    errors
  • Make small changes in your approach and try them
    for a while
Write a Comment
User Comments (0)
About PowerShow.com