Prioritizing Test Cases for Regression Testing - PowerPoint PPT Presentation

About This Presentation
Title:

Prioritizing Test Cases for Regression Testing

Description:

1) techniques that order test cases based on their total coverage of code components, ... After complete coverage has been achieved the remaining test cases are ... – PowerPoint PPT presentation

Number of Views:2256
Avg rating:3.0/5.0
Slides: 48
Provided by: martinl
Category:

less

Transcript and Presenter's Notes

Title: Prioritizing Test Cases for Regression Testing


1
Prioritizing Test Cases for Regression Testing
  • Article By Rothermel, et al.
  • Presentation by
  • Martin, Otto, and Prashanth

2
  • Test case prioritization techniques - schedule
    test cases for execution in an order that
    attempts to increase their effectiveness at
    meeting some performance goal.
  • One goal is the rate of fault detection - a
    measure of how quickly faults are detected within
    the testing process
  • An improved rate of fault detection during
    testing can provide faster feedback on the system
    under test and let software engineers begin
    correcting faults earlier than might otherwise be
    possible.
  • One application of prioritization techniques
    involves regression testing

3
  • This paper describes several techniques for using
    test execution information to prioritize test
    cases for regression testing, including
  • 1) techniques that order test cases based on
    their total coverage of code components,
  • 2) techniques that order test cases based on
    their coverage of code components not previously
    covered, and
  • 3) techniques that order test cases based on
    their estimated ability to reveal faults in the
    code components that they cover.

4
  • When the time required to re-execute an entire
    test suite is short, test case prioritization may
    not be cost-effective-it may be sufficient simply
    to schedule test cases in any order.
  • When the time required to execute an entire test
    suite is sufficiently long, however, test-case
    prioritization may be beneficial because, in this
    case, meeting testing goals earlier can yield
    meaningful benefits.
  • In general test case prioritization, given
    program P and test suite T, we prioritize the
    test cases in T with the intent of finding an
    ordering of test cases that will be useful over a
    succession of subsequent modified versions of P.
  • In the case of regression testing, prioritization
    techniques can use information gathered in
    previous runs of existing test cases to help
    prioritize the test cases for subsequent runs.

5
  • This paper considers 9 different test case
    prioritization techniques.
  • The first three techniques serve as experimental
    controls
  • The last six techniques represent heuristics that
    could be implemented using software tools
  • A source of motivation for these approaches is
    the conjecture that the availability of test
    execution data can be an asset.
  • This assumes that past test execution data can be
    used to predict, with sufficient accuracy,
    subsequent execution behavior.

6
(No Transcript)
7
  • Definition 1. The Test Case Prioritization
    Problem
  • Given T, a test suite, PT, the set of
    permutations of T, and f, a function from PT to
    the real numbers.
  • PT represents the set of all possible
    prioritizations (orderings) of T
  • f is a function that, applied to any such
    ordering, yields an award value for that
    ordering.

8
  • A challenge care must be taken to keep the cost
    of performing the prioritization from excessively
    delaying the very regression testing activities
    it is intended to facilitate.

9
  • M3 Optimal prioritization.
  • Given program P and a set of known faults for P,
    if we can determine, for test suite T, which test
    cases in T expose which faults in P, then we can
    determine an optimal ordering of the test cases
    in T for maximizing T's rate of fault detection
    for that set of faults.
  • This is not a practical technique, as it requires
    a priori knowledge of the existence of faults and
    of which test cases expose which faults.
  • However, by using this technique in the empirical
    studies, we can gain insight into the success of
    other practical heuristics, by comparing their
    solutions to optimal solutions.

10
  • M4 Total statement coverage prioritization.
  • By instrumenting a program, we can determine, for
    any test case, which statements in that program
    were exercised (covered) by that test case.
  • We can then prioritize test cases in terms of the
    total number of statements they cover by counting
    the number of statements covered by each test
    case and then sorting the test cases in
    descending order of that number.

11
  • M5 Additional statement coverage prioritization.
  • Total statement coverage prioritization schedules
    test cases in the order of total coverage
    achieved however, having executed a test case
    and covered certain statements, more may be
    gained in subsequent testing by executing
    statements that have not yet been covered.
  • Additional statement coverage prioritization
    iteratively selects a test case that yields the
    greatest statement coverage, then adjusts the
    coverage information on all remaining test cases
    to indicate their coverage of statements not yet
    covered and repeats this process until all
    statements covered by at least one test case.
  • We may reach a point where each statement has
    been covered by at least one test case, and the
    remaining unprioritized test cases cannot add
    additional statement coverage. We could order
    these remaining test cases using any
    prioritization technique.

12
(No Transcript)
13
  • M6 Total branch coverage prioritization.
  • Total branch coverage prioritization is the same
    as total statement coverage prioritization,
    except that it uses test coverage measured in
    terms of program branches rather than statements.
  • In this context, we define branch coverage as
    coverage of each possible overall outcome of a
    (possibly compound) condition in a predicate.
    Thus, for example, each if or while statement
    must be exercised such that it evaluates at least
    once to true and at least once to false.

14
  • M7 Additional branch coverage prioritization.
  • Additional branch coverage prioritization is the
    same as additional statement coverage
    prioritization, except that it uses test coverage
    measured in terms of program branches rather than
    statements.
  • After complete coverage has been achieved the
    remaining test cases are prioritized by resetting
    coverage vectors to their initial values and
    reapplying additional branch coverage
    prioritization to the remaining test cases.

15
  • M8 Total fault-exposing-potential (FEP)
    prioritization.
  • Some faults are more easily exposed than other
    faults, and some test cases are more adept at
    revealing particular faults than other test
    cases.
  • The ability of a test case to expose a fault-that
    test case's fault exposing potential
    (FEP)-depends not only on whether the test case
    covers (executes) a faulty statement, but also on
    the probability that a fault in that statement
    will cause a failure for that test case
  • Three probabilities that could be used in
    determining FEP
  • 1) the probability that a statement s is executed
    (execution probability),
  • 2) the probability that a change in s can cause a
    change in program state (infection probability),
    and
  • 3) the probability that a change in state
    propagates to output (propagation probability).

16
  • This paper adopts an approach that uses mutation
    analysis, to produce a combined estimate of
    propagation-and-infection that does not
    incorporate independent execution probabilities.
  • Mutation analysis creates a large number of
    faulty versions (mutants) of a program by
    altering program statements, and uses these to
    assess the quality of test suites by measuring
    whether those test suites can detect those faults
    (kill those mutants).
  • Given program P and test suite T, we first create
    a set of mutants N n1 n2 . . . nm for P,
    noting which statement sj in P contains each
    mutant. Next, for each test case ti in T, we
    execute each mutant version nk of P on ti, noting
    whether ti kills that mutant.
  • Having collected this information for every test
    case and mutant, we consider each test case ti
    and each statement sj in P, and calculate the
    fault-exposing potential FEP(s, t) of ti on sj as
    the ratio of mutants of sj killed by ti to the
    total number of mutants of sj.

17
  • To perform total FEP prioritization, given these
    FEP(s t) values, we next calculate, for each
    test case ti in T, an award value, by summing the
    FEP(sj ti) values for all statements sj in P.
  • Given these award values, we then prioritize test
    cases by sorting them in order of descending
    award value.

18
(No Transcript)
19
  • M9 Additional fault-exposing-potential (FEP)
    prioritization.
  • This lets us account for the fact that additional
    executions of a statement may be less valuable
    than initial executions.
  • We require a mechanism for measuring the value of
    an execution of a statement, that can be related
    to FEP values.
  • For this, we use the term confidence. We say that
    the confidence in statement s, C(s), is an
    estimate of the probability that s is correct.
  • If we execute a test case t that exercises s and
    does not reveal a fault in s, C(s) should
    increase.

20
(No Transcript)
21
  • Research Questions
  • Can test case prioritization improve the rate of
    fault detection in test suites?
  • How do the various test case prioritization
    techniques discussed earlier compare to one
    another in terms of effects on rate of fault
    detection?
  • Effectiveness Measures
  • Use a weighted Average of the Percentage of
    Faults Detected (APFD)
  • Ranges from 0..100
  • Higher numbers means faster detection
  • Problems with APFD
  • Doesnt measure cost of prioritization
  • Cost is normally amortized because test suites
    are created after the release of a version of the
    software

22
Effectiveness Example
23
  • Programs used
  • Aristotle program analysis system for test
    coverage and control graph information
  • Proteum mutation system to obtain mutation
    scores.
  • Used 8 C programs as subjects
  • First 7 were created at Siemens, the eighth is a
    European Space Agency program

24
(No Transcript)
25
  • Siemens Programs - Description
  • 7 programs used by Siemens in a study that
    observed the fault detecting effectiveness of
    coverage criteria
  • Created faulty versions of these programs by
    manual seeding them with single errors creating
    the number of versions column
  • Using single line faults only allows researchers
    to determine whether a test case discovers the
    error or not
  • For each of the seven programs, a test case suite
    was created by Siemens. First via a black box
    method, they then completed the suite using white
    box testing, so that each executable statement,
    edge, and definition use pair was exercised by
    at least 30 test cases.
  • Kept faulty programs whose errors were detectable
    by between 3 and 350 test cases
  • Test suites were created by the researchers by
    random selection until a branch coverage adequate
    test suite was created
  • Proteum was used to create mutants of the seven
    programs

26
  • Space Program Description
  • 33 versions of space with only one fault in each
    were created by the ESA, 2 more were created by
    the research team
  • Initial pool of 10 000 test cases were obtained
    from Vokolos and Frankl
  • Used these as a base and added cases until each
    statement and edge was exercised by at least 30
    test cases
  • Created a branch coverage adequate test suite in
    the same way as the Siemens program
  • Also created mutants via Proteum

27
  • Empirical Studies and Results
  • 4 different studies using the 8 programs
  • Siemens programs with APFD measured relative to
    Siemens faults
  • Siemens programs with APFD measured relative to
    mutants
  • Space with APFD measured relative to actual
    faults
  • Space with APFD measure relative to mutants

28
  • Siemens programs with APFD measured relative to
    Siemens faults Study Format
  • M2 to M9 were applied to each of the 1000 test
    suites, resulting in 8000 prioritized test suites
  • The original 1000 were used as M1
  • Calculated the APFD relative to the faults
    provided by the program

29
Example boxplot
30
  • Study 1 - Overall observations
  • M3 is markedly better than all of the others (as
    expected)
  • The test case prioritization techniques offered
    appear to have some improvement, but more
    statistics needed to be done to confirm
  • Upon completion of these statistics, more results
    were revealed
  • Branch based coverage did as well or better than
    statement coverage
  • All except one indicates that total branch
    coverage did as well or better than additional
    branch coverage
  • All total statement coverage did as well or
    better than additional statement coverage
  • In 5 of 7 programs, even randomly prioritized
    test suites did better than untreated test suites

31
Example Groupings
32
  • Siemens programs with APFD measured relative to
    mutants Study Format
  • Same format as the first study, 9000 test suites
    used, 1000 for each prioritization technique
  • But rather than run those test cases on the small
    subset of known errors, they were applied to
    mutated programs that were created to form a
    larger bed of programs to test against
  • Results
  • Additional and Total FEP prioritization
    outperformed all others (except optimal)
  • Branch almost always outperformed statement
  • Total statement outperformed additional
  • But additional branch coverage outperformed total
    branch coverage
  • However, in this study random did not outperform
    the control

33
  • Space with APFD measured relative to Actual
    Faults
  • M2 M9 were applied to each of the 50 test
    suites, resulting in 400 test suites, plus the
    original 50 resulting in 450 total test suites
  • Additional FEP outperformed all others, but there
    was no significant difference among the rest
  • Also random is no better than the control

34
Study 3 Groupings
35
  • Space with APFD measured relative to mutants
  • Same technique as other space study, only using
    132,163 mutant version of the software
  • Additional FEP outperformed all others
  • Branch and statement are indistinguishable
  • But additional coverage always outperforms its
    total counterpart

36
Study 4 Groupings
37
  • Threats to Validity
  • Construct Validity You are measuring what you
    say you are measuring (and not something else)
  • Internal Validity Ability to say that the
    causal relationship is true
  • External Validity Ability to generalize results
    across the field

38
  • Construct Validity
  • APFD is highly accurate, but it is not the only
    method of measuring fault detection, could also
    measure percentage of test suite that must be run
    before all errors are found
  • No value to later tests that detect the same
    error
  • FEP based calculations Other estimates may more
    accurately capture the probability of a test case
    finding a fault
  • Effectiveness is measured without cost

39
  • Internal Validity
  • Instrumentation bias can bias results especially
    in APFD and prioritization measurement tools
  • Performed code revision
  • Also limit problems by running prioritization
    algorithm on each test suite and each subject
    program

40
  • External Validity
  • The Siemens programs are non-trivial but not
    representative of real world programs. The space
    program is, but is only one program
  • Faults in Siemens programs were seeded (not like
    those in the real world)
  • Faults in space were found during development,
    but these may differ from those found later in
    the development process. Plus they are only one
    set of faults found by one set of programmers
  • Single faults version programs are also not
    representative of the real world
  • The test suites were created with only a single
    method, other real world methods exist
  • These threats can only be answered by more
    studies with different test suites, programs, and
    errors

41
Additional Discussion And Practical Implications
42
  • Test case prioritization can substantially
    improve rate of fault detection of test suites.
  • Additional FEP prioritization techniques do not
    always justify the additional expenses incurred,
    as is gathered from cases where specific coverage
    based techniques outperformed them and also in
    cases where the total gain in APFD, when the
    additional FEP techniques did perform the best,
    was not large enough.
  • Branch-coverage-based techniques almost always
    performed as well if not better than
    statement-coverage-based techniques. Thus if the
    two techniques incur similar costs,
    branch-coverage-techniques are advocated.

43
  • Total statement and branch coverage techniques
    perform almost at par with the additional branch
    and statement coverage techniques, entitling its
    use due to its lower complexity.
  • However, this does not apply for space (Study 4)
    program where the additional branch and statement
    coverage techniques outperformed the total
    statement and branch coverage techniques by a
    huge margin.
  • Randomly prioritized test suites typically
    outperform untreated test suites.

44
Conclusion
45
  • Any one of the prioritization techniques offer
    some amount of improved fault detection
    capabilities.
  • These studies are of interest only to research
    groups, due to the high expense that they incur.
    However, code coverage based techniques have
    immediate practical implications.

46
Future Work
47
  • Additional studies to be performed using wider
    range of programs, faults and test suites.
  • The gap between optimal prioritization and FEP
    prioritization techniques is yet to be bridged.
  • Determining which prioritization technique is
    warranted by particular types of programs and
    test suites.
  • Other prioritization objectives have to be
    investigated.
  • Version specific techniques
  • Techniques may not only be applied to regression
    testing but also during the initial testing of
    the software.
Write a Comment
User Comments (0)
About PowerShow.com