Software Testing Part III: Test Assessment and Improvement

About This Presentation

Title:

Software Testing Part III: Test Assessment and Improvement

Description:

... test assessment. To learn the fundamental principle underlying test ... It is the measurement of the goodness of T which is known as test assessment. ... – PowerPoint PPT presentation

Number of Views:375

Avg rating:3.0/5.0

Slides: 107

Provided by: ValuedGate2215

Learn more at: https://www.cs.purdue.edu

Category:

more less

Transcript and Presenter's Notes

Title: Software Testing Part III: Test Assessment and Improvement

1
Software TestingPart III Test Assessment and
Improvement

Aditya P. Mathur
Purdue university
July 20-24, 1998
_at_ Raytheon technical services
Indianapolis.

Last update July 19, 1998
2
Course Organization

Part I Preliminaries
Part II Functional testing
Part III Test assessment and improvement
Part IV Special topics

3
Learning Objectives

To understand the relevance and importance of
test assessment.
To learn the fundamental principle underlying
test assessment.
To learn various methods and tools for test
assessment.

4
Learning objectives

To understand the relative strengths/weaknesses
of test assessment methods.
To learn how to improve tests based on a test
assessment procedure.

5
What is test assessment?

Once a test set T, a collection of test inputs,
has been developed, we ask
How good is T?
It is the measurement of the goodness of T which
is known as test assessment.
Test assessment is carried out based on one or
more criteria.

6
Test assessment-continued

These criteria are known as test adequacy
criteria.
Test assessment is also known as test adequacy
assessment.

7
Test assessment-continued

Test assessment provides the following
information
A metric, also known as the adequacy score or
coverage, usually between 0 and 1.
A list of all the weaknesses found in T, which
when removed, will raise the score to 1.
The weaknesses depend on the criteria used for
assessment.

8
Test assessment-continued

Once the coverage has been computed, and the
weaknesses identified, one can improve T.
Improvement of T is done by examining one or more
weaknesses and constructing new test requirements
designed to overcome the weakness(es).
The new test requirements lead to new test
specifications and to further testing of the
program.

9
Test assessment-continued

This is continued until all weaknesses are
overcome, i.e. the adequacy criterion is
satisfied (coverage1).
In some instances it may not be possible to
satisfy the adequacy criteria for one or more of
the following reasons
Lack of sufficient manpower
Weaknesses that cannot be removed because they
are infeasible.

10
Test assessment-continued

The cost of removing the weaknesses is not
justified.
While improving T by removing its weaknesses, one
usually tests the program more thoroughly than it
has been tested so far.
This additional testing is likely to result in
the discovery of remaining errors.

11
Test assessment-continued

Hence we say that test assessment and improvement
helps in the improvement of software reliability.
Test assessment and improvement is applicable
throughout the testing process and during all
stages of software development.

12
Test assessment-summary procedure
0
Develop T
Select an adequacy criterion C.
1
2
Measure adequacy of T w.r.t. C.
Yes
3
Is T adequate?
No
Yes
4
Improve T
More testing is warranted ?
5
No
6
Done
13
Principle underlying test assessment

There is a uniform principle that underlies test
assessment throughout the testing process.
This principle is known as the coverage
principle.
It has come about as a result of intensive
research at Purdue and other research groups in
software testing.

14
The coverage principle

To formulate and understand the coverage
principle, we need to understand
coverage domains
coverage elements
A coverage domain is a finite domain, related to
the program under test, that we want to cover.
Coverage elements are the individual elements of
this domain

15
The coverage principle-continued
Coverage Domains
Coverage Elements
Requirements Classes Functions Interface
mutations Exceptions
16
The coverage principle-continued

Measuring test adequacy and improving a test set
against a sequence of well defined, increasingly
strong, coverage domains leads to improved
confidence in the reliability of the system under
test.

17
The coverage principle-continued

Note the following properties of a coverage
domain
It is related to the program under test.
It is finite.
It may come from program requirements, related to
the inputs and outputs.

18
The coverage principle-continued

It may come from program code. Can you think of a
coverage domain that comes from the program code?
It aids in measuring test adequacy as well as the
progress made in testing. How?

19
The coverage principle-continued

Example
It is required to write a program that takes in
the name of a person as a string and searches for
the name in a file of names. The program must
output the record ID which matches the given
name. In case of no match a -1 is returned.
What coverage domains can be identified from this
requirement?

20
The coverage principle-continued

As we learned earlier, improving coverage
improves our confidence in the correct
functioning of the program under test.
Given a program P and a test T suppose that T is
adequate w.r.t. a coverage criterion C.
Does this mean that P is error free?
Obviously???

21
Test effort

There are several measures of test effort.
One measure is the size of T. By this measure a
test set with a larger number of test cases
corresponds to higher effort than one with a
lesser number of test cases.

22
Error detection effectiveness

Each coverage criterion has its error detection
ability. This is also known as the error
detection effectiveness or simply effectiveness
of the criterion.
One measure of the effectiveness of criterion C
is the fraction of faults guaranteed to be
revealed by a test T that satisfies C.

23
Effectiveness-continued

Another measure is the probability that at least
fraction f of the faults in P will be revealed by
test T that satisfies C.
Unfortunately there is no absolute measure of the
effectiveness of any given coverage criterion for
a general class of programs and for arbitrary
test sets.

24
Effectiveness-continued

One coverage criterion results in an exception to
this rule What is it?
Empirical studies conducted by researchers give
us an idea of the relative goodness of various
coverage criteria.
Thus, for a variety of criteria we can make a
statement like Criterion C1 is definitely better
than criterion C2.

25
Effectiveness-continued

In some cases we may be able to say Criterion C1
is probably better than criterion C2.
Such information allows us to construct a
hierarchy of coverage criteria.
This hierarchy is helpful in organizing and
managing testing. How?

26
Strength of a coverage criterion

The effectiveness of a coverage criterion is also
referred to as its strength.
Strength is a measure of the criterions ability
to reveal faults in a program.
Criterion C1 is considered stronger than
criterion C2 if C1 is is capable of revealing
more faults than C2.

27
The Saturation Effect

The rate at which new faults are discovered
reduces as test adequacy with respect to a finite
coverage domain increases it reduces to zero
when the coverage domain has been exhausted.

coverage
0
1
28
Saturation Effect Fault View
N
Remaining Faults
M
0
tfs
tfe
tds
tdfe
tme
Functional
Testing Effort
29
Saturation Effect Reliability View
Rm
Rd
Rdf
Rf
Reliability
Rm
Rdf
Mutation
Rd
Dataflow
Rf
Decision
Functional
tfs
tfe
tds
tde
tdfs
tdfe
tms
tfe
Testing Effort
FUNCTIONAL, DECISION, DATAFLOW AND MUTATION
COVERAGE PROVIDE VARIOUS TEST EVALUATION CRITERIA.
30
Coverage principle-discussion

Discuss
How will you use the knowledge of coverage
principle and the saturation effect in organizing
and managing testing?
Can you think of any other uses of the coverage
principle and the saturation effect?

31
Control flow graph

Control flow graph (CFG) of a program is a
representation of the flow of execution within
the program.
It is useful in program analysis such as that
required during test assessment and improvement.
More formally, a CFG G is

32
Control flow graph

G(N,A)
where N set of nodes and A set of arcs
There is a unique entry node en in N.
There is a unique exit node ex in N. A node
represents a single statement or a block.
A block is a single-entry-single-exit sequence of
instructions that are always executed in a
sequence without any diversion of path except at
the end of the block.

33
Control flow graph-continued

Every statement in a block, except possibly the
first one, has exactly one predecessor.
Similarly, every statement in the block, except
possibly the last one, has exactly one successor.
An arc a in A is a pair (n,m) of nodes from N
which represent transfer of control from node n
to node m.
A path of length k in G is an ordered sequence of
arcs, from A such that

34
Control flow graph-continued

The first node in is en
The last node in is ex
For any two adjacent arcs (n,m) and
(p,q), mp.
A path is considered executable or feasible if
there exists a test case which causes this path
to be traversed during program execution,
otherwise the path is unexecutable or infeasible.

35
Control flow graph-example

Class exercise
Draw a CFG for the following program

1. scanf (x,y) if (ylt0) 2. pow0-y 3. else
powy 4. z1.0 5. while (pow !0) 6. zzx
powpow-1 7. if (ylt0) 8. z1.0/z 9. printf(z)
What does the above program compute?
36
Control flow graph-example

Class exercise
For the CFG you have drawn, list all paths of
length at most 10.
Are there more paths than what you have listed?

What does the above program compute?
37
Control flow graph-second example

Class exercise
Examine the program on page 348 of Korel and
Laskis paper. Look at the control flow graph
given there and try understanding it.

38
Structure-based test adequacy

Based on the CFG of a program several test
adequacy criteria can be defined.
Some are
statement coverage criterion
branch coverage criterion
condition coverage criterion
path coverage criterion

39
Statement coverage

The coverage domain consists of all statements in
the program. Restated, in terms of the control
flow graph, it is the set of all nodes in G.
A test T satisfies the statement coverage
criterion if upon execution of P on each element
of T, each statement of P has been executed at
least once.

40
Statement coverage-continued

Restated in terms of G, T is adequate w.r.t. the
statement coverage criterion if each node in N is
on at least one of the paths traversed when P is
executed on each element of T.

41
Statement coverage-continued

Class exercise
For the program for which you have drawn the
control flow graph, develop a test set that
satisfies the statement coverage criterion.
Follow the procedure for test assessment and
improvement suggested earlier.

42
Statement coverage-weakness

Consider the following program
int abs (x)
int x
if (xgt0) x0-x
return x

43
Statement coverage-weakness

Suppose that T (x0).
Clearly, T satisfies the statement coverage
criterion.
But is the program correct and is the error
revealed by T which is adequate w.r.t. the
statement coverage criterion?
What do you suggest we do to improve T?

44
Branch (or edge) coverage

In G there may be nodes which correspond to
conditions in P. Such nodes, also called
condition nodes, contain branches in P.
Each such node is considered covered if during
some execution of P, the condition evaluates to
true and false these executions of P need not
be the same.

45
Branch coverage

The coverage domain consists of all branches in
G. Restated, in terms of the control flow graph,
it is the set of all arcs exiting the condition
nodes.
A test T satisfies the branch coverage criterion
if upon execution of P on each element of T, each
branch of P has been executed at least once.

46
Branch coverage

Class exercise
Identify all condition nodes in the flow graph
you have drawn earlier.
Does T (x0) satisfy the branch coverage
criterion?
If not, then improve it so that it does.

47
Branch coverage-weakness

Consider the following program which is suppose
to check that the input data item is in the range
0 to 100, inclusive
int check(x)
int x
if ((xgt0 ) (xlt200))
checktrue
else checkfalse

48
Branch coverage-weakness

Class exercise
Do you notice the error in this program?
Find a test set T which is adequate w.r.t.
statement coverage and does not reveal the error.
Improve T so that it is adequate w.r.t. branch
coverage and does not reveal the error.
What do you conclude about the weakness of the
branch coverage criterion?

49
Condition coverage

Condition nodes in G might have compound
conditions.
For example, in the check program the condition
node contains the condition
This is a compound condition which consists of
the elementary conditions xgt0 and xlt200.

((xgt0 ) (xlt200))
50
Condition coverage-continued

A compound condition is considered covered if all
of its constituent elementary conditions evaluate
to true and false, respectively, during some
execution of P.
A test set T is adequate w.r.t. condition
coverage if all conditions in P are covered when
P is executed on elements of T.

51
Condition coverage-continued

Class exercise
Improve T from the previous exercise so that it
is adequate w.r.t. the condition coverage
criterion for the check function and does not
reveal the error.
Do you find the above possible?

52
Branch coverage-weakness, continued

Consider the following program

0. int set_z(x,y) 1. int x,y 2. if
(x!0) 3. y5 4. else zz-x 5. if
(zgt1) 6. zz/x 7. else 8. zy
What might happen here?
53
Branch coverage-weakness

Class exercise
Construct T for set_z such that (a) T is adequate
w.r.t. the branch coverage criterion and (b) does
not reveal the error.
What do you conclude about the effectiveness of
the branch and condition coverage criteria?

54
Path coverage

As mentioned before, a path through a program is
a sequence of statements such that the entry node
of the program CFG is the first node on the path
and the exit node is the last one on the path.
Is this definition equivalent to the one given
earlier?

55
Path coverage-continued

A test set T is considered adequate w.r.t. the
path coverage criterion if all paths in P are
executed at least once upon execution on each
element of T.
Class exercise
Construct T for set_z such that T is adequate
w.r.t. the path coverage criterion and does not
reveal the error.
Is the above possible?

56
Path coverage-weakness

The number of paths in a program is usually very
large.
How many paths in set_z?
How many paths in check?
How many in the program that computes

57
Path coverage-weaknesses

It is the infinite or a prohibitively large
number of paths that prevent the use of this
criterion in practice.
Suppose that a test set T covers all paths. Will
it guarantee that all errors in P are revealed ?
Is obtaining 100 path coverage equivalent to
exhaustive testing?

58
Variants of path coverage

As path coverage is usually impossible to attain,
other heuristics have been proposed.
Loop coverage
Make sure that each loop is executed 0, 1, and 2
times.
Try several combinations of if and switch
statements. The combinations must come from
requirements.

59
Hierarchy in Control flow criteria
Path coverage
Condition coverage
Branch coverage
Statement coverage
60
Exercise

Develop a test set T that is adequate w.r.t. the
statement, condition, and the loop coverage
criteria for the exponentiation program.

61
Testing technique or strategy

One can develop a testing strategy based on any
of the criteria discussed.
Example
A testing strategy based on the statement
coverage criterion will begin by evaluating a
test set T against this criterion. Then new tests
will be added to T until all the statements are
covered, i.e. T satisfies the criterion.

62
Definitions

Error-sensitive path a path whose execution
might lead to eventual detection of an error.
Error revealing path a path whose execution will
always cause the program to fail and the error to
be detected.

63
Definitions

Reliable A testing technique is reliable for an
error if it guarantees that the error will always
be detected.
This implies that a reliable testing technique
must lead to the exercising of at least one
error-revealing path.

64
Definitions

Weakly reliable A testing technique is weakly
reliable if it forces the execution of at least
one error sensitive path.

65
Example error detection

Let us go over the example in Korel and Laskis
paper.
It is a sorting program which uses the bubble
sort algorithm.
It sorts an array a0N in descending order.
There are two, nested, loops in the program.
The inner loop from i6-i10 finds the largest
element of aR1N.

66
Example error detection

The largest element is saved in R0 and R3 points
to the location of R0 in a.
The outer loop swaps a(R1) with a(R3).
The completion of one iteration of the outer loop
ensures that the sub-array a0R1-1 has been
sorted and that aR1-1 is greater than or equal
to any element of aR1N.

67
Example error detection

There is a missing re-initialization of R3 to R1
at the beginning of the inner loop.
In some cases this will cause the program to
fail.
What are these cases?
We will get back to this error later!

68
Class exercise

Is the path testing strategy reliable for the
sort program and for the missing initialization
error in it ?
Is it viable ?
What about the branch testing strategy?
What about loop testing?

69
Data flow graph

It represents the flow of data in a program.
The graph is constructed from the control flow
graph (CFG) of the program.
A statement that occurs within a node of the CFG
might contain variables occurrences.
Each variable occurrence is classified as a def
or a use.

70
defs and uses

A def represents the definition of a variable.
Here are some sample defs of variable x
xyx
scanf(x,y)
int x
xi-1yx
A use represents the use of a variable in a
statement. Here a few examples of use of variable
x

All defs of x are italicized.
71
def-use-continued

xx1
printf (x is d, y is d, x,y)
cout ltlt x ltlt endl ltlt y
zxi1
if (xlty)
Uses of a variable in input and assignments are
classified as c-uses. Those in conditions are
classified as p-uses.

All uses of x are italicized.
72
def-use-continued

c-use stands for computational use and p-use for
predicate-use.
Both c- and p-uses affect the flow of control
p-uses directly as their values are used in
evaluating conditions and c-uses indirectly as
their values are used to compute other variables
which in turn affect the outcome of condition
evaluation.

73
def-use-continued

A path from node i to node j is said to be
def-clear w.r.t. a variable x if there is no def
of x in the nodes along the path from node i to
node j. Nodes i and j may have a def of x.
A def-clear path from node i to edge (j,k) is one
in which no node on the path has a def of x.

74
global-def

A def of a variable x is considered global to its
block if it is the last def of x within that
block.
A c-use of x in a block is considered global
c-use if there is no def of x preceding this
c-use within this block.

75
def-use graph definitions

def(i) set of all variables for which there is a
global def in node i.
c-use(i) set of all variables that have a
global c-use in node i.
p-use(i,j) set of all variables for which there
is a p-use for the edge (i,j).
dcu(x,i) set of all nodes such that each node
has x in its c-use and x is in def(i).

76
def-use graph definitions

dpu(x,i) set of all edges such that each edge
has x in its p-use , x is in def(i).
The def-use graph of program P is constructed by
associating defs, c-use, and p-use sets with
nodes of a flow graph.
The next example is from Jalotes text,
pp425-428.

77
def-use graph-continued
Sample program
1. scanf (x,y) if (ylt0) 2. pow0-y 3. else
powy 4. z1.0 5. while (pow !0) 6. zzx
powpow-1 7. if (ylt0) 8. z1.0/z 9. printf(z)
78
def-use graph-continued
Unlabeled edges imply empty p-use set.
defx,y c-use?
1
y
y
defpow c-usey
defpow c-usey
2
3
4
defz c-use?
def? c-use?
5
def? c-use?
pow
pow
defz,pow c-usez,x,pow
7
6
y
y
def? c-usez
defz c-usez
8
9
79
def-use graph-class exercise
Draw a def-use graph for the following program.
0. int set_z(x,y) 1. int x,y 2. if
(x!0) 3. y5 4. else zz-x 5. if
(zgt1) 6. zz/x 7. else 8. zy
80
def-use graph-continued

Traverse the graph to determine dcu and dpu sets.

81
Test generation

Class exercises
For the above graph generate a test set that
satisfies
the branch coverage criterion
the all-defs criterion - for definitions of all
variables at least one use (c- or p- use) must be
exercised.
the all-uses criterion- all p-uses and all c-uses
of all variable definitions be covered.
Develop the tests incrementally, i.e. by
modifying the previous test set!

82
Data flow testing tool

We will use ?SUDS, a data flow testing tool
developed at Bellcore and available commercially
from IBM.
The acronym ?SUDS stands for Software
Understanding and Debugging System.
?SUDS is a collection of tools of which ?ATAC is
the one that measures control flow and data flow
coverage.

83
?ATAC processing phase I
P, Program under test
Preprocess, compile and instrument
Test set
generate
input
generate
.atac files
Instrumented version of P (executable)
upon execution
upon execution
.trace file
Program output
84
?ATAC processing phase II
coverage analyzer
control flow and data flow coverage values
85
?ATAC demo

Open DOS window.
Go to /Program Files/bellcore/xSUDS/tutorial
Type
ataccl /Fedemo main.c wc.c
Type
xsuds .atac
You may now view program complexity statistics in
the ?suds window

86
?ATAC demo-continued

Go back to the DOS window and type
demo -c input1
Go to the xSUDS window and examine various
coverage values.
Go back to the DOS window and type
demo -c input2
Go to the xSUDS window and examine how various
coverage values have changed.

87
?ATAC demo-continued

Repeat the above steps of executing demo on
several test inputs. Analyze coverage values and
observe how they change with new test data.
Other tools in ?SUDS will be discussed in the
laboratory.

88
Mutation testing

What is mutation testing?
Mutation testing is a code-based test assessment
and improvement technique.
It relies on the competent programmer hypothesis
which is the following assumption
Given a specification a programmer develops a
program that is either correct or differs from
the correct program by a combination of simple
errors.

89
Mutation testing-continued

The process of program development is considered
as iterative whereby an initial version of the
program is refined by making simple, or a
combination of simple changes, towards the final
version.

90
Mutation testing-definitions

Given a program P, a mutant of P is obtained by
making a simple change in P.

Program
Mutant
1. int x,y 2. if (x!0) 3. y5 4. else
zz-x 5. if (zgt1) 6. zz/x
7. else 8. zy
1. int x,y 2. if (x!0) 3. y5 4. else
zz-x 5. if (zgt1) 6. zz/zpush(x)
7. else 8. zy
What is zpush?
91
Another mutant
Program
Mutant
1. int x,y 2. if (x!0) 3. y5 4. else
zz-x 5. if (zgt1) 6. zz/x
7. else 8. zy
1. int x,y 2. if (x!0) 3. y5 4. else
zz-x 5. if (zlt1) 6. zz/x
7. else 8. zy
92
Mutant

A mutant M is considered distinguished by a test
case t ?T iff
P(t)?M(t)
where P(t) and M(t) denote, respectively, the
observed behavior of P and M when executed on
test input t.
A mutant M is considered equivalent to P iff
P(t)?M(t) ?t ? T.

93
Mutation score

During testing a mutant is considered live if it
has not been distinguished or proven equivalent.
Suppose that a total of M mutants are generated
for program P.
The mutation score of a test set T, designed to
test P, is computed as
number of live mutants/(M-number of equivalent
mutants)

94
Test adequacy criterion

A test T is considered adequate w.r.t. the
mutation criterion if its mutation score is 1.
The number of mutants generated depends on P and
the mutant operators applied on P.
A mutant operator is a rule that when applied to
the program under test generates zero or more
mutants.

95
Mutant operators

Consider the following program
int abs (x)
int x
if (xgt0) x0-x
return x

96
Mutation operator

Consider the following rule
Replace each relational operator in P by all
possible relational operators excluding the one
that is being replaced.
Assuming the set of relational operators to be
lt, gt, lt, gt, , !, the above mutant
operator will generate a total of 5 mutants of P.

97
Mutation operators

Mutation operators are language dependent.
For Fortran a total of 22 operators were
proposed.
For C a total of 77 operators were proposed. None
have been proposed for C though most of the
operators for C are applicable to C programs.

98
Equivalent mutant

Consider the following program P
int x,y,z
scanf(x,y)
if (xgt0)
xx1 zx(y-1)
else
xx-1 zx(y-1)
Here z is considered the output of P.

99
Equivalent mutant-continued

Now suppose that a mutant of P is obtained by
changing xx1 to xabs(x)1.
This mutant is equivalent to P as no test case
can distinguish it from P.

100
Mutation testing procedure
Given P and a test set T
1. Generate mutants
2. Compile P and the mutants
3. Execute P and the mutants on each test case.
4. Determine equivalent mutants..
5. Determine mutation score.
6. If mutation score is not 1 then improve the
test set and repeat from step 3.
101
Mutation testing procedure

In practice the above procedure is implemented
incrementally.
One applies a few selected mutant operators to P
and computes the mutation score w.r.t. to the
mutants generated.
Once these mutants have been distinguished or
proven equivalent, another set of mutant
operators is applied.

102
Mutation testing procedure

This procedure is repeated until either all the
mutants have been exhausted or some external
condition forces testing to stop.
We will not discuss the details of practical
application of mutation testing.

103
Tools for mutation testing

Mothra for Fortran, developed at Purdue, 1990
Proteum for C, developed at the University of
Saõ Paulo at Saõ Carlos in Brazil.

104
Uses of Mutation testing

Mutation testing is useful during integration
testing to check for integration errors.
Only the variables that are in the interfaces of
the components being integrated are mutated. This
reduces the complexity of mutation testing.

105
Summary