Title: Advanced Topics Combining S' M' T
1AdvancedTopicsCombining S. M. T
2Mutation Testing Tutorial Answers
1. What is an equivalent mutant and why are
equivalent mutants a problem? 2. For the program
fragment x xy give five examples of mutants
which are equivalent and five which are not
equivalent 3. Give examples of test cases which
kill your five non-equivalent mutants 4. Now make
up some simple program fragments and try to think
of some more stubborn mutants of these fragments.
That is, mutants which are hard to kill but which
are not equivalent.
3Equivalent mutant
- An equivalent mutant is one which cannot be
killed by any test case because it gives
identical output to the original program for
every possible input. - That is, it is possible to create a syntactic
variant of a program (a mutant) which is
syntactically different to the original, but
which behaves identically to it. - This is a problem because one cannot tell whether
the failure of test data to kill a mutant stems
from inadequate test data or the fact that the
mutant is equivalent.
4Equivalent mutants
- x x y
- is equivalent to-
- x y x
- x -z x y z
- x x (11)?yz
- x 2x y x
- x x y 0
5Non Equivalent mutants
- x x y
- is not equivalent to-
- x y x
- x x z
- x x - y
- x x 1
- x 2 y
6Killing the Non Equivalent mutants
- x y x is killed by anything other than y2,
x2 - x x z is killed by any non-equal values for
y and z - x x - y is killed by any non zero value for y
- x x 1 is killed by any value of y other
than 1 - x 2 y is killed by any value of x other
than 2
7Stubborn mutant
- Consider
- if (ab) then z1 else z2
- Suppose we mutate this to
- if (ab1) then z1 else z2
- This is only killed by test cases where a and b
are equal or where b one more than a.
8an empirical study of predicate dependence
levels and trends
Icse 2003
- david binkley
- mark harman
Or How to use slicing to measure testability
9overview
- evolutionary testing
- variable dependence
- empirical study programs
- results
- implications
10DaimlerChrysler approach to test data generation
the starting point for the study was
evolutionary test data generation
11Target
Target
12Target
Target
Target
Target
13approximation level
Target
Target
Target
Target
if A B local distance A - B
- fitness approximation level local distance
14search space reduction
so c is killed
x, y, and z seem to be killed
search space reduction is exponential
which variables matter?
of the original 7 variables only 3 matter
15int g1,,gk
as k increases?
foo(int x1,,int xn)
as n increases?
int a1,,am
if p(a1,x1,a3,g4,x4) if q(a1,x5)
q(a1,x5)
16an empirical study
the question
- for a typical predicate
- how much of input space is relevant ?
why care?
how easy is it to understand the decision ?
how much do inputs affect ?
how cohesive ?
reduce search space implications for slicing
slice size for the predicate is closely related
17inputs
formal parameters globals in scope du-globals tran
sitively defined or used in the predicates
procedure
18the programs studied
19data collection
- modified HRB slicing algorithm
- implemented using CodeSurfer
- formals and du-globals become parameters
-
thanks to GrammaTech for CodeSurfer
20terminology
max parameters
the maximum number of parameters a predicate
could depend upon
parameters used
the number of parameters a predicate actually
depends upon
21result data
- three results to summarise in one data point
- max parameters
- parameters used
- number of predicates summarised
-
we adopted two diagramatic techniques
22dependence skyline diagram
notice the trend for dependent proportion to drop
replace
prepro
skylines give a feeling for size reduction
but precise reading is lost
23dependence bubble chart
formal parameters
prepro
replace
max parameters parameters used
x-axis max formals
y-axis formals used
bubble at (x,y) of size s
there are s predicates
which have x max formals available to them
these s predicates depend on average on y formals
trend line
24over all predicates in all programs
25good news
- as max formals increases
- the dependent proportion falls
- good news for evolutionary testing
- and also for all the other applications
well almost
perhaps it is not good for cohesion
26declining dependent proportion
all predicates where max formals lt 11
The number of predicates depended upon as a
function of max formals available is piecewise
linear
all predicates where max formals gt 10
27declining dependent proportion
max formals lt 11
max formals gt 10
28bad news
- no such correlation for globals
- globals could entail untestability using search
- also bad for other applications
- is this more evidence that globals are bad
29dependence trends
30implications
- formals
- as the problem gets worse
- the solution gets better
31implications
- globals
- large global variable lists
- present problems
hard to generate test data hard to
understand high levels of dependence
32implications
- cohesion
- functions with
- large numbers of formals
- may not be so cohesive
33algorithm performance
- the algorithm used has complexity O(PS)
- where P is the number of predicates and S is the
size of the procedure - PII 450 with 386Mb memory
- analysis time per procedure 0.017 P S ms
- average analysis time 7.6ms per predicate
34some possible interpretations
what do you read into these diagrams ?
35flex evolution of du globals
version 2.4.7
version 2.5.4
36profiles
sendmail du globals
findutils du globals
37conclusions
falling formal dependent proportion invariant
global dependent proportion analysis is worth
performing diagrams may be an aid to
understanding evaluation monitoring evolution
38Testability TransformationOvervieworHow to
use slicing and transformation to improve testing
- Test Data Generation
- Evolutionary Test Data Generation
- The Flag Problem
- Flag Removal Algorithm
- Initial Results
- Testability Transformation
- Other non meaning-preserving transformations
If time permits
39Automatic Test Generation
- We know that generating good quality test data is
hard - and knowing what good quality means is hard
- I do not propose to answer that question today
- Starting point structural test adequacy
criterion - Specifically that some branch is to be covered
- and that we are going to use evolutionary testing
40Distance Based Fitness
Relevant branching statements can lead to a miss
of the desired target
2. Local distance calculation in the branching
statements with undesired branching
Target
Target
Target
Target
Target
Target
- Fitness Approximation_Level Local_Distance
41The Flag Problem
flag A0 if (flag) ...
if (A0) ...
Suppose we want to make this true
Max - ( A-0 )
10 - ( A-0 )
42transform fitness function to transform landscape
Transform program to
Flag Landscape
Better
Large plateau of low fitness
43Informally
A transformation is a partial function on
programs
which preserves meaning
,of some kind or other
- We need to pair the program and test adequacy
criterion - call this the test pair
A testability transformation is a partial
function on test pairs such that...
44Testability Transformation
Test data which is adequate for the
transformed test pair is adequate for the
original test pair
45Testability Transformation Paradox
We are testing to cover structure but the
structure is the problem So we transform the
program and this alters the structure
So we need to be careful Are we still testing
according to the same criterion?
Our transformations will preserve coverage of
Statements
branches
MC/DC
Future work define a semantics to verify this
46This is not abstract interpretation
- To preserve branch coverage
-
if (e) skip else skip Cannot be transformed to
skip
But the program if (e) x1 else x2 Can be
transformed to if (e) skip else skip
47Flag TT
- Transform the program to remove flags
- Not always possible
- but worth doing where possible
- Our Approach uses
- Simple transformations
- Simple amorphous slicing
- Substitute flag use with definition
Brief overview of amorphous slicing...
48Flag Removal Transformation
Suppose n is an unsigned integer
What initial values of n will achieve this?
flag n lt 4if (n20) flag 0 if (ai
! 0 flag) ...
flag (n20)?0(nlt4)
n? nflag (n? 20)?0(n? lt4)
We can keep the original flag assignment code
(n? 20)?0(n? lt4))
Claim
Adding the new flag assignment leaves adequacy
criteria invariant
49Simple Flag Removal Algorithm
- For loop free flag definition code
- Bush
- Blossom
- Slice leaf sequences
- Convert to conditional assignment
- Add temporary variables
- Substitute definition for use
50Bushing
Produces a binary tree
51Blossoming
Moves all actions to the leaves
All internal nodes will be predicates
Original predicates may be altered
'
52Blossoming is repeated for all internal action
nodes
Some leaf assignments are not to the flag variable
Now all internal nodes are predicates
these can be removed using slicing
'
'
'
'''
53Amorphous slicing gives single assignment at
leaves
Sometimes syntax-preserving slicing does this too
But sometimes syntax-preserving slicing leaves
several assignments
Some predicates change several times during
blossoming
We assumed freedom from side effects
Fortunately we have a side effect removal
transformation
'
'
'
'''
ac
ad
ac
ad
Now all leaves are single assignments
and they all assign to the flag variable
54Initial Empirical Analysis
- Daimler ran their Evolutionary Test Data
Generator on both versions - Collected coverage information for 6 runs
55Special Values
With flags we never even got up here
/ date correction for september 1752 /
if(special_days) result "Day did not
exist." else if (leapflag is_september
daygt13) result dayName((addMonths(month,
year) (--day)firstJanuary(year)1
0)7) else ...
Remove flags
With flags it took longer to get anywhere
/ date correction for september 1752 /
if(year1752 month9 daygt3 daylt 13)
result "Day did not exist." else
if (year1752 month9 daygt13)
result dayName((addMonths(month,year)
(--dayfirstJanuary(year)10)7) else
...
56Nothing special flag
returnflag (a0 b0 c0)
agt10000 bgt10000 cgt10000
(cgtab) (agtbc) (bgtac) ... if
(returnflag) return
Is it hard to find values which make this flag
true?
Is it hard to find values which make this flag
false?
Remove flags
To have a problem we need flag for which few
inputs make it true/false
There is very little difference
Lets look at a bad flag problem ...
... if (a0 b0 c0) agt10000
bgt10000 cgt10000 (cgtab) (agtbc)
(bgtac) return
57Special Value Flag
returnflag (a99999 b99999
c99999) ... if (returnflag) return
at less cost
There are relatively few values which make the
flag true
... if (a99999 b99999 c99999) return
So we get better coverage
58Disposable Transformations
- We generate test data using the transformed
program - because it is easier
- then throw away the transformed program
-
- Transformation as a means to an end not an end in
itself - Do the transformations even need to preserve
meaning?
59Conclusion
- Test data generation is hard
- anything which helps is good
- Test data generation can be impeded by structure
- so transform the structure
- We have to be sure to preserve branch coverage
- but not traditional meaning
- This suggests a new kind of transformation
- Testability Transformation
60References
- David Binkley and Mark Harman Analysis and
Visualization of Predicate Dependence on Formal
Parameters and Global Variables. IEEE
Transactions on Software Engineering. (journal
version of ICSE 2003 paper, to appear) - Mark Harman, Lin Hu, Rob Hierons, Joachim
Wegener, Harmen Sthamer, Andre Baresel and Marc
Roper. Testability Transformation. IEEE
Transactions on Software Engineering. 30(1)
3-16, 2004. - David Binkley and Mark Harman. An Empirical
Study of Predicate Dependence Levels and Trends
25th IEEE/ACM International Conference on
Software Engineering (ICSE 2003). 3-10 May,
2003. Portland, Oregon, USA, Pages 330-339. - Electronic copies of these papers are available
on my website.