Title: An Experimental Evaluation of Data Flow and Mutation Testing
1An Experimental Evaluation of Data Flow and
Mutation Testing
2Introduction
- Both are Unit Testing.
- Use automation testing technique (Mothra for
Mutation and ATAC for Data Flow ). - White box in nature-requires large amount of
computational and human resources.
3Introduction-cont Basic terms
- Adequacy Criteria
- Data Flow Testing
- Mutation Testing
4Adequacy Criteria
- Stopping rule.
- For every fault in the program being tested
,there is a test case in the test set that
detects that fault.
5Data flow Testing
- Program unit P is considered to be an individual
program. - A subprogram is decomposed into a set of basic
blocks. - A subprogram is represented by CFG-
- nodes- basic blocks.
- edges-possible flow of control between basic
blocks.
6Data flow Testing-cont
- Data Definition-location where a value is stored
into memory (assignment ,input,etc) - Data Use location where the value of the
variable is accessed. - C-use (nodes in CFG).
- P-use (edges in CFG).
7Data flow Testing-cont
- Definition clear sub path-sequence of nodes that
do not contain a definition of a variable. - Unexecutable sub path- difficulty to execute all
of the sub paths (static representation of the
program by the CFG).
8Data flow Testing-contAll uses criterion
- Uses in this experiment.
- Effective and low in cost.
- All uses-for each definition of a variable X in P
,the set of paths executed by the test set T
contains definition clear sub path from the
definition to all reachable c-uses and p-uses of
X.
9Data flow Testing-contAll uses criterion
- DU pair-the use is reachable from the definition.
- Goal of all uses data flow
- We satisfy all-uses by covering all DU-pairs
10Mutation Testing
- Fault based testing technique.
- Simple faults are introduced by mutation
operators. - Each change by mutation operator is encoded in a
mutant program.
11Mutation Testing-cont
- A mutant is killed by a test case that causes it
to produce incorrect output. - Equivalent mutants like unexecutable path in
all uses data flow. - Goal of Mutation
Find test cases that kill all non-equivalent
mutants
12The Experiment Goals
- Comparing between the two testing -which one is
better? - Find a way to test software that provides the
advantages of both techniques.
13Comparison of testing criteria
- empirical comparison.
- ProbBetter-a testing criterion c1 is ProbBetter
than c2 for a program P if a randomly selected
test set T that satisfies c1 is more likely to
detect a failure than a randomly selected test
set that satisfies c2. - ProbBetter is defined with respect to the fault
detection capability of test sets.
14Comparison of testing criteria-cont
- ProbSubsumes -a testing criterion c1 ProbSubsumes
c2 for a program P if a test set T that is
adequate with respect to c1 is likely to be
adequate with respect to c2. if c1 ProbSubsumes
c2,c1 is said to be more difficult to satisfy
then c2. - ProbSubsumes is defined with respect to the
difficulty of satisfying one criterion in terms
of another.
15Experimental hypotheses and conduct
- Comparing the two systems in three different
ways - Test sets for one criterion also satisfy another
(ProbSubsumes). - Test sets created for testing technique will
actually find faults in programs. - Cost-test case size.
-
16Experimental hypotheses and conduct-cont
- For the comparison ,there were formulated 4
hypotheses - Mutation testing ProbSubsumes all-uses data
flow. - All-uses data flow testing ProbSubsumes mutation.
- All-uses data flow is testing ProbBetter than
mutation. - Mutation testing is testing ProbBetter than
all-uses data flow.
17Experimental hypotheses and conduct-experimental
programs
18Experimental hypotheses and conduct-cont
- 2 tools were used for the experiment
- Mothara and Godzilla (data test generator) for
mutation testing. - ATAC for data flow testing.
- Since mothra tests Fortran-77 programs and ATAC
tests C programs ,the programs have been
translated into both language ,taking care not
harming the CFG,DU-pairs etc.
19Experimental hypotheses and conduct-cont
- Test requirements-killing mutants for mutation,
executing DU-pairs for data flow testing. - 10 test sets5 mutation-adequate,5 data
flow-adequate test sets. - Minimum test case- smallest number of cases to
satisfy the criterion. - Minimal test case- if any test case was removed
,the set would no longer satisfy the criterion.
20Experiments and analysis-Coverage Measurement
Experiment
- Coverage (formally definition)-
- P-program.
- T- test set .
- A , B - two adequacy criteria .
- FA(T,P), FB(T,P) be the functions that measure
whether a test set T for a program P is adequate
for the criteria.
21Experiments and analysis-Coverage Measurement
Experiment-cont
- Coverage-
- TA-set of test data that is adequate with respect
to criterion A. - TB-set of test data that is adequate with respect
to criterion B. - FA(TB,P) - coverage of criterion A by criterion
B. - FB(TA,P) - coverage of criterion B by criterion
A.
22Experiments and analysis-Coverage Measurement
Experiment-cont
- Mutation Score-coverage measure for mutation.
Number of mutants killed by a set of Test cases
T.
Total number of mutants Generated for a
program.
Number of equivalent Mutants.
23Experiments and analysis-Coverage Measurement
Experiment-cont
- Data flow Score-coverage measure for Data flow.
Number of DU-pairs that have been satisfied by
the program.
Total number of DU- Pairs in the program.
Number of DU-pairs that never been satisfied.
24Experiments and analysis-Coverage Measurement
Experiment-cont
- FM(TD,P) -function that computes the mutation
score for a Data flow-adequate test sets
(coverage of mutation by Data flow). - FD(TM,P) -function that computes the DFS for a
Mutation-adequate test sets (coverage of Data
flow by mutation).
25Experiments and analysis-FM(TD)Mutation scores
of Data flow-adequate test sets
Average 88.86
26Experiments and analysis- FD(TM)Data flow
scores for Mutation-adequate test sets
Average 98.99
27Experiments and analysis-Coverage Measurement
Experiment-cont
- It does appear that by satisfying mutation,we
have in some sense come close to satisfying data
flow . - A pattern wasnt found among the mutants not
killed by the data flow- adequate test sets. - The coverage experiment support hypothesis 1 than
hypothesis 2.
28Experiments and analysis-fault detection
experimentation
- It have been inserted several faults into each of
the programs under the following consideration - Faults must not be equivalent to mutants.
- Faults should not be N-order Mutants.
- Faults should not have a high failure rate.
29Experiments and analysis-fault detection
experimentation
- Samples of faults that inserted
- Create multiple related transposition of
variables (exchanging the use of two variables). - Modify arithmetic or relational operators.
- Change the precedence of operation.
- Change conditional expressions by adding extra
conditions. - To gather the results each fault have been
inserted separately so there would be N incorrect
versions of each program.
30Experiments and analysis-fault detection
experimentation
31Experiments and analysis-fault detection
experimentation
- Our data support hypothesis 4 and not hypothesis
3
Mutation Testing is ProbBetter Than all-uses
Data flow.
32Experiments and analysis-Test set size
Mean number of Test cases per set.
33Experiments and analysis-Test set size
- In most cases mutation requires many more test
cases than data flow does. - With the ability to automatically generate test
data,this cost is somewhat less important during
initial testing. - Number of test cases is still important during
regression testing.
34Conclusion and summary
- In this presentation I show a comparison
between Data flow and mutation testing - Comparing on the basis of cross scoring.
- Measured the fault detection of test data
generated for each criterion. - Compared the two techniques on the basis of the
number of test cases generated to satisfy them.
35Conclusion and summary-cont
- The mutation scores for the data flow-adequate
test sets are reasonably high-average coverage of
88.66. - The mutation-adequate test sets come very close
to covering the data flow criterion- average
coverage of 98.99.
Mutation ProbSubsumes all-uses Data flow
36Conclusion and summary-cont
- These conclusions are supported by the faults
that the test sets detected. - The mutation adequate test sets detected an
average of 16 more faults than the data
flow-adequate test sets. - The difference was as high as 60 for one
program(Insert).
Mutation is ProbBetter than all-uses Data flow
37Conclusion and summary-cont
- Mutation required more test cases in almost every
case than data flow testing did.
Mutation offers more coverage ,but at a higher
cost , a tradeoff that practical testers must
consider when choosing a test
methodology.
38THE END