Title: Dependable Software Systems
1Dependable Software Systems
Topics in Data-Flow Testing
Material drawn from Beizer, Mancoridis, Vokolos
2Data-Flow Testing
- Data-flow testing uses the control flowgraph to
explore the unreasonable things that can happen
to data (i.e., anomalies). - Consideration of data-flow anomalies leads to
test path selection strategies that fill the gaps
between complete path testing and branch or
statement testing.
3Data-Flow Testing (Contd)
- Data-flow testing is the name given to a family
of test strategies based on selecting paths
through the programs control flow in order to
explore sequences of events related to the status
of data objects. - E.g., Pick enough paths to assure that
- Every data object has been initialized prior to
its use. - All defined objects have been used at least once.
4Data Object Categories
- (d) Defined, Created, Initialized
- (k) Killed, Undefined, Released
- (u) Used
- (c) Used in a calculation
- (p) Used in a predicate
5(d) Defined Objects
- An object (e.g., variable) is defined when it
- appears in a data declaration
- is assigned a new value
- is a file that has been opened
- is dynamically allocated
- ...
6(k) Killed Objects
- An object is killed when it is
- released (e.g., free) or otherwise made
unavailable (e.g., out of scope) - a loop control variable when the loop exits
- a file that has been closed
- ...
7(u) Used Objects
- An object is used when it is part of a
computation or a predicate. - A variable is used for a computation (c) when it
appears on the RHS (sometimes even the LHS in
case of array indices) of an assignment
statement. - A variable is used in a predicate (p) when it
appears directly in that predicate.
8Example Definition and Uses
What are the definitions and uses for the program
below?
- 1. read (x, y)
- 2. z x 2
- 3. if (z lt y)
- 4 w x 1
- else
- 5. y y 1
- 6. print (x, y, w, z)
9Example Definition and Uses
- 1. read (x, y)
- 2. z x 2
- 3. if (z lt y)
- 4 w x 1
- else
- 5. y y 1
- 6. print (x, y, w, z)
10Data-Flow Anomalies
- A data-flow anomaly is denoted by a two character
sequence of actions. E.g., - ku Means that an object is killed and then used.
- dd Means that an object is defined twice without
an intervening usage.
11Example
- E.g., of a valid (not anomalous) scenario where
variable A is a dpd
A C D if(A gt 0) X 1 else X
-1 A B C
12Two Letter Combinations for d k u
- dd Probably harmless, but suspicious.
- dk Probably a bug.
- du Normal situation.
- kd Normal situation.
- kk Harmless, but probably a bug.
- ku Definitely a bug.
- ud Normal situation (reassignment).
- uk Normal situation.
- uu Normal situation.
13Single Letter Situations
- A leading dash means that nothing of interest (d,
k, u) occurs prior to the action noted along the
entry-exit path of interest. - A trailing dash means that nothing of interest
happens after the point of action until the exit.
14Single Letter Situations
- -k Possibly anomalous
- Killing a variable that does not exist.
- Killing a variable that is global.
- -d Normal situation.
- -u Possibly anomalous, unless variable is
global. - k- Normal situation.
- d- Possibly anomalous, unless variable is
global. - u- Normal situation.
15Data-Flow Anomaly State Graph
state of variable
action
K
k
u,k
anomalous state
d
d
d,k
U
A
D
u
u
k,u,d
16Data-Flow Anomaly State Graph with Variable
Redemption
k
k
u
K
KU
u
k
d
d
k
u
u
d
D
U
DK
d
k
u
d
u
k
DD
d
17Static vs DynamicAnomaly Detection
- Static Analysis is analysis done on source code
without actually executing it. - E.g., Syntax errors are caught by static analysis.
18Static vs DynamicAnomaly Detection (Contd)
- Dynamic Analysis is analysis done as a program is
executing and is based on intermediate values
that result from the programs execution. - E.g., A division by 0 error is caught by dynamic
analysis. - If a data-flow anomaly can be detected by static
analysis then the anomaly does not concern
testing. (Should be handled by the compiler.)
19Anomaly Detection Using Compilers
- Compilers are able to detect several data-flow
anomalies using static analysis. - E.g., By forcing declaration before use, a
compiler can detect anomalies such as - -u
- ku
- Optimizing compilers are able to detect some dead
variables.
20Is Static Analysis Sufficient?
- Questions
- Why isnt static analysis enough?
- Why is testing required?
- Could a good compiler detect all data-flow
anomalies? - Answer No. Detecting all data-flow anomalies
is provably unsolvable.
21Static Analysis Deficiencies
- Current static analysis methods are inadequate
for - Dead Variables Detecting unreachable variables
is unsolvable in the general case. - Arrays Dynamically allocated arrays contain
garbage unless they are initialized explicitly.
(-u anomalies are possible)
22Static Analysis Deficiencies (Contd)
- Pointers Impossible to verify pointer values at
compile time. - False Anomalies Even an obvious bug (e.g., ku)
may not be a bug if the path along which the
anomaly exists is unachievable. (Determining
whether a path is or is not achievable is
unsolvable.)
23Data-Flow Modeling
- Data-flow modeling is based on the control
flowgraph. - Each link is annotated with
- symbols (e.g., d, k, u, c, p)
- sequences of symbols (e.g., dd, du, ddd)
- that denote the sequence of data operations on
that link with respect to the variable of
interest.
24Control Flowgraph Annotated for X and Y Data Flows
1 INPUT X,Y Z XY Y
X-Y 3 IF Zgt0 GOTO SAM 4 JOE ZZ-1 5 SAM
ZZV U0 6 LOOP B(U),Q(V)(ZV)U 7 IF
B(U)0 GOTO JOE ZZ-1 8 IF Z0 GOTO
ELL UU1 9 UNTIL Uz B(U-1)B(U1)Q(V-1) 10
ELL B(UQ(V))UV 11 IF UV GOTO JOE 12 IF
UgtV THEN UZ 13 YYZU 2 END
LOOP
B(U)?
JOE
dcc
1
4
6
3
5
7
SAM
Z?
ELL
U,Z?
Z?
2
8
9
10
11
12
13
END
YY
U,V?
U,V?
25Control Flowgraph Annotated for Z Data Flows
p
1 INPUT X,Y Z XY Y
X-Y 3 IF Zgt0 GOTO SAM 4 JOE ZZ-1 5
SAM ZZV U0 6 LOOP B(U),Q(V)(ZV)U 7 IF
B(U)0 GOTO JOE ZZ-1 8 IF Z0 GOTO
ELL UU1 9 UNTIL Uz B(U-1)B(U1)Q(V-1) 10
ELL B(UQ(V))UV 11 IF UV GOTO JOE 12 IF
UgtV THEN UZ 13 YYZU 2 END
LOOP
B(U)?
JOE
d
1
3
4
5
6
7
p
cd
cd
c
SAM
Z?
cd
p
ELL
d
p
p
c
2
8
9
10
11
12
13
Z?
U,Z?
END
YY
U,V?
U,V?
p
26Definition-Clear Path Segments
- A Definition-clear Path Segment (w.r.t. variable
X) is a connected sequence of links such that X
is defined on the first link and not redefined or
killed on any subsequent link of that path
segment.
27Definition-Clear Path Segments for Variable Z
(Contd)
p
LOOP
B(), U?
JOE
d
1
3
4
5
6
7
p
cd
cd
c
SAM
Z?
cd
p
ELL
d
p
p
c
2
8
9
10
11
12
13
Z?
END
YY
U,V?
U,V?
U,Z?
p
28Non Definition-Clear Path Segments for Variable Z
(Contd)
p
LOOP
B(), U?
JOE
d
1
3
4
5
6
7
p
cd
cd
c
SAM
Z?
cd
p
ELL
d
p
p
c
2
8
9
10
11
12
13
Z?
END
YY
U,V?
U,V?
U,Z?
p
29Simple Path Segments
- A Simple Path Segment is a path segment in which
at most one node is visited twice. - E.g., (7,4,5,6,7) is simple.
- Therefore, a simple path may or may not be
loop-free.
30Loop-free Path Segments
- A Loop-free Path Segment is a path segment for
which every node is visited at most once. - E.g., (4,5,6,7,8,10) is loop-free.
- path (10,11,4,5,6,7,8,10,11,12) is not loop-free
because nodes 10 and 11 are visited twice.
31du Path Segments
- A du Path is a path segment such that if the last
link has a use of X, then the path is simple and
definition clear.
32def-use Associations
- A def-use association is a triple (x, d, u,),
where x is a variable, d is a node
containing a definition of x, u is either a
statement or predicate node containing a use
of x, and there is a sub-path in the flow graph
from d to u with no other definition of x
between d and u.
33Example Def-Use Associations
1
read (x, y)
Some Def-Use Associations (x, 1, 2), (x, 1, 4),
(y, 1, (3,t)), (y, 1, (3,f)), (y, 1, 5), (z,
2, (3,t)),...
2
z x 2
3
z lt y
F
T
5
4
y y 1
w x 1
6
print (x,y,w,z)
34Example Def-Use Associations
- What are all the def-use associations for the
program below? -
- read (z)x 0 y 0if (z ? 0)
- x sqrt (z) if (0 ? x x ? 5) y f (x)
else y h (z) -
- y g (x, y)
- print (y)
35Example Def-Use Associations
- read (z)x 0 y 0if (z ? 0)
- x sqrt (z) if (0 ? x x ? 5) y f (x)
else y h (z) -
- y g (x, y)
- print (y)
def-use associations for variable z.
36Example Def-Use Associations
- read (z)x 0 y 0if (z ? 0)
- x sqrt (z) if (0 ? x x ? 5)
- y f (x) else y h (z)
-
- y g (x, y)
- print (y)
def-use associations for variable x.
37Example Def-Use Associations
- read (z)x 0 y 0if (z ? 0)
- x sqrt (z) if (0 ? x x ? 5) y f (x)
else y h (z) -
- yg (x, y)
- print (y)
def-use associations for variable y.
38Definition-Clear Paths
- A path (i, n1, ..., nm, j) is called a
definition-clear path with respect to x from node
i to node j if it contains no definitions of
variable x in nodes (n1, ..., nm , j) . - The family of data flow criteria requires that
the test data execute definition-clear paths from
each node containing a definition of a variable
to specified nodes containing c-use and edges
containing p-use of that variable.
39Data-Flow Testing Strategies
- All du Paths (ADUP)
- All Uses (AU)
- All p-uses/some c-uses (APUC)
- All c-uses/some p-uses (ACUP)
- All Definitions (AD)
- All p-uses (APU)
- All c-uses (ACU)
40All du Paths Strategy (ADUP)
- ADUP is one of the strongest data-flow testing
strategies. - ADUP requires that every du path from every
definition of every variable to every use of that
definition be exercised under some test All du
Paths Strategy (ADUP).
41An example All-du-paths
- What are all the du-paths in the following
program ? - read (x,y)
- for (i 1 i lt 2 i)
- print (hello)
- Sa
- if (y lt 0)
- Sb
- else
- print (x)
42An example All-du-paths
1
y lt o
6
F
T
read (x, y)i 1
2
print x
7
8
Sb
3
i lt 2
F
T
5
4
Sa
9
print(hello) i i 1
6
y lt o
43Example pow(x,y)
b
g
a
f
i
d
1
5
8
9
16
14
17
c
h
e
44Example pow(x,y)du-Path for Variable x
/ pow(x,y)
This program computes x to the power of y,
where x and y are integers.
INPUT The x and y values.
OUTPUT x raised to the power of y is printed
to stdout.
/
1
void pow (int x, y)
2
3
float z
4
int p
b
g
5
if (y lt 0)
a
f
i
6
p 0 y
d
7
else p y
1
5
8
9
16
14
17
8
z 1.0
9
while (p ! 0)
c
h
e
10
11
z z x
12
p p 1
13
14
if (y lt 0)
15
z 1.0 / z
16
printf(z)
17
45Example pow(x,y)du-Path for Variable x
/ pow(x,y)
This program computes x to the power of y,
where x and y are integers.
INPUT The x and y values.
OUTPUT x raised to the power of y is printed
to stdout.
/
1
void pow (int x, y)
2
3
float z
4
int p
b
g
5
if (y lt 0)
a
f
i
6
p 0 y
d
7
else p y
1
5
8
9
16
14
17
8
z 1.0
9
while (p ! 0)
c
h
e
10
11
z z x
12
p p 1
13
14
if (y lt 0)
15
z 1.0 / z
16
printf(z)
17
46Example pow(x,y)du-Path for Variable y
/ pow(x,y)
This program computes x to the power of y,
where x and y are integers.
INPUT The x and y values.
OUTPUT x raised to the power of y is printed
to stdout.
/
1
void pow (int x, y)
2
3
float z
4
int p
b
g
5
if (y lt 0)
a
f
i
6
p 0 y
d
7
else p y
1
5
8
9
16
14
17
8
z 1.0
9
while (p ! 0)
c
h
e
10
11
z z x
12
p p 1
13
14
if (y lt 0)
15
z 1.0 / z
16
printf(z)
17
47Example pow(x,y)du-Path for Variable y
/ pow(x,y)
This program computes x to the power of y,
where x and y are integers.
INPUT The x and y values.
OUTPUT x raised to the power of y is printed
to stdout.
/
1
void pow (int x, y)
2
3
float z
4
int p
b
g
5
if (y lt 0)
a
f
i
6
p 0 y
d
7
else p y
1
5
8
9
16
14
17
8
z 1.0
9
while (p ! 0)
c
h
e
10
11
z z x
12
p p 1
13
14
if (y lt 0)
15
z 1.0 / z
16
printf(z)
17
48Example pow(x,y)du-Path for Variable y
/ pow(x,y)
This program computes x to the power of y,
where x and y are integers.
INPUT The x and y values.
OUTPUT x raised to the power of y is printed
to stdout.
/
1
void pow (int x, y)
2
3
float z
4
int p
b
g
5
if (y lt 0)
a
f
i
6
p 0 y
d
7
else p y
1
5
8
9
16
14
17
8
z 1.0
9
while (p ! 0)
c
h
e
10
11
z z x
12
p p 1
13
14
if (y lt 0)
15
z 1.0 / z
16
printf(z)
17
49Example Using du-Path Testing to Test Program
COUNT
- Consider the following program
/ COUNT This program counts the number of
characters and lines in a text file. INPUT
Text File OUTPUT Number of characters and
number of lines. / 1 main(int
argc, char argv) 2
3 int numChars
0 4 int numLines
0 5 char chr 6
FILE fp NULL 7
50Program COUNT (Contd)
8 if (argc lt 2) 9
10 printf(\nUsage s
ltfilenamegt, argv0) 11 return
(-1) 12 13 fp
fopen(argv1, r) 14 if (fp
NULL) 15 16
perror(argv1) / display error message
/ 17 return (-2) 18
51Program COUNT (Contd)
19 while (!feof(fp)) 20
21 chr getc(fp)
/ read character / 22 if
(chr \n) / if carriage return
/ 23 numLines 24
else 25
numChars 26 27
printf(\nNumber of characters d,
numChars) 28 printf(\nNumber of
lines d, numLines) 29
52The Flowgraph for COUNT
- The junction at line 12 and line 18 are not
needed because if you are at these lines then you
must also be at line 14 and 19 respectively.
53du-Path for argc
54du-Path for argc
55du-Path for argv
56du-Path for argv
57du-Path for numChars
58du-Path for numChars
59du-Path for numChars
60du-Path for numLines
61du-Path for numLines
62du-Path for numLines
63du-Path for chr
64du-Path for chr
65du-Path for fp
66du-Path for fp
67du-Path for fp
68All Uses Strategy (AU)
- AU requires that at least one path from every
definition of every variable to every use of that
definition be exercised under some test. - Hence, at least one definition-clear path from
every definition of every variable to every use
of that definition be exercised under some test. - Clearly, AU lt ADUP.
69All p-uses/Some c-uses Strategy (APUC)
- APUC requires that for every variable and every
definition of that variable include at least one
definition-free path from the definition to every
predicate use. - If there are definitions of the variable that are
not covered by the above prescription, then add
computational-use test cases to cover every
definition.
70All c-uses/Some p-uses Strategy (ACUP)
- ACUP requires that for every variable and every
definition of that variable include at least one
definition-free path from the definition to every
computational use. - If there are definitions of the variable that are
not covered by the above prescription, then add
predicate-use test cases to cover every
definition.
71All Definitions Strategy (AD)
- AD requires that for every variable and every
definition of that variable include at least one
definition-free path from the definition to a
computational or predicate use. - AD lt ACUP and AD lt APUC.
72All p-uses (APU) All c-uses (ACU)
- APU is the same as APUC without the C
requirement. - APU lt APUC.
- ACU is the same as ACUP without the P
requirement. - ACU lt ACUP.
73Relationship among DF criteria
ALL-PATHS
ALL-DU-PATHS
ALL-USES
ALL-P-USES/SOME-C-USES
ALL-C-USES/SOME-P-USES
ALL-P-USES
ALL-C-USES
ALL-DEFS
ALL-EDGES
ALL-NODES
74Feasible Data Flow Criteria
- What happens if we eliminate all un-executable
associations from consideration? - If we eliminate all un-executable associations
from consideration, then there are significant
differences between the Data Flow criteria and
the Feasible Data Flow criteria. - For a large class of well behaved programs,
the Feasible DF criteria All-p-uses,
All-p-uses/some-c-uses, and All-uses bridge the
gap between All-edges and All-paths. - However, for certain programs with anomalies
there are tests which satisfy All-p-uses without
satisfying All-edges.
75An Example
1
read (x)
6
F
T
2
x sqrt(x)
7
8
3
x lt 0
9
F
T
5
4
- Node 4 is un-executable y is un-initialized.
- Essentially, only def-use of concern is (x, 2,
3) - Outcome of node 6 can be either T or F.
- All-p-uses satisfied, but not all-edges.
read (y)
6
y ? 0
F
T
76Relationship among Feasible DF criteria
ALL-PATHS
ALL-DU-PATHS
ALL-EDGES
ALL-USES
ALL-NODES
ALL-C-USES/SOME-P-USES
ALL-P-USES/SOME-C-USES
ALL-DEFS
ALL-C-USES
ALL-P-USES
77Effectiveness of Strategies
- Ntafos compared Random, Branch, and All uses
testing strategies on 14 Kernighan and Plauger
programs. - Kernighan and Plauger programs are a set of
mathematical programs with known bugs that are
often used to evaluate test strategies. - Ntafos conducted two experiments
78Results of 2 of the 14 Ntafos Experiments
Mean Number of Test Cases
Percentage of Bugs Found
Strategy
Random Branch All Uses
35 3.8 11.3
93.7 91.6 96.3
Mean Number of Test Cases
Percentage of Bugs Found
Strategy
Random Branch All Uses
100 34 84
79.5 85.5 90.0
79Data-Flow Testing Tips
- Resolve all data-flow anomalies.
- Try to do all data-flow operations on a variable
within the same routine (i.e., avoid integration
problems). - Use strong typing and user defined types when
possible.
80Data-Flow Testing Tips (Contd)
- Use explicit (rather than implicit) declarations
of data when possible. - Put data declarations at the top of the routine
and return data objects at the bottom of the
routine.
81Summary
- Data are as important as code.
- Define what you consider to be a data-flow
anomaly. - Data-flow testing strategies span the gap between
all paths and branch testing.
82Summary
- AU has the best payoff for the money. It seems
to be no worse than twice the number of required
test cases for branch testing, but the results
are much better. - Path testing with Branch Coverage and Data-flow
testing with AU is a very good combination.