Title: Simplifying and Isolating FailureInducing Input
1Simplifying and IsolatingFailure-Inducing Input
- Presented by Nir Peer
- University of Maryland
2Introduction
3Overview
- Given some test case, a program fails.
- What is the minimal test case that still produces
the failure? - Also, what is the difference between a passing
and a failing test case? - or in other words
4How do we go from this
lttd alignleft valigntopgtltSELECT NAME"op sys"
MULTIPLE SIZE7gtltOPTION VALUE"All"gtAllltOPTION
VALUE"Windows 3.1"gtWindows 3.1ltOPTION
VALUE"Windows 95"gtWindows 95ltOPTION
VALUE"Windows 98"gtWindows 98ltOPTION
VALUE"Windows ME"gtWindows MEltOPTION
VALUE"Windows 2000"gtWindows 2000ltOPTION
VALUE"Windows NT"gtWindows NTltOPTION VALUE"Mac
System 7"gtMac System 7ltOPTION VALUE"Mac System
7.5"gtMac System 7.5ltOPTION VALUE"Mac System
7.6.1"gtMac System 7.6.1ltOPTION VALUE"Mac System
8.0"gtMac System 8.0ltOPTION VALUE"Mac System
8.5"gtMac System 8.5ltOPTION VALUE"Mac System
8.6"gtMac System 8.6ltOPTION VALUE"Mac System
9.x"gtMac System 9.xltOPTION VALUE"MacOS X"gtMacOS
XltOPTION VALUE"Linux"gtLinuxltOPTION
VALUE"BSDI"gtBSDIltOPTION VALUE"FreeBSD"gtFreeBSDltO
PTION VALUE"NetBSD"gtNetBSDltOPTION
VALUE"OpenBSD"gtOpenBSDltOPTION VALUE"AIX"gtAIXltOPT
ION VALUE"BeOS"gtBeOSltOPTION VALUE"HP-UX"gtHP-UXltO
PTION VALUE"IRIX"gtIRIXltOPTION VALUE"Neutrino"gtNe
utrinoltOPTION VALUE"OpenVMS"gtOpenVMSltOPTION
VALUE"OS/2"gtOS/2ltOPTION VALUE"OSF/1"gtOSF/1ltOPTIO
N VALUE"Solaris"gtSolarisltOPTION
VALUE"SunOS"gtSunOSltOPTION VALUE"other"gtotherlt/SE
LECTgtlt/tdgtlttd alignleft valigntopgtltSELECT
NAME"priority" MULTIPLE SIZE7gt ltOPTION
VALUE"--"gt--ltOPTION VALUE"P1"gtP1ltOPTION
VALUE"P2"gtP2ltOPTION VALUE"P3"gtP3ltOPTION
VALUE"P4"gtP4ltOPTION VALUE"P5"gtP5lt/SELECTgtlt/tdgt
lttd alignleft valigntopgtltSELECT NAME"bug
severity" MULTIPLE SIZE7gtltOPTION
VALUE"blocker"gtblockerltOPTION VALUE"critical"gtcr
iticalltOPTION VALUE"major"gtmajorltOPTION
VALUE"normal"gtnormalltOPTION VALUE"minor"gtminorltO
PTION VALUE"trivial"gttrivialltOPTION
VALUE"enhancement"gtenhancementlt/SELECTgtlt/trgtlt/t
ablegt
File
Print
Segmentation Fault
5into this
ltSELECTgt
File
Print
Segmentation Fault
6Motivation
- The Mozilla open-source web browser project
receives several dozens bug reports a day. - Each bug report has to be simplified
- Eliminate all details irrelevant to producing the
failure - To facilitate debugging
- To make sure it does not replicate a similar bug
report - In July 1999, Bugzilla listed more than 370 open
bug reports for Mozilla. - These were not even simplified
- Mozilla engineers were overwhelmed with work
- They created the Mozilla BugAThon a call for
volunteers to process bug reports
7Motivation
- Simplifying meant turning bug reports into
minimal test cases - where every part of the input would be
significant in reproducing the failure - What we want is the simplest HTML page that still
produces the fault. - Decomposing specific bug reports into simple test
case is of general interest - Lets automate this task!
8Simplification of test cases
- The minimizing delta debugging algorithm ddmin
- Takes a failing test case
- Simplifies it by successive testing
- Stops when a minimal test case is reached
- where removing any single input entity will cause
the failure to disappear
9How to minimize a test case?
- Test subsets with removed characters (shown in
grey) - A given test case
- Fails (?) if Mozilla crashes on it
- Passes (?) otherwise
10How to minimize a test case?
Original failing input
Try removing halfNow everything passes, weve
lost the error inducing input!
Try removing a quarter ok found something!
11How to minimize a test case?
Try removing a quarter instead
OK, weve gotsomething!So keep it, and continue
Good, carry on
Lost it!Try removing an eighth instead
12How to minimize a test case?
Removing an eighth
Good, keep it!
Lost it!Try removing a sixteenth instead
Great! were making progress
OK, now lets see if removing single characters
helps us reduce it even more
13How to minimize a test case?
Removing a single character
Reached a minimal test case!
Therefore, this should be ourtest case
14Formalization
15Testing for Change
- The execution of a program is determined by a a
number of circumstances - The program code
- Data from storage or input devices
- The programs environment
- The specific hardware
- and so on
- Were only interested in the changeable
circumstances - Those whose change may cause a different program
behavior
16The change that Causes a Failure
- Denote the set of possible configurations of
circumstances by R. - Each r?R determines a specific program run.
- This r could be
- a failing run, denoted by r?
- a passing run, denoted by r?
- Given a specific r?
- We focus on the difference between r? and some
r??R that works - This difference is the change which causes the
failure - The smaller this change, the better it qualifies
as a failure cause
17The change that Causes a Failure
- Formally, the difference between r? and r? is
expressed as a mapping ? which changes the
circumstances of a program run - The exact definition of d is problem specific
- In the Mozilla example, applying d means to
expand a trivial (empty) HTML input to the full
failure-inducing HTML page.
Definition 1 (Change).A change ? is a mapping ?
R?R.The set of changes is C ? R ? R.The
relevant change between two runs r?,r??R isa
change ??C s.t. ?(r?) ? r?.
18Decomposing Changes
- We assume that the relevant change d can be
decomposed into a number of elementary changes
d1,..., dn. - In general, this can be an atomic decomposition
- Changes that can no further be decomposed
Definition 2 (Composition of changes).The change
composition?? C ? C ? C is defined as (?i ?
?j)(r) ?i(?j(r))
19Test Cases and Tests
- According to the POSIX 1003.3 standard for
testing frameworks, we distinguish three test
outcomes - The test succeeds (PASS, written here as ?)
- The test has produced the failure it was intended
to capture (FAIL, written here as ?) - The test produced indeterminate results
(UNRESOLVED, written as ?)
Definition 3 (rtest).The function rtest R ?
?,?,? determines for a program run r?Rwhether
some specific failure occurs (?) or not (?) or
whether the test isunresolved (?).
Axiom 4 (Passing and failing run).rtest(r?) ?
and rtest(r?) ? hold.
20Test Cases and Tests
- We identify each run by the set of changes being
applied to r? - We define c? as the empty set?? which identifies
r? (no changes applied) - The set of all changes c? ?1,?2,...,?n
identifiesr? (?1??2?...??n)(r?)
Definition 5 (Test case). A subset c?? c? is
called a test case.
21Test Cases and Tests
Definition 6 (test). The function test 2? ?
?,?,? is defined as followsLet c?? c? be a
test case with c? ?1,?2,...,?n. Then test(c)
rtest((?1??2?...??n)(r?)) holds.
Corollary 7 (Passing and failing test cases).
The following holds test(c?) test(?)
? (passing test case) test(c?)
test(?1,?2,...,?n) ? (failing test case)
22Minimizing Test Cases
23Minimal Test Cases
- If a test case c ? c? is a minimum, no other
smaller subset of c? causes a failure - But we don't want to have to test all 2c? of c?
- So we'll settle for a local minimum
- A test case is minimal if none of its subsets
causes a failure
Definition 8 (Global minimum). A set c?? c? is
called the global minimum of c? if?c' ? c? ?
(c' lt c?? test(c')?? ?) holds.
Definition 9 (Local minimum). A test case c?? c?
is a local minimum of c? or minimal if?c' ? c ?
(test(c')?? ?) holds.
24Minimal Test Cases
- Thus, if a test case c is minimal
- It is not necessarily the smallest test case
(there may be a different global minimum) - But each element of c is relevant in producing
the failure - Nothing can be removed without making the failure
disappear - However, determining that c is minimal still
requires 2c tests - We can use an approximation instead
- It is possible that removing several changes at
once might make a test case smaller - But we'll only check if this is so when we remove
up to n changes
25Minimal Test Cases
- We define n-minimality removing any combination
of up to n changes, causes the failure to
disappear - We're actually most interested in 1-minimal test
cases - When removing any single change causes the
failure to disappear - Removing two or more changes at once may result
in an even smaller, still failing test case - But every single change on its own is significant
in reproducing the failure
Definition 10 (n-minimal test case). A test case
c?? c? is n-minimal if?c' ? c ? (c - c'?? n
? test(c')?? ?) holds. Consequently, c is
1-minimal if ?di ? c ? (test(c - di)?? ?) holds.
26The Delta Debugging Algorithm
- We partition a given test case c? into subsets
- Suppose we have n subsets D1,...,Dn
- We test
- each Di and
- its complement ??i c? - Di
27The Delta Debugging Algorithm
- Testing each Di and its complement, we have four
possible outcomes - Reduce to subset
- If testing any Di fails, it will be a smaller
test case - Continue reducing Di with n 2 subsets
- Reduce to complement
- If testing any ?i c? - Di fails, it will be a
smaller test case - Continue reducing ?i with n - 1 subsets
- Why n - 1 subsets and not n 2 subsets?
(Maintain granularity!) - Double the granularity
- Done
28The Delta Debugging Algorithm
29Example
- Consider the following minimal test case which
consists of the changes ?1, ?7, and ?8 - Any test case that includes only a subset of
these changes results in an unresolved test
outcome - A test case that includes none of these changes
passes the test - We first partition the set of changes in two
halves - none of them passes the test
30Example
- We continue with granularity increased to four
subsets - When testing the complements, the set ?2 fails,
thus removing changes d3 and d4 - We continue with splitting ?2 into three subsets
31Example
- Steps 9 to 11 have already been carried out and
need not be repeated (marked with ) - When testing ?2, changed ?5 and ?6 can be
eliminated - We reduce to ?2 and continue with two subsets
32Example
- We increase granularity to four subsets and test
each - Testing the complements shows the we can
eliminate d2
33Example
- The next steps show that none of the remaining
changes ?1, ?7, and ?8 can be eliminated - To minimize this test case, a total of 19
different tests was required
34Case Studies
35The GNU C Compiler
define SIZE 20 double mult(double z, int n)
int i, j i 0 for (j 0 j lt n j)
i i j 1 zi zi (z0
1.0) return zn void copy(double to,
double from, int count) int n (count 7)
/ 8 switch (count 8) do case 0 to
from case 7 to from case
6 to from case 5 to from
case 4 to from case 3 to
from case 2 to from case 1
to from while (--n gt 0) return
mult(to, 2) int main(int argc, char
argv) double xSIZE, ySIZE double
px x while (px lt x SIZE) px (px
x) (SIZE 1.0) return copy(y, x, SIZE)
- This program (bug.c) causes GCC 2.95.2 to crash
when optimization is enabled - We would like to minimize this program in order
to file a bug report - In the case of GCC, a passing program run is the
empty input - For the sake of simplicity, we model change as
the insertion of a single character - r? is running GCC with an empty input
- r? means running GCC with bug.c
- each change di inserts the ith character of bug.c
36The GNU C Compiler
- The test procedure would
- create the appropriate subset of bug.c
- feed it to GCC
- return ? iff GCC had crashed, and ? otherwise
77
755
377
188
37The GNU C Compiler
- The minimized code is
- The test case is 1-minimal
- No single character can be removed without
removing the failure - Even every superfluous whitespace has been
removed - The function name has shrunk from mult to a
single t - This program actually has a semantic error
(infinite loop), but GCC still isn't supposed to
crash - So where could the bug be?
- We already know it is related to optimization
- If we remove the O option to turn off
optimization, the failure disappears
t(double z,int n)int i,jfor()iij1ziz
i(z00)return zn
38The GNU C Compiler
- The GCC documentation lists 31 options to control
optimization on Linux - It turns out that applying all of these options
causes the failure to disappear - Some option(s) prevent the failure
ffloat-store fno-default-inline fno-defer-pop
fforce-mem fforce-addr fomit-frame-pointer fno-
inline finline-functions fkeep-inline-functions
fkeep-static-consts fno-function-cse ffast-math
fstrength-reduce fthread-jumps fcse-follow-jum
ps fcse-skip-blocks frerun-cse-after-loop freru
n-loop-opt fgcse fexpensive-optimizations fsche
dule-insns fschedule-insns2 ffunction-sections
fdata-sections fcaller-saves funroll-loops funr
oll-all-loops fmove-all-movables freduce-all-giv
s fno-peephole fstrict-aliasing
39The GNU C Compiler
- We can use test case minimization in order to
find the preventing option(s) - Each di stands for removing a GCC option
- Having all di applied means to run GCC with no
option (failing) - Having no di applied means to run GCC with all
options (passing) - After seven tests, the single option -ffast-math
is found which prevents the failure - Unfortunately, it is a bad candidate for a
workaround because it may alter the semantics of
the program - Thus, we remove -ffast-math from the list of
options and make another run - Again after seven tests, it turn out that
-fforce-addr also prevents the failure - Further examination shows that no other option
prevents the failure
40The GNU C Compiler
- So, this is what we can send to the GCC
maintainers - The minimal test case
- The failure only occurs with optimization
- -ffast-math and -fforce-addr prevent the failure
41Minimizing Fuzz
- In a classical experiment by Miller et al.
several UNIX utilities were fed with fuzz input
a large number of random characters - The studies showed that in the worst case 40 of
the basic programs crashed or went into infinite
loops - We would like to use the ddmin algorithm to
minimize the fuzz input sequences - We examine the following six UNIX utilities
- NROFF (format documents for display)
- TROFF (format documents for typesetter)
- FLEX (fast lexical analyzer generator)
- CRTPLOT (graphics filter for various plotters)
- UL (underlining filter)
- UNITS (convert quantities)
42Minimizing Fuzz
- The following table summarizes the
characteristics of the different fuzz inputs used
?S segmentation fault, ?A arithmetic exception
43Minimizing Fuzz
- Out of the 6?? 16 96 test runs, the utilities
crashed 42 times (43) - We apply our algorithm to minimize the
failure-inducing fuzz input - Our test function returns ? if the input made the
program crash - or ? otherwise