Title: Scalable Statistical Bug Isolation
1Scalable Statistical Bug Isolation
- Ben Liblit, Mayur Naik,
- Alice X. Zheng, Alex Aiken,
- Michael I. Jordan
2Statistical Debugging
- Dynamic analysis for large programs with multiple
bugs - Program instrumentation
- Tests predicates as program runs
- Random sampling (incomplete, lightweight)
- Feedback reports
- Predicate P observed to be true during run R
- Run R succeeded/failed
- Bug cause isolation algorithm
- Predicate P is a bug predictor for bug B
- Complementary to static analysis
3Non-Scalable Statistical Debugging
- Regularized logistic regression
- Outputs too many redundant predicates (100,000s
for large apps) - Many predicates predict multiple bugs (e.g. long
list of command line flags) - Different bugs occur at rates that differ by
orders of magnitude
4Cause Isolation Algorithm
What predicates to test?
Cause isolation algorithm
Instrumented program
Feedback reports
Bug predictors
Tests predicates at various program points
5What instrumentation to add?
- At branches Are true and false branches ever
taken? - At returns Is return value ever lt 0, lt 0, gt 0,
gt 0, 0, or ! 0 ? - At scalar assignments (x ) For each in-scope
variable and each constant expression, is the
value ever lt x, lt x, gt x, gt x, x, ! x ?
- For heap structures Future work
6Cause Isolation Algorithm
Mimic human debugging
Cause isolation algorithm
Instrumented program
Feedback reports
Bug predictors
P was observed to be true during run R
P probably corresponds to a bug
Run R succeeded/failed
7Cause Isolation Algorithm
- Identify the most important bug B.
- Eliminate predicates that have no predictive
power - Rank the surviving predicates by importance
- Fix B, and repeat.
8A Simple Bug
Bug is deterministic w.r.t. f NULL (if f
NULL is true, then the program crashes)
9A Simple Bug
Bug is non-deterministic w.r.t. f
NULL (possible that f NULL is true and program
terminates normally)
- f
- if (f NULL)
- x 0
- if ()
- f some valid pointer
- f
10Failure(P)
- Even if P is the cause of a bug,
- when P is true, the program may succeed
- when P is never observed to be true, the program
may fail - Probability that P being true implies failure
- Failure(P) Pr( Crash P observed to be true)
- Failure(P) failing runs where P is true
- runs where P is true
11Simple Bug Revisited
Failure(f NULL) 1
Failure(x 0) 1
The predicate x 0 is checked on a path where
the program is already doomed.
12Context(P)
- Probability that P being observed implies
failure - Context(P) Pr( Crash P observed )
- Context(P) failing runs where P is observed
- runs where P is observed
13Increase(P)
- Increase(P) Failure(P) Context(P)
- How much does P being true increase the
probability of failure over P being observed? - Discard any predicate with Increase(P) 0
- P has no predictive power
- Tends to localize bugs at their source, not at
the crash site. - Unlike stack traces
14Cause Isolation Algorithm
- Idea mimic human debugging
- Identify the most important bug B.
- Eliminate predicates that have no predictive
power - Rank the surviving predicates by importance
- Fix B, and repeat.
15Ranking Predicates
- Importance(P) harmonic mean of Increase(P) and
log(F(P))/log(Numf) -
- Chosen to balance
- high sensitivity - P accounts for many failed
runs - high specificity - P does not mis-predict failure
in many successful runs
Total number of failing runs
16Cause Isolation Algorithm
- Idea mimic human debugging
- Identify the most important bug B.
- Eliminate predicates that have no predictive
power - Rank the surviving predicates by importance
- Fix B, and repeat.
- Fix B by removing the top-ranked predicate P
and discarding all runs R where R(P) 1, and
apply algorithm recursively
17Experiment Summary
Dramatic reduction in number of predicates
Apps contained both known and previously unknown
bugs
18Controlled Experiment with Moss
9 seeded errors
- Buffer overruns
- Null file pointer dereference
- Missing end-of-list check
- Missing out-of-memory check
- Violation of a subtle invariant in a data
structure - Incorrect output
19Questions?