Scalable Statistical Bug Isolation - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Scalable Statistical Bug Isolation

Description:

Dynamic analysis for large programs with multiple bugs. Program ... The predicate x == 0 is checked on a path where the program is already doomed. Context(P) ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 20
Provided by: stre7
Category:

less

Transcript and Presenter's Notes

Title: Scalable Statistical Bug Isolation


1
Scalable Statistical Bug Isolation
  • Ben Liblit, Mayur Naik,
  • Alice X. Zheng, Alex Aiken,
  • Michael I. Jordan

2
Statistical Debugging
  • Dynamic analysis for large programs with multiple
    bugs
  • Program instrumentation
  • Tests predicates as program runs
  • Random sampling (incomplete, lightweight)
  • Feedback reports
  • Predicate P observed to be true during run R
  • Run R succeeded/failed
  • Bug cause isolation algorithm
  • Predicate P is a bug predictor for bug B
  • Complementary to static analysis

3
Non-Scalable Statistical Debugging
  • Regularized logistic regression
  • Outputs too many redundant predicates (100,000s
    for large apps)
  • Many predicates predict multiple bugs (e.g. long
    list of command line flags)
  • Different bugs occur at rates that differ by
    orders of magnitude

4
Cause Isolation Algorithm
What predicates to test?
Cause isolation algorithm
Instrumented program
Feedback reports
Bug predictors
Tests predicates at various program points
5
What instrumentation to add?
  • At branches Are true and false branches ever
    taken?
  • At returns Is return value ever lt 0, lt 0, gt 0,
    gt 0, 0, or ! 0 ?
  • At scalar assignments (x ) For each in-scope
    variable and each constant expression, is the
    value ever lt x, lt x, gt x, gt x, x, ! x ?
  • For heap structures Future work

6
Cause Isolation Algorithm
Mimic human debugging
Cause isolation algorithm
Instrumented program
Feedback reports
Bug predictors
P was observed to be true during run R
P probably corresponds to a bug
Run R succeeded/failed
7
Cause Isolation Algorithm
  • Identify the most important bug B.
  • Eliminate predicates that have no predictive
    power
  • Rank the surviving predicates by importance
  • Fix B, and repeat.

8
A Simple Bug
  • f
  • if (f NULL)
  • x 0
  • f

Bug is deterministic w.r.t. f NULL (if f
NULL is true, then the program crashes)
9
A Simple Bug
Bug is non-deterministic w.r.t. f
NULL (possible that f NULL is true and program
terminates normally)
  • f
  • if (f NULL)
  • x 0
  • if ()
  • f some valid pointer
  • f

10
Failure(P)
  • Even if P is the cause of a bug,
  • when P is true, the program may succeed
  • when P is never observed to be true, the program
    may fail
  • Probability that P being true implies failure
  • Failure(P) Pr( Crash P observed to be true)
  • Failure(P) failing runs where P is true
  • runs where P is true

11
Simple Bug Revisited
  • f
  • if (f NULL)
  • x 0
  • f

Failure(f NULL) 1
Failure(x 0) 1
The predicate x 0 is checked on a path where
the program is already doomed.
12
Context(P)
  • Probability that P being observed implies
    failure
  • Context(P) Pr( Crash P observed )
  • Context(P) failing runs where P is observed
  • runs where P is observed

13
Increase(P)
  • Increase(P) Failure(P) Context(P)
  • How much does P being true increase the
    probability of failure over P being observed?
  • Discard any predicate with Increase(P) 0
  • P has no predictive power
  • Tends to localize bugs at their source, not at
    the crash site.
  • Unlike stack traces

14
Cause Isolation Algorithm
  • Idea mimic human debugging
  • Identify the most important bug B.
  • Eliminate predicates that have no predictive
    power
  • Rank the surviving predicates by importance
  • Fix B, and repeat.

15
Ranking Predicates
  • Importance(P) harmonic mean of Increase(P) and
    log(F(P))/log(Numf)
  • Chosen to balance
  • high sensitivity - P accounts for many failed
    runs
  • high specificity - P does not mis-predict failure
    in many successful runs

Total number of failing runs
16
Cause Isolation Algorithm
  • Idea mimic human debugging
  • Identify the most important bug B.
  • Eliminate predicates that have no predictive
    power
  • Rank the surviving predicates by importance
  • Fix B, and repeat.
  • Fix B by removing the top-ranked predicate P
    and discarding all runs R where R(P) 1, and
    apply algorithm recursively

17
Experiment Summary
Dramatic reduction in number of predicates
Apps contained both known and previously unknown
bugs
18
Controlled Experiment with Moss
9 seeded errors
  • Buffer overruns
  • Null file pointer dereference
  • Missing end-of-list check
  • Missing out-of-memory check
  • Violation of a subtle invariant in a data
    structure
  • Incorrect output

19
Questions?
Write a Comment
User Comments (0)
About PowerShow.com