Title: Analysis of Alternative Caching Methods for jFuzz
1 Analysis of Alternative Caching Methods for
jFuzz
- David Harvison
- Adam Kiezun
2Summary
- Problem
- jFuzz makes many redundant calls to the
constraint solver. - Approach
- Measure the average hit rates of different
caching strategies. - Results
- Global caching should reduce the number of calls
to the constraint solver.
3NASA Java PathFinder
- Dynamic analysis framework for Java implemented
as a JVM - Features
- Backtracking
- Execute all thread interleavings
- Execute a program on all possible inputs
- Assign attributes to variables
4jFuzz Architecture
Subject and Input
- Runs JPF many times on the subject program and
input files - Each run
- Collects the Path Condition (PC)
- Negates each constraint, reduces, and solves
- Uses new PCs to generate new input files
- Keeps track of inputs which caused exceptions to
be thrown
5jFuzz Architecture
Subject and Input
jFuzz
JPF
PC
Subject and Original Input
Negated PC
Negated PC
Cache
Solver
New Input
New Input
Inputs which cause crashes
6Levels of Caching
- Local Caching
- Each run of JPF has a cache
- Global Caching
- Persistent cache throughout all runs of JPF
7Hash Properties
- Sound
- Two objects have the same hash if they are
interchangeable.
- Sound and Complete
- Two objects have the same hash if and only if
they are interchangeable.
8Ideal Cache
Path Condition 1 1 x 3
- Path Condition 2
- 1 y lt 4
- 2 2 y gt 6
- These two PCs are equivalent
- Calculating this would be too much work for large
PCs. - Hash functions need to be fast.
9Caching Trade offs
- Hit rate
- The percentage of the time that the data being
asked for is in the cache.
- Speed of hashing
- Inversely related to the hit rate.
10Types of Caching
- Identity Hash
- Every PC has a unique value.
- Hit only if the exact PC is seen again.
- Identity Hash
- Every PC has a unique value.
- Hit only if the exact PC is seen again.
- Name Dependent Hash
- Unique value for structurally different PCs.
- This includes variable names.
- Name Independent Hash
- Same as name dependent except variable names are
factored out.
11Types of Caching
- Identity Hash
- Every PC has a unique value.
- Hit only if the exact PC is seen again.
- Identity Hash
- Every PC has a unique value.
- Hit only if the exact PC is seen again.
- Name Dependent Hash
- Unique value for structurally different PCs.
- This includes variable names.
- Name Independent Hash
- Same as name dependent except variable names are
factored out.
12Motivation
Path Condition 1 1 a b lt 10 2 b gt 6 3 c
lt 15 4 a lt 3 5 c d gt 7 6 e ! 1 7 c e
5 8 a 2
- Path Condition 2
- 1 x y lt 10
- 2 z lt 15
- 3 y gt 6
- 4 x lt 3
- 5 z w gt 7
- 6 w ! 1
- 7 x 2
13Motivation
Path Condition 1 1 a b lt 10 2 b gt 6 3 c
lt 15 4 a lt 3 5 c d gt 7 6 e ! 1 7 c e
5 8 a ! 2
- Path Condition 2
- 1 x y lt 10
- 2 z lt 15
- 3 y gt 6
- 4 x lt 3
- 5 z w gt 7
- 6 w ! 1
- 7 x ! 2
14Motivation
Path Condition 1 1 a b lt 10 2 b gt 6 4 a
lt 3 8 a ! 2
- Path Condition 2
- 1 x y lt 10
- 3 y gt 6
- 4 x lt 3
- 7 x ! 2
15Motivation
Path Condition 1 1 a b lt 10 2 b gt 6 3 a
lt 3 4 a ! 2
- Path Condition 2
- 1 x y lt 10
- 2 y gt 6
- 3 x lt 3
- 4 x ! 2
- Name dependent caching will pass both of these to
the solver.
- Name independent caching will recognize these are
the same.
16Removing Name Dependence
For Each conjunct in the PC
- Locate the variables.
- If the variable has been seen before use the
previously used name. - Otherwise, replace the variable name with a name
that will be consistent between runs.
17Removing Name Dependence
Path Condition 1 1 a b lt 10 2 b gt 6 3 a
lt 3 4 a ! 2
- Path Condition 2
- 1 x y lt 10
- 2 y gt 6
- 3 x lt 3
- 4 x ! 2
18Removing Name Dependence
Path Condition 1 1 var1 var2 lt 10 2 var2 gt
6 3 var1 lt 3 4 var1 ! 2
Path Condition 2 1 var1 var2 lt 10 2 var2 gt
6 3 var1 lt 3 4 var1 ! 2
- The PCs are now name independent.
- This can reduce the number of times the solver is
called.
19Case Study
- Subject Sat4J
- SAT solver written in Java.
- Takes inputs in dimacs files.
- 10 kloc.
- Goals
- Compare Global vs Local Caching
- Compare name dependent and independent Caching
test1.dimacs c test 3 single clauses c and 2
binary clauses p cnf 4 5 1 0 2 0 3 0 -2 4 0 -3 4 0
PC Size 250 constraints
20Local Caching
- Name dependent caching does nothing.
- This is by design.
- Name independent caching is sporadic.
- High hit rates on runs with more input creation.
21Global Caching
- Name dependent caching plateaus between 70-80
- Name independent quickly approaches a 99 average
hit rate.
22Results
- Global caching quickly achieves a higher hit rate
than local caching. - Name independent caching is better in both cases.
23Conclusions
- Name independent caching is better than name
dependent caching. - Global caching has a much higher hit rate than
local caching. - The gains from implementing global caching versus
local caching should be higher than providing
name independence.
24Parallelization
- Current bottle neck is waiting for JPF to finish.
- Should execute on multiple input files
simultaneously. - Distribute work over multiple computers.
25Selecting the Next Input
- One input produces many more inputs.
- Currently the oldest is selected.
- Oldest input is closest to the current input.
- Test different heuristics picking the next input