Title: Random Interpretation
1Random Interpretation
UC-Berkeley
2Program Analysis
- Applications in all aspects of software
development, e.g. - Program correctness
- Compiler optimizations
- Translation validation
- Parameters
- Completeness (precision, no false positives)
- Computational complexity
- Ease of implementation
- What if we allow probabilistic soundness?
- We obtain a new class of analyses random
interpretation
3Random Interpretation
- Random Testing Abstract Interpretation
- Almost as simple as random testing but better
soundness guarantees. - Almost as sound as abstract interpretation but
more precise, efficient, and simple.
4Example 1
True
False
- Random testing needs to execute all 4 paths to
verify assertions. - Abstract interpretation analyzes statements
once but uses complicated operations. - Random interpretation executes program once, but
in a way that captures effect of all paths.
a 0 b i
a i-2 b 2
True
False
c b a d i 2b
c 2a b d b 2i
assert(cd 0) assert(c ai)
5Outline
- Random Interpretation
- Linear arithmetic (POPL 2003)
- Uninterpreted functions (POPL 2004)
- Inter-procedural analysis (POPL 2005)
- Other applications
6Problem Linear relationships in linear programs
- Does not mean inapplicability to real programs
- abstract other program stmts as
non-deterministic assignments (standard practice
in program analysis) - Linear relationships are useful for
- Program correctness
- Buffer overflows
- Compiler optimizations
- Constant propagation, copy propagation, common
subexpression elimination, induction variable
elimination.
7Basic idea in random interpretation
- Generic algorithm
- Choose random values for input variables.
- Execute both branches of a conditional.
- Combine the values of variables at join points.
- Test the assertion.
8Idea 1 The Affine Join operation
- Affine join of v1 and v2 w.r.t. weight w
- ?w(v1,v2) w v1 (1-w) v2
- Affine join preserves common linear relationships
(e.g. ab5) - It does not introduce false relationships w.h.p.
- Unfortunately, non-linear relationships are not
preserved (e.g. a (1b) 8)
w 7
9Geometric Interpretation of Affine Join
- satisfies all the affine relationships that
are satisfied by both (e.g. a b 5) - Given any relationship that is not satisfied by
any of (e.g. b2),
also does not satisfy it with high
probability
State before the join State after the join
b
a b 5
(a 2, b 3)
b 2
(a 4, b 1)
a
10Example 1
i3
- Choose a random weight for each join
independently. - All choices of random weights verify first
assertion - Almost all choices contradict second assertion
False
True
a 0 b i
a i-2 b 2
w1 5
i3, a1, b2
i3, a0, b3
i3, a-4, b7
False
True
c b a d i 2b
c 2a b d b 2i
i3, a-4, b7 c-1, d1
i3, a-4, b7 c11, d-11
w2 2
i3, a-4, b7 c23, d-23
assert (cd 0) assert (c ai)
11Example 2
- We need to make use of the conditional xy
on the true branch to prove the assertion.
a x y
True
False
x y ?
b a
b 2x
assert (b 2x)
12Idea 2 The Adjust Operation
- Execute multiple runs of the program in parallel.
- Sample Collection of states at a program point
- Combine states in the sample before a conditional
s.t. - The equality conditional is satisfied.
- Original relationships are preserved.
- Use adjusted sample on the true branch.
13Geometric Interpretation of Adjust
Algorithm to obtain S Adjust(S, e0)
S1
S4
S1
S2
Hyperplane e 0
S3
S2
S3
- Program states points
- Adjust projection onto the hyperplane
- S satisfies e0 and all relationships satisfied
by S
14Correctness of Random Interpreter R
- Completeness If e1e2, then R ) e1e2
- assuming non-det conditionals
- Soundness If e1?e2, then R ) e1e2
- error prob.
- b number of branches
- j number of joins
- d size of the field
- k number of points in the sample
- If j b 10, k 15, d ¼ 232, then error
15Outline
- Random Interpretation
- Linear arithmetic (POPL 2003)
- Uninterpreted functions (POPL 2004)
- Inter-procedural analysis (POPL 2005)
- Other applications
16Problem Global value numbering
- Goal Detect expression equivalence in programs
that have been abstracted using uninterpreted
functions - Axiom of the theory of uninterpreted functions
- If xy, then F(x)F(y)
- Applications
- Compiler optimizations
- Translation validation
17Example
x ?(a,b) y ?(a,b) z ?(F(a),F(b)) F(y)
F(?(a,b))
False
True
x b y b z F(b)
x a y a z F(a)
assert(x y) assert(z F(y))
- Typical algorithms treat ? as uninterpreted
- Hence cannot verify the second assertion
- The randomized algorithm interprets ?
- as affine join operation ?w
18How to execute uninterpreted functions
- e y F(e1,e2)
- Choose a random interpretation for F
- Non-linear interpretation
- E.g. F(e1,e2) r1e12 r2e22
- Preserves all equivalences in straight-line code
- But not across join points
- Lets try linear interpretation
19Random Linear Interpretation
- Encode F(e1,e2) r1e1 r2e2
- Preserves all equivalences across a join point
- Introduces false equivalences in straight-line
code. - E.g. e and e have same encodings even though
e ? e
Encodings e r1(r1ar2b) r2(r1cr2d)
r12(a)r1r2(b)r2r1(c)r22(d) e
r12(a)r1r2 (c)r2r1(b)r22(d)
F
e
F
F
a
c
b
d
- Problem Scalar multiplication is commutative.
- Solution Evaluate expressions to vectors and
choose r1 and r2 to be random matrices
20Outline
- Random Interpretation
- Linear arithmetic (POPL 2003)
- Uninterpreted functions (POPL 2004)
- Inter-procedural analysis (POPL 2005)
- Other applications
21Example
False
True
a 0 b i
a i-2 b 2
- The second assertion is true in the context i2.
- Interprocedural Analysis requires computing
procedure summaries.
True
False
c b a d i 2b
c 2a b d b 2i
assert (c d 0) assert (c a i)
22Idea 1 Keep input variables symbolic
False
True
- Do not choose random values for input variables
(to later instantiate by any context). - Resulting program state at the end is a random
procedure summary.
a 0 b i
a i-2 b 2
a0, bi
ai-2, b2
w1 5
a8-4i, b5i-8
True
False
c b a d i 2b
c 2a b d b 2i
a8-4i, b5i-8 c8-3i, d3i-8
a8-4i, b5i-8 c9i-16, d16-9i
w2 2
a0, b2 c2, d-2
i2
a8-4i, b5i-8 c21i-40, d40-21i
assert (cd 0) assert (c ai)
23Idea 2 Generate fresh summaries
Procedure P
Procedure Q
Input i
u P(2) v P(1) w P(1)
True
False
x i1
x 3
u 52 -7 3 v 51 -7 -2 w 51 -7 -2
w 5
x 3
x i1
x 5i-7
Assert (u 3) Assert (v w)
return x
- Plugging the same summary twice is unsound.
- Fresh summaries can be generated by random affine
combination of few independent summaries!
24Experiments
25Experiments
- Randomized algorithm discovers 10-70 more
facts. - Randomized algorithm is slower by a factor of 2.
26Experimental measure of error
- The of incorrect relationships decreases with
increase in - S size of set from which random values are
chosen. - N of random summaries used.
S
103 105 108
2 95.5 95.5 95.5
3 64.3 3.2 0
4 0.2 0 0
5 0 0 0
6 0 0 0
N
The experimental results are better than what is
predicted by theory.
27Outline
- Random Interpretation
- Linear arithmetic (POPL 2003)
- Uninterpreted functions (POPL 2004)
- Inter-procedural analysis (POPL 2005)
- Other applications
28Other applications of random interpretation
- Model Checking
- Randomized equivalence testing algorithm for
FCEDs, which represent conditional linear
expressions and are generalization of BDDs. (SAS
04) - Theorem Proving
- Randomized decision procedure for linear
arithmetic and uninterpreted functions. This runs
an order of magnitude faster than det. algo.
(CADE 03) - Ideas for deterministic algorithms
- PTIME algorithm for global value numbering,
thereby solving a 30 year old open problem. (SAS
04)
29Future Work and Limitations
- Future Work
- Random interpreters for other theories
- E.g. data-structures
- Combining random interpreters
- E.g. random interpreter for the combined theory
of linear arithmetic and uninterpreted functions. - Limitations
- Does not discover never equal information
- Only detects always equal information
30Summary
Random interpretation
Abstract interpretation
Key Idea Complexity Complexity
Linear Arithmetic Affine Join O(n2) O(n4)
Uninterpreted Fns. Vectors O(n3) O(n4)
Interproc. Analysis Symbolic i/p variables Poly blowup ?
- Lessons Learned
- Randomization buys efficiency, simplicity at cost
of prob. soundness. - Randomization suggests ideas for deterministic
algorithms. - Combining randomized techniques with symbolic is
powerful.