Title: Pointer analysis
1Pointer analysis
2Flow insensitive loss of precision
S1 l new Cons
Flow-insensitive Soln (Andersen)
Flow-sensitive Soln
p l
t
p
l
S1
S2
S2 t new Cons
t
p
l
S1
S2
p t
t
p
l
S1
S2
p t
t
p
l
S1
S2
3Flow insensitive loss of precision
- Flow insensitive analysis leads to loss of
precision!
main() x y ... x z
Flow insensitive analysis tells us that x may
point to z here!
- However
- uses less memory (memory can be a big bottleneck
to running on large programs) - runs faster
4Worst case complexity of Andersen
x
y
x
y
x y
a
b
c
d
e
f
a
b
c
d
e
f
- Worst case N2 per statement, so at least N3 for
the whole program. Andersen is in - fact O(N3)
5New idea one successor per node
- Make each node have only one successor.
- This is an invariant that we want to maintain.
x
y
x
y
x y
a,b,c
d,e,f
a,b,c
d,e,f
6More general case for x y
x
y
x y
7More general case for x y
8Handling x y
x
y
x y
9Handling x y
10Handling x y (what about y x?)
x
y
x y
Handling x y
x
y
x y
11Handling x y (what about y x?)
get the same for y x
Handling x y
12Our favorite example, once more!
S1 l new Cons
1
p l
2
S2 t new Cons
3
p t
4
p t
5
13Our favorite example, once more!
l
l
p
1
2
S1 l new Cons
1
S1
S1
3
p l
2
l
t
p
l
t
p
4
S2 t new Cons
3
S1
S2
S1
S2
5
p t
4
l
t
p
l
t
p
p t
5
S1
S2
S1,S2
14Flow insensitive loss of precision
Flow-insensitive Unification- based
S1 l new Cons
Flow-sensitive Subset-based
Flow-insensitive Subset-based
p l
t
p
l
S1
S2
S2 t new Cons
t
p
l
t
p
l
S1
S2
p t
t
p
S1,S2
l
S1
S2
p t
t
p
l
S1
S2
15Another example
bar() i a j b foo(i)
foo(j) // i pnts to what? i ...
void foo(int p) printf(d,p)
1
2
3
4
16Another example
p
bar() i a j b foo(i)
foo(j) // i pnts to what? i ...
void foo(int p) printf(d,p)
i
i
j
i
j
1
2
3
1
2
a
a
b
a
b
3
4
4
p
p
i
j
i,j
a
b
a,b
17Steensgaard beyond
- A well engineered implementation of Steensgaard
ran on Word97 (2.1 MLOC) in 1 minute. - One Level Flow (Das PLDI 00) is an extension to
Steensgaard that gets more precision and runs in
2 minutes on Word97.
18Correctness
19Compilers have many bugs
Searched for incorrect and wrong in the
gcc-bugs mailing list. Some of the results
- Bug middle-end/19650 New miscompilation of
correct code - Bug c/19731 arguments incorrectly named in
static member specialization - Bug rtl-optimization/13300 Variable incorrectly
identified as a biv - Bug rtl-optimization/16052 strength reduction
produces wrong code - Bug tree-optimization/19633 local address
incorrectly thought to escape - Bug target/19683 New MIPS wrong-code for
64-bit multiply - Bug c/19605 Wrong member offset in inherited
classes - Bug java/19295 4.0 regression Incorrect
bytecode produced for bitwise AND
Total of 545 matches And this is only for one
month! On a mature compiler!
20Compiler bugs cause problems
Compiler
Exec
- They lead to buggy executables
- They rule out having strong guarantees about
executables
21The focus compiler optimizations
- A key part of any optimizing compiler
22The focus compiler optimizations
- A key part of any optimizing compiler
- Hard to get optimizations right
- Lots of infrastructure-dependent details
- There are many corner cases in each optimization
- There are many optimizations and they interact in
unexpected ways - It is hard to test all these corner cases and all
these interactions
23Goals
- Make it easier to write compiler optimizations
- student in an undergrad compiler course should be
able to write optimizations - Provide strong guarantees about the correctness
of optimizations - automatically (no user intervention at all)
- statically (before the opts are even run once)
- Expressive enough for realistic optimizations
24The Rhodium work
- A domain-specific language for writing
optimizations Rhodium - A correctness checker for Rhodium optimizations
- An execution engine for Rhodium optimizations
- Implemented and checked the correctness of a
variety of realistic optimizations
25Broader implications
- Many other kinds of program manipulatorscode
refactoring tools, static checkers - Rhodium work is about program analyses and
transformations, the core of any program
manipulator - Enables safe extensible program manipulators
- Allow end programmers to easily and safely extend
program manipulators - Improve programmer productivity
26Outline
- Introduction
- Overview of the Rhodium system
- Writing Rhodium optimizations
- Checking Rhodium optimizations
- Discussion
27Rhodium system overview
Written by the Rhodium team
Rhodium Execution engine
Checker
Written by programmer
28Rhodium system overview
Written by the Rhodium team
Rhodium Execution engine
Checker
Written by programmer
29Rhodium system overview
Rdm Opt
Rdm Opt
Rdm Opt
30Rhodium system overview
Compiler
Rhodium Execution engine
Exec
Rdm Opt
Rdm Opt
Rdm Opt
31The technical problem
- Tension between
- Expressiveness
- Automated correctness checking
- Challenge develop techniques
- that will go a long way in terms of
expressiveness - that allow correctness to be checked
32Solution three techniques
Rdm Opt
Verification Task
Checker
Show that for any original program behavior
of original program behavior
of optimized program
Verification Task
33Solution three techniques
Rdm Opt
Verification Task
Verification Task
34Solution three techniques
Rdm Opt
Verification Task
Verification Task
35Solution three techniques
Rdm Opt
- Rhodium is declarative
- declare intent using rules
- execution engine takes care of the rest
36Solution three techniques
Rdm Opt
- Rhodium is declarative
- declare intent using rules
- execution engine takes care of the rest
37Solution three techniques
Heuristics not affecting correctness
Part that must be reasoned about
Rdm Opt
- Rhodium is declarative
- Factor out heuristics
- legal transformations
- vs. profitable transformations
38Solution three techniques
Heuristics not affecting correctness
Part that must be reasoned about
- Rhodium is declarative
- Factor out heuristics
- legal transformations
- vs. profitable transformations
39Solution three techniques
opt-dependent
- Rhodium is declarative
- Factor out heuristics
- Split verification task
- opt-dependent
- vs. opt-independent
opt-independent
40Solution three techniques
- Rhodium is declarative
- Factor out heuristics
- Split verification task
- opt-dependent
- vs. opt-independent
41Solution three techniques
- Rhodium is declarative
- Factor out heuristics
- Split verification task
- opt-dependent
- vs. opt-independent
42Solution three techniques
- Rhodium is declarative
- Factor out heuristics
- Split verification task
- Result
- Expressive language
- Automated correctness checking
43Outline
- Introduction
- Overview of the Rhodium system
- Writing Rhodium optimizations
- Checking Rhodium optimizations
- Discussion
44MustPointTo analysis
a b
c a
d c
d b
45MustPointTo info in Rhodium
a b
c a
d c
46MustPointTo info in Rhodium
a b
a b
c a
c a
d c
d c
47MustPointTo info in Rhodium
define fact mustPointTo(XVar,YVar) with
meaning X Y
a b
Fact correct on edge if
whenever program execution reaches edge, meaning
of fact evaluates to true in the program state
c a
d c
48Propagating facts
define fact mustPointTo(XVar,YVar) with
meaning X Y
a b
c a
d c
49Propagating facts
define fact mustPointTo(XVar,YVar) with
meaning X Y
a b
a b
if currStmt X Y then mustPointTo(X,Y)_at_ou
t
c a
d c
50Propagating facts
define fact mustPointTo(XVar,YVar) with
meaning X Y
a b
if currStmt X Y then mustPointTo(X,Y)_at_ou
t
c a
d c
51Propagating facts
define fact mustPointTo(XVar,YVar) with
meaning X Y
a b
if currStmt X Y then mustPointTo(X,Y)_at_ou
t
mustPointTo (a, b)
mustPointTo (a, b)
if mustPointTo(X,Y)_at_in Æ currStmt Z
X then mustPointTo(Z,Y)_at_out
c a
c a
mustPointTo (a, b)
mustPointTo (c, b)
mustPointTo (c, b)
d c
52Propagating facts
define fact mustPointTo(XVar,YVar) with
meaning X Y
a b
if currStmt X Y then mustPointTo(X,Y)_at_ou
t
if mustPointTo(X,Y)_at_in Æ currStmt Z
X then mustPointTo(Z,Y)_at_out
c a
d c
53Transformations
define fact mustPointTo(XVar,YVar) with
meaning X Y
a b
if currStmt X Y then mustPointTo(X,Y)_at_ou
t
if mustPointTo(X,Y)_at_in Æ currStmt Z
X then mustPointTo(Z,Y)_at_out
c a
mustPointTo (a, b)
mustPointTo (c, b)
mustPointTo (c, b)
if mustPointTo(X,Y)_at_in Æ currStmt Z
X then transform to Z Y
d c
d c
d b
54Transformations
define fact mustPointTo(XVar,YVar) with
meaning X Y
a b
if currStmt X Y then mustPointTo(X,Y)_at_ou
t
if mustPointTo(X,Y)_at_in Æ currStmt Z
X then mustPointTo(Z,Y)_at_out
c a
mustPointTo (a, b)
mustPointTo (c, b)
if mustPointTo(X,Y)_at_in Æ currStmt Z
X then transform to Z Y
d c
d b
55Profitability heuristics
Legal transformations
(identified by the Rhodium rules)
Profitability Heuristics
Subset of legal transformations
(actually performed)
56Profitability heuristic example 1
- Inlining
- Many heuristics to determine when to inline a
function - compute function sizes, estimate code-size
increase, estimate performance benefit - maybe even use AI techniques to make the decision
- However, these heuristics do not affect the
correctness of inlining - They are just used to choose which of the correct
set of transformations to perform
57Profitability heuristic example 2
- Partial redundancy elimination (PRE)
a ... b ... if (...) a ...
x a b else ...
x a b
58Profitability heuristic example 2
- PRE as code duplication followed by CSE
a ... b ... if (...) a ...
x a b else ... x a b
x a b
59Profitability heuristic example 2
- PRE as code duplication followed by CSE
a ... b ... if (...) a ...
x a b else ... x
x a b
a b
x
60Profitability heuristic example 2
- PRE as code duplication followed by CSE
a ... b ... if (...) a ...
x a b else ... x
- Code duplication
- CSE
- self-assignment removal
x a b
x
61Profitability heuristic example 2
Legal placements of x a b
Profitable placement
a ... b ... if (...) a ...
x a b else ... x a b
62Semantics of a Rhodium opt
- Run propagation rules in a loop until there are
no more changes (optimistic iterative analysis) - Then run transformation rules to identify the set
of legal transformations - Then run profitability heuristics to determine
set of transformations to perform
63More facts
define fact mustNotPointTo(XVar,YVar) with
meaning X ? Y
define fact doesNotPointIntoHeap(XVar) with
meaning X null Ç 9 YVar . X Y
define fact hasConstantValue(XVar,CConst) with
meaning X C
64More rules
if currStmt X A Æ mustNotPointToHeap(A)
_at_in Æ 8 BVar . mayPointTo(A,B)_at_in )
mustNotPointTo(B,Y) then mustNotPointTo(X,Y)_at_out
if currStmt Y I BE Æ varEqualArray(X,A
,J)_at_in Æ equalsPlus(J,I,BE)_at_in Æ mayDef(X) Æ
mayDefArray(A) Æ unchanged(BE) then
varEqualArray(X,A,Y)_at_out
65More in Rhodium
- More powerful pointer analyses
- Heap summaries
- Analyses across procedures
- Interprocedural analyses
- Analyses that dont care about the order of
statements - Flow-insensitive analyses
66Outline
- Introduction
- Overview of the Rhodium system
- Writing Rhodium optimizations
- Checking Rhodium optimizations
- Discussion
67Rhodium correctness checker
Rdm Opt
68Rhodium correctness checker
Rdm Opt
69Rhodium correctness checker
Rdm Opt
Checker
Automatic theorem prover
70Rhodium correctness checker
Rhodium optimization
define fact
if then transform
if then
Profitability heuristics
Checker
Automatic theorem prover
71Rhodium correctness checker
Rhodium optimization
define fact
if then transform
if then
Checker
Automatic theorem prover
72Rhodium correctness checker
Rhodium optimization
Opt-independent
define fact
if then
if then transform
Checker
VCGen
VCGen
LocalVC
LocalVC
Opt-dependent
Automatic theorem prover
73Local verification conditions
define fact mustPointTo(X,Y) with meaning X
Y
Local VCs (generated and proven automatically)
74Local correctness of prop. rules
Local VC (generated and proven automatically)
define fact mustPointTo(X,Y) with meaning X
Y
Assume
All incoming facts are correct
if mustPointTo(X,Y)_at_in Æ
currStmt Z X
Propagated factis correct
Show
then mustPointTo(Z,Y)_at_out
75Local correctness of prop. rules
Local VC (generated and proven automatically)
define fact mustPointTo(X,Y) with meaning X
Y
?in
Z X
?out
76Local correctness of trans. rules
Local VC (generated and proven automatically)
define fact mustPointTo(X,Y) with meaning X
Y
if mustPointTo(X,Y)_at_in Æ
currStmt Z X
then transform to Z Y
Z X
Z Y
77Local correctness of trans. rules
Local VC (generated and proven automatically)
define fact mustPointTo(X,Y) with meaning X
Y
if mustPointTo(X,Y)_at_in Æ
currStmt Z X
then transform to Z Y
Z X
Z X
Z Y
Z Y
?out
?out
?
78Outline
- Introduction
- Overview of the Rhodium system
- Writing Rhodium optimizations
- Checking Rhodium optimizations
- Discussion
79Topics of Discussion
- Correctness guarantees
- Usefulness of the checker
- Expressiveness
80Correctness guarantees
- Guarantees
- Usefulness
- Expressiveness
- Once checked, optimizations are guaranteed to be
correct - Caveat trusted computing base
- execution engine
- checker implementation
- proofs done by hand once
- Adding a new optimization does not increase the
size of the trusted computing base
81Usefulness of the checker
- Guarantees
- Usefulness
- Expressiveness
- Found subtle bugs in my initial implementation of
various optimizations
define fact equals(XVar, EExpr) with
meaning X E
if currStmt X E then equals(X,E)_at_out
82Usefulness of the checker
- Guarantees
- Usefulness
- Expressiveness
- Found subtle bugs in my initial implementation of
various optimizations
define fact equals(XVar, EExpr) with
meaning X E
if currStmt X E then equals(X,E)_at_out
if currStmt X E Æ X does not appear in
E then equals(X,E)_at_out
83Usefulness of the checker
- Guarantees
- Usefulness
- Expressiveness
- Found subtle bugs in my initial implementation of
various optimizations
define fact equals(XVar, EExpr) with
meaning X E
x x 1
x x 1
x y 1
if currStmt X E Æ X does not appear in
E then equals(X,E)_at_out
if currStmt X E Æ E does not use
X then equals(X,E)_at_out
equals (x , x 1)
equals (x , y 1)
84Rhodium expressiveness
- Guarantees
- Usefulness
- Expressiveness
- Traditional optimizations
- const prop and folding, branch folding, dead
assignment elim, common sub-expression elim,
partial redundancy elim, partial dead assignment
elim, arithmetic invariant detection, and integer
range analysis. - Pointer analyses
- must-point-to analysis, Andersen's may-point-to
analysis with heap summaries - Loop opts
- loop-induction-variable strength reduction, code
hoisting, code sinking - Array opts
- constant propagation through array elements,
redundant array load elimination
85Expressiveness limitations
- Guarantees
- Usefulness
- Expressiveness
- May not be able to express your optimization in
Rhodium - opts that build complicated data structures
- opts that perform complicated many-to-many
transformations (e.g. loop fusion, loop
unrolling) - A correct Rhodium optimization may be rejected by
the correctness checker - limitations of the theorem prover
- limitations of first-order logic
86Lessons learned (discussion)
87Lessons learned (my answers)
- Capture structure of problem
- Rhodium flow functions, rewrite rules, prof.
heuristics - Restricts the programmer, but can lead to better
reasoning abilities - Split correctness-critical code from rest
- Split verification task
- meta-level vs. per-verification
- between analysis tool and theorem prover
- between human and theorem prover
88Lessons learned (my answers)
- DSL design is an iterative process
- Hard to see best design without trying something
first - Previous version of Rhodium was called Cobalt
- Cobalt was based on temporal logic
- Stepping stone towards Rhodium
89Lessons learned (my answers)
- One of the gotchas is efficient execution
- easier to reason about automatically does not
always mean easier to execute efficiently - can possibly recover efficiency with hints from
users - how can you trust a complex execution engine?
- Rely on annotations?
- meanings in Rhodium
- May be ok, especially if annotations simply state
what the programmer is already thinking
90Conclusion
- Rhodium system
- makes it easier to write optimizations
- provides correctness guarantees
- is expressive enough for realistic optimizations
- Rhodium is an example of using a DSL to allow
more precise reasoning