Title: A Roadmap
1A Roadmap
- Traditional Static Program Analysis
- Theory
- Compiler Optimizations Control Flow Graphs
- Data-flow Analysis Data-flow framework todays
class - Classic analyses and applications
- Software Testing
- Formal Static Program Analysis
2Outline
- Data-flow frameworks
- Lattice theoretic foundations
- Monotone frameworks
- The Maximal Fixed Point (MFP) solution
- The Meet Over all Paths (MOP) solution
- Reading Compilers Principles, Techniques and
Tools, by Aho, Lam, Sethi and Ullman, Chapter 9.3
3Four Classical Dataflow Problems Similarities
- There is a finite set U of dataflow facts
- Reaching Definitions the set of all definitions
in program - Available Expressions and Very Busy Expressions
the set of all expressions in program - The solution at a program point i (i.e., in(i),
out(i)) is a subset of U (e.g., for each
definition it either reaches program point i or
does not).
4Similarities, continue
- Dataflow equations are of the form
- out(i) fi(in(i)) (in(i)-kill(i)) gen(i)
- (in(i) pres(i))
gen(i) - Also, for all four classical data-flow problems,
sets pres(i) - and gen(i) have constant values --- i.e., they do
not depend - on in(i). This is not true in general.
- Set union and set intersection can be implemented
as logical OR and AND respectively
5Lattice Theory
- Partial ordering (denoted by or )
- Relation between pairs of elements
- Reflexive x x
- Anti-symmetric x y, y x implies xy
- Transitive x y, y z implies x z
- Poset (set S, )
- 0 Element 0 x, for every x in S
- 1 Element x 1, for every x in S
We dont necessarily need 0 and 1 element.
6Poset Example
a,b,c
U a,b,cThe poset is 2U, is set inclusion
a,b
b,c
a,c
a b c
7Lattice Theory
- Greatest lower bound (glb) g of elements l1,
l2is an element in S such that - (1) g l1, (2) g l2
- (3) for any b in S, b l1, b l2 implies b g
- If glb exists, it is unique. Why? It is called
the meet (denoted by ? or) of l1 and l2. - Least upper bound (lub) l of elements l1, l2is
an element in S such that - (1) l l1, (2) l l2
- (3) for any d in S, d l1, d l2 implies d l
- If lub exists, it is unique. It is called the
join (denoted by V or) of l1 and l2.
8Definition of a Lattice
- A lattice, L, is a poset under such that every
pair of elements has a glb (meet) and lub (join) - Not every poset is a lattice
- A lattice need not contain a 0 or 1 element
- A finite lattice must contain 0 and 1 elements
- If a x for every x in L, then a is the 0
element of L - If x a for every x in L, then a is the 1
element of L
9A poset but not a lattice
e4
e3
e1
e2
0
There is no lub(e3,e4) in this poset so it is not
a lattice. Even if we put a lub(e3,e4), is it
going to be a lattice?
10Examples of Lattices
- H (2U, n, U) where U is a finite set
- Partial order is subset relation
- glb(s1,s2) s1?s2 s1ns2
- lub(s1,s2) s1Vs2 s1Us2
- J (N1, gcd, lcm)
- Partial order is integer divide on N1
- glb(n1,n2) n1?n2 gcd(n1,n2)
- lub(n1,n2) n1Vn2 lcm(n1,n2)
11Chain
- A poset C where for every pair of elements c1, c2
in C, either c1 c2 or c2 c1. - E.g., a a,b a,b,c
- And from the lattice J as shown here,
- 1 2 6 30
- 1 3 15 30
30
6
15
10
Lattices are used in dataflow analysis to reason
about the solution obtainable through fixed-point
iteration.
2
5
3
1
12Dataflow Lattices Reaching Definitions
U all definitions(x,1),(x,4),(a,3)The poset
is 2U, is the subset relation
(x,1),(x,4),(a,3)
1
1. xab
2. if yltab
(x,1),(x,4)
(x,4),(a,3)
(x,1),(a,3)
3. aa1
(x,1) (x,4) (a,3)
4. xab
5. goto 3
0
13Dataflow Lattices Available Expressions
U all expressions ab, a1, yzThe poset is
2U, is the superset relation
1
1. xab
ab
yz
2. if yzltab
a1
3. aa1
ab,yz ab,a1 a1,yz
4. xab
5. goto 2
0
ab,a1,yz
14Monotone Dataflow Frameworks
- Generic data-flow equations
- in(i) V out(m) out(i) fi (in(i))
- Parameters
- Property space in(i), out(i) are elements of a
property space - Combination operator V U for may problems and n
for must problems - Transfer functions fi is associated with node i
- If we instantiate these parameters in a certain
way, then our analysis is an instance of the
monotone dataflow framework
m in pred(i)
15Monotone Frameworks Requirements
- The property space
- Is a complete lattice L under partial order
- where L satisfies the Ascending Chain Condition
- (i.e., all ascending chains are finite)
- The combination operator V
- Is the join (V, pronounced vee) of L
- Reaching Definitions Property space? Combination
operator? - Available Expressions Property space?
Combination operator?
16Monotone Frameworks Requirements
- The transfer functions fi L? L
- Formally, there is space F such that
- F contains all fi
- F contains the identity function id(x) x
- F is closed under composition
- Each fi is monotone
17Monotonicity
- It is defined as
- (1) a b f(a) f(b)
- An equivalent definitions is (2) f(x) V f(y)
f(x V y) - Lemma The two definitions are equivalent.
- First, we show that (1) implies (2).
- Second, we show that (2) implies (1).
18The four classical dataflow problems
Let Def denote all definitions in the program Let
2Def denote the powerset of Def
Let AExp denote all expressions in the
program. Let 2AExp denote the powerset of AExp
Reaching Definitions
Available Expressions
19Distributivity
- A distributive framework A monotone framework
with distributive transfer functions f(x) V f(y)
f(x V y).
20Distributivity
- Each of the four problems is an instance of a
distributive framework. - First, prove monotonicity
- Second, prove distributivity of the functions
21Distributivity
- Each of the four problems is an instance of a
distributive framework. - First, prove monotonicity
- if in(i) in(i) then out(i) out(i)
- Have to show
- if in(i) in(i) then
- (in(i)npres(i)) U gen(i) (in(i)npres(i)) U
gen(i) - Second, prove distributivity
- ((in(i) U in(i))npres(i)) U gen(i)
- ((in(i)npres(i)) U gen(i)) U ((in(i)npres(i)) U
gen(i))
22Points-to Analysis Monotone, Non-distributive
Analysis
- Lattice The set of all points-to graphs Pt
- is inclusion, Pt1 Pt2 if Pt1 is a subgraph of
Pt2 - V is union, P1 V P2 P1 U P2
- Transfer functions are defined on four kinds of
statements - (1) f(pq) is kill all points-to edges from p,
and generate a new points-to edge from p to q - (2) f(pq) is kill all points-to edges from p,
and generate new points-to edges from p to
every x such that q points-to x - (3) f(pq) is kill all points to edges from p,
and generate new points to edges from p to
every x, such that there exists y and q points to
y and y points to x - (4) f(pq) Do not perform kill. Can you think of
a reason why? Generate new points-to edges from
every y to every x, such that p points to y and q
points to x.
23Monotone non-distributive Analysis
- First, we show that the framework is monotone,
- I.e., for each of the four transfer functions we
have to show that if Pt1 Pt2, then f(Pt1)
f(Pt2) - Second, we show that the framework is not
distributive - It is easy to show f(Pt1 V Pt2) ? f(Pt1) V f(Pt2)
- Another example is constant propagation
24Non-distributivity of Points-to Analysis
pxqy
pzqw
p
q
Pt1 V Pt2
x
y
z
w
What f does Adds edges from each variable that
p points to (i.e., x and z), to each variable
where q points to (i.e., y and w). 4 new edges
from x to y and w, and fromz to y and w.
pq
f(Pt1) V f(Pt2)
f(Pt1 V Pt2)
25The Maximal Fixed Point (MFP)1
- / Initialize to initial values /
- in(1)InitialValue in(1) UNDEF
- for m 2 to n do in(m) 0 in(m) Ø
- W 1,2,,n / put every node on the worklist
/ - while W ? Ø do
- remove i from W
- out(i) fi(in (i)) outRD(i)
inRD(i)npres(i)Ugen(i) - for j in successors(i) for j in
successors(i) - if out(i) in(j) then
if outRD(i) not subset of inRD (j) - in(j) out(i) V in(j)
inRD(j) out(i) U inRD(j) - if j not in W do add j to W
-
-
1. The Least Fixed Point (LFP) actually
26Properties of the algorithm
- Lemma1 The algorithm terminates.
- Sketch of the proof
- We have ink(j) ink1(j) and since L has ACC,
in(j) changes at most O(h) times. Thus, each j is
put on W at most O(h) times (h is the height of
the lattice L). - Complexity At each iteration, the analysis
examines e(j)out edges. Thus, number of basic
operations is bounded by h(e(1)oute(N)out)O(h
E). - We can do better on reducible graphs.
27Properties of the Algorithm
- Lemma2 The algorithm computes the least solution
of the dataflow equations. - For every node i MFP computes solution MFP(i)
in(i),out(i), such that every other solution
in(i),out(i) of the dataflow equations is
larger than the MFP - Lemma3 The algorithm computes a correct (safe)
solution.
28Example
Solution1
Solution2
Ø
Ø
inAE(1) Ø
1. zxy
xy
outAE(1) (inAE(1)-Ez) xy
xy
inAE(2) outAE(1) V outAE(3)
xy
Ø
2. if (z gt 500)
outAE(2) inAE(2)
xy
Ø
3. skip
inout(3) outAE(2)
outAE(3) inAE(3)
Equivalent to inAE(2) xy V inAE(2) and
recall that V is n (i.e., set intersection).
That is why we needed to initialize inAE(2) and
the other initial values to the universal set of
expressions (0 of the Available Expressions
lattice), rather than to the more intuitive empty
set.
29Meet Over All Paths (MOP) Solution1
?
n1
- Desired dataflow information at n is obtained by
traversing ALL PATHS from ? to n. For every path
p(?, n1, n2 ..., nk) we compute
fnk(fn2(fn1(init(?)))) - The MOP at entry of n is V fnk(fn2(fn1(init(?))))
- The MOP is the best summary of dataflow facts
possible to compute with this static analysis
n2
nk
n
p in paths from ? to n
30MOP vs. MFP
- For distributive functions the dataflow analysis
can merge paths (p1, p2), without loss of
precision! - E.g., fp1(0) need not be calculated explicitly
- MFPMOP
- Due to Kam and Ullman, 1976,1977 This is not
true for monotone functions. - Lemma 3 The MFP approximates the MOP for general
monotone functions MFP MOP