Title: Program Analysis
1Program Analysis
- Mooly Sagiv
- http//www.math.tau.ac.il/sagiv/courses/pa01.html
- Tel Aviv University
- 640-6706
- Textbook Principles of Program Analysis
- Chapter 1.5-8 (modified)
2Outline
- Mathematical Background
- Abstract Interpretation
- Type systems
- Conclusions
3Mathematical Background
- Declaratively define
- The result of the analysis
- The exact solution
- Allow comparison
4Posets
- A partial ordering is a binary relation? L ? L
? false, true - For all l ? L l ? l (Reflexive)
- For all l1, l2, l3 ? L l1 ? l2, l2 ? l3 ? l1 ?
l3 (Transitive) - For all l1, l2? L l1 ? l2, l2 ? l1 ? l1 l2
(Anti-Symmetric) - Denoted by (L, ? )
- In program analysis
- l1 ? l2? l1 is more precise than l2 ? l1
represents fewer concrete states than l2 - Examples
- Total orders (N, ?)
- Powersets (P(S), ?)
- Powersets (P(S), ?)
- More notations
- l1 ? l2 ? l2 ? l1
- l1 ? l2 ? l1 ? l2 ? l1? l2
- l1 ? l2 ? l2? l1
5Upper and Lower Bounds
- Consider a poset (L, ? )
- A subset L ? L has a lower bound l ? L if for
all l ? L l ? l - A subset L ? L has an upper bound u ? L if for
all l ? L l ? u - A greatest lower bound of a subset L ? L is a
lower bound l0 ?L such that l ? l0 for any
lower bound l of L - A lowest upper bound of a subset L ? L is an
upper bound u0 ?L such that u0 ? u for any
upper bound u of L - For every subset L ? L
- The greatest lower bound of L is unique if at
all exists - ?L (meet) a ?b
- The lowest upper bound of L is unique if at all
exists - ?L (join) a?b
6Complete Lattices
- A poset (L, ? ) is a complete lattice if every
subset has least and upper bounds - L (L, ?) (L, ?, ?, ?, ?, ?)
- ? ? ? ? L
- ? ? L ? ?
- Lemma For every poset (L, ? ) the following
conditions are equivalent - L is a complete lattice
- Every subset of L has a least upper bound
- Every subset of L has a greatest lower bound
7Cartesian Products
- A complete lattice (L1, ?1) (L1, ?, ?1, ?1,
?1, ?1) - A complete lattice (L2, ?2) (, ?, ?2, ?2, ?2,
?2) - Define a Poset L (L1 ? L2 ,? ) where
- (x1, x2) ? (y1, y2) if
- x1 ? x2 and
- y1 ? y2
- L is a complete lattice
8Chains
- A subset Y ? L in a poset (L, ? ) is a chain if
every two elements in Y are ordered - For all l1, l2 ? Y l1 ? l2 or l2 ? l1
- An ascending chain is a sequence of values
- l1 ? l2 ? l3 ?
- A strictly ascending chain is a sequence of
values - l1 ? l2 ? l3?
- A descending chain is a sequence of values
- l1 ? l2 ? l3 ?
- A strictly descending chain is a sequence of
values - l1 ? l2 ? l3 ?
- L has a finite height if every chain in L is
finite - Lemma A poset (L, ? ) has finite height if and
only if every strictly decreasing and strictly
increasing chains are finite
9Monotone Functions
- A poset (L, ? )
- A function f L ? L is monotone if for every
l1, l2 ? L - l1 ? l2 ? f(l1 ) ? f(l2 )
10Fixed Points
- A monotone function f L ? L where (L, ?, ?, ?,
?, ?) is a complete lattice - Fix(f) l l ? L, f(l) l
- Red(f) l l ? L, f(l) ? l
- Ext(f) l l ? L, l ? f(l)
- l1 ? l2 ? f(l1 ) ? f(l2 )
- Tarskis Theorem 1955 if f is monotone then
- lfp(f) ? Fix(f) ? Red(f) ? Fix(f)
- gfp(f) ? Fix(f) ? Ext(f) ? Fix(f)
gfp(f)
lfp(f)
11Chaotic Iterations
- A lattice L (L, ?, ?, ?, ?, ?) with finite
strictly increasing chains - Ln L ? L ? ? L
- A monotone function f Ln? Ln
- Compute lfp(f)
- The simultaneous least fixed of the system
xi fi(x) 1 ? i ?n
for i 1 to n do xi ? WL 1, 2, ,
n while (WL ? ? ) do select and remove an
element i ? WL new fi(x) if (new ?
xi) then xi new Add
all the indexes that directly depends on i to WL
x (?, ?, , ?) while (f(x) ? x ) do x
f(x)
12The Abstract Interpretation Technique
- The foundation of program analysis
- Goals
- Establish soundness of (find faults in) a given
program analysis algorithm - Design new program analysis algorithms
- The main ideas
- Relate each step in the algorithm to a step in a
structural semantics - Establish global correctness using a general
theorem - Not limited to a particular form of analysis
13Soundness in Reaching Definitions
- Every reachable definition is detected
- May include more definitions
- Less constants may be identified
- Not all the loop invariant code will be
identified - May warn against uninitailzed variables that are
in fact in initialized - At every elementary block l RDentry(l) includes
all the possibly definitions reaching l - At every elementary block l RDentry(l)
represents all the possible concrete states
arising when the structural operational semantics
reaches l
14Proof of Soundness
- Define an appropriate structural operational
semantics - Define collecting structural operational
semantics - Establish a Galois connection between collecting
states and reaching definitions - (Local correctness) Show that the abstract
interpretation of every atomic statement is
soundw.r.t. the collecting semantics - (Global correctness) Conclude that the analysis
is sound CC1976
15Structural Operational Semantics to justify
Reaching Definitions
- Normal states Var ?Z are not enough
- Instrumented states Var ?Z ? Var ?Lab
- For an instrumented state (s, def) and variable
xdef(x) holds the last definition of x
16Instrumented Structural Semantics for While
asssos ltx al, (s, d)gt ? (sx ?A?a?s, d(x
?l)) skipsos ltskipl, (s, d)gt ? (s, d)
axioms
rules
17Instrumented Structural Semantics if construct
18Instrumented Structural Semantics while construct
whilesos ltwhile bl do S, (s, d)gt ?
ltif bl then (S while bl do S) else
skip, (s, d)gt
19The Factorial Program
y x1z 12 while ygt13 do ( z z
y4 y y - 15 ) y 06
20Code Instrumentation
- Alternative instrumentation
- Generate an equivalent program which maintains
more information - Use standard structural operational semantics
21Other Consumers of Instrumentation
- Specialized interpreters
- Code Instrumentation
- Performance analysis qpt
- count the number of executions of basic blocks or
the number of calls to a function - Profiling Tools
- Find hot paths (paths that are executed often)
by remembering which edge in the control flow
graph was executed - Cleanness Tools Purify, Insure
- identify uninitialized objects
22Collecting (Instrumented) Semantics
- The input state is not known at compile-time
- Collect all the (instrumented) states for all
possible inputs to the program - No lost of precision
23Flow Information for While
- Associate labels with program statements
describing when statements begin and end - initStm?Lab
- init(x al) l
- init(skipl) l
- init(S1 S2) init(S1)
- init(if bl then S1 else S2) l
- init(while bl do S) l
- finalStm?P(Lab)
- final(x al) l
- final(skipl) l
- final(S1 S2) final(S2)
- final(if bl then S1 else S2) final(S1)?
final(S2) - final(while bl do S) l
24Collecting (Instrumented) Semantics(Cont)
- The input state is not known at compile-time
- Collect all the (instrumented) states for all
possible inputs to the program - Define d?Var ?Lab by d?(x)?
- CSentry(l) (s, d)?s0 (P, (s0, d?) ? (S,
(s, d)),
init(S)l - Soundness w.r.t. operational semanticsFor all
(s, d) in CSentry (l) For all variable x
(x, d(l)) ?RDentry(l) - Optimality w.r.t. operational semantics
25The Factorial Program
y x1z 12 while ygt13 do ( z z
y4 y y - 15 ) y 06
26An Iterative Definition
- Generate a system of monotonic equations
- The least solution is well-defined
- The least solution is the collecting
interpretation
27Equations Generated for Collecting Interpretation
- Equations for elementary statements
- skiplCSexit(1) CSentry(l)
- blCSexit(1) CSentry(l)
- x alCSexit(1) (sx ?A?a?s, d(x ?l))
(s, d) ? CSentry(l) - Equations for control flow constructs CSentry(l)
? CSexit(l) l immediately precedes l in the
control flow graph - An equation for the entryCSentry(1) (s0, d?)
s0 ? Var ?Z
28The Least Solution
- 12 sets of equationsCSentry(1), , CSexit (6)
- Can be written in vectorial form
- The least solution lfp(Fcs) is well-defined
- Every component is minimal
- Since Fcs is monotonic such a solution always
exists - CSentry(l) (s, d)?s0 (P, (s0, d?) ? (S,
(s, d)),
init(S)l - Simplify the soundness criteria
29Abstract (Conservative) interpretation
abstract representation
30The Abstraction Function
- Map collecting states into reaching definitions
- The abstraction of an individual state?Var
?Z ? Var ?Lab ? P(Var ? Lab)?(s,d) (x,
d(x) x ? Var - The abstraction of set of states ?P(Var ?Z ?
Var ?Lab) ? P(Var ? Lab) ?(CS) ? (s, d)
? CS ?(s,d) (x, d(x) (s, d)
? CS, x ? Var - Soundness ?(CSentry (l)) ? RDentry(l)
- Optimality
31The Concretization Function
- Map reaching definitions into collecting states
- The formal meaning of reaching definitions
- The concretization ? P(Var ? Lab) ? P(Var
?Z ? Var ?Lab) ? (RD) (s, d) ? x ?
Var (x, d(x) ? RD (s, d)
?(s, d) ? RD - Soundness CSentry (l) ? ? (RDentry(l))
- Optimality
32Galois Connections
- The pair of functions (?, ?) form a Galois
connection if ? CS ? P(Var ?Z ? Var
?Lab) ? RD? P(Var ? Lab) ?(CS) ?
RD iff CS ? ? (RD) - Alternatively? CS ? P(Var ?Z ? Var ?Lab)
? RD? P(Var ? Lab) ?(? (RD)) ? RD
and CS ? ? (?(CS)) - ? and ? uniquely determine each other
33Local Concrete Semantics
- For every atomic statement S
- ?S ? Var ?Z ? Var ?Lab ?Var ?Z ?
Var ?Lab - ?x al ?((s, d)) (sx ?A?a?s, d(x ?l))
- ?skipl ?((s, d)) (s, d)
- ?bl ?((s, d)) (s, d)
34Local Abstract Semantics
- For every atomic statement S
- ?S ? P(Var ?Lab) ? P(Var ?Lab)
- ?x al ? (RD) (RD - (x, l) l ? Lab
) ? (x, l) - ?skipl ? (RD) (RD)
- ?bl ? (RD) (RD)
35Local Soundness
- For every atomic statement S show one of the
following - ?(?S?(s, d) (s, d) ?CS ? ?S? (?(CS))
- ?S?(s, d) (s, d) ? ? (RD) ? ? (?S? (RD))
- ?(?S?(s, d) (s, d) ? ? (RD)) ? ?S? (RD)
- The above condition implies global soundness
Cousot Cousot 1976 ?(CSentry (l)) ?
RDentry(l) CSentry (l) ? ? (RDentry(l))
36Proof of Soundness (Summary)
- Define an appropriate structural operational
semantics - Define collecting structural operational
semantics - Establish a Galois connection between collecting
states and reaching definitions - (Local correctness) Show that the abstract
interpretation of every atomic statement is
soundw.r.t. the collecting semantics - (Global correctness) Conclude that the analysis
is sound
37Induced Analysis (Relatively Optimal)
- It is sometimes possible to show that a given
analysis is not only sound but optimal w.r.t. the
chosen abstraction (but not necessarily optimal) - Define ?S? (RD) ?(?S?(s, d) (s, d) ? ?
(RD)) - But this ?S? may not be computable
- Derive (at compiler-generation time) an
alternative form for ?S? - A useful measure to decide if the abstraction
must lead to overly imprecise results
38Type and Effect Systems
- The type of a program expression at a given
program point provides a conservative estimation
to its value in all the execution paths - A type system provides a syntax directed rules
for annotating expressions with types - Simple type inference algorithms are linear
- But in Ada, ML, ABC
- But types can also include implementation
information such as reaching definitions
39Annotated Type Base for Reaching Definitions
- S RD1 ? RD2 if S is executed when the reaching
definitions is RD1 it produces reaching
definitions RD2 - Similar to the constraint based approach
40Annotated Type Base for Reaching Definitions
ass x al RD ? (RD - (x, l) l ? Lab
) ? (x, l) skip skipl RD ? RD
axioms
rules
41Annotated Type Base For While while construct
42Annotated Type Base For While subsumption rule
43Not Covered
- Effect Systems
- Transformations
44Conclusions
- Three similar techniques
- Dataflow analysis
- Constraint based approach (a generalization)
- Type and effect system (directly deals with the
syntax) - Abstract interpretation can be used to show
soundness of these methods - But more convenient in the dataflow setting
- We are ready for more sophisticated analyses