Title: Static Program Analysis
1Static Program Analysis
2Outline
- Spectrum of program analysis
- Dynamic analysis
- Model checking
- Static analysis
- Data flow analysis
3Spectrum of Program Analysis
- Dynamic analysis
- Reasoning about specific executions.
- Model checking
- Reasoning about all feasible executions.
- Both symbolic and explicit state model checker
has that capability - Static program analysis
- Reasoning about all possible executions
(including infeasible ones)
- 1. x1
- 2. read (y)
- 3. if (y 10)
- If (y
- x--
- 6.
- 7. z10/x
- 8. z10/y
4Static Program Analysis
- Flow sensitive analyses
- The order of statements matters
- Need a control flow graph
- Flow insensitive analyses
- The order of statements doesnt matter
- Analysis is the same regardless of statement
order
5Example of Flow Insensitive Analysis
- What variables does a program modify?
6Example of Flow Sensitive Analysis
- 1. x1
- 2. read (y)
- 3. if (y 10)
- If (y
- x--
- 6.
- 7. z10/x
- 8. z10/y
7The Advantage
- Flow-sensitive analyses require a model of
program state at each program point - E.g., liveness analysis, reaching definitions,
- Flow-insensitive analyses require only a single
global state - E.g., for G, the set of all variables modified
8Notes on Flow Sensitivity
- Flow insensitive analyses seem weak, but
- Flow sensitive analyses are hard to scale to very
large programs - Additional cost state size X of program points
- Beyond 1000s of lines of code, only flow
insensitive analyses have been shown to scale (by
Alex Aiken, 2001)
9The Essence of Flow Sensitive Analysis
- The goal is to achieve results that are
equivalent to enumerating all possible execution
paths and combine the results. - Problems of doing so
- Path explosion
- Loops/recursions
- The most popular solution
- Data flow analysis
10The Essence of Data Flow Analysis (1) - Merging
- Reaching definition find out the set of
definitions that can reach any program point. A
definition is a statement that defines a
variable. - RD is very useful
- A naïve way is to enumerate all the paths.
1. If (p1)
RDin (4) 2, 3 RDout (5) 3, 5 RDin (8)
2,3, 5
2. x
3. y
4. if (x)
7.
8.
- In DFA, merging is used to achieve efficiency.
Analysis results are merged at control flow joint
points and the merged results are used in further
analysis (like dynamic programming).
11The Conservativeness of Merging
- 1. If (P)
- 2. x-1
- 3. else
- x1
- 5. yxx
12The Essence of DFA (2) Termination
- In the presence of loops and recursion,
termination is a problem. - Assume we want to compute reaching definition for
the following code.
- In DFA, the analysis designer has to promise
- Finite domain
- Monotonic computation.
- What if we change the analysis a little bit.
13Reaching definitions generalized
- Computed information at a program point is a set
of var ? stmt bindings - eg x? s1, x ?s2, y ? s3
- How do we get the previous info we wanted?
- if the incoming info of a stmt whose is in,
then s (x ? s) E in - This is a common pattern
- generalize the problem to define what information
should be computed at each program point - use the computed information at the program
points to get the original info we wanted
141. If (p1)
2. x
3. y
4. if (x)
7.
8.
15Constraints for reaching definitions
in
out (in x ? s s E stmts ) U x ? s
s x ...
out
- out in x ? s x E must-point-to(p)
- s E stmts
- U x ? s x E may-point-to(p)
in
s p ...
out
16Constraints for reaching definitions
in
out 0 in out 1 in
s if (...)
out0
out1
more generally for all i . out i in
in0
in1
out in 0 U in 1
merge
more generally out ? i in i
out
17Flow functions
- The constraint for a statement kind s often has
the form out Fs(in) - Fs is called a flow function
- other names for it dataflow function, transfer
function - Given information in before statement s, Fs(in)
returns information after statement s - Distributivity.
- If so, no precision loss by merging.
in
s x ...
out in x ? s s E stmts U x ? s
out
18The Problem of Loops
- If there is no loop, the topological order can be
adopted to evaluate transfer functions of
statements. - What if loops?
19Solution iterate!
- Initialize all sets to the empty
- Store all nodes onto a worklist
- while worklist is not empty
- remove node n from worklist
- apply flow function for node n
- update the appropriate set, and add nodes whose
inputs have changed back onto worklist
20Termination
- How do we know the algorithm terminates?
- Because
- results change monotonically
- the domain is finite
21Monotonicity
- Operation f is monotonic if
- X Y f(x) f(y)
- We require that all operations be monotonic
- Easy to check for the set operations (merging)
- Easy to check for all transfer functions recall
in
s x ...
out in x ? s s E stmts U x ? s
out
22Termination again
- To see the algorithm terminates
- start with empty sets
- sets increase with each update
- Sets can only grow to a max finite size
- Together, these imply termination
- Partial order and lattice
23Where is Dataflow Analysis Useful?
- Best for flow-sensitive, intraprocedural,
distributive problems on small pieces of code - E.g., the examples weve seen and many others
- Extremely efficient algorithms are known
- Use different representation than control-flow
graph, but not fundamentally different
24Where is Dataflow Analysis Weak?
25Data Structures
- Not good at analyzing data structures
- Works well for atomic values
- Labels, constants, variable names
- Not easily extended to arrays, lists, trees, etc.
26The Heap
- Good at analyzing flow of values in local
variables - No notion of the heap in traditional dataflow
applications - Aliasing
27Beyond Procedures
- Standard dataflow techniques for handling
inter-procedures analyses dont scale well
int x main () 1. A( ) 2. B( )
A () 11. x10 12. F ( ) 13.
B () 21. x20 22. F ( )
F ()
28Flow Sensitivity (Beyond Procedures)
- Flow sensitive analyses are standard for
analyzing single procedures - Not used (or not aware of uses) for whole
programs (2001) - Too expensive
29The Call Graph
- Dataflow analysis requires a call graph
- Or something close
- Inadequate for higher-order programs
- First class functions
- Object-oriented languages with dynamic dispatch
- Call-graph hinders algorithmic efficiency
30Coming Back The Essence of Static Analysis
- Examine the program text (no execution)
- Build a model of the program state
- An abstract of the run-time state
- Reason over the possible behaviors.
- E.g. run the program over the abstract state
- The property an analysis needs to promise is that
it TERMINATES - Slogan of many researchers
Finite Lattices Monotonic Functions Static
Program Analysis