Static Program Analysis - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

Static Program Analysis

Description:

Additional cost: state size X # of program points ... Assume we want to compute reaching definition for the following code. 1. x=1; 2. while ... – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 31

Provided by: hpcu59

Category:

more less

Transcript and Presenter's Notes

Title: Static Program Analysis

1
Static Program Analysis

Xiangyu Zhang

2
Outline

Spectrum of program analysis
Dynamic analysis
Model checking
Static analysis
Data flow analysis

3
Spectrum of Program Analysis

Dynamic analysis
Reasoning about specific executions.
Model checking
Reasoning about all feasible executions.
Both symbolic and explicit state model checker
has that capability
Static program analysis
Reasoning about all possible executions
(including infeasible ones)

1. x1
2. read (y)
3. if (y 10)
If (y
x--
6.
7. z10/x
8. z10/y

4
Static Program Analysis

Flow sensitive analyses
The order of statements matters
Need a control flow graph
Flow insensitive analyses
The order of statements doesnt matter
Analysis is the same regardless of statement
order

5
Example of Flow Insensitive Analysis

What variables does a program modify?

Note G(s1s2) G(s2s1)

6
Example of Flow Sensitive Analysis

Division by zero problem

1. x1
2. read (y)
3. if (y 10)
If (y
x--
6.
7. z10/x
8. z10/y

7
The Advantage

Flow-sensitive analyses require a model of
program state at each program point
E.g., liveness analysis, reaching definitions,
Flow-insensitive analyses require only a single
global state
E.g., for G, the set of all variables modified

8
Notes on Flow Sensitivity

Flow insensitive analyses seem weak, but
Flow sensitive analyses are hard to scale to very
large programs
Additional cost state size X of program points
Beyond 1000s of lines of code, only flow
insensitive analyses have been shown to scale (by
Alex Aiken, 2001)

9
The Essence of Flow Sensitive Analysis

The goal is to achieve results that are
equivalent to enumerating all possible execution
paths and combine the results.
Problems of doing so
Path explosion
Loops/recursions
The most popular solution
Data flow analysis

10
The Essence of Data Flow Analysis (1) - Merging

Reaching definition find out the set of
definitions that can reach any program point. A
definition is a statement that defines a
variable.
RD is very useful
A naïve way is to enumerate all the paths.

1. If (p1)
RDin (4) 2, 3 RDout (5) 3, 5 RDin (8)
2,3, 5
2. x
3. y
4. if (x)

7.
8.

In DFA, merging is used to achieve efficiency.
Analysis results are merged at control flow joint
points and the merged results are used in further
analysis (like dynamic programming).

11
The Conservativeness of Merging

Merging loses accuracy

1. If (P)
2. x-1
3. else
x1
5. yxx

12
The Essence of DFA (2) Termination

In the presence of loops and recursion,
termination is a problem.
Assume we want to compute reaching definition for
the following code.

1. x1
2. while ()
xx1
5.

In DFA, the analysis designer has to promise
Finite domain
Monotonic computation.
What if we change the analysis a little bit.

13
Reaching definitions generalized

Computed information at a program point is a set
of var ? stmt bindings
eg x? s1, x ?s2, y ? s3
How do we get the previous info we wanted?
if the incoming info of a stmt whose is in,
then s (x ? s) E in
This is a common pattern
generalize the problem to define what information
should be computed at each program point
use the computed information at the program
points to get the original info we wanted

14
1. If (p1)
2. x
3. y
4. if (x)

7.
8.
15
Constraints for reaching definitions
in
out (in x ? s s E stmts ) U x ? s
s x ...
out

out in x ? s x E must-point-to(p)
s E stmts
U x ? s x E may-point-to(p)

in
s p ...
out
16
Constraints for reaching definitions
in
out 0 in out 1 in
s if (...)
out0
out1
more generally for all i . out i in
in0
in1
out in 0 U in 1
merge
more generally out ? i in i
out
17
Flow functions

The constraint for a statement kind s often has
the form out Fs(in)
Fs is called a flow function
other names for it dataflow function, transfer
function
Given information in before statement s, Fs(in)
returns information after statement s
Distributivity.
If so, no precision loss by merging.

in
s x ...
out in x ? s s E stmts U x ? s
out
18
The Problem of Loops

If there is no loop, the topological order can be
adopted to evaluate transfer functions of
statements.
What if loops?

19
Solution iterate!

Initialize all sets to the empty
Store all nodes onto a worklist
while worklist is not empty
remove node n from worklist
apply flow function for node n
update the appropriate set, and add nodes whose
inputs have changed back onto worklist

1. x1
2. while ()
xx1
5.

20
Termination

How do we know the algorithm terminates?
Because
results change monotonically
the domain is finite

21
Monotonicity

Operation f is monotonic if
X Y f(x) f(y)
We require that all operations be monotonic
Easy to check for the set operations (merging)
Easy to check for all transfer functions recall

in
s x ...
out in x ? s s E stmts U x ? s
out
22
Termination again

To see the algorithm terminates
start with empty sets
sets increase with each update
Sets can only grow to a max finite size
Together, these imply termination
Partial order and lattice

23
Where is Dataflow Analysis Useful?

Best for flow-sensitive, intraprocedural,
distributive problems on small pieces of code
E.g., the examples weve seen and many others
Extremely efficient algorithms are known
Use different representation than control-flow
graph, but not fundamentally different

24
Where is Dataflow Analysis Weak?