Static Program Analysis - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Static Program Analysis

Description:

Additional cost: state size X # of program points ... Assume we want to compute reaching definition for the following code. 1. x=1; 2. while ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 31
Provided by: hpcu59
Category:

less

Transcript and Presenter's Notes

Title: Static Program Analysis


1
Static Program Analysis
  • Xiangyu Zhang

2
Outline
  • Spectrum of program analysis
  • Dynamic analysis
  • Model checking
  • Static analysis
  • Data flow analysis

3
Spectrum of Program Analysis
  • Dynamic analysis
  • Reasoning about specific executions.
  • Model checking
  • Reasoning about all feasible executions.
  • Both symbolic and explicit state model checker
    has that capability
  • Static program analysis
  • Reasoning about all possible executions
    (including infeasible ones)
  • 1. x1
  • 2. read (y)
  • 3. if (y 10)
  • If (y
  • x--
  • 6.
  • 7. z10/x
  • 8. z10/y

4
Static Program Analysis
  • Flow sensitive analyses
  • The order of statements matters
  • Need a control flow graph
  • Flow insensitive analyses
  • The order of statements doesnt matter
  • Analysis is the same regardless of statement
    order

5
Example of Flow Insensitive Analysis
  • What variables does a program modify?
  • Note G(s1s2) G(s2s1)

6
Example of Flow Sensitive Analysis
  • Division by zero problem
  • 1. x1
  • 2. read (y)
  • 3. if (y 10)
  • If (y
  • x--
  • 6.
  • 7. z10/x
  • 8. z10/y

7
The Advantage
  • Flow-sensitive analyses require a model of
    program state at each program point
  • E.g., liveness analysis, reaching definitions,
  • Flow-insensitive analyses require only a single
    global state
  • E.g., for G, the set of all variables modified

8
Notes on Flow Sensitivity
  • Flow insensitive analyses seem weak, but
  • Flow sensitive analyses are hard to scale to very
    large programs
  • Additional cost state size X of program points
  • Beyond 1000s of lines of code, only flow
    insensitive analyses have been shown to scale (by
    Alex Aiken, 2001)

9
The Essence of Flow Sensitive Analysis
  • The goal is to achieve results that are
    equivalent to enumerating all possible execution
    paths and combine the results.
  • Problems of doing so
  • Path explosion
  • Loops/recursions
  • The most popular solution
  • Data flow analysis

10
The Essence of Data Flow Analysis (1) - Merging
  • Reaching definition find out the set of
    definitions that can reach any program point. A
    definition is a statement that defines a
    variable.
  • RD is very useful
  • A naïve way is to enumerate all the paths.

1. If (p1)
RDin (4) 2, 3 RDout (5) 3, 5 RDin (8)
2,3, 5
2. x
3. y
4. if (x)
  • x

7.
8.
  • In DFA, merging is used to achieve efficiency.
    Analysis results are merged at control flow joint
    points and the merged results are used in further
    analysis (like dynamic programming).

11
The Conservativeness of Merging
  • Merging loses accuracy
  • 1. If (P)
  • 2. x-1
  • 3. else
  • x1
  • 5. yxx

12
The Essence of DFA (2) Termination
  • In the presence of loops and recursion,
    termination is a problem.
  • Assume we want to compute reaching definition for
    the following code.
  • 1. x1
  • 2. while ()
  • xx1
  • 5.
  • In DFA, the analysis designer has to promise
  • Finite domain
  • Monotonic computation.
  • What if we change the analysis a little bit.

13
Reaching definitions generalized
  • Computed information at a program point is a set
    of var ? stmt bindings
  • eg x? s1, x ?s2, y ? s3
  • How do we get the previous info we wanted?
  • if the incoming info of a stmt whose is in,
    then s (x ? s) E in
  • This is a common pattern
  • generalize the problem to define what information
    should be computed at each program point
  • use the computed information at the program
    points to get the original info we wanted

14
1. If (p1)
2. x
3. y
4. if (x)
  • x

7.
8.
15
Constraints for reaching definitions
in
out (in x ? s s E stmts ) U x ? s
s x ...
out
  • out in x ? s x E must-point-to(p)
  • s E stmts
  • U x ? s x E may-point-to(p)

in
s p ...
out
16
Constraints for reaching definitions
in
out 0 in out 1 in
s if (...)
out0
out1
more generally for all i . out i in
in0
in1
out in 0 U in 1
merge
more generally out ? i in i
out
17
Flow functions
  • The constraint for a statement kind s often has
    the form out Fs(in)
  • Fs is called a flow function
  • other names for it dataflow function, transfer
    function
  • Given information in before statement s, Fs(in)
    returns information after statement s
  • Distributivity.
  • If so, no precision loss by merging.

in
s x ...
out in x ? s s E stmts U x ? s
out
18
The Problem of Loops
  • If there is no loop, the topological order can be
    adopted to evaluate transfer functions of
    statements.
  • What if loops?

19
Solution iterate!
  • Initialize all sets to the empty
  • Store all nodes onto a worklist
  • while worklist is not empty
  • remove node n from worklist
  • apply flow function for node n
  • update the appropriate set, and add nodes whose
    inputs have changed back onto worklist
  • 1. x1
  • 2. while ()
  • xx1
  • 5.

20
Termination
  • How do we know the algorithm terminates?
  • Because
  • results change monotonically
  • the domain is finite

21
Monotonicity
  • Operation f is monotonic if
  • X Y f(x) f(y)
  • We require that all operations be monotonic
  • Easy to check for the set operations (merging)
  • Easy to check for all transfer functions recall

in
s x ...
out in x ? s s E stmts U x ? s
out
22
Termination again
  • To see the algorithm terminates
  • start with empty sets
  • sets increase with each update
  • Sets can only grow to a max finite size
  • Together, these imply termination
  • Partial order and lattice

23
Where is Dataflow Analysis Useful?
  • Best for flow-sensitive, intraprocedural,
    distributive problems on small pieces of code
  • E.g., the examples weve seen and many others
  • Extremely efficient algorithms are known
  • Use different representation than control-flow
    graph, but not fundamentally different

24
Where is Dataflow Analysis Weak?
  • Lots of places

25
Data Structures
  • Not good at analyzing data structures
  • Works well for atomic values
  • Labels, constants, variable names
  • Not easily extended to arrays, lists, trees, etc.

26
The Heap
  • Good at analyzing flow of values in local
    variables
  • No notion of the heap in traditional dataflow
    applications
  • Aliasing

27
Beyond Procedures
  • Standard dataflow techniques for handling
    inter-procedures analyses dont scale well

int x main () 1. A( ) 2. B( )
A () 11. x10 12. F ( ) 13.
B () 21. x20 22. F ( )
F ()
28
Flow Sensitivity (Beyond Procedures)
  • Flow sensitive analyses are standard for
    analyzing single procedures
  • Not used (or not aware of uses) for whole
    programs (2001)
  • Too expensive

29
The Call Graph
  • Dataflow analysis requires a call graph
  • Or something close
  • Inadequate for higher-order programs
  • First class functions
  • Object-oriented languages with dynamic dispatch
  • Call-graph hinders algorithmic efficiency

30
Coming Back The Essence of Static Analysis
  • Examine the program text (no execution)
  • Build a model of the program state
  • An abstract of the run-time state
  • Reason over the possible behaviors.
  • E.g. run the program over the abstract state
  • The property an analysis needs to promise is that
    it TERMINATES
  • Slogan of many researchers

Finite Lattices Monotonic Functions Static
Program Analysis
Write a Comment
User Comments (0)
About PowerShow.com