Title: Program%20Analysis%20via%20Graph%20Reachability
1Program Analysis via Graph Reachability
- Thomas Reps
- University of Wisconsin
Joint work with S. Horwitz, M. Sagiv, G. Rosay,
and D. Melski
21987
1993
1994
1995
1996
1997
1998
3More Recently
- Flow-insensitive points-to analysis
- An undecidability result
- context-sensitive structure-transmitted
data-dependence analysis - Model checking of recursive hierarchical
finite-state machines - infinite-state systems
- CFL-reachability/circularity queries
- linear-, quadratic-, and cubic-time algorithms
4Other Applications of CFL-Reachability
- Analysis of attribute grammars
- CFL-recognition
- ? ? L(G)?
- 2DPDA- and 2NPDA-recognition
- ? ? L(M)?
- String-matching problems
- Ping-pong protocols in distributed systems
Dolev, Even, Karp 83
5Outline
- Interprocedural slicing
- Interprocedural dataflow analysis
- Demand-driven analysis
- (Model-checking of recursive HFSMs)
6Program Slicing
- The backward slice w.r.t variable v at program
point p The program subset that may influence
the value of - variable v at point p.
- The forward slice w.r.t variable v at program
point p - The program subset that may be influenced by
- the value of variable v at point p.
7Backward Slice
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
8Backward Slice
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Backward slice with respect to statement
printf(d\n,i)
9Forward Slice
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
10Forward Slice
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Forward slice with respect to statement sum 0
11Who Cares About Slices?
- Understanding programs
- Restructuring Programs
- Program Specialization and Reuse
- Program Differencing
- Testing (and Retesting)
- Year 2000 Problem
- Automatic Differentiation
12What Are Slices Useful For?
- Understanding Programs
- What is affected by what?
- Restructuring Programs
- Isolation of separate computational threads
- Program Specialization and Reuse
- Slices specialized programs
- Only reuse needed slices
- Program Differencing
- Compare slices to identify changes
- Testing
- What new test cases would improve coverage?
- What regression tests must be rerun after a
change?
13Line-Character-Count Program
void line_char_count(FILE f) int lines
0 int chars BOOL eof_flag FALSE int
n extern void scan_line(FILE f, BOOL bptr,
int iptr) scan_line(f, eof_flag, n) chars
n while(eof_flag FALSE) lines lines
1 scan_line(f, eof_flag, n) chars chars
n printf(lines d\n,
lines) printf(chars d\n, chars)
14Character-Count Program
void char_count(FILE f) int lines 0 int
chars BOOL eof_flag FALSE int n extern
void scan_line(FILE f, BOOL bptr, int
iptr) scan_line(f, eof_flag, n) chars
n while(eof_flag FALSE) lines lines
1 scan_line(f, eof_flag, n) chars chars
n printf(lines d\n,
lines) printf(chars d\n, chars)
15Line-Character-Count Program
void line_char_count(FILE f) int lines
0 int chars BOOL eof_flag FALSE int
n extern void scan_line(FILE f, BOOL bptr,
int iptr) scan_line(f, eof_flag, n) chars
n while(eof_flag FALSE) lines lines
1 scan_line(f, eof_flag, n) chars chars
n printf(lines d\n,
lines) printf(chars d\n, chars)
16Line-Count Program
void line_count(FILE f) int lines 0 int
chars BOOL eof_flag FALSE int n extern
void scan_line2(FILE f, BOOL bptr, int
iptr) scan_line2(f, eof_flag, n) chars
n while(eof_flag FALSE) lines lines
1 scan_line2(f, eof_flag, n) chars
chars n printf(lines d\n,
lines) printf(chars d\n, chars)
17How are Slices Computed?
- Reachability in a Dependence Graph
- Program Dependence Graph (PDG)
- Dependences within one procedure
- Intraprocedural slicing is reachability in one
PDG - System Dependence Graph (SDG)
- Dependences within entire system
- Interprocedural slicing is reachability in the SDG
18How is a PDG Created?
- Control Flow Graph (CFG)
- PDG is union of
- Control Dependence Graph
- Flow Dependence Graph
- computed from CFG
19Control Flow Graph
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Enter
F
sum 0
i 1
printf(sum)
printf(i)
while(i lt 11)
T
sum sum i
i i i
20Control Dependence Graph
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Control dependence
q is reached from p if condition p is true (T),
not otherwise.
p
q
T
Similar for false (F).
p
q
F
Enter
T
T
T
T
T
T
sum 0
i 1
printf(sum)
printf(i)
while(i lt 11)
T
T
sum sum i
i i i
21Flow Dependence Graph
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Flow dependence
Value of variable assigned at p may be used at q.
p
q
Enter
i 1
sum 0
printf(sum)
printf(i)
while(i lt 11)
sum sum i
i i i
22Program Dependence Graph (PDG)
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Control dependence
Flow dependence
Enter
T
T
T
T
T
T
sum 0
i 1
printf(sum)
printf(i)
while(i lt 11)
T
T
sum sum i
i i i
23Program Dependence Graph (PDG)
int main() int i 1 int sum 0 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Enter
T
T
T
T
T
T
sum 0
i 1
printf(sum)
printf(i)
while(i lt 11)
T
T
sum sum i
i i i
24Backward Slice
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Enter
T
T
T
T
T
T
sum 0
i 1
printf(sum)
printf(i)
while(i lt 11)
T
T
sum sum i
i i i
25Backward Slice (2)
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Enter
T
T
T
T
T
T
sum 0
i 1
printf(sum)
printf(i)
while(i lt 11)
T
T
sum sum i
i i i
26Backward Slice (3)
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Enter
T
T
T
T
T
T
sum 0
i 1
printf(sum)
printf(i)
while(i lt 11)
T
T
sum sum i
i i i
27Backward Slice (4)
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Enter
T
T
T
T
T
T
sum 0
i 1
printf(sum)
printf(i)
while(i lt 11)
T
T
sum sum i
i i i
28Slice Extraction
int main() int i 1 while (i lt 11)
i i 1 printf(d\n,i)
Enter
T
T
T
T
i 1
printf(i)
while(i lt 11)
T
i i i
29CodeSurfer
30(No Transcript)
31CodeSurfer
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36Interprocedural Slice
int main() int sum 0 int i 1 while (i
lt 11) sum add(sum,i) i
add(i,1) printf(d\n,sum) printf(d\n,i
)
int add(int x, int y) return x y
37Interprocedural Slice
int main() int sum 0 int i 1 while (i
lt 11) sum add(sum,i) i
add(i,1) printf(d\n,sum) printf(d\n,i
)
int add(int x, int y) return x y
Backward slice with respect to statement
printf(d\n,i)
38Interprocedural Slice
int main() int sum 0 int i 1 while (i
lt 11) sum add(sum,i) i
add(i,1) printf(d\n,sum) printf(d\n,i
)
int add(int x, int y) return x y
Superfluous components included by Weisers
slicing algorithm TSE 84 Left out by algorithm
of Horwitz, Reps, Binkley PLDI 88 TOPLAS 90
39How is an SDG Created?
- Each PDG has nodes for
- entry point
- procedure parameters and function result
- Each call site has nodes for
- call
- arguments and function result
- Appropriate edges
- entry node to parameters
- call node to arguments
- call node to entry node
- arguments to parameters
40System Dependence Graph (SDG)
Enter main
Call p
Call p
Enter p
41SDG for the Sum Program
Enter main
while(i lt 11)
sum 0
i 1
printf(sum)
printf(i)
Call add
Call add
yin i
xin sum
sum xout
xin i
yin 1
i xout
Enter add
x xin
y yin
x x y
xout x
42Interprocedural Backward Slice
Enter main
Call p
Call p
Enter p
43Interprocedural Backward Slice (2)
Enter main
Call p
Call p
Enter p
44Interprocedural Backward Slice (3)
Enter main
Call p
Call p
Enter p
45Interprocedural Backward Slice (4)
Enter main
Call p
Call p
Enter p
46Interprocedural Backward Slice (5)
Enter main
Call p
Call p
Enter p
47Interprocedural Backward Slice (6)
Enter main
Call p
Call p
Enter p
48Matched-Parenthesis Path
49Interprocedural Backward Slice (6)
Enter main
Call p
Call p
Enter p
50Interprocedural Backward Slice (7)
Enter main
Call p
Call p
Enter p
51Slice Extraction
Enter main
Call p
Enter p
52Slice of the Sum Program
Enter main
while(i lt 11)
i 1
printf(i)
Call add
xin i
yin 1
i xout
Enter add
x xin
y yin
x x y
xout x
53CFL-ReachabilityYannakakis 90
- G Graph
- L A context-free language
- L-path from s to t iff
- Running time O(N 3)
54(No Transcript)
55Degenerate Case CFL-Recognition
56CFL-Reachability via Dynamic Programming
Graph
Grammar
B
C
57Interprocedural Slicingvia CFL-Reachability
- Graph System dependence graph
- L L(matched)
- Node m is in the slice w.r.t. n iff there is an
L(matched)-path from m to n
58Asymptotic Running Time Reps, Horwitz, Sagiv,
Rosay 94
- CFL-reachability
- System dependence graph N nodes, E edges
- Running time O(N 3)
- System dependence graph Special structure
Running time O(E CallSites MaxParams3)
59pointer analysis? points-to analysis? shape
analysis? alias analysis?
60Cross-Cutting Issues
- Context-sensitive/context-insensitive analysis
- interprocedural slicing
- interprocedural dataflow analysis
- Pointers and heap-allocated storage
- Flow-sensitive/flow-insensitive analysis
- Andersens points-to analysis
- Scalability
61The Need for Pointer Analysis
int main() int sum 0 int i 1 int p
sum int q i int (f)(int,int)
add while (q lt 11) p (f)(p,q)
q (f)(q,1) printf(d\n,p)
printf(d\n,q)
int add(int x, int y) return x y
62The Need for Pointer Analysis
int main() int sum 0 int i 1 int p
sum int q i int (f)(int,int)
add while (i lt 11) sum add(sum,i)
i add(i,1) printf(d\n,sum)
printf(d\n,i)
int add(int x, int y) return x y
63Flow-Sensitive Points-To Analysis
p q
p q
p q
p q
64Flow-Sensitive ? Flow-Insensitive
65Flow-Insensitive Points-To AnalysisAndersen 94,
Shapiro Horwitz 97
p q
p q
p q
p q
66Flow-Insensitive Points-To Analysis
a
a e b a c f b c d a
e
b
c
f
d
67Flow-Insensitive Points-To Analysis
- Andersen Thesis 94
- Formulated using set constraints
- Cubic-time algorithm
- Shapiro Horwitz (1995 POPL 97)
- Re-formulated as a graph-grammar problem
- Reps (1995 unpublished)
- Re-formulated as a Horn-clause program
- Melski (1996 see Reps, IST98)
- Re-formulated via CFL-reachability
68CFL-Reachability via Dynamic Programming
Graph
Grammar
B
C
69CFL-Reachability Chain Programs
Graph
Grammar
B
C
a(X,Z) - b(X,Y), c(Y,Z).
70Base Facts for Points-To Analysis
p q
assignAddr(p,q).
p q
assign(p,q).
p q
assignStar(p,q).
p q
starAssign(p,q).
71Rules for Points-To Analysis (I)
pointsTo(P,Q) - assignAddr(P,Q).
pointsTo(P,R) - assign(P,Q), pointsTo(Q,R).
72Rules for Points-To Analysis (II)
pointsTo(P,S) - assignStar(P,Q),pointsTo(Q,R),poi
ntsTo(R,S).
pointsTo(R,S) - starAssign(P,Q),pointsTo(P,R),poi
ntsTo(Q,S).
73Rules for Points-To Analysis (II)
pointsTo(P,S) - assignStar(P,Q),pointsTo(Q,R),poi
ntsTo(R,S).
pointsTo(R,S) - starAssign(P,Q),pointsTo(P,R),poi
ntsTo(Q,S).
pointsTo(R,S) - pointsTo(P,R),starAssign(P,Q),poi
ntsTo(Q,S).
74Creating a Chain Program
pointsTo(R,S) - starAssign(P,Q),pointsTo(P,R),poi
ntsTo(Q,S).
pointsTo(R,S) - pointsTo(P,R),starAssign(P,Q),poi
ntsTo(Q,S).
75Base Facts for Points-To Analysis
p q
assignAddr(p,q).
p q
assign(p,q).
p q
assignStar(p,q).
p q
starAssign(p,q).
76Creating a Chain Program
pointsTo(P,Q) - assignAddr(P,Q).
pointsTo(P,R) - assign(P,Q), pointsTo(Q,R).
pointsTo(P,S) - assignStar(P,Q),pointsTo(Q,R),poi
ntsTo(R,S).
77. . . and now to CFL-Reachability
78Points-To Analysis as CFL-Reachability
Consequences
- Points-to analysis solvable in time cubic in the
number of variables - Known previously Andersen 94
- Demand algorithms
- What does variable p point to?
- Issue query ?- pointsTo(p, Q).
- Solve single-source L(pointsTo)-reachability
problem - What variables point to q?
- Issue query ?- pointsTo(P, q).
- Solve single-target L(pointsTo)-reachability
problem
791987
1993
1994
1995
1996
1997
1998
80Structure-Transmitted Dependences Reps1995
McCarthys equations car(cons(x,y)) x
cdr(cons(x,y)) y
w cons(x,y) v car(w)
81Set Constraints
Semantics of Set Constraints
82CFL-ReachabilityversusSet Constraints
- Lazy languages CFL-reachability is more natural
- car(cons(X,Y)) X
- Strict languages Set constraints are more
natural - car(cons(X,Y)) X, provided I(Y) g v
- But . . . SC and CFL-reachability are equivalent!
- Melski Reps 97
83Solving Set Constraints
84Simulating Inhabited
W
85Simulating Inhabited
86Simulating Provided I(Y) g v
87SC CFL-Reachability Consequences
- Demand algorithm for SC
- SC is log-space complete for PTIME
- Limitations on ability to parallelize algorithms
for solving set-constraint problems
88Outline
- Interprocedural slicing
- Interprocedural dataflow analysis
- Demand-driven analysis
- (Model-checking of recursive HFSMs)
891987
1993
1994
1995
1996
1997
1998
90Dataflow Analysis
- Goal For each point in the program, determine a
superset of the facts that could possibly hold
during execution - Examples
- Constant propagation
- Reaching definitions
- Live variables
- Possibly uninitialized variables
91Useful For . . .
- Optimizing compilers
- Parallelizing compilers
- Tools that detect possible logical errors
- Tools that show the effects of a proposed
modification
92Possibly Uninitialized Variables
w,x,y
w,y
w,y
w,y
w
w,y
93Precise Intraprocedural Analysis
C
94if . . .
95Precise Interprocedural Analysis
ret
C
n
start
Sharir Pnueli 81
96Representing Dataflow Functions
Identity Function
a
b
c
Constant Function
97Representing Dataflow Functions
a
b
c
Gen/Kill Function
a
b
c
Non-Gen/Kill Function
98if . . .
99Composing Dataflow Functions
100x
y
a
b
if . . .
101Interprocedural Dataflow Analysisvia
CFL-Reachability
- Graph Exploded control-flow graph
- L L(matched)
- Fact d holds at n iff there is an L(matched)-path
from
102Asymptotic Running Time Reps, Horwitz, Sagiv
95
- CFL-reachability
- Exploded control-flow graph ND nodes
- Running time O(N3D3)
- Exploded control-flow graph Special
structure
Running time O(ED3)
Typically E l N
Gen/kill problems O(ED)
103Why Bother?Were only interested in
million-line programs
- Know thy enemy!
- Any algorithm must do these operations
- Avoid pitfalls (e.g., claiming quadratic-time
algorithm) - Special cases
- Gen/kill problems O(ED)
- Compression techniques
- Basic blocks
- SSA form
- Demand algorithms
104Outline
- Interprocedural slicing
- Interprocedural dataflow analysis
- Demand-driven analysis
- (Model-checking of recursive HFSMs)
105Exhaustive Versus Demand Analysis
- Exhaustive analysis All facts at all points
- Optimization Concentrate on inner loops
- Program-understanding tools Only some facts are
of interest - Demand analysis
- Does a given fact hold at a given point?
- Which facts hold at a given point?
- At which points does a given fact hold?
106Exhaustive Versus Demand Analysis
- Exhaustive analysis All facts at all points
- Optimization Concentrate on inner loops
- Program-understanding tools Only some facts are
of interest
107Demand Analysis and LP Queries (I)
- Flow-insensitive analysis
- Does variable x point to y?
- ?- pointsTo(x, y).
- What does variable x point to?
- ?- pointsTo(x, Y).
- What variables point to y?
- ?- pointsTo(X, y).
108Demand Analysis and LP Queries (II)
- Flow-sensitive analysis
- Does a given fact f hold at a given point p?
- ?- dfFact(p, f).
- Which facts hold at a given point p?
- ?- dfFact(p, F).
- At which points does a given fact f hold?
- ?- dfFact(P, f).
- E.g., flow-sensitive points-to analysis
- ?- dfFact(p, pointsTo(x, Y)).
- ?- dfFact(P, pointsTo(x, y)).
- etc.
109if . . .
110Experimental ResultsHorwitz , Reps, Sagiv
1995
- 53 C programs (200-6,700 lines)
- For a single fact of interest
- Demand algorithm always better than exhaustive
algorithm - All appropriate demands beats exhaustive when
percentage of yes answers is high - Live variables
- Truly live variables
- Constant predicates
- . . .
111Path Problems
- Static analysis
- context-free reachability
- Path profiling
- path counting
- Model checking
- reachability
- cyclicity
- Testing
- identifying non-executable paths
112Outline
- Interprocedural slicing
- Interprocedural dataflow analysis
- Demand-driven analysis
- (Model-checking of recursive HFSMs)
113Model-Checking of Recursive HFSMs Benedikt,
Godefroid, Reps (in prep.)
- Non-recursive HFSMs Alur Yannakakis 98
- Ordinary FSMs
- T-reachability/circularity queries
- Recursive HFSMs
- Matched-parenthesis T-reachability/circularity
- Key observation Linear-time algorithms for
matched-parenthesis T-reachability/cyclicity - Single-entry/multi-exit or multi-entry/single-exi
t - Deterministic, multi-entry/multi-exit
114Recursive HFSMs Data Complexity
115Recursive HFSMs Data Complexity
116But . . . ?
- Model checking
- Huge graphs (10100 reachable states)
- Reachability/circularity queries
- Represent implicitly (OBDDs)
- Dataflow analysis
- Large graphs
- e.g., Stmts ?Vars (? 1011)
- CFL-reachability queries Reps,Horwitz,Sagiv 95
- OBDDs blew up Siff Reps 95 (unpub.)
- . . . yes, we tried the usual tricks . . .
117CFL-Reachability Scope of Applicability
- Static analysis
- Slicing, DFA, structure-transmitted dep.,
points-to analysis - Formal-language theory
- CF-, 2DPDA-, 2NPDA-recognition
- Attribute-grammar analysis
- Verification
- Model-checking recursive HFSMs
- Ping-pong protocols Dolev, Even, Karp 83
118CFL-Reachability Benefits
- Algorithms
- Demand exhaustive
- Complexity
- Linear-, quadratic-, cubic-time algorithms
- PTIME-completeness
- Variants that are undecidable
- Complementary to
- Equations
- Set constraints
- Types
- . . .
119Most Significant Contributions 1987-2000
- Asymptotically fastest algorithms
- Interprocedural slicing
- Interprocedural dataflow analysis
- Demand algorithms
- All appropriate demands may beat exhaustive
- Tool for slicing and browsing ANSI C
- Slices programs as large as 60,000 lines
- University research distribution
- Commercial product CodeSurfer (GrammaTech, Inc.)
- CFL-reachability as unifying conceptual model
- Kou 77, HolleyRosen 81, CooperKennedy 88,
Callahan 88, Horwitz,Reps,Binkley 88, . . . - Identifies fundamental bottlenecks (e.g.,
cubic-time barrier)
120Path Problems
- Static analysis
- context-free reachability
- Path profiling
- path counting
- Model checking
- reachability
- cyclicity
- Testing
- identifying non-executable paths
121Ball-Larus Intraprocedural Path Profiling
NumPathsToExit(Exit) 1
122Melski-Reps Interprocedural Path Profiling
123Automatic Differentiation
124Automatic Differentiation
double F(double x) int i double ans
1.0 for(i 1 i lt n i) ans ans
fi(x) return ans
double delta . . . / small constant
/ double F(double x) return (F(xdelta) -
F(x)) / delta
125Automatic Differentiation
double F (double x) int i double ans
1.0 for(i 1 i lt n i) ans ans
fi(x) return ans
126Automatic Differentiation
double F(double x) int i double ans
0.0 double ans 1.0 for(i 1 i lt n i)
ans ans fi(x) ans fi(x) ans
ans fi(x) return ans
127Automatic Differentiation
x1
y1
xi
yj1
xm
yn