Title: Interprocedural Dataflow Analysis in the Presence of Large Libraries
1Interprocedural Dataflow Analysis in the Presence
of Large Libraries
- Atanas (Nasko) Rountev
- Scott Kagan
- Ohio State University
- Thomas Marlowe
- Seton Hall University
2Uses of Interprocedural Dataflow Analysis
- Performance optimizations in compilers
- Software understanding and transformation
- e.g. dependence analysis for program slicing,
change impact analysis, refactoring, etc. - Software testing
- e.g. dataflow-based testing testing of object
interactions in OO software - Software checking
- e.g. object protocols open(readwrite)close
3Model for Interprocedural Whole-Program Analysis
Engine for Whole- Program Dataflow Analysis
code for C1
dataflow solution for C1 C2 Cn
code for C2
code for Cn
- Components C1, C2, , Cn form a complete program
- Assumption it is possible and desirable to
analyze the source code of the entire program
4A Specific Case Main Lib
code for Main
Engine for Whole- Program Dataflow Analysis
dataflow solution for Main Lib
code for Lib
- Main Lib form a complete program
- What if we are using large libraries that need to
be re-analyzed from scratch? - e.g. the standard Java libraries contain about
10,000 classes and 80,000 methods - need to be re-analyzed with every new Main
component
5Example Methods in Java Programs
6A Specific Case Main Lib
Summary Generation Analysis
code for Lib
summary for Lib
Engine for Whole-Program Dataflow Analysis
dataflow solution for Main
code for Main
summary for Lib
- Goal the solution for Main should be as good as
the solution that would have been computed by a
whole-program analysis (no loss of precision)
7Functional Approach to Whole-Program Analysis
- Sharir-Pnueli 1981
- Dataflow lattice L
- Edge function f L ? L for effects of a statement
- Path function f fn ? fn-1 ? ? f2 ? f1
- Phase 1 summary functions fn L ? L
- solution at node n as a function of the solution
at the entry of ns procedure - Phase 2 solutions at start nodes of procedures
- Phase 3 solutions at the remaining nodes
8Example Functional Approach
f6 f13 ? f1 ? f0
f28 f8 ? f7 f21 f4 ? f5 ? (f28 ? f6) f13
(f21 ? f2) ? (f21 ? f3)
9Callbacks
- Callbacks
- e.g. function pointers in C
- e.g. virtual dispatch in C and Java
- Can no longer determine f21 and f13 without code
for ext
10Library Summary
- Idea run pieces of phase 1
- Compute functions for sets of library-local paths
f id f f8 ? f7 ? f6 f f4 ? f5 f
f2 ? f3 f id
1416
1421
1721
7 11
1213
11Library Summary Generation
- Fixed call in the library
- always invokes the same library procedure
independent of code for main component - Fixed procedure in the library
- makes no calls, or
- makes only fixed calls, to fixed procedures
- standard functional approach can be applied
- For any other procedure, compute f
- k is the start node, or
- k is a return from a non-fixed call, or
- k is a return from a fixed call to a non-fixed
procedure
k n
12Example Library Summary Generation
- Fixed calls
- 11-12 and 23-24
- Non-fixed calls
- 16-17
- Fixed procedures
- p3
- Non-fixed procedures
- p1 and p2
- Contexts k for f
- 7 and 14 start nodes
- 17 return from a non-fixed call
- 12 return from a fixed call to a non-fixed
procedure
k n
13The Condensed Graph
f id f f8 ? f7 ? f6 f f4 ? f5 f
f2 ? f3 f id
1416
1421
1721
7 11
1213
14Analysis of a Main Component
- Create a fake graph for the whole program
- Run a whole-program analysis engine
- Safe solutions for non-library nodes
- precise for distributive problems
15Original vs. Condensed Library CFGs Number of
Nodes
16Original vs. Condensed Library CFGs Number of
Edges
17Discussion
- Flow and context insensitivity
- Cost reduction time and memory
- Compact representation of functions
- IFDS, IDE
- Use assumptions about the callback methods?
- e.g. assume callback methods are good