Title: Abstract Interpretation and Future Program Analysis Problems
1Abstract Interpretation and Future Program
Analysis Problems
- Martin Rinard
- Alexandru Salcianu
- Laboratory for Computer Science
- Massachusetts Institute of Technology
2Abstract InterpretationThe Early Years
- Formal Connection Between
- Sound analysis of program
- Execution of program
- Broader Impact
- Insight that analysis is execution
- Reduced need to think of analysis as reasoning
about all possible executions! - Good fit with analysis problems of that era
- Properties of local variables
- Within single procedure
3How Is Abstract Interpretation Holding Up?
- Technical result as relevant as ever
- Moores Law effects
- Much more computing power for analysis
- More complex programs
- Ambitious analyses
- Heap properties
- Multiple threads
- Interprocedural partial program analyses
- Stretch intuitive vision of analysis as execution
4Outline
- Combined pointer and escape analysis
- Rationale behind design decisions
- Alternative choices in design space
- Challenges and Predictions
- Bigger Picture
5Goal of Pointer Analysis
- Characterize objects to which pointers point
- Synthesize finite set of object representatives
- Derive representative(s) each pointer points to
p
f
r
p.f points to a object, so after the
execution of r p.f, r may point to a
object, but not to a , , or object
6Our Pointer Analysis Goals
- Accurate for multithreaded programs
- Compositional, partial program analysis
- Analyze each procedure once
- Independently of callers
- May skip analysis of invoked procedures
- Why?
- Parts of program unavailable (different
language, not written yet) - Parts may be irrelevant for desired result
7Analysis Abstraction
- Basic abstraction Is Points-to Graph
- Nodes represent objects in heap
- Edges represent references in heap
f
p
f
f
q
f
u
8Two Kinds of Edges
- Inside edges (solid) represent references
created inside analyzed part of program - Outside edges (dashed) represent references
created outside analyzed part of program
f
p
f
f
q
f
u
9Two Kinds of Nodes
- Inside nodes (solid) represent objects created
inside analyzed part of program - Outside nodes (dashed) represent objects
- Created outside analyzed part of program, or
- Accessed via edges created outside analyzed part
of program
f
p
f
f
q
f
u
10Key Question
- What does the heap look like when the procedure
begins its execution? - Previous algorithms analyzed callers before
callees, so model of heap always available - Unfortunately, this approach requires analysis of
entire program in top-down fashion - Our solution use code to reconstruct what
(accessed part of) heap must look like
11Analysis In Example
m(p, q) r new C() p.f r s q do s
s.f until (s null) t new C() s.f
t u new C()
p
q
12Analysis In Example
m(p, q) r new C() p.f r s q do s
s.f until (s null) t new C() s.f
t u new C()
p
r
q
13Analysis In Example
f
m(p, q) r new C() p.f r s q do s
s.f until (s null) t new C() s.f
t u new C()
p
r
q
14Analysis In Example
f
m(p, q) r new C() p.f r s q do s
s.f until (s null) t new C() s.f
t u new C()
p
r
q
s
15Analysis In Example
f
m(p, q) r new C() p.f r s q do s
s.f until (s null) t new C() s.f
t u new C()
p
r
f
q
s
16Analysis In Example
f
m(p, q) r new C() p.f r s q do s
s.f until (s null) t new C() s.f
t u new C()
p
r
f
f
q
s
One option continue to expand graph But the
analysis may never terminate
17Analysis In Example
f
m(p, q) r new C() p.f r s q do s
s.f until (s null) t new C() s.f
t u new C()
p
r
f
f
q
s
Instead have one outside node per load
statement Represents all objects loaded at that
statement Bounds graph and guarantees termination
18Consequences of This Decision
- Multiple objects represented by single node (load
node in loop) - But can also have single object represented by
multiple nodes in graph (!!)
(object loaded at multiple statements)
f
do a q.f until (a null) do b q.f until
(b null)
f
q
f
f
19Consequences of This Decision
- Form of points-to graph depends on program
- Programs with identical behavior but different
graphs
f
f
p
p
f
r
r
f
f
f
f
q
q
s
s
do s s.f until (s null)
s s.f while (s ! null) s s.f
20Analysis In Example
f
m(p, q) r new C() p.f r s q do s
s.f until (s null) t new C() s.f
t u new C()
p
r
f
f
q
s
21Analysis In Example
f
m(p, q) r new C() p.f r s q do s
s.f until (s null) t new C() s.f
t u new C()
p
r
f
f
q
s
t
22Analysis In Example
f
m(p, q) r new C() p.f r s q do s
s.f until (s null) t new C() s.f
t u new C()
p
r
f
f
f
q
s
t
23Analysis In Example
f
m(p, q) r new C() p.f r s q do s
s.f until (s null) t new C() s.f
t u new C()
p
r
f
f
f
q
s
t
u
24What Does Result Tell Us?
- Nodes (outside)
- Created outside analyzed part of program
- Incomplete information
- Nodes (inside, escaped)
- Created inside analyzed part of program
- But reachable from unanalyzed part of program
- Incomplete information
f
p
r
f
f
f
q
s
t
u
- Nodes (inside, captured)
- Created inside analyzed part of program
- Unreachable from unanalyzed part of program
- Complete information about referencing
relationships!
25Crucial Distinction
- Escaped vs. Captured
- Enables analysis to identify regions of heap
where it has complete information - Crucial for both
- Accuracy of analysis
- Effective use of analysis results
f
p
r
f
f
f
q
s
t
u
26Multiple Calling Contexts
f
- Two Key Assumptions
- p and q refer to different objects
- Parallel threads may access objects
p
r
f
f
f
q
m(p, q) r new C() p.f r s q do s
s.f until (s null) t new C() s.f
t u new C()
s
t
27Multiple Calling Contexts
What if p and q refer to the same object? (i.e. p
and q aliased)
m(p, q) r new C() p.f r s q do s
s.f until (s null) t new C() s.f
t u new C()
r
f
f
p
f
f
q
s
t
28Multiple Calling Contexts
f
p
What if p and q refer to the same object and
there are no parallel threads?
r
f
f
f
q
m(p, q) r new C() p.f r s q do s
s.f until (s null) t new C() s.f
t u new C()
s
t
r
f
f
p
f
f
q
s
t
29Multiple Calling Contexts
What if p and q refer to the same object and
there are no parallel threads?
m(p, q) r new C() p.f r s q do s
s.f until (s null) t new C() s.f
t u new C()
r
p
f
f
q
s
t
30Issues
- Substantially different results for different
calling contexts - But caller is unavailable at analysis time
- New analysis for each possible context?
- Lots of contexts
- Most of which probably wont be needed
31Our Solution
f
p
- Analyze assuming
- Distinct parameters
- Parallel threads
- Aliased parameters at caller? Merge nodes
- No parallel threads? Remove outside edges and
nodes
r
f
f
f
q
s
t
r
f
f
p
f
f
q
s
t
32Solution Is Not Perfect
- Specialization can lose precision can have two
procedures such that when analyzed with - Distinct parameters same analysis result
- Aliased parameters - different analysis result
- Conceptually complex analysis
- Think about all contexts during analysis
- Start to lose intuition of analysis as execution
- Difficult time applying abstract interpretation
framework
33Abstract Interpretation and Analysis
Abstract interpretation is parameterized framework
- V concrete values
- A abstract values
- ? - abstraction function
- ? - concretization function
ta
a1
a2
?
?
?
?
tv
v1
v2
34Applying Framework
- A points-to graphs
- V concrete heaps
- ? - points-to graph for a given heap
- Points-to graph depends on program
- Need to augment heap with access history
- ? - all heaps that correspond to points-to graph
- OK, I give up
35Correctness Proof
- Inductively construct a relation ? between
- Objects in heap
- Nodes that represent objects
- Invariants that characterize ?
- Transfer function
- Takes points-to graph and ?
- Give new points-to graph and ?
- Prove that transfer functions preserve invariants
36Threads and Abstract Interpretation
- Philosophy of Abstract Interpretation
- Come up with a decent abstraction
- Execute program on that abstraction
- Problem with threads
- Execution usually modeled as interleaving
- Too many interleavings!
37Our Solution
- Points-to graphs explicitly represent all
possible interactions between parallel threads - Basic Analysis Approach
- Analyze each thread in isolation
- To compute combined effect of multiple threads
- Retrieve result for each thread
- Compute interactions that may occur
Outside edges Interactions in which one thread
reads a reference created by parallel thread
Inside Edges Interactions in which one thread
creates a reference read by parallel thread
38Interthread Analysis
n(p,q) m(p,q)
39Interthread Analysis
n(p,q) m(p,q)
p
q
q
Retrieve points-to graph from analysis of each
thread
40Interthread Analysis
n(p,q) m(p,q)
p
q
q
Establish correspondence between nodes
Start with parameter nodes
41Interthread Analysis
n(p,q) m(p,q)
p
q
q
- Compute Interactions Between Threads
- Match inside and outside edges
- For each outside node, compute nodes in other
graph that it represents
42Interthread Analysis
n(p,q) m(p,q)
p
q
q
- Compute Interactions Between Threads
- Match inside and outside edges
- For each outside node, compute nodes in other
graph that it represents
43Interthread Analysis
n(p,q) m(p,q)
p
q
q
- Use computed representation relationship to
- combine graphs and
- obtain single graph for the execution of both
threads
q
44Property of Analysis
- Flow-sensitive within each thread (if reorder
statements, get different result) - Flow-insensitive between threads
- Assumes interactions can happen
- Any number of times
- In any order
- Analysis models interactions that cant actually
happen in any interleaved execution
45Imprecision Due To Flow Insensitivity
n(a,b,c) 1pb.f p.fa 2a.fb
m(a,c) 3qa.f 4q.fc
Interthread Analysis Result
Execution Order Required to Produce Blue Edge
a
1
3
b
2
4
c
46Weak Memory Consistency Models
47Initially y1 x0
Thread 2
Thread 1
y0
z xy
x1
What is value of z?
48Initially y1 x0
Three Interleavings
z xy
y0
Thread 2
Thread 1
z xy
y0
y0
x1
x1
z xy
z 0
z 1
x1
y0
What is value of z?
x1
z xy
z 1
49Initially y1 x0
Three Interleavings
z xy
y0
Thread 2
Thread 1
z xy
y0
y0
x1
x1
z xy
z 0
z 1
x1
y0
What is value of z?
x1
z can be 0 or 1
z xy
z 1
50Initially y1 x0
Three Interleavings
z xy
y0
Thread 2
Thread 1
z xy
INCORRECT REASONING!
y0
y0
x1
x1
z xy
z 0
z 1
x1
y0
What is value of z?
x1
z can be 0 or 1
z xy
z 1
51Initially y1 x0
Memory system can reorder writes as long as it
preserves illusion of sequential execution within
each thread!
Thread 2
Thread 1
y0
y0
z xy
z xy
x1
x1
What is value of z?
Different threads can observe different orders!
z can be 0 or 1 OR 2!
52Implications for Example
n(a,b,c) 1pb.f p.fa 2a.fb
m(a,c) 3qa.f 4q.fc
Interthread Analysis Result
Blue Edge Can Actually Occur in Some Execution!
a
Cant reason about program by interleaving
statements
1
3
b
2
4
c
53Implications for Analysis of Multithreaded
Programs
- Analyzing all statement interleavings is unsound
- We believe that our flow-insensitive analysis is
sound even for weak consistency models - But formal semantics of weak memory consistency
models still under development - Maessen, Arvind, Shen OOPSLA 2000
- Manson, Pugh Java Grande/ISCOPE 2001
- Unclear how to prove ANY analysis sound
54Challenges and Predictions
55Need To Analyze Partial Programs
- Fact of life - whole program may be either
- Unavailable,
- Infeasible to analyze, or
- Unnecessary to analyze
- Challenges
- What is starting context(s) for analysis?
- What is effect of invoked but unanalyzed parts of
program? - Especially difficult for linked data structures
56Need To Analyze Partial Programs
- Predictions
- Future analyses will not use presented technique
- Care about more sophisticated properties
- Need more information about calling context
- Many potential calling contexts never used
- Analysis will instead start with specification
- Provided by programmer
- Automatically guessed by unsound static analysis
heuristic or dynamic analysis - Then automatically verify specification
57Multithreaded Programs
- Challenge too many potential executions
- Prediction more two phase analyses
- Phase One
- Analyze each thread in isolation
- Represent potential interactions between analyzed
thread and other threads - Phase Two
- Collect results from parallel threads
- Compute interactions between threads
58Multithreaded Programs
- Prediction
- Language will enforce more structured model
- Enhanced type system
- Force threads to interact only at explicit
synchronization points - Development of structured analyses
- Analyze single thread in isolation between
synchronization points - Apply potential interaction effects only at
synchronization points
59Weak Memory Consistency Models
- Challenges
- Lack of good formal semantics
- Explosion in possible program behaviors
- Short Term Prediction
- Development of formal semantics
- Flow-insensitive analyses proved sound
- Long Term Prediction
- Structured model will force threads to interact
only at synchronization points - Eliminate visibility of weak models
60Trends
- More sophisticated properties
- Harsher analysis environments
- Partial programs
- Threads with weak consistency models
- Role of abstract interpretation
- Intuition of analysis as execution breaking down
as analyses become more ambitious - Analyses starting to look like verifications
- Synthesis of loop invariants
- Synthesizing global view of computation
61Bigger Picture
No idea what program should do
Can write full formal specification for program
Correctness Crucial
?
Program verification
?
Abstract Interpretation
Dynamic Analyses
?
?
Unsound Static Analyses
?
?
Dont care if program works reliably or not