Abstract Interpretation and Future Program Analysis Problems - PowerPoint PPT Presentation

About This Presentation

Title:

Abstract Interpretation and Future Program Analysis Problems

Description:

Good fit with analysis problems of that era. Properties of local variables ... this approach requires analysis of entire program in top-down fashion ... – PowerPoint PPT presentation

Number of Views:58

Avg rating:3.0/5.0

Slides: 62

Provided by: martin49

Learn more at: https://people.csail.mit.edu

Category:

more less

Transcript and Presenter's Notes

Title: Abstract Interpretation and Future Program Analysis Problems

1
Abstract Interpretation and Future Program
Analysis Problems

Martin Rinard
Alexandru Salcianu
Laboratory for Computer Science
Massachusetts Institute of Technology

2
Abstract InterpretationThe Early Years

Formal Connection Between
Sound analysis of program
Execution of program
Broader Impact
Insight that analysis is execution
Reduced need to think of analysis as reasoning
about all possible executions!
Good fit with analysis problems of that era
Properties of local variables
Within single procedure

3
How Is Abstract Interpretation Holding Up?

Technical result as relevant as ever
Moores Law effects
Much more computing power for analysis
More complex programs
Ambitious analyses
Heap properties
Multiple threads
Interprocedural partial program analyses
Stretch intuitive vision of analysis as execution

4
Outline

Combined pointer and escape analysis
Rationale behind design decisions
Alternative choices in design space
Challenges and Predictions
Bigger Picture

5
Goal of Pointer Analysis

Characterize objects to which pointers point
Synthesize finite set of object representatives
Derive representative(s) each pointer points to

r p.f

f
r
p.f points to a object, so after the
execution of r p.f, r may point to a
object, but not to a , , or object
6
Our Pointer Analysis Goals

Accurate for multithreaded programs
Compositional, partial program analysis
Analyze each procedure once
Independently of callers
May skip analysis of invoked procedures
Why?
Parts of program unavailable (different
language, not written yet)
Parts may be irrelevant for desired result

7
Analysis Abstraction

Basic abstraction Is Points-to Graph
Nodes represent objects in heap
Edges represent references in heap

f
p
f
f
q
f
u
8
Two Kinds of Edges

Inside edges (solid) represent references
created inside analyzed part of program
Outside edges (dashed) represent references
created outside analyzed part of program

f
p
f
f
q
f
u
9
Two Kinds of Nodes

Inside nodes (solid) represent objects created
inside analyzed part of program
Outside nodes (dashed) represent objects
Created outside analyzed part of program, or
Accessed via edges created outside analyzed part
of program

f
p
f
f
q
f
u
10
Key Question

What does the heap look like when the procedure
begins its execution?
Previous algorithms analyzed callers before
callees, so model of heap always available
Unfortunately, this approach requires analysis of
entire program in top-down fashion
Our solution use code to reconstruct what
(accessed part of) heap must look like

11
Analysis In Example
m(p, q) r new C() p.f r s q do s
s.f until (s null) t new C() s.f
t u new C()
p
q
12
Analysis In Example
m(p, q) r new C() p.f r s q do s
s.f until (s null) t new C() s.f
t u new C()
p
r
q
13
Analysis In Example
f
m(p, q) r new C() p.f r s q do s
s.f until (s null) t new C() s.f
t u new C()
p
r
q
14
Analysis In Example
f
m(p, q) r new C() p.f r s q do s
s.f until (s null) t new C() s.f
t u new C()
p
r
q
s
15
Analysis In Example
f
m(p, q) r new C() p.f r s q do s
s.f until (s null) t new C() s.f
t u new C()
p
r
f
q
s
16
Analysis In Example
f
m(p, q) r new C() p.f r s q do s
s.f until (s null) t new C() s.f
t u new C()
p
r
f
f
q
s
One option continue to expand graph But the
analysis may never terminate
17
Analysis In Example
f
m(p, q) r new C() p.f r s q do s
s.f until (s null) t new C() s.f
t u new C()
p
r
f
f
q
s
Instead have one outside node per load
statement Represents all objects loaded at that
statement Bounds graph and guarantees termination
18
Consequences of This Decision

Multiple objects represented by single node (load
node in loop)
But can also have single object represented by
multiple nodes in graph (!!)
(object loaded at multiple statements)

f
do a q.f until (a null) do b q.f until
(b null)
f
q
f
f
19
Consequences of This Decision

Form of points-to graph depends on program
Programs with identical behavior but different
graphs

f
f
p
p
f
r
r
f
f
f
f
q
q
s
s
do s s.f until (s null)
s s.f while (s ! null) s s.f
20
Analysis In Example
f
m(p, q) r new C() p.f r s q do s
s.f until (s null) t new C() s.f
t u new C()
p
r
f
f
q
s
21
Analysis In Example
f
m(p, q) r new C() p.f r s q do s
s.f until (s null) t new C() s.f
t u new C()
p
r
f
f
q
s
t
22
Analysis In Example
f
m(p, q) r new C() p.f r s q do s
s.f until (s null) t new C() s.f
t u new C()
p
r
f
f
f
q
s
t
23
Analysis In Example
f
m(p, q) r new C() p.f r s q do s
s.f until (s null) t new C() s.f
t u new C()
p
r
f
f
f
q
s
t
u
24
What Does Result Tell Us?

Nodes (outside)
Created outside analyzed part of program
Incomplete information
Nodes (inside, escaped)
Created inside analyzed part of program
But reachable from unanalyzed part of program
Incomplete information

f
p
r
f
f
f
q
s
t
u

Nodes (inside, captured)
Created inside analyzed part of program
Unreachable from unanalyzed part of program
Complete information about referencing
relationships!

25
Crucial Distinction

Escaped vs. Captured
Enables analysis to identify regions of heap
where it has complete information
Crucial for both
Accuracy of analysis
Effective use of analysis results

f
p
r
f
f
f
q
s
t
u
26
Multiple Calling Contexts
f

Two Key Assumptions
p and q refer to different objects
Parallel threads may access objects

p
r
f
f
f
q
m(p, q) r new C() p.f r s q do s
s.f until (s null) t new C() s.f
t u new C()
s
t
27
Multiple Calling Contexts
What if p and q refer to the same object? (i.e. p
and q aliased)
m(p, q) r new C() p.f r s q do s
s.f until (s null) t new C() s.f
t u new C()
r
f
f
p
f
f
q
s
t
28
Multiple Calling Contexts
f
p
What if p and q refer to the same object and
there are no parallel threads?
r
f
f
f
q
m(p, q) r new C() p.f r s q do s
s.f until (s null) t new C() s.f
t u new C()
s
t
r
f
f
p
f
f
q
s
t
29
Multiple Calling Contexts
What if p and q refer to the same object and
there are no parallel threads?
m(p, q) r new C() p.f r s q do s
s.f until (s null) t new C() s.f
t u new C()
r
p
f
f
q
s
t
30
Issues

Substantially different results for different
calling contexts
But caller is unavailable at analysis time
New analysis for each possible context?
Lots of contexts
Most of which probably wont be needed

31
Our Solution
f
p

Analyze assuming
Distinct parameters
Parallel threads
Aliased parameters at caller? Merge nodes
No parallel threads? Remove outside edges and
nodes

r
f
f
f
q
s
t
r
f
f
p
f
f
q
s
t
32
Solution Is Not Perfect

Specialization can lose precision can have two
procedures such that when analyzed with
Distinct parameters same analysis result
Aliased parameters - different analysis result
Conceptually complex analysis
Think about all contexts during analysis
Start to lose intuition of analysis as execution
Difficult time applying abstract interpretation
framework

33
Abstract Interpretation and Analysis
Abstract interpretation is parameterized framework

V concrete values
A abstract values

? - abstraction function
? - concretization function

ta
a1
a2
?
?
?
?
tv
v1
v2
34
Applying Framework

A points-to graphs
V concrete heaps
? - points-to graph for a given heap
Points-to graph depends on program
Need to augment heap with access history
? - all heaps that correspond to points-to graph
OK, I give up

35
Correctness Proof

Inductively construct a relation ? between
Objects in heap
Nodes that represent objects
Invariants that characterize ?
Transfer function
Takes points-to graph and ?
Give new points-to graph and ?
Prove that transfer functions preserve invariants

36
Threads and Abstract Interpretation

Philosophy of Abstract Interpretation
Come up with a decent abstraction
Execute program on that abstraction
Problem with threads
Execution usually modeled as interleaving
Too many interleavings!

37
Our Solution

Points-to graphs explicitly represent all
possible interactions between parallel threads
Basic Analysis Approach
Analyze each thread in isolation
To compute combined effect of multiple threads
Retrieve result for each thread
Compute interactions that may occur

Outside edges Interactions in which one thread
reads a reference created by parallel thread
Inside Edges Interactions in which one thread
creates a reference read by parallel thread
38
Interthread Analysis
n(p,q) m(p,q)
39
Interthread Analysis
n(p,q) m(p,q)
p
q
q
Retrieve points-to graph from analysis of each
thread
40
Interthread Analysis
n(p,q) m(p,q)
p
q
q
Establish correspondence between nodes
Start with parameter nodes
41
Interthread Analysis
n(p,q) m(p,q)
p
q
q

Compute Interactions Between Threads
Match inside and outside edges
For each outside node, compute nodes in other
graph that it represents

42
Interthread Analysis
n(p,q) m(p,q)
p
q
q

Compute Interactions Between Threads
Match inside and outside edges
For each outside node, compute nodes in other
graph that it represents

43
Interthread Analysis
n(p,q) m(p,q)
p
q
q

Use computed representation relationship to
combine graphs and
obtain single graph for the execution of both
threads

q
44
Property of Analysis

Flow-sensitive within each thread (if reorder
statements, get different result)
Flow-insensitive between threads
Assumes interactions can happen
Any number of times
In any order
Analysis models interactions that cant actually
happen in any interleaved execution

45
Imprecision Due To Flow Insensitivity
n(a,b,c) 1pb.f p.fa 2a.fb
m(a,c) 3qa.f 4q.fc

Interthread Analysis Result
Execution Order Required to Produce Blue Edge
a
1
3
b
2
4
c
46
Weak Memory Consistency Models
47
Initially y1 x0
Thread 2
Thread 1
y0
z xy
x1
What is value of z?
48
Initially y1 x0
Three Interleavings
z xy
y0
Thread 2
Thread 1
z xy
y0
y0
x1
x1
z xy
z 0
z 1
x1
y0
What is value of z?
x1
z xy
z 1
49
Initially y1 x0
Three Interleavings
z xy
y0
Thread 2
Thread 1
z xy
y0
y0
x1
x1
z xy
z 0
z 1
x1
y0
What is value of z?
x1
z can be 0 or 1
z xy
z 1
50
Initially y1 x0
Three Interleavings
z xy
y0
Thread 2
Thread 1
z xy
INCORRECT REASONING!
y0
y0
x1
x1
z xy
z 0
z 1
x1
y0
What is value of z?
x1
z can be 0 or 1
z xy
z 1
51
Initially y1 x0
Memory system can reorder writes as long as it
preserves illusion of sequential execution within
each thread!
Thread 2
Thread 1
y0
y0
z xy
z xy
x1
x1
What is value of z?
Different threads can observe different orders!
z can be 0 or 1 OR 2!
52
Implications for Example
n(a,b,c) 1pb.f p.fa 2a.fb
m(a,c) 3qa.f 4q.fc

Interthread Analysis Result
Blue Edge Can Actually Occur in Some Execution!
a
Cant reason about program by interleaving
statements
1
3
b
2
4
c
53
Implications for Analysis of Multithreaded
Programs

Analyzing all statement interleavings is unsound
We believe that our flow-insensitive analysis is
sound even for weak consistency models
But formal semantics of weak memory consistency
models still under development
Maessen, Arvind, Shen OOPSLA 2000
Manson, Pugh Java Grande/ISCOPE 2001
Unclear how to prove ANY analysis sound

54
Challenges and Predictions
55
Need To Analyze Partial Programs

Fact of life - whole program may be either
Unavailable,
Infeasible to analyze, or
Unnecessary to analyze
Challenges
What is starting context(s) for analysis?
What is effect of invoked but unanalyzed parts of
program?
Especially difficult for linked data structures

56
Need To Analyze Partial Programs

Predictions
Future analyses will not use presented technique
Care about more sophisticated properties
Need more information about calling context
Many potential calling contexts never used
Analysis will instead start with specification
Provided by programmer
Automatically guessed by unsound static analysis
heuristic or dynamic analysis
Then automatically verify specification

57
Multithreaded Programs

Challenge too many potential executions
Prediction more two phase analyses
Phase One
Analyze each thread in isolation
Represent potential interactions between analyzed
thread and other threads
Phase Two
Collect results from parallel threads
Compute interactions between threads

58
Multithreaded Programs

Prediction
Language will enforce more structured model
Enhanced type system
Force threads to interact only at explicit
synchronization points
Development of structured analyses
Analyze single thread in isolation between
synchronization points
Apply potential interaction effects only at
synchronization points

59
Weak Memory Consistency Models

Challenges
Lack of good formal semantics
Explosion in possible program behaviors
Short Term Prediction
Development of formal semantics
Flow-insensitive analyses proved sound
Long Term Prediction
Structured model will force threads to interact
only at synchronization points
Eliminate visibility of weak models

60
Trends

More sophisticated properties
Harsher analysis environments
Partial programs
Threads with weak consistency models
Role of abstract interpretation
Intuition of analysis as execution breaking down
as analyses become more ambitious
Analyses starting to look like verifications
Synthesis of loop invariants
Synthesizing global view of computation

61
Bigger Picture
No idea what program should do
Can write full formal specification for program
Correctness Crucial
?
Program verification
?
Abstract Interpretation
Dynamic Analyses
?
?
Unsound Static Analyses
?
?
Dont care if program works reliably or not

Write a Comment

User Comments (0)