Title: Pointer Analysis
1Pointer Analysis
- G. Ramalingam
- Microsoft Research, India
2A Constant Propagation Example
x 3 y 4 z x 5
- x is always 3 here
- can replace x by 3
- and replace x5 by 8
- and so on
3A Constant Propagation ExampleWith Pointers
x 3 p 4 z x 5
4A Constant Propagation ExampleWith Pointers
p y x 3 p 4 z x 5
p x x 3 p 4 z x 5
if (?) p x else p y x 3
p 4 z x 5
pointers affect most program analyses
x is always 3
x is always 4
x may be 3 or 4 (i.e., x is unknown in our
lattice)
5A Constant Propagation ExampleWith Pointers
p y x 3 p 4 z x 5
p x x 3 p 4 z x 5
if (?) p x else p y x 3
p 4 z x 5
p always points-to y
p always points-to x
p may point-to x or y
6Points-to Analysis
- Determine the set of targets a pointer variable
could point-to (at different points in the
program) - p points-to x p has l-value x
- targets could be variables or locations in the
heap (dynamic memory allocation) - p x
- p new Foo() or p malloc ()
- must-point-to vs. may-point-to
7A Constant Propagation ExampleWith Pointers
Can p denote the same location as q?
q 3 p 4 z q 5
what values can this take?
8More Terminology
- p and q are said to be aliases (in a given
concrete state) if they represent the same
location - Alias analysis
- determine if a given pair of references could be
aliases at a given program point - p may-alias q
- p must-alias q
9Pointer Analysis
- Points-To Analysis
- may-point-to
- must-point-to
- Alias Analysis
- may-alias
- must-alias
10Points-To AnalysisA Simple Example
p x q y if (?) q p x a y
b z q
11Points-To AnalysisA Simple Example
p x q y if (?) q p x a y
b z q
12Points-To Analysis
x a y b if (?) p x else p
y x c p c
How should we handle this statement?
Strong update
Weak update
Weak update
13Questions
- When is it correct to use a strong update? A weak
update? - Is this points-to analysis precise?
- What does it mean to say
- p must-point-to x at pgm point u
- p may-point-to x at pgm point u
- p must-not-point-to x at u
- p may-not-point-to x at u
14Points-To Analysis, Formally
- We must formally define what we want to compute
before we can answer many such questions
15Static Program Analysis
- A static program analysis computes approximate
information about the runtime behavior of a given
program - The runtime behavior of a given program is
defined by the programming language semantics - The analysis problem defines what information is
desired - The analysis algorithm determines what
approximation to make - 0. The set of valid programs is defined by the
programming language syntax
16Programming Language Syntax
- A program consists of
- a set of variables Var
- a directed graph (V,E,entry) with a distinguished
entry vertex, with every edge labelled by a
primitive statement - A primitive statement is of the form
- x null
- x y
- x y
- x y
- x y
- skip
- (where x and y are variables in Var)
- Omitted (for now)
- Dynamic memory allocation
- Pointer arithmetic
- Structures and fields
- Procedures
17Example Program
Vars x,y,p,a,b,c
x a y b if (?) p x else p
y x c p c
18Programming LanguageOperational Semantics
- Operational semantics an interpreter (defined
mathematically) - State
- Data-State Var -gt (Var U null)
- PC V (the vertex set of the CFG)
- Program-State PC x Data-State
- Initial-state
- (entry, \x. null)
19Example States
Vars x,y,p,a,b,c
20Programming LanguageOperational Semantics
- Meaning of primitive statements
- CSstmt Data-State -gt Data-State
- CS x y s sx x s(y)
- CS x y s sx x s(s(y))
- CS x y s ss(x) x s(y)
- CS x null s sx x null
- CS x y s sx x y
must say what happens if null is dereferenced
21Programming LanguageOperational Semantics
- Meaning of program
- a transition relation on program-states
- Program-State X Program-State
- state1 state2 means that the execution of some
edge in the program can transform state1 into
state2 - Defining
- (u,s) (v,s) iff the program contains a
control-flow edge u-gtv labelled with a statement
stmt such that Mstmts s
22Programming LanguageOperational Semantics
- A sequence of states s1s2 sn is said to be an
execution (of the program) iff - s1 is the Initial-State
- si si1 for 1 lt I lt n
- A state s is said to be a reachable state iff
there exists some execution s1s2 sn is such
that sn s. - Define RS(u) s (u,s) is reachable
23Programming LanguageOperational Semantics
- A sequence of states s1s2 sn is said to be an
execution (of the program) iff - s1 is the Initial-State
- si si1 for 1 lt I lt n
- A state s is said to be a reachable state iff
there exists some execution s1s2 sn is such
that sn s. - Define RS(u) s (u,s) is reachable
All of this formalism for this one definition
24Ideal Points-To AnalysisFormal Definition
- Let u denote a vertex in the CFG
- Define IdealMustPT (u) to be
- (p,x) forall s in RS(u). s(p) x
- Define IdealMayPT (u) to be
- (p,x) exists s in RS(u). s(p) x
25May-Point-To AnalysisFormal Requirement
Specification
26May-Point-To AnalysisFormal Requirement
Specification
Compute R V -gt 2Vars such that R(u) r
IdealMayPT(u)
- An algorithm is said to be correct if the
solution R it computes satisfies - "uÎV. R(u) r IdealMayPT(u)
- An algorithm is said to be precise if the
solution R it computes satisfies - "uÎV. R(u) IdealMayPT(u)
- An algorithm that computes a solution R1 is said
to be more precise than one that computes a
solution R2 if - "uÎV. R1(u) Í R2(u)
27Back To OurMay-Point-To Algorithm
p x q y if (?) q p x a y
b z q
28(May-Point-To Analysis)Algorithm A
- Is this algorithm correct?
- Is this algorithm precise?
- Lets first completely and formally define the
algorithm.
29Algorithm A A Formal DefinitionThe Data Flow
Analysis Recipe
- Define semi-lattice of abstract-values
- AbsDataState Var -gt 2Var
- f1 7 f2 \x. (f1 (x) È f2 (x))
- bottom \x.
- Define initial abstract-value
- InitialAbsState \x. null
- Define transformers for primitive statements
- ASstmt AbsDataState -gt AbsDataState
30Algorithm A A Formal DefinitionThe Data Flow
Analysis Recipe
- Let st(v,u) denote stmt on edge v-gtu
- Compute the least-fixed-point of the following
dataflow equations - x(entry) InitialAbsState
- x(u) 7v-gtu AS(st(v,u)) x(v)
31Algorithm AThe Transformers
- Abstract transformers for primitive statements
- ASstmt AbsDataState -gt AbsDataState
- AS x y s sx x s(y)
- AS x null s sx x null
- AS x y s sx x y
- AS x y s sx x s(s(y))
- where s(v1,,vn) s(v1) È È s(vn)
- AS x y s ???
32Correctness Precision
- We have a complete formal definition of the
problem. - We have a complete formal definition of a
proposed solution. - How do we reason about the correctness
precision of the proposed solution?
33Enter The French Recipe(Abstract Interpretation)
- IdealMayPT (u)
- (p,x) exists s in RS(u). s(p) x
- a ( RS(u) ) where
- a(Y) (p,x) exists s in Y. s(p) x
a
RS(u)
2Data-State
2Var x Var
34MayPT as a least-fixed-point
- Let st(v,u) denote stmt on edge v-gtu
- MayPT is the LFP of the following equations
- MayPT(entry) InitialAbsState
- MayPT(u) 7v-gtu AS(st(v,u)) MayPT(v)
35RS as a least-fixed-point
- Let st(v,u) denote stmt on edge v-gtu
- RS is LFP of following equations
- RS(entry) InitialDataState
- RS(u) 7v-gtu CS(st(v,u)) s s Î RS(v)
- 7v-gtu CS (st(v,u)) RS(v)
36Approximating LFPs
- RS(entry) initial-data-state
- RS(u) 7v-gtu CS(st(v,u)) RS(v)
MayPT(entry) initial-abs-state MayPT(u) 7v-gtu
AS(st(v,u)) MayPT(v)
a
RS(u)
2Data-State
2Var x Var
37Approximating LFPs(Lifting to whole program
solution)
- RS(entry) initial-data-state
- RS(u) 7v-gtu CS(st(v,u)) RS(v)
MayPT(entry) initial-abs-state MayPT(u) 7v-gtu
AS(st(v,u)) MayPT(v)
a
RS
V -gt 2Data-State
V -gt 2Var x Var
38Approximating LFPs
correctly approximated by
lfp(f)
lfp(f)
c is said to be correctly approximated by
a iff a(c) Í a
C
A
39Approximating TransformersCorrectness Criterion
correctly approximated by
correctly approximated by
C
A
40Enter The French Recipe(Abstract Interpretation)
- Concrete Domain
- Concrete states C
- Semantics For every statement st,
- CSst C -gt C
- Abstract Domain
- A semi-lattice (A, b)
- Transfer Functions
- For every statement st,
- ASst A -gt A
- Concrete (Collecting) Domain
- A semi-lattice (2C, b)
- Transfer Functions
- For every statement st,
- CSst 2C -gt 2C
a
g
2Data-State
2Var x Var
41Points-To Analysis(Abstract Interpretation)
MayPT(u)
a
Í
a
RS(u)
IdealMayPT(u)
2Data-State
2Var x Var
- a(Y) (p,x) exists s in Y. s(p) x
IdealMayPT (u) a ( RS(u) )
42The French Recipe(for CFG-based programs)
- Define concrete domain (C, b)
- Define abstract domain (A, b)
- Define abstraction function a C-gtA
- Define concretization function g C-gtA
- (forming a Galois Connection)
- For every statement st define
- ASst A -gt A
- that correctly approximates
- CSst C -gt C
43Approximating TransformersCorrectness Criterion
c is said to be correctly approximated by
a iff a(c) Í a
correctly approximated by
correctly approximated by
C
A
44Approximating TransformersCorrectness Criterion
concretization g
abstraction a
requirement f(a1) t a (f( g(a1))
C
A
45Concrete Transformers
- CSstmt Data-State -gt Data-State
- CS x y s sx x s(y)
- CS x y s sx x s(s(y))
- CS x y s ss(x) x s(y)
- CS x null s sx x null
- CSstmt 2Data-State -gt 2Data-State
- CSst X CSsts s Î X
46Abstract Transformers
- ASstmt AbsDataState -gt AbsDataState
- AS x y s sx x s(y)
- AS x null s sx x null
- AS x y s sx x s(s(y))
- where s(v1,,vn) s(v1) È È s(vn)
- AS x y s ???
47Algorithm A TranformersWeak/Strong Update
g
f
f
y b
y b
a
48Algorithm A TranformersWeak/Strong Update
g
f
f
x b
x b
a
49Algorithm A TransformersWeak/Strong Update