Title: Program Representations
1Program Representations
Xiangyu Zhang
2Program Representations
- Static program representations
- Abstract syntax tree
- Control flow graph
- Program dependence graph
- Call graph
- Points-to relations.
- Dynamic program representations
- Control flow trace, address trace and value
trace - Dynamic dependence graph
- Whole execution trace
3(1) Abstract syntax tree
- An abstract syntax tree (AST) is a finite,
labeled, directed tree, where the internal nodes
are labeled by operators, and the leaf nodes
represent the operands of the operators.
Program chipping.
4(2) Control Flow Graph (CFG)
- Consists of basic blocks and edges
- A maximal sequence of consecutive instructions
such that inside the basic block an execution can
only proceed from one instruction to the next
(SESE). - Edges represent potential flow of control between
BBs. - Program path.
B1
- CFG
- V Vertices, nodes (BBs)
- E Edges, potential flow of control E ? V V
- Entry, Exit ?V, unique entry and exit
B2
B3
B4
5(2) An Example of CFG
- BB- A maximal sequence of consecutive
instructions such that inside the basic block an
execution can only proceed from one instruction
to the next (SESE).
1 sum0 2 i1
1 sum0 2 i1 3 while ( i 4 ii1 5 sumsumi endwhile 6
print(sum)
3 while ( i
4 ii1 5 sumsumi
6 print (sum)
6(3) Program Dependence Graph (PDG) Data
Dependence
- S data depends T if there exists a control flow
path from T to S and a variable is defined at T
and then used at S.
1 2 3
4 5 6
7 8 9
10
7(3) PDG Control Dependence
- X dominates Y if every possible program path from
the entry to Y has to pass X. - Strict dominance, dominator, immediate dominator.
1 sum0 2 i1
1 sum0 2 i1 3 while ( i 4 ii1 5 sumsumi endwhile 6
print(sum)
3 while ( i
4 ii1 5 sumsumi
6 print (sum)
DOM(6)1,2,3,6 IDOM(6)3
8(3) PDG Control Dependence
- X post-dominates Y if every possible program path
from Y to EXIT has to pass X. - Strict post-dominance, post-dominator, immediate
post-dominance.
1 sum0 2 i1
1 sum0 2 i1 3 while ( i 4 ii1 5 sumsumi endwhile 6
print(sum)
3 while ( i
4 ii1 5 sumsumi
6 print (sum)
PDOM(5)3,5,6 IPDOM(5)3
9(3) PDG Control Dependence
- Intuitively, Y is control-dependent on X iff X
directly determines whether Y executes
(statements inside one branch of a predicate are
usually control dependent on the predicate) - there exists a path from X to Y s.t. every node
in the path other than X and Y is post-dominated
by Y - X is not strictly post-dominated by Y
X
Y
Sorin Lerner
10(3) PDG Control Dependence
- A node (basic block) Y is
control-dependent on another X iff X directly
determines whether Y executes - there exists a path from X to Y s.t. every node
in the path other than X and Y is post-dominated
by Y - X is not strictly post-dominated by Y
1 sum0 2 i1
1 sum0 2 i1 3 while ( i 4 ii1 5 sumsumi endwhile 6
print(sum)
3 while ( i
4 ii1 5 sumsumi
6 print (sum)
CD(5)3
CD(3)3, tricky!
11(3) PDG Control Dependence is not Syntactically
Explicit
- A node (basic block) Y is
control-dependent on another X iff X directly
determines whether Y executes - there exists a path from X to Y s.t. every node
in the path other than X and Y is post-dominated
by Y - X is not strictly post-dominated by Y
1 sum0 2 i1
1 sum0 2 i1 3 while ( i 4 ii1 5 if (i20) 6
continue 7 sumsumi endwhile 8
print(sum)
3 while ( i
4 ii1 5 if (i20)
7 print (sum)
8 print (sum)
12(3) PDG Control Dependence is Tricky!
- A node (basic block) Y is
control-dependent on another X iff X directly
determines whether Y executes - there exists a path from X to Y s.t. every node
in the path other than X and Y is post-dominated
by Y - X is not strictly post-dominated by Y
- Can a statement control depends on two predicates?
13(3) PDG Control Dependence is Tricky!
- A node (basic block) Y is
control-dependent on another X iff X directly
determines whether Y executes - there exists a path from X to Y s.t. every node
in the path other than X and Y is post-dominated
by Y - X is not strictly post-dominated by Y
- Can one statement control depends on two
predicates?
1 ? p1
1 if ( p1 p2 ) 2 s1 3 s2
1 ? p2
What if ? 1 if ( p1 p2 ) 2 s1 3
s2
2 s1
3 s2
Interprocedural CD, CD in case of exception,
14(3) PDG
- A program dependence graph consists of control
dependence graph and data dependence graph - Why it is so important to software reliability?
- In debugging, what could possibly induce the
failure? - In security
pgetpassword( ) if (pzhang) send
(m)
15(4) Points-to Graph
- Aliases two expressions that denote the same
memory location. - Aliases are introduced by
- pointers
- call-by-reference
- array indexing
- C unions
16(4) Points-to Graph
- Aliases two expressions that denote the same
memory location. - Aliases are introduced by
- pointers
- call-by-reference
- array indexing
- C unions
17(4) Why Do We Need Points-to Graphs
x.lock() ... y.unlock() // same object as x?
F(x,y) x.fpassword print (y.f)
F(a,a) disaster!
18(4) Points-to Graph
- Points-to Graph
- at a program point, compute a set of pairs of the
form p - x, where p MAY/MUST points to x.
m(p) r new C() p-f r t new C() if
() qp r-f t
r
19(4) Points-to Graph
- Points-to Graph
- at a program point, compute a set of pairs of the
form p-x, where p MAY/MUST points to x.
m(p) r new C() p-f r t new C() if
() qp r-f t
r
p
f
20(4) Points-to Graph
- Points-to Graph
- at a program point, compute a set of pairs of the
form p-x, where p MAY/MUST points to x.
m(p) r new C() p-f r t new C() if
() qp r-f t
r
p
f
t
21(4) Points-to Graph
- Points-to Graph
- at a program point, compute a set of pairs of the
form p-x, where p MAY/MUST points to x.
m(p) r new C() p-f r t new C() if
() qp r-f t
r
p
f
q
t
22(4) Points-to Graph
- Points-to Graph
- at a program point, compute a set of pairs of the
form p-x, where p MAY/MUST points to x.
m(p) r new C() p-f r t new C() if
() qp r-f t
r
p
f
f
q
t
p-f-f and t are aliases
23(5) Call Graph
- Call graph
- nodes are procedures
- edges are calls
- Hard cases for building call graph
- calls through function pointers
Can the password acquired at A be leaked at G?
24How to acquire and use these representations?
- Will be covered by later lectures.
25Program Representations
- Static program representations
- Abstract syntax tree
- Control flow graph
- Program dependence graph
- Call graph
- Points-to relations.
- Dynamic program representations
- Control flow trace
- Address trace, Value trace
- Dynamic dependence graph
- Whole execution trace
26(1) Control Flow Trace
N2
11 sum0
21 i1
1 sum0 2 i1
31 while ( i41 ii1
51 sumsumi
3 while ( i32 while ( i42 ii1
4 ii1 5 sumsumi
52 sumsumi
33 while ( i61 print (sum)
6 print (sum)
x is a program point, xi is an execution point
27(1) Control Flow Trace
N2
11 sum0 i1
1 sum0 2 i1
31 while ( i41 ii1 sumsumi
3 while ( i32 while ( i4 ii1 5 sumsumi
42 ii1 sumsumi
33 while ( i6 print (sum)
61 print (sum)
A More Compact CFT
28(2) Dynamic Dependence Graph (DDG)
Input N2
11 z0
1 z0 2 a0 3 b2 4
pb 5 for i 1 to N do 6 if ( i
2 0) then 7 pa
endif endfor 8 aa1 9
z2(p) 10 print(z)
21 a0
31 b2
41 pb
51 for i1 to N do
61 if (i20) then
81 aa1
29(2) Dynamic Dependence Graph (DDG)
Input N2
1 z0 2 a0 3 b2 4
pb 5 for i 1 to N do 6 if ( i
2 0) then 7 pa
endif endfor 8 aa1 9
z2(p) 10 print(z)
One use has only one definition at runtime One
statement instance control depends on only one
predicate instance.
30(3) Whole Execution Trace
Input N2
T 1 2 3 4 5 6 7 8 9 10 11 12 13 14
11 z0 21 a0 31 b2 41 pb 51
for i 1 to N do 61 if ( i 2 0) then 81
aa1 91 z2(p) 52 for i 1 to N do 62
if ( i 2 0) then 71 pa 82 aa1 92
z2(p) 101 print(z)
31(3) Whole Execution Trace
Multiple streams of numbers.
32Program Representations
- Static program representations
- Abstract syntax tree
- Control flow graph
- Program dependence graph
- Call graph
- Points-to relations.
- Dynamic program representations
- Control flow trace, address trace and value
trace - Dynamic dependence graph
- Whole execution trace
33What is a slice?
- S . f (v)
- Slice of v at S is the set of statements
involved in computing vs value at S. - Mark Weiser, 1982
- Data dependence
- Control dependence
Void main ( ) int I0 int sum0
while (IIadd(I,1) printf (sumd\n,sum)
printf(Id\n,I)
34How to do slicing?
- Static analysis
- Input insensitive
- May analysis
- Dependence Graph
- Characteristics
- Very fast
- Very imprecise
35Why is a static slice imprecise?
- All possible program paths
S1x
S2x
L1x
- Use of Pointers static alias analysis is very
imprecise
S1a
S2b
L1p
- Use of function pointers hard to know which
function is called, conservative expectation
results in imprecision
36Dynamic Slicing
- Korel and Laski, 1988
- Dynamic slicing makes use of all information
about a particular execution of a program and
computes the slice based on an execution history
(trace) - Trace consists control flow trace and memory
reference trace - A dynamic slice query is a triple
-
- Smaller, more precise, more helpful to the user
37Dynamic Slicing Example -background
For input N2,
11 b0
b0 21 a2 31 for i 1 to N do
i1 41 if ( (i) 2 1) then
i1 51 aa1
a3 32 for i1 to N do
i2 42 if ( i2 1) then
i2 61 ba2
b6 71 zab
z9 81 print(z)
z9
1 b0 2 a2 3 for i 1 to N do 4 if
((i)21) then 5 a a1 else 6
b a2 endif done 7 z ab 8 print(z)
38Issues about Dynamic Slicing
- Precision perfect
- Running history very big ( GB )
- Algorithm to compute dynamic slice -
slow and very high space requirement.
39Backward vs. Forward
- 1 main( )
- 2
- 3 int i, sum
- 4 sum 0
- 5 i 1
- 6 while(i
- 7
- 8 sum sum 1
- 9 i
- 10
- 11 Cout
- 12 Cout
- 13
- An Example Program its forward slice w.r.t.
40Comments
- Want to know more?
- Frank Tips survey paper (1995)
- Static slicing is very useful for static analysis
- Code transformation, program understanding, etc.
- Points-to analysis is the key challenge
- Not as useful in reliability as dynamic slicing
- Dynamic slicing
- Precise
- good for defect analysis.
- Solution space is much larger.
- There exist hybrid techniques.
41Efficiency
- How are dynamic slices computed?
- Execution traces
- control flow trace -- dynamic control dependences
- memory reference trace -- dynamic data
dependences - Construct a dynamic dependence graph
- Traverse dynamic dependence graph to compute
slices
42How to Detect Dynamic Dependence
- Dynamic Data Dependence
- Shadow space (SS)
- Addr ? Abstract State
Virtual Space
Shadow Space
s1x
r1
s1x ST r1, r2
SS(r2)s1x
s2y ? SS(r1)s1x
s2y LD r1, r2
Dynamic control dependence is more tricky!