Dependence Modeling

About This Presentation

Title:

Dependence Modeling

Description:

Should preserve every dependence in the program ... Process the loop nest level by level, starting from the outmost loop. For each level ... – PowerPoint PPT presentation

Number of Views:58

Avg rating:3.0/5.0

Slides: 89

Provided by: yutao

Category:

more less

Transcript and Presenter's Notes

Title: Dependence Modeling

1
Dependence Modeling Parallelization Part 2

CS 640

2
Review Dependence Model

Dependence
Flow(true), anti, output
Loop dependence
Distance vector
Direction vector
Reordering transformation
Should preserve every dependence in the program
Left-most non- element of direction vector
must be lt
Source occurs before sink

3
Today

Coarse-grained parallelization
Loop interchange
Privatization, loop alignment, loop distribution,
loop fusion
Vectorization
Locality
Program Dependence Graphs

4
Loop Parallelization

It is always valid to convert a sequential loop
to a parallel loop if the loop carries no
dependence. Assign each iteration to different
processor
Loop-carried dependences excluded
Loop-independent dependences parallelizing a
loop will not change the order of statements
within a loop body

5
Coarse-Grained Parallelism

Target machine symmetric multiprocessor
Multiple processors with a shared memory
Parallelism employed by creating and executing a
process on each processor
PARALLEL DO
Expensive overhead processes initiation and
synchronization(barrier)
Parallelism concern for high performance
Find and package parallelism with a granularity
large enough to compensate the overhead
Delicate trade-off between overhead minimization
and load balancing
Simply stated goal identify independent loop
iterations to execute on different processors

6
Loop Transformations

To promote coarse-grained parallelism
Perfect loop-nest methods
Loop interchange
Single-loop methods
Privatization
Loop alignment
Loop distribution
Loop fusion

7
Loop-Carried Dependences

The level of a loop-carried dependence is the
index of the left-most non- of the direction
vector D(i,j) for the dependence
The out-most loop critical to the execution order
Direction vector (, , lt)
Level 3
Denotation of this dependence S?3S

DO I 1, N DO J 1, M DO K 1, L S
A(I, J, K1) A(I, J, K) C
ENDDO ENDDO ENDDO
Innermost loop can be run across NxM processors
8
Example
D(i,j,k) (, lt, lt)
DO I 1, 10 DO J 1, 10 DO K 1, 10 S
A(I, J2, K3) A(I, J, K) C
ENDDO ENDDO ENDDO
J/K loops can be run across 10 processors
9
Loop Interchange Example

After interchanging I-loop and J-loop
PARALLEL DO J 1, N
DO I 1, N
A(I1, J) A(I, J) B(I, J)
ENDDO
END PARALLEL DO

DV(J,I) ( , lt)
1 barrier needed
10
Loop Interchange Profitability

Not always possible to move a parallel loop
outward and have it remain free of dependence
Example
DO J 1, N
DO I 1, N
A(I1, J1) A(I, J) B(I, J)
ENDDO
ENDDO
The best we can do parallelize the inner loop

DV(J,I) ( lt, lt)
11
Loop Interchange Safety

Loop interchange switching the nesting order of
two loops in a perfect nest
A reordering transformation
Not all loop interchanges are legal
DO J 1, M
DO I 1, N
A(I,J1) A(I1,J) B
ENDDO
ENDDO
Theorem let D be a direction vector for a
dependence in a perfect loop nests, the direction
vector of the same dependence after any loop
permutation is determined by applying the same
permutation to the elements of D

DV(J,I) (lt,gt)
12
Direction Matrix

Loop interchange might affect all dependences
involved in a loop nest
We need a way to describe the original and
updated groups of dependences
We define direction matrix for a nest of loops
A matrix in which each row is a direction vector
for some dependence of the nest
Every dependence in the loop has its direction
vector represented by a row in the matrix

DO I 1, N DO J 1, M DO K 1, L
A(I1,J1,K) A(I,J,K) A(I,J1,K1)
ENDDO ENDDO ENDDO
13
Loop Interchange Safety

The direction matrix after loop interchange can
be computed by apply the same permutation of
loops to its columns
Theorem A permutation of the loops in a perfect
nest is legal if and only if the updated
direction matrix has no "gt" direction as the
leftmost non-"" direction in any row.

14
Loop Interchange Profitability

Theorem 6.3 In a perfect nest of loops, a
particular loop can be parallelized at the
outermost level if and only if the column of the
direction matrix for that nest contains only
entries.

15
Loop Selection and Interchange

DO I1,N
PARALLEL DO J1,M
DO K1,L
S1 A(I1,J,K) A(I,J,K) X1
S2 B(I,J,K1) B(I,J,K) X2
S3 C(I1,J1,K1) C(I,J,K) X3
ENDDO
END PARALLEL DO
ENDDO
General case of loop-selection NP-complete

16
Privatization

DO I 1, N
S1 T A(I)
S2 A(I) B(I)
S3 B(I) T
ENDDO

Privatization
Determine privatizable variable T assigned
within the loop, assigned value used only in the
same iteration in which it is assigned
Replicate T across different iterations

Privatization PARALLEL DO I1,N PRIVATE t S1
t A(I) S2 A(I) B(I) S3 B(I) t END
PARALLEL DO
17
Loop Alignment

Example
DO I 2, N
S1 A(I)B(I)C(I)
S2 D(I)A(I-1)2.0
ENDDO

S1?1S2
S1??S2
18
Loop Alignment

Loop alignment works by
Increasing the number of iterations, and
Executing statements on slightly different
subsets of those iterations
Changing a loop-carried dependence into a
loop-independent dependence

DO I 1, N1 IF (Igt1) A(I)B(I)C(I) IF
(IltN) D(I1)A(I)2.0 ENDDO
19
Loop Distribution

Break one loop into multiple smaller loops
May not be desired for coarse-grained parallelism
Barriers needed between distributed loops
May facilitate other transformations

20
Loop Distribution

Example
DO I 1, N
S1 A(I)B(I)1
S2 C(I)A(I)C(I-1)
S3 D(I)A(I)X
ENDDO

Only 1 loop-carried dependence(S2)
21
Loop Fusion
L1
L1 PARALLEL DO I 1, N A(I)B(I)1 L3
D(I)A(I)X END PARALLEL DO L2 DO I 1, N
C(I)A(I)C(I-1) ENDDO
L3
L2

Merge small loops into a single loop with a
larger loop body
Parallelization overhead might be compensated

22
Today

Coarse-grained parallelization
Vectorization
Transformations to increase parallelism
Loop interchange
Transformations to break recurrence
Scalar expansion, scalar/array renaming, node
splitting
Locality

23
Loop Vectorization

Loops that carry no dependence can also be
vectorized
How about loops that carry some dependence?

DO I 1, N S A(I) A(I) C ENDDO
v
A(1N) A(1N) C
DO I 1, N S1 A(I1) B(I) C S2 D(I)
A(I) E ENDDO
?
24
Loop Vectorization
DO I 1, N S1 A(I1) B(I) C S2 D(I)
A(I) E ENDDO
S1 ?1 S2
S1 ?? S2

We can split the loop into two loops. The
loop-carried dependence is transformed into a
loop-independent dependence

25
More Examples
DO I 1, N S1 D(I) A(I) E S2 A(I1)
B(I) C ENDDO
S2 ?1 S1

First interchange S1 and S2, then vectorize the
loop
The loop-independent dependence prevents us from
interchanging S1 and S2 ? cannot be vectorized

DO I 1, N S1 B(I) A(I) E S2 A(I1)
B(I) C ENDDO
S2 ?1 S1 S1 ?? S2
26
Loop Vectorization

Loop distribution and vectorization cannot work
if there is a cycle of dependences
Leads us to the first algorithm of vectorization

DO I 1, N S1 B(I1) A(I) E S2
A(I1) B(I) C ENDDO
S2 ?1 S1 S1 ?1 S2
27
Simple Vectorization Algorithm

Idea
Construct the dependence graph
Reduce the strongly connected sub-graphs to
single nodes
Topologically sort the reduced graph
Generate code for each node in the sorted graph
Single statement vectorize
Strongly-connected sub-graph keep the sequential
loop

DO I 1, N S1 D(I) A(I) E S2 A(I1)
B(I) C ENDDO
28
Problems with Simple Vectorization

Some opportunities are missed
S is contained in a dependence cycle ?our simple
vectorizaion algorithm cannot parallelize the
loop
Direction vector (lt, )
From theorem 2.4, as far as we keep I-loop
sequential, the dependence is preserved

DO I 1, N DO J 1, M S A(I1,J)
A(I,J) B ENDDO ENDDO
DO I 1, N A(I1,1M) A(I,1M) B ENDDO
29
Advanced Vectorization Algorithm

Idea
Construct the dependence graph
Process the loop nest level by level, starting
from the outmost loop
For each level
Find the strongly-connected subgraph and perform
the topological sorting
Vectorize single statements of the sorted graph
Otherwise, generate a sequential loop for the
current level, update dependence graph, and
recursively process the next loop level

30
Example

DO I 1, 100
S1 X(I) Y(I) 10
DO J 1, 100
S2 B(J) A(J,N)
DO K 1, 100
S3 A(J1,K)B(J)C(J,K)
ENDDO
S4 Y(IJ) A(J1, N)
ENDDO
ENDDO

DO I 1, 100 DO J 1, 100 B(J) A(J,N)
//S2 A(J1,1100)B(J)C(J,1100)
//S3 ENDDO Y(I1I101)A(2101,N) //vectorized
S4 ENDDO X(1100) Y(1100) 10 //vectorized S1
31
Vectorization other issues

Basic idea find all the possible parallelism by
loop distribution and statement reordering
May not work if cyclic dependences exist,
especially when they are carried by the inner
loops
Move the cycle outwards
Loop interchange
Break the cycle
Scalar expansion
Scalar array renaming
Node splitting

32
Example

DO I 1, N
DO J 1, M
S A(I,J1) A(I,J) B
ENDDO
ENDDO

Direction Vector (I,J) (,lt)

Dependence carried in the inner loop
Codegen() would generate two sequential loops
Is this loop nest really not vectorizable?

33
Loop Interchange Vectorization

Motivation
If we deal with loops containing cyclic
dependences early on in the loop nest, we can
potentially vectorize more loops
Inward-shifting loops that carry no dependences
Theorem In a perfect loop nest, loops carry no
dependence are legal to be shifted inward and
will not carry any dependences in their new
position. Note, we are moving the dependences in
the opposite direction than we did for
parallelization

34
Loop interchange Vectorization

DO I 1, N
DO J 1, M
S A(I,J1) A(I,J) B
ENDDO
ENDDO

Direction Vector (I,J) (,lt)

After interchanging I-loop and J-loop
DO J 1, M
DO I 1, N
S A(I,J1) A(I,J) B
ENDDO
ENDDO

Direction Vector (J,I) (lt,)

Vectorization
DO J 1, M
S A(1N,J1) A(1N,J) B
ENDDO

35
Scalar Expansion

DO I 1, N
S1 T A(I)
S2 A(I) B(I)
S3 B(I) T
ENDDO

Vectorization S1 T(1N) A(1N) S2 A(1N)
B(1N) S3 B(1N) T(1N) T T(N)
Scalar expansion DO I 1, N S1 T(I)
A(I) S2 A(I) B(I) S3 B(I)
T(I) ENDDO T T(N)

Scalar expansion replaces T with a
compiler-generated temporary array T that has a
location for each loop iteration

36
Scalar Expansion

Always safe to be applied
May need SSA to decide how to rewrite the code
Not always profitable
DO I 1, N
T T A(I) A(I-1)
A(I) T
ENDDO
After scalar expansion
T(0) T
DO I 1, N
S1 T(I) T(I-1) A(I) A(I-1)
S2 A(I) T(I)
ENDDO
T T(N)

37
Scalar Expansion Profitability

Dependences due to reuse of memory location vs.
reuse of values
Dependences due to reuse of values must be
preserved
Dependences due to reuse of memory location can
be deleted by expansion
Overhead
Increased memory consumption

38
Scalar Renaming
After renaming T DO I 1, 100 S1 T1 A(I)
B(I) S2 C(I) T1 T1 S3 T2 D(I) -
B(I) S4 A(I1) T2 T2 ENDDO

DO I 1, 100
S1 T A(I) B(I)
S2 C(I) T T
S3 T D(I) - B(I)
S4 A(I1) T T
ENDDO

Vectorization S3 T2(1100) D(1100) -
B(1100) S4 A(2101) T2(1100)
T2(1100) S1 T1(1100) A(1100)
B(1100) S2 C(1100) T1(1100)
T1(1100) T T2(100)
39
Array Renaming
After renaming A DO I 1, N S1 A(I)
A(I-1) X S2 Y(I) A(I) Z S3 A(I)
B(I) C ENDDO

Original
DO I 1, N
S1 A(I) A(I-1) X
S2 Y(I) A(I) Z
S3 A(I) B(I) C
ENDDO

Vectorization S3 A(1N) B(1N) C S1
A(1N) A(0N-1) X S2 Y(1N) A(1N) Z
40
Node Splitting

DO I 1, N
S1 A(I) X(I1) X(I)
S2 X(I1) B(I) 10
ENDDO

Renaming does not work because the two
dependences share one single access to X(I1)
Renaming will try to give both name spaces the
original array name
Solution creating a copy of the node from which
the critical anti-dependence emanates

41
Node Splitting Example
After node splitting DO I 1, N S1 X(I)
X(I1) S1 A(I) X(I) X(I) S2 X(I1) B(I)
10 ENDDO

Original
DO I 1, N
S1 A(I) X(I1) X(I)
S2 X(I1) B(I) 10
ENDDO

Vectorization S1 X(1N) X(2N1) S2 X(2N1)
B(1N) 10 S1 A(1N) X(1N) X(1N)
42
Today

Coarse-grained parallelization
Vectorization
Locality
Loop Transformations that may improve locality
Loop interchange
Loop blocking/tiling
Loop Transformations that enable further
optimizations
Loop fusion
Loop skewing

43
Motivation

Cache is important to computer system
A fast buffer between CPU and memory
Stores most recently/most frequently accessed
data
Affects system performance, cost, energy
consumption
Locality
Cache effectiveness depends on program reuse
pattern
Temporal locality
LRU replacement policy
Spatial locality
Cache blocks(lines)

44
CPU Memory Speed Gap

From Patterson Hennesy, Computer
architecture, Morgan Kaufmann Publishers.

45
Motivation ExampleSpatial Reuse
M
J
DO I 1, N DO J 1, M S A(I, J) A(I,
J) B(I,J) ENDDO ENDDO
I
I1

Array storage
Fortran style column-major
Access pattern
J-loop iterate over a row-A(I,J) with I fixed
I-loop iterate over different rows
Potential spatial reuse
Cache misses
Could be NM for A(I,J) if M is large enough

N
46
Motivation ExampleSpatial Reuse
M
J
DO J 1, M DO I 1, N S A(I, J) A(I,
J) B(I,J) ENDDO ENDDO
I
N

Interchanging I-loop and J-loop
Access pattern
I-loop iterate over columns-A(I,J) with J fixed
Spatial locality exploited N/b misses given b as
the cache line length in words
Cache misses
Always NM/b for A(I,J) assuming a perfect
alignment
Similar result for B(I,J)

47
Motivation ExampleTemporal Reuse
DO I 1, N DO J 1, M S A(I) A(I)
B(J) ENDDO ENDDO

Assume block size b1
Access pattern
A(I) reused for different J-loop iterations N
misses
B(J) reuses limited by cache size
When M is large, NM misses for LRU replacement
policy
Strip-mine-interchage
Divide the large array into small sections
B(jj) M misses
A(I) NM/S misses

48
Profitability of Loop Interchange

Choosing the right loop to put innermost is
critical
Spatial reuse consecutive loop iterations access
adjacent memory locations
Temporal reuse consecutive loop iterations
access the same set of memory locations
Not always a clear-cut
Cache misses
NM for B, N/b for D
Misses after interchange
NM/b for B, NM/b for D
When should we interchange?
N/b NM -2NM/b gt 0 ? M(b-2) 1 gt0

DO I 1, N DO J 1, M S
D(I)D(I)B(I,J) ENDDO ENDDO //assume
column-major
49
A Heuristic Approach

Carr, McKinley, Tseng, Compiler Optimizations
for Improving Data Locality, ASPLOS94
For each loop l, attach a cost to each reference
in the loop nest as if l is the innermost loop
Rank loops using the attached loop cost
Reorder loops from lowest cost to highest
Place the loop with the lowest cost in the
innermost position, if direction matrix shows it
can legally be placed there

50
Cost Assignment

Attach a cost to each reference as if the loop
considered is the innermost loop
Cost1 if the reference does not depend on the
loop induction variable
CostN if the reference is non-consecutive
(induction variable strides over a noncontiguous
dimension)
Cost Ns/b if the reference is consecutive in
small steps of size s (induction variables
strides over a contiguous dimension)
Multiply the cost by the trip count of each outer
loop
Intuitively, the cost approximates the total
number of cache lines accessed / the total number
of cache misses

DO I1,N A(J) ENDDO
DO J1,N A(I,J) ENDDO
DO I1,N,s A(I,J) ENDDO
51
Example Matrix Multiplication
DO I 1, N DO J 1, N DO K 1, N S
C(I,J) C(I,J) A(I,K)B(K,J) ENDDO
ENDDO ENDDO

Ideal loop order I-innermost, J-outermost
Direction matrix lt

52
Example Matrix Multiplication
DO I 1, N DO J 1, N DO K 1, N S
C(I,J) C(I,J) A(I,K)B(K,J) ENDDO
ENDDO ENDDO
DO J 1, N DO K 1, N DO I 1, N S
C(I,J) C(I,J) A(I,K)B(K,J) ENDDO
ENDDO ENDDO
53
Loop Fusion

Takes multiple compatible loop nests and combines
their bodies into one loop nest
Is legal if no data dependences are reversed
Improves locality directly by merging accesses to
the same cache line into one loop iteration
Also enables further loop interchanges generate
perfect loop nests

DO I 2, N DO K 1, N X(I,K)X(I,K)-X(I-1,
K)A(I,K)/B(I-1,K) ENDDO DO K 1, N
B(I,K)B(I,K)-A(I,K)/B(I-1,K) ENDDO ENDDO
54
Loop Fusion
DO I 2, N DO K 1, N X(I,K)X(I,K)-X(I-1,
K)A(I,K)/B(I-1,K) B(I,K)B(I,K)-A(I,K)/B(I-1,K)
ENDDO ENDDO

After loop fusion
After fusion interchange

DO K 1, N DO I 2, N X(I,K)X(I,K)-X(I-1,
K)A(I,K)/B(I-1,K) B(I,K)B(I,K)-A(I,K)/B(I-1,K)
ENDDO ENDDO
55
Loop Blocking

Example revisited
Spatial locality of B(I,J) exploited
How about D(I)?
Long-term reuse separated by N I-loop iterations
What if we reduce the number of intervening
iterations?

DO J 1, M DO I 1, N S
D(I)D(I)B(I,J) ENDDO ENDDO
56
Strip-mine-and-interchange
DO J 1, M DO I 1, N S
D(I)D(I)B(I,J) ENDDO ENDDO

Iterates on smaller strips of I-dimension
2NM/b N/bNM/b (11/M)NM/b

D
B
57
Loop Blocking

Splitting into smaller strips always legal
Interchanging the by-strip loop to the outside of
some containing loop not always legal
Condition after interchange, no direction vector
has gt as the leftmost non- direction
Could be overly conservative
Blocking is profitable if there is reuse between
iterations of a loop that is not the innermost
loop

DO I 1, N, S DO J 1, M DO iiI,
min(IS-1,N) S . . . ENDDO ENDDO ENDDO
58
References

Padua and Wolfe, Advanced Compiler Optimizations
for Supercomputers, Communications of the ACM,
1986 (a survey for parallelization
transformations)
Allen and Kennedy, Automatic Loop Interchange,
SIGPLAN SCC, 1984 (loop interchange)
Kennedy and McKinley, Maximizing Loop
Parallelism and Improving Data Locality via Loop
Fusion and Distribution, LCPC 1993 (loop
distribution and fusion)
Carr, McKinley, Tseng, Compiler Optimizations
for Improving Data Locality, ASPLOS, 1994 (loop
interchange and loop fusion for locality)
Wolf and Lam, A Data Locality Optimizing
Algorithm, PLDI, 1991(sophisticated
transformations for locality)
Allen and Kennedy, Optimizing Compilers for
Modern Architectures, Ch5, 6, 9

59
Program Dependence Graph

For procedure P
Nodes program statements of P
For each variable v used before defined in P,
there is an added node v initial().
For each variable v named in Ps end statement,
there is a final(v) node.
Additional distinguished entry vertex
Edges program dependencies
Data (output, flow, anti)
Control

J. Ferrante, K. Ottenstein, J. Warren. The
Program Dependence Graph and its Use in
Optimization. ACM Transactions on Programming
Languages and Systems, Vol. 9, No. 3, July 1987,
pp. 319 - 349
60
Example PDG Output Dependences
ENTRY
main() sum 0 i 1 while i lt 11 do sum
sum i i i 1 od end(sum,i)
sum 0
i 1
while i lt 11
final(i)
final(sum)
sum sum i
i i 1
61
Example PDG Loop Independent flow
ENTRY
main() sum 0 i 1 while i lt 11 do sum
sum i i i 1 od end(sum,i)
sum 0
i 1
while i lt 11
final(i)
final(sum)
sum sum i
i i 1
62
Example PDG Loop Carried Flow
ENTRY
main() sum 0 i 1 while i lt 11 do sum
sum i i i 1 od end(sum,i)
sum 0
i 1
while i lt 11
final(i)
final(sum)
sum sum i
i i 1
63
Example PDG
ENTRY
main() sum 0 i 1 while i lt 11 do sum
sum i i i 1 od end(sum,i)
sum 0
i 1
while i lt 11
final(i)
final(sum)
sum sum i
i i 1
Loop carried flow
Loop independent flow
Output
64
Control Dependence dc

V1 dc V2
Node V2 is control dependent on V1 - During
execution, if when V1 evaluates to c (where c
either true or false), V2 must eventually
execute.
If V2 is control dependent on V1, then V1 must
have two exits.
On exit 1, V2 must execute.
On exit 2, there is a path where V2 will not
execute.

65
Control Dependence

V1 dc V2
V2 is control dependent on V1, if
? a path from V1 to V2, V2 post-dominates every
vertex p in that path, (p ltgt V1, V2), and
V2 does not strictly post-dominate V1.
p PDOM v, if every path from v to exit node
includes p

A
exit
C
B
D
A dc B A dc C
D
A
B
C
to exit
66
Control Dependence

V2 is control dependent on V1, if
? a path from V1 to V2, V2 post-dominates every
vertex p in that path, (p ltgt V1, V2), and
V2 does not strictly post-dominate V1.

A
A dc B A dc C
C
exit
B
D
D
to exit
A
B
C
67
Example PDG Control Dependences
ENTRY
main() sum 0 i 1 while i lt 11 do sum
sum i i i 1 od end(sum,i)
T
T
T
T
T
sum 0
i 1
while i lt 11
final(i)
final(sum)
T
T
sum sum i
i i 1
68
Complete PDG for example
ENTRY
main() sum 0 i 1 while i lt 11 do sum
sum i i i 1 od end(sum,i)
T
T
T
T
T
sum 0
i 1
while i lt 11
final(i)
final(sum)
T
T
sum sum i
i i 1
Control
Loop carried flow
Loop independent flow
Output
69
Using PDGs

Constant Propagation folding
Via graph walking
Code Motion
Slicing
Basis for system-level analysis

70
Using PDG for Code Motion
i 1
i lt 100
do i 1, 100 k i (n2) do j i,
100 ai,j 100 n 10k j
end end
t
f
t1 n 2 k i t1 j i
f
j lt 100
i i 1
t
t2 100n t3 10 k t4 t2 t3 t5 t4
j j j 1
71
PDG
i 1 while i lt 100 do k i (n2) j
i while j lt 100 do ai,j 100 n
10k j j j 1 end i i
1 end
ENTRY
T
final(i)
final(j)
while i lt 100
i 1
while j lt 100
t1 n2
k it1
i i1
j i
T
t3 10k
t2 100n
t4 t2t3
t5 t4j
j j1
Control
Loop carried flow
Loop independent flow
Output
72
PDG
i 1 while i lt 100 do k i (n2) j
i while j lt 100 do ai,j 100 n
10k j j j 1 end i i
1 end
ENTRY
T
final(i)
final(j)
while i lt 100
i 1
while j lt 100
t1 n2
k it1
i i1
j i
T
t3 10k
t2 100n
t4 t2t3
t5 t4j
j j1
Goal Move invariant statements out of the loop
73
PDG
i 1 while i lt 100 do k i (n2) j
i while j lt 100 do ai,j 100 n
10k j j j 1 end i i
1 end
ENTRY
T
final(i)
final(j)
while i lt 100
i 1
while j lt 100
t1 n2
k it1
i i1
j i
T
t3 10k
t2 100n
t4 t2t3
t5 t4j
j j1
Loop carried flow fine
74
PDG
i 1 while i lt 100 do k i (n2) j
i while j lt 100 do ai,j 100 n
10k j j j 1 end i i
1 end
ENTRY
T
final(i)
final(j)
while i lt 100
i 1
while j lt 100
t1 n2
k it1
i i1
j i
T
t3 10k
t2 100n
t4 t2t3
t5 t4j
j j1
Output dependencies fine
75
PDG
i 1 while i lt 100 do k i (n2) j
i while j lt 100 do ai,j 100 n
10k j j j 1 end i i
1 end
ENTRY
T
final(i)
final(j)
while i lt 100
i 1
while j lt 100
t1 n2
k it1
i i1
j i
T
t3 10k
t2 100n
t4 t2t3
t5 t4j
j j1
Loop independent flow dotted lines unaffected.
76
PDG after the move
ENTRY
while i lt 100
t1 n2
t2 100n
final(i)
final(j)
i 1
while j lt 100
k it1
i i1
j i
t3 10k
t4 t2t3
t5 t4j
j j1
Loop independent flow fine
77
new PDG
i 1 while i lt 100 do k i (n2) j
i while j lt 100 do ai,j 100 n
10k j j j 1 end i i
1 end
ENTRY
T
final(i)
final(j)
while i lt 100
i 1
while j lt 100
t1 n2
k it1
i i1
j i
T
t3 10k
t2 100n
t4 t2t3
t5 t4j
j j1
Control some issues need to use conditional
to handle null case (which actually isnt an
issue, but )
78
Next Lecture

Topic
Inter-procedure analysis and optimization
References
Dragon Ch12

79
Computing Control Dependence from a CFG

Add slicing edge (entry -gt exit)
Choose S set of edges (A,B) where B does not
post-dominate A.
For all edges (A,B) in S, if we traverse from B
in the post-dominator tree until we reach A's
parent, all nodes we visit (before A's parent)
are control dependent on A.

80
Example 1
T
entry
start
exit
1
7
entry
T
2
3
3
1
6
T
T
4
2
5
start
5
4
True branches labeled (other branch is false).
6
7
exit
81
Computing Control Dependence from a CFG

Add slicing edge (entry -gt exit)
Choose S set of edges (A,B) where B does not
post-dominate A.
For all edges (A,B) in S, if we traverse from B
in the post-dominator tree until we reach A's
parent, all nodes we visit (before A's parent)
are control dependent on A.

82
Example 1
T
entry
start
exit
1
7
entry
T
2
3
3
1
6
T
T
4
2
5
start
5
4
S set of edges (A,B) where B does not
post-dominate A (E,S), (1,2), (1,3), (2,4),
(2,5), (3,5)
6
7
exit
83
Determining Control Dependence

Add slicing edge (entry -gt exit)
Choose S set of edges (A,B) where B does not
post-dominate A.
For all edges (A,B) in S, if we traverse from B
in the post-dominator tree until we reach A's
parent, all nodes we visit (before A's parent)
are control dependent on A.

S (E,S), (1,2), (1,3), (2,4), (2,5), (3,5)

exit
7
entry
3
1
6
4
2
5
start
85

S (E,S), (1,2), (1,3), (2,4), (2,5), (3,5)

Entry
T
T
T
Start
1
7
F
T
T
2
3
T
T
F
T
4
6
5
86
Region Nodes

Used to summarize conditions

Entry
T
T
1
7
F
Entry
T
T
T
T
R3
R5
Start
1
7
F
T
2
3
T
2
3
T
T
R1
R6
F
T
4
6
R2
R4
4
5
5
6
87
Example 1
T
entry
start
1
T
2
3
T
T
5
4
6
7
exit
88
Example 2
entry
T
A
H
B
G
entry
T
S
B
C
T
D
E
A
F
F
D
C
E
G
H
89