Title: Dependence Modeling
1Dependence Modeling Parallelization Part 2
2Review Dependence Model
- Dependence
- Flow(true), anti, output
- Loop dependence
- Distance vector
- Direction vector
- Reordering transformation
- Should preserve every dependence in the program
- Left-most non- element of direction vector
must be lt - Source occurs before sink
3Today
- Coarse-grained parallelization
- Loop interchange
- Privatization, loop alignment, loop distribution,
loop fusion - Vectorization
- Locality
- Program Dependence Graphs
4Loop Parallelization
- It is always valid to convert a sequential loop
to a parallel loop if the loop carries no
dependence. Assign each iteration to different
processor - Loop-carried dependences excluded
- Loop-independent dependences parallelizing a
loop will not change the order of statements
within a loop body
5Coarse-Grained Parallelism
- Target machine symmetric multiprocessor
- Multiple processors with a shared memory
- Parallelism employed by creating and executing a
process on each processor - PARALLEL DO
- Expensive overhead processes initiation and
synchronization(barrier) - Parallelism concern for high performance
- Find and package parallelism with a granularity
large enough to compensate the overhead - Delicate trade-off between overhead minimization
and load balancing - Simply stated goal identify independent loop
iterations to execute on different processors
6Loop Transformations
- To promote coarse-grained parallelism
- Perfect loop-nest methods
- Loop interchange
- Single-loop methods
- Privatization
- Loop alignment
- Loop distribution
- Loop fusion
7Loop-Carried Dependences
- The level of a loop-carried dependence is the
index of the left-most non- of the direction
vector D(i,j) for the dependence - The out-most loop critical to the execution order
- Direction vector (, , lt)
- Level 3
- Denotation of this dependence S?3S
DO I 1, N DO J 1, M DO K 1, L S
A(I, J, K1) A(I, J, K) C
ENDDO ENDDO ENDDO
Innermost loop can be run across NxM processors
8Example
D(i,j,k) (, lt, lt)
DO I 1, 10 DO J 1, 10 DO K 1, 10 S
A(I, J2, K3) A(I, J, K) C
ENDDO ENDDO ENDDO
J/K loops can be run across 10 processors
9Loop Interchange Example
- After interchanging I-loop and J-loop
- PARALLEL DO J 1, N
- DO I 1, N
- A(I1, J) A(I, J) B(I, J)
- ENDDO
- END PARALLEL DO
DV(J,I) ( , lt)
1 barrier needed
10Loop Interchange Profitability
- Not always possible to move a parallel loop
outward and have it remain free of dependence - Example
- DO J 1, N
- DO I 1, N
- A(I1, J1) A(I, J) B(I, J)
- ENDDO
- ENDDO
- The best we can do parallelize the inner loop
DV(J,I) ( lt, lt)
11Loop Interchange Safety
- Loop interchange switching the nesting order of
two loops in a perfect nest - A reordering transformation
- Not all loop interchanges are legal
- DO J 1, M
- DO I 1, N
- A(I,J1) A(I1,J) B
- ENDDO
- ENDDO
- Theorem let D be a direction vector for a
dependence in a perfect loop nests, the direction
vector of the same dependence after any loop
permutation is determined by applying the same
permutation to the elements of D
DV(J,I) (lt,gt)
12Direction Matrix
- Loop interchange might affect all dependences
involved in a loop nest - We need a way to describe the original and
updated groups of dependences - We define direction matrix for a nest of loops
- A matrix in which each row is a direction vector
for some dependence of the nest - Every dependence in the loop has its direction
vector represented by a row in the matrix
DO I 1, N DO J 1, M DO K 1, L
A(I1,J1,K) A(I,J,K) A(I,J1,K1)
ENDDO ENDDO ENDDO
13Loop Interchange Safety
- The direction matrix after loop interchange can
be computed by apply the same permutation of
loops to its columns - Theorem A permutation of the loops in a perfect
nest is legal if and only if the updated
direction matrix has no "gt" direction as the
leftmost non-"" direction in any row.
14Loop Interchange Profitability
- Theorem 6.3 In a perfect nest of loops, a
particular loop can be parallelized at the
outermost level if and only if the column of the
direction matrix for that nest contains only
entries.
15Loop Selection and Interchange
- DO I1,N
- PARALLEL DO J1,M
- DO K1,L
- S1 A(I1,J,K) A(I,J,K) X1
- S2 B(I,J,K1) B(I,J,K) X2
- S3 C(I1,J1,K1) C(I,J,K) X3
- ENDDO
- END PARALLEL DO
- ENDDO
- General case of loop-selection NP-complete
16Privatization
- DO I 1, N
- S1 T A(I)
- S2 A(I) B(I)
- S3 B(I) T
- ENDDO
- Privatization
- Determine privatizable variable T assigned
within the loop, assigned value used only in the
same iteration in which it is assigned - Replicate T across different iterations
Privatization PARALLEL DO I1,N PRIVATE t S1
t A(I) S2 A(I) B(I) S3 B(I) t END
PARALLEL DO
17Loop Alignment
- Example
- DO I 2, N
- S1 A(I)B(I)C(I)
- S2 D(I)A(I-1)2.0
- ENDDO
S1?1S2
S1??S2
18Loop Alignment
- Loop alignment works by
- Increasing the number of iterations, and
- Executing statements on slightly different
subsets of those iterations - Changing a loop-carried dependence into a
loop-independent dependence
DO I 1, N1 IF (Igt1) A(I)B(I)C(I) IF
(IltN) D(I1)A(I)2.0 ENDDO
19Loop Distribution
- Break one loop into multiple smaller loops
- May not be desired for coarse-grained parallelism
- Barriers needed between distributed loops
- May facilitate other transformations
20Loop Distribution
- Example
- DO I 1, N
- S1 A(I)B(I)1
- S2 C(I)A(I)C(I-1)
- S3 D(I)A(I)X
- ENDDO
Only 1 loop-carried dependence(S2)
21Loop Fusion
L1
L1 PARALLEL DO I 1, N A(I)B(I)1 L3
D(I)A(I)X END PARALLEL DO L2 DO I 1, N
C(I)A(I)C(I-1) ENDDO
L3
L2
- Merge small loops into a single loop with a
larger loop body - Parallelization overhead might be compensated
22Today
- Coarse-grained parallelization
- Vectorization
- Transformations to increase parallelism
- Loop interchange
- Transformations to break recurrence
- Scalar expansion, scalar/array renaming, node
splitting - Locality
23Loop Vectorization
- Loops that carry no dependence can also be
vectorized - How about loops that carry some dependence?
DO I 1, N S A(I) A(I) C ENDDO
v
A(1N) A(1N) C
DO I 1, N S1 A(I1) B(I) C S2 D(I)
A(I) E ENDDO
?
24Loop Vectorization
DO I 1, N S1 A(I1) B(I) C S2 D(I)
A(I) E ENDDO
S1 ?1 S2
S1 ?? S2
- We can split the loop into two loops. The
loop-carried dependence is transformed into a
loop-independent dependence
25More Examples
DO I 1, N S1 D(I) A(I) E S2 A(I1)
B(I) C ENDDO
S2 ?1 S1
- First interchange S1 and S2, then vectorize the
loop - The loop-independent dependence prevents us from
interchanging S1 and S2 ? cannot be vectorized
DO I 1, N S1 B(I) A(I) E S2 A(I1)
B(I) C ENDDO
S2 ?1 S1 S1 ?? S2
26Loop Vectorization
- Loop distribution and vectorization cannot work
if there is a cycle of dependences - Leads us to the first algorithm of vectorization
DO I 1, N S1 B(I1) A(I) E S2
A(I1) B(I) C ENDDO
S2 ?1 S1 S1 ?1 S2
27Simple Vectorization Algorithm
- Idea
- Construct the dependence graph
- Reduce the strongly connected sub-graphs to
single nodes - Topologically sort the reduced graph
- Generate code for each node in the sorted graph
- Single statement vectorize
- Strongly-connected sub-graph keep the sequential
loop
DO I 1, N S1 D(I) A(I) E S2 A(I1)
B(I) C ENDDO
28Problems with Simple Vectorization
- Some opportunities are missed
- S is contained in a dependence cycle ?our simple
vectorizaion algorithm cannot parallelize the
loop - Direction vector (lt, )
- From theorem 2.4, as far as we keep I-loop
sequential, the dependence is preserved
DO I 1, N DO J 1, M S A(I1,J)
A(I,J) B ENDDO ENDDO
DO I 1, N A(I1,1M) A(I,1M) B ENDDO
29Advanced Vectorization Algorithm
- Idea
- Construct the dependence graph
- Process the loop nest level by level, starting
from the outmost loop - For each level
- Find the strongly-connected subgraph and perform
the topological sorting - Vectorize single statements of the sorted graph
- Otherwise, generate a sequential loop for the
current level, update dependence graph, and
recursively process the next loop level
30Example
- DO I 1, 100
- S1 X(I) Y(I) 10
- DO J 1, 100
- S2 B(J) A(J,N)
- DO K 1, 100
- S3 A(J1,K)B(J)C(J,K)
- ENDDO
- S4 Y(IJ) A(J1, N)
- ENDDO
- ENDDO
DO I 1, 100 DO J 1, 100 B(J) A(J,N)
//S2 A(J1,1100)B(J)C(J,1100)
//S3 ENDDO Y(I1I101)A(2101,N) //vectorized
S4 ENDDO X(1100) Y(1100) 10 //vectorized S1
31Vectorization other issues
- Basic idea find all the possible parallelism by
loop distribution and statement reordering - May not work if cyclic dependences exist,
especially when they are carried by the inner
loops - Move the cycle outwards
- Loop interchange
- Break the cycle
- Scalar expansion
- Scalar array renaming
- Node splitting
32Example
- DO I 1, N
- DO J 1, M
- S A(I,J1) A(I,J) B
- ENDDO
- ENDDO
Direction Vector (I,J) (,lt)
- Dependence carried in the inner loop
- Codegen() would generate two sequential loops
- Is this loop nest really not vectorizable?
33Loop Interchange Vectorization
- Motivation
- If we deal with loops containing cyclic
dependences early on in the loop nest, we can
potentially vectorize more loops - Inward-shifting loops that carry no dependences
- Theorem In a perfect loop nest, loops carry no
dependence are legal to be shifted inward and
will not carry any dependences in their new
position. Note, we are moving the dependences in
the opposite direction than we did for
parallelization
34Loop interchange Vectorization
- DO I 1, N
- DO J 1, M
- S A(I,J1) A(I,J) B
- ENDDO
- ENDDO
Direction Vector (I,J) (,lt)
- After interchanging I-loop and J-loop
- DO J 1, M
- DO I 1, N
- S A(I,J1) A(I,J) B
- ENDDO
- ENDDO
Direction Vector (J,I) (lt,)
- Vectorization
- DO J 1, M
- S A(1N,J1) A(1N,J) B
- ENDDO
35Scalar Expansion
- DO I 1, N
- S1 T A(I)
- S2 A(I) B(I)
- S3 B(I) T
- ENDDO
Vectorization S1 T(1N) A(1N) S2 A(1N)
B(1N) S3 B(1N) T(1N) T T(N)
Scalar expansion DO I 1, N S1 T(I)
A(I) S2 A(I) B(I) S3 B(I)
T(I) ENDDO T T(N)
- Scalar expansion replaces T with a
compiler-generated temporary array T that has a
location for each loop iteration
36Scalar Expansion
- Always safe to be applied
- May need SSA to decide how to rewrite the code
- Not always profitable
- DO I 1, N
- T T A(I) A(I-1)
- A(I) T
- ENDDO
- After scalar expansion
- T(0) T
- DO I 1, N
- S1 T(I) T(I-1) A(I) A(I-1)
- S2 A(I) T(I)
- ENDDO
- T T(N)
37Scalar Expansion Profitability
- Dependences due to reuse of memory location vs.
reuse of values - Dependences due to reuse of values must be
preserved - Dependences due to reuse of memory location can
be deleted by expansion - Overhead
- Increased memory consumption
38Scalar Renaming
After renaming T DO I 1, 100 S1 T1 A(I)
B(I) S2 C(I) T1 T1 S3 T2 D(I) -
B(I) S4 A(I1) T2 T2 ENDDO
- DO I 1, 100
- S1 T A(I) B(I)
- S2 C(I) T T
- S3 T D(I) - B(I)
- S4 A(I1) T T
- ENDDO
Vectorization S3 T2(1100) D(1100) -
B(1100) S4 A(2101) T2(1100)
T2(1100) S1 T1(1100) A(1100)
B(1100) S2 C(1100) T1(1100)
T1(1100) T T2(100)
39Array Renaming
After renaming A DO I 1, N S1 A(I)
A(I-1) X S2 Y(I) A(I) Z S3 A(I)
B(I) C ENDDO
- Original
- DO I 1, N
- S1 A(I) A(I-1) X
- S2 Y(I) A(I) Z
- S3 A(I) B(I) C
- ENDDO
Vectorization S3 A(1N) B(1N) C S1
A(1N) A(0N-1) X S2 Y(1N) A(1N) Z
40Node Splitting
- DO I 1, N
- S1 A(I) X(I1) X(I)
- S2 X(I1) B(I) 10
- ENDDO
- Renaming does not work because the two
dependences share one single access to X(I1) - Renaming will try to give both name spaces the
original array name - Solution creating a copy of the node from which
the critical anti-dependence emanates
41Node Splitting Example
After node splitting DO I 1, N S1 X(I)
X(I1) S1 A(I) X(I) X(I) S2 X(I1) B(I)
10 ENDDO
- Original
- DO I 1, N
- S1 A(I) X(I1) X(I)
- S2 X(I1) B(I) 10
- ENDDO
Vectorization S1 X(1N) X(2N1) S2 X(2N1)
B(1N) 10 S1 A(1N) X(1N) X(1N)
42Today
- Coarse-grained parallelization
- Vectorization
- Locality
- Loop Transformations that may improve locality
- Loop interchange
- Loop blocking/tiling
- Loop Transformations that enable further
optimizations - Loop fusion
- Loop skewing
43Motivation
- Cache is important to computer system
- A fast buffer between CPU and memory
- Stores most recently/most frequently accessed
data - Affects system performance, cost, energy
consumption - Locality
- Cache effectiveness depends on program reuse
pattern - Temporal locality
- LRU replacement policy
- Spatial locality
- Cache blocks(lines)
44CPU Memory Speed Gap
- From Patterson Hennesy, Computer
architecture, Morgan Kaufmann Publishers.
45Motivation ExampleSpatial Reuse
M
J
DO I 1, N DO J 1, M S A(I, J) A(I,
J) B(I,J) ENDDO ENDDO
I
I1
- Array storage
- Fortran style column-major
- Access pattern
- J-loop iterate over a row-A(I,J) with I fixed
- I-loop iterate over different rows
- Potential spatial reuse
- Cache misses
- Could be NM for A(I,J) if M is large enough
N
46Motivation ExampleSpatial Reuse
M
J
DO J 1, M DO I 1, N S A(I, J) A(I,
J) B(I,J) ENDDO ENDDO
I
N
- Interchanging I-loop and J-loop
- Access pattern
- I-loop iterate over columns-A(I,J) with J fixed
- Spatial locality exploited N/b misses given b as
the cache line length in words - Cache misses
- Always NM/b for A(I,J) assuming a perfect
alignment - Similar result for B(I,J)
47Motivation ExampleTemporal Reuse
DO I 1, N DO J 1, M S A(I) A(I)
B(J) ENDDO ENDDO
- Assume block size b1
- Access pattern
- A(I) reused for different J-loop iterations N
misses - B(J) reuses limited by cache size
- When M is large, NM misses for LRU replacement
policy - Strip-mine-interchage
- Divide the large array into small sections
- B(jj) M misses
- A(I) NM/S misses
48Profitability of Loop Interchange
- Choosing the right loop to put innermost is
critical - Spatial reuse consecutive loop iterations access
adjacent memory locations - Temporal reuse consecutive loop iterations
access the same set of memory locations - Not always a clear-cut
- Cache misses
- NM for B, N/b for D
- Misses after interchange
- NM/b for B, NM/b for D
- When should we interchange?
- N/b NM -2NM/b gt 0 ? M(b-2) 1 gt0
DO I 1, N DO J 1, M S
D(I)D(I)B(I,J) ENDDO ENDDO //assume
column-major
49A Heuristic Approach
- Carr, McKinley, Tseng, Compiler Optimizations
for Improving Data Locality, ASPLOS94 - For each loop l, attach a cost to each reference
in the loop nest as if l is the innermost loop - Rank loops using the attached loop cost
- Reorder loops from lowest cost to highest
- Place the loop with the lowest cost in the
innermost position, if direction matrix shows it
can legally be placed there
50Cost Assignment
- Attach a cost to each reference as if the loop
considered is the innermost loop - Cost1 if the reference does not depend on the
loop induction variable - CostN if the reference is non-consecutive
(induction variable strides over a noncontiguous
dimension) - Cost Ns/b if the reference is consecutive in
small steps of size s (induction variables
strides over a contiguous dimension) - Multiply the cost by the trip count of each outer
loop - Intuitively, the cost approximates the total
number of cache lines accessed / the total number
of cache misses
DO I1,N A(J) ENDDO
DO J1,N A(I,J) ENDDO
DO I1,N,s A(I,J) ENDDO
51Example Matrix Multiplication
DO I 1, N DO J 1, N DO K 1, N S
C(I,J) C(I,J) A(I,K)B(K,J) ENDDO
ENDDO ENDDO
- Ideal loop order I-innermost, J-outermost
- Direction matrix lt
52Example Matrix Multiplication
DO I 1, N DO J 1, N DO K 1, N S
C(I,J) C(I,J) A(I,K)B(K,J) ENDDO
ENDDO ENDDO
DO J 1, N DO K 1, N DO I 1, N S
C(I,J) C(I,J) A(I,K)B(K,J) ENDDO
ENDDO ENDDO
53Loop Fusion
- Takes multiple compatible loop nests and combines
their bodies into one loop nest - Is legal if no data dependences are reversed
- Improves locality directly by merging accesses to
the same cache line into one loop iteration - Also enables further loop interchanges generate
perfect loop nests
DO I 2, N DO K 1, N X(I,K)X(I,K)-X(I-1,
K)A(I,K)/B(I-1,K) ENDDO DO K 1, N
B(I,K)B(I,K)-A(I,K)/B(I-1,K) ENDDO ENDDO
54Loop Fusion
DO I 2, N DO K 1, N X(I,K)X(I,K)-X(I-1,
K)A(I,K)/B(I-1,K) B(I,K)B(I,K)-A(I,K)/B(I-1,K)
ENDDO ENDDO
- After loop fusion
- After fusion interchange
DO K 1, N DO I 2, N X(I,K)X(I,K)-X(I-1,
K)A(I,K)/B(I-1,K) B(I,K)B(I,K)-A(I,K)/B(I-1,K)
ENDDO ENDDO
55Loop Blocking
- Example revisited
- Spatial locality of B(I,J) exploited
- How about D(I)?
- Long-term reuse separated by N I-loop iterations
- What if we reduce the number of intervening
iterations?
DO J 1, M DO I 1, N S
D(I)D(I)B(I,J) ENDDO ENDDO
56Strip-mine-and-interchange
DO J 1, M DO I 1, N S
D(I)D(I)B(I,J) ENDDO ENDDO
- Iterates on smaller strips of I-dimension
- 2NM/b N/bNM/b (11/M)NM/b
D
B
57Loop Blocking
- Splitting into smaller strips always legal
- Interchanging the by-strip loop to the outside of
some containing loop not always legal - Condition after interchange, no direction vector
has gt as the leftmost non- direction - Could be overly conservative
- Blocking is profitable if there is reuse between
iterations of a loop that is not the innermost
loop
DO I 1, N, S DO J 1, M DO iiI,
min(IS-1,N) S . . . ENDDO ENDDO ENDDO
58References
- Padua and Wolfe, Advanced Compiler Optimizations
for Supercomputers, Communications of the ACM,
1986 (a survey for parallelization
transformations) - Allen and Kennedy, Automatic Loop Interchange,
SIGPLAN SCC, 1984 (loop interchange) - Kennedy and McKinley, Maximizing Loop
Parallelism and Improving Data Locality via Loop
Fusion and Distribution, LCPC 1993 (loop
distribution and fusion) - Carr, McKinley, Tseng, Compiler Optimizations
for Improving Data Locality, ASPLOS, 1994 (loop
interchange and loop fusion for locality) - Wolf and Lam, A Data Locality Optimizing
Algorithm, PLDI, 1991(sophisticated
transformations for locality) - Allen and Kennedy, Optimizing Compilers for
Modern Architectures, Ch5, 6, 9
59Program Dependence Graph
- For procedure P
- Nodes program statements of P
- For each variable v used before defined in P,
there is an added node v initial(). - For each variable v named in Ps end statement,
there is a final(v) node. - Additional distinguished entry vertex
- Edges program dependencies
- Data (output, flow, anti)
- Control
J. Ferrante, K. Ottenstein, J. Warren. The
Program Dependence Graph and its Use in
Optimization. ACM Transactions on Programming
Languages and Systems, Vol. 9, No. 3, July 1987,
pp. 319 - 349
60Example PDG Output Dependences
ENTRY
main() sum 0 i 1 while i lt 11 do sum
sum i i i 1 od end(sum,i)
sum 0
i 1
while i lt 11
final(i)
final(sum)
sum sum i
i i 1
61Example PDG Loop Independent flow
ENTRY
main() sum 0 i 1 while i lt 11 do sum
sum i i i 1 od end(sum,i)
sum 0
i 1
while i lt 11
final(i)
final(sum)
sum sum i
i i 1
62Example PDG Loop Carried Flow
ENTRY
main() sum 0 i 1 while i lt 11 do sum
sum i i i 1 od end(sum,i)
sum 0
i 1
while i lt 11
final(i)
final(sum)
sum sum i
i i 1
63Example PDG
ENTRY
main() sum 0 i 1 while i lt 11 do sum
sum i i i 1 od end(sum,i)
sum 0
i 1
while i lt 11
final(i)
final(sum)
sum sum i
i i 1
Loop carried flow
Loop independent flow
Output
64Control Dependence dc
- V1 dc V2
- Node V2 is control dependent on V1 - During
execution, if when V1 evaluates to c (where c
either true or false), V2 must eventually
execute. - If V2 is control dependent on V1, then V1 must
have two exits. - On exit 1, V2 must execute.
- On exit 2, there is a path where V2 will not
execute.
65Control Dependence
- V1 dc V2
- V2 is control dependent on V1, if
- ? a path from V1 to V2, V2 post-dominates every
vertex p in that path, (p ltgt V1, V2), and - V2 does not strictly post-dominate V1.
- p PDOM v, if every path from v to exit node
includes p
A
exit
C
B
D
A dc B A dc C
D
A
B
C
to exit
66Control Dependence
- V2 is control dependent on V1, if
- ? a path from V1 to V2, V2 post-dominates every
vertex p in that path, (p ltgt V1, V2), and - V2 does not strictly post-dominate V1.
A
A dc B A dc C
C
exit
B
D
D
to exit
A
B
C
67Example PDG Control Dependences
ENTRY
main() sum 0 i 1 while i lt 11 do sum
sum i i i 1 od end(sum,i)
T
T
T
T
T
sum 0
i 1
while i lt 11
final(i)
final(sum)
T
T
sum sum i
i i 1
68Complete PDG for example
ENTRY
main() sum 0 i 1 while i lt 11 do sum
sum i i i 1 od end(sum,i)
T
T
T
T
T
sum 0
i 1
while i lt 11
final(i)
final(sum)
T
T
sum sum i
i i 1
Control
Loop carried flow
Loop independent flow
Output
69Using PDGs
- Constant Propagation folding
- Via graph walking
- Code Motion
- Slicing
- Basis for system-level analysis
70Using PDG for Code Motion
i 1
i lt 100
do i 1, 100 k i (n2) do j i,
100 ai,j 100 n 10k j
end end
t
f
t1 n 2 k i t1 j i
f
j lt 100
i i 1
t
t2 100n t3 10 k t4 t2 t3 t5 t4
j j j 1
71PDG
i 1 while i lt 100 do k i (n2) j
i while j lt 100 do ai,j 100 n
10k j j j 1 end i i
1 end
ENTRY
T
final(i)
final(j)
while i lt 100
i 1
while j lt 100
t1 n2
k it1
i i1
j i
T
t3 10k
t2 100n
t4 t2t3
t5 t4j
j j1
Control
Loop carried flow
Loop independent flow
Output
72PDG
i 1 while i lt 100 do k i (n2) j
i while j lt 100 do ai,j 100 n
10k j j j 1 end i i
1 end
ENTRY
T
final(i)
final(j)
while i lt 100
i 1
while j lt 100
t1 n2
k it1
i i1
j i
T
t3 10k
t2 100n
t4 t2t3
t5 t4j
j j1
Goal Move invariant statements out of the loop
73PDG
i 1 while i lt 100 do k i (n2) j
i while j lt 100 do ai,j 100 n
10k j j j 1 end i i
1 end
ENTRY
T
final(i)
final(j)
while i lt 100
i 1
while j lt 100
t1 n2
k it1
i i1
j i
T
t3 10k
t2 100n
t4 t2t3
t5 t4j
j j1
Loop carried flow fine
74PDG
i 1 while i lt 100 do k i (n2) j
i while j lt 100 do ai,j 100 n
10k j j j 1 end i i
1 end
ENTRY
T
final(i)
final(j)
while i lt 100
i 1
while j lt 100
t1 n2
k it1
i i1
j i
T
t3 10k
t2 100n
t4 t2t3
t5 t4j
j j1
Output dependencies fine
75PDG
i 1 while i lt 100 do k i (n2) j
i while j lt 100 do ai,j 100 n
10k j j j 1 end i i
1 end
ENTRY
T
final(i)
final(j)
while i lt 100
i 1
while j lt 100
t1 n2
k it1
i i1
j i
T
t3 10k
t2 100n
t4 t2t3
t5 t4j
j j1
Loop independent flow dotted lines unaffected.
76PDG after the move
ENTRY
while i lt 100
t1 n2
t2 100n
final(i)
final(j)
i 1
while j lt 100
k it1
i i1
j i
t3 10k
t4 t2t3
t5 t4j
j j1
Loop independent flow fine
77new PDG
i 1 while i lt 100 do k i (n2) j
i while j lt 100 do ai,j 100 n
10k j j j 1 end i i
1 end
ENTRY
T
final(i)
final(j)
while i lt 100
i 1
while j lt 100
t1 n2
k it1
i i1
j i
T
t3 10k
t2 100n
t4 t2t3
t5 t4j
j j1
Control some issues need to use conditional
to handle null case (which actually isnt an
issue, but )
78Next Lecture
- Topic
- Inter-procedure analysis and optimization
- References
- Dragon Ch12
79Computing Control Dependence from a CFG
- Add slicing edge (entry -gt exit)
- Choose S set of edges (A,B) where B does not
post-dominate A. - For all edges (A,B) in S, if we traverse from B
in the post-dominator tree until we reach A's
parent, all nodes we visit (before A's parent)
are control dependent on A.
80Example 1
T
entry
start
exit
1
7
entry
T
2
3
3
1
6
T
T
4
2
5
start
5
4
True branches labeled (other branch is false).
6
7
exit
81Computing Control Dependence from a CFG
- Add slicing edge (entry -gt exit)
- Choose S set of edges (A,B) where B does not
post-dominate A. - For all edges (A,B) in S, if we traverse from B
in the post-dominator tree until we reach A's
parent, all nodes we visit (before A's parent)
are control dependent on A.
82Example 1
T
entry
start
exit
1
7
entry
T
2
3
3
1
6
T
T
4
2
5
start
5
4
S set of edges (A,B) where B does not
post-dominate A (E,S), (1,2), (1,3), (2,4),
(2,5), (3,5)
6
7
exit
83Determining Control Dependence
- Add slicing edge (entry -gt exit)
- Choose S set of edges (A,B) where B does not
post-dominate A. - For all edges (A,B) in S, if we traverse from B
in the post-dominator tree until we reach A's
parent, all nodes we visit (before A's parent)
are control dependent on A.
84- S (E,S), (1,2), (1,3), (2,4), (2,5), (3,5)
exit
7
entry
3
1
6
4
2
5
start
85- S (E,S), (1,2), (1,3), (2,4), (2,5), (3,5)
Entry
T
T
T
Start
1
7
F
T
T
2
3
T
T
F
T
4
6
5
86Region Nodes
- Used to summarize conditions
Entry
T
T
1
7
F
Entry
T
T
T
T
R3
R5
Start
1
7
F
T
2
3
T
2
3
T
T
R1
R6
F
T
4
6
R2
R4
4
5
5
6
87Example 1
T
entry
start
1
T
2
3
T
T
5
4
6
7
exit
88Example 2
entry
T
A
H
B
G
entry
T
S
B
C
T
D
E
A
F
F
D
C
E
G
H
89- S (entry,A),(B,C),(C,D),(C,E)
- Control Dependencies
H
G
entry
B
A
F
D
C
E