Dependence Modeling - PowerPoint PPT Presentation

1 / 88
About This Presentation
Title:

Dependence Modeling

Description:

Should preserve every dependence in the program ... Process the loop nest level by level, starting from the outmost loop. For each level ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 89
Provided by: yutao
Category:

less

Transcript and Presenter's Notes

Title: Dependence Modeling


1
Dependence Modeling Parallelization Part 2
  • CS 640

2
Review Dependence Model
  • Dependence
  • Flow(true), anti, output
  • Loop dependence
  • Distance vector
  • Direction vector
  • Reordering transformation
  • Should preserve every dependence in the program
  • Left-most non- element of direction vector
    must be lt
  • Source occurs before sink

3
Today
  • Coarse-grained parallelization
  • Loop interchange
  • Privatization, loop alignment, loop distribution,
    loop fusion
  • Vectorization
  • Locality
  • Program Dependence Graphs

4
Loop Parallelization
  • It is always valid to convert a sequential loop
    to a parallel loop if the loop carries no
    dependence. Assign each iteration to different
    processor
  • Loop-carried dependences excluded
  • Loop-independent dependences parallelizing a
    loop will not change the order of statements
    within a loop body

5
Coarse-Grained Parallelism
  • Target machine symmetric multiprocessor
  • Multiple processors with a shared memory
  • Parallelism employed by creating and executing a
    process on each processor
  • PARALLEL DO
  • Expensive overhead processes initiation and
    synchronization(barrier)
  • Parallelism concern for high performance
  • Find and package parallelism with a granularity
    large enough to compensate the overhead
  • Delicate trade-off between overhead minimization
    and load balancing
  • Simply stated goal identify independent loop
    iterations to execute on different processors

6
Loop Transformations
  • To promote coarse-grained parallelism
  • Perfect loop-nest methods
  • Loop interchange
  • Single-loop methods
  • Privatization
  • Loop alignment
  • Loop distribution
  • Loop fusion

7
Loop-Carried Dependences
  • The level of a loop-carried dependence is the
    index of the left-most non- of the direction
    vector D(i,j) for the dependence
  • The out-most loop critical to the execution order
  • Direction vector (, , lt)
  • Level 3
  • Denotation of this dependence S?3S

DO I 1, N DO J 1, M DO K 1, L S
A(I, J, K1) A(I, J, K) C
ENDDO ENDDO ENDDO
Innermost loop can be run across NxM processors
8
Example
D(i,j,k) (, lt, lt)
DO I 1, 10 DO J 1, 10 DO K 1, 10 S
A(I, J2, K3) A(I, J, K) C
ENDDO ENDDO ENDDO
J/K loops can be run across 10 processors
9
Loop Interchange Example
  • After interchanging I-loop and J-loop
  • PARALLEL DO J 1, N
  • DO I 1, N
  • A(I1, J) A(I, J) B(I, J)
  • ENDDO
  • END PARALLEL DO

DV(J,I) ( , lt)
1 barrier needed
10
Loop Interchange Profitability
  • Not always possible to move a parallel loop
    outward and have it remain free of dependence
  • Example
  • DO J 1, N
  • DO I 1, N
  • A(I1, J1) A(I, J) B(I, J)
  • ENDDO
  • ENDDO
  • The best we can do parallelize the inner loop

DV(J,I) ( lt, lt)
11
Loop Interchange Safety
  • Loop interchange switching the nesting order of
    two loops in a perfect nest
  • A reordering transformation
  • Not all loop interchanges are legal
  • DO J 1, M
  • DO I 1, N
  • A(I,J1) A(I1,J) B
  • ENDDO
  • ENDDO
  • Theorem let D be a direction vector for a
    dependence in a perfect loop nests, the direction
    vector of the same dependence after any loop
    permutation is determined by applying the same
    permutation to the elements of D

DV(J,I) (lt,gt)
12
Direction Matrix
  • Loop interchange might affect all dependences
    involved in a loop nest
  • We need a way to describe the original and
    updated groups of dependences
  • We define direction matrix for a nest of loops
  • A matrix in which each row is a direction vector
    for some dependence of the nest
  • Every dependence in the loop has its direction
    vector represented by a row in the matrix

DO I 1, N DO J 1, M DO K 1, L
A(I1,J1,K) A(I,J,K) A(I,J1,K1)
ENDDO ENDDO ENDDO
13
Loop Interchange Safety
  • The direction matrix after loop interchange can
    be computed by apply the same permutation of
    loops to its columns
  • Theorem A permutation of the loops in a perfect
    nest is legal if and only if the updated
    direction matrix has no "gt" direction as the
    leftmost non-"" direction in any row.

14
Loop Interchange Profitability
  • Theorem 6.3 In a perfect nest of loops, a
    particular loop can be parallelized at the
    outermost level if and only if the column of the
    direction matrix for that nest contains only
    entries.

15
Loop Selection and Interchange
  • DO I1,N
  • PARALLEL DO J1,M
  • DO K1,L
  • S1 A(I1,J,K) A(I,J,K) X1
  • S2 B(I,J,K1) B(I,J,K) X2
  • S3 C(I1,J1,K1) C(I,J,K) X3
  • ENDDO
  • END PARALLEL DO
  • ENDDO
  • General case of loop-selection NP-complete

16
Privatization
  • DO I 1, N
  • S1 T A(I)
  • S2 A(I) B(I)
  • S3 B(I) T
  • ENDDO
  • Privatization
  • Determine privatizable variable T assigned
    within the loop, assigned value used only in the
    same iteration in which it is assigned
  • Replicate T across different iterations

Privatization PARALLEL DO I1,N PRIVATE t S1
t A(I) S2 A(I) B(I) S3 B(I) t END
PARALLEL DO
17
Loop Alignment
  • Example
  • DO I 2, N
  • S1 A(I)B(I)C(I)
  • S2 D(I)A(I-1)2.0
  • ENDDO

S1?1S2
S1??S2
18
Loop Alignment
  • Loop alignment works by
  • Increasing the number of iterations, and
  • Executing statements on slightly different
    subsets of those iterations
  • Changing a loop-carried dependence into a
    loop-independent dependence

DO I 1, N1 IF (Igt1) A(I)B(I)C(I) IF
(IltN) D(I1)A(I)2.0 ENDDO
19
Loop Distribution
  • Break one loop into multiple smaller loops
  • May not be desired for coarse-grained parallelism
  • Barriers needed between distributed loops
  • May facilitate other transformations

20
Loop Distribution
  • Example
  • DO I 1, N
  • S1 A(I)B(I)1
  • S2 C(I)A(I)C(I-1)
  • S3 D(I)A(I)X
  • ENDDO

Only 1 loop-carried dependence(S2)
21
Loop Fusion
L1
L1 PARALLEL DO I 1, N A(I)B(I)1 L3
D(I)A(I)X END PARALLEL DO L2 DO I 1, N
C(I)A(I)C(I-1) ENDDO
L3
L2
  • Merge small loops into a single loop with a
    larger loop body
  • Parallelization overhead might be compensated

22
Today
  • Coarse-grained parallelization
  • Vectorization
  • Transformations to increase parallelism
  • Loop interchange
  • Transformations to break recurrence
  • Scalar expansion, scalar/array renaming, node
    splitting
  • Locality

23
Loop Vectorization
  • Loops that carry no dependence can also be
    vectorized
  • How about loops that carry some dependence?

DO I 1, N S A(I) A(I) C ENDDO
v
A(1N) A(1N) C
DO I 1, N S1 A(I1) B(I) C S2 D(I)
A(I) E ENDDO
?
24
Loop Vectorization
DO I 1, N S1 A(I1) B(I) C S2 D(I)
A(I) E ENDDO
S1 ?1 S2
S1 ?? S2
  • We can split the loop into two loops. The
    loop-carried dependence is transformed into a
    loop-independent dependence

25
More Examples
DO I 1, N S1 D(I) A(I) E S2 A(I1)
B(I) C ENDDO
S2 ?1 S1
  • First interchange S1 and S2, then vectorize the
    loop
  • The loop-independent dependence prevents us from
    interchanging S1 and S2 ? cannot be vectorized

DO I 1, N S1 B(I) A(I) E S2 A(I1)
B(I) C ENDDO
S2 ?1 S1 S1 ?? S2
26
Loop Vectorization
  • Loop distribution and vectorization cannot work
    if there is a cycle of dependences
  • Leads us to the first algorithm of vectorization

DO I 1, N S1 B(I1) A(I) E S2
A(I1) B(I) C ENDDO
S2 ?1 S1 S1 ?1 S2
27
Simple Vectorization Algorithm
  • Idea
  • Construct the dependence graph
  • Reduce the strongly connected sub-graphs to
    single nodes
  • Topologically sort the reduced graph
  • Generate code for each node in the sorted graph
  • Single statement vectorize
  • Strongly-connected sub-graph keep the sequential
    loop

DO I 1, N S1 D(I) A(I) E S2 A(I1)
B(I) C ENDDO
28
Problems with Simple Vectorization
  • Some opportunities are missed
  • S is contained in a dependence cycle ?our simple
    vectorizaion algorithm cannot parallelize the
    loop
  • Direction vector (lt, )
  • From theorem 2.4, as far as we keep I-loop
    sequential, the dependence is preserved

DO I 1, N DO J 1, M S A(I1,J)
A(I,J) B ENDDO ENDDO
DO I 1, N A(I1,1M) A(I,1M) B ENDDO
29
Advanced Vectorization Algorithm
  • Idea
  • Construct the dependence graph
  • Process the loop nest level by level, starting
    from the outmost loop
  • For each level
  • Find the strongly-connected subgraph and perform
    the topological sorting
  • Vectorize single statements of the sorted graph
  • Otherwise, generate a sequential loop for the
    current level, update dependence graph, and
    recursively process the next loop level

30
Example
  • DO I 1, 100
  • S1 X(I) Y(I) 10
  • DO J 1, 100
  • S2 B(J) A(J,N)
  • DO K 1, 100
  • S3 A(J1,K)B(J)C(J,K)
  • ENDDO
  • S4 Y(IJ) A(J1, N)
  • ENDDO
  • ENDDO

DO I 1, 100 DO J 1, 100 B(J) A(J,N)
//S2 A(J1,1100)B(J)C(J,1100)
//S3 ENDDO Y(I1I101)A(2101,N) //vectorized
S4 ENDDO X(1100) Y(1100) 10 //vectorized S1
31
Vectorization other issues
  • Basic idea find all the possible parallelism by
    loop distribution and statement reordering
  • May not work if cyclic dependences exist,
    especially when they are carried by the inner
    loops
  • Move the cycle outwards
  • Loop interchange
  • Break the cycle
  • Scalar expansion
  • Scalar array renaming
  • Node splitting

32
Example
  • DO I 1, N
  • DO J 1, M
  • S A(I,J1) A(I,J) B
  • ENDDO
  • ENDDO

Direction Vector (I,J) (,lt)
  • Dependence carried in the inner loop
  • Codegen() would generate two sequential loops
  • Is this loop nest really not vectorizable?

33
Loop Interchange Vectorization
  • Motivation
  • If we deal with loops containing cyclic
    dependences early on in the loop nest, we can
    potentially vectorize more loops
  • Inward-shifting loops that carry no dependences
  • Theorem In a perfect loop nest, loops carry no
    dependence are legal to be shifted inward and
    will not carry any dependences in their new
    position. Note, we are moving the dependences in
    the opposite direction than we did for
    parallelization

34
Loop interchange Vectorization
  • DO I 1, N
  • DO J 1, M
  • S A(I,J1) A(I,J) B
  • ENDDO
  • ENDDO

Direction Vector (I,J) (,lt)
  • After interchanging I-loop and J-loop
  • DO J 1, M
  • DO I 1, N
  • S A(I,J1) A(I,J) B
  • ENDDO
  • ENDDO

Direction Vector (J,I) (lt,)
  • Vectorization
  • DO J 1, M
  • S A(1N,J1) A(1N,J) B
  • ENDDO

35
Scalar Expansion
  • DO I 1, N
  • S1 T A(I)
  • S2 A(I) B(I)
  • S3 B(I) T
  • ENDDO

Vectorization S1 T(1N) A(1N) S2 A(1N)
B(1N) S3 B(1N) T(1N) T T(N)
Scalar expansion DO I 1, N S1 T(I)
A(I) S2 A(I) B(I) S3 B(I)
T(I) ENDDO T T(N)
  • Scalar expansion replaces T with a
    compiler-generated temporary array T that has a
    location for each loop iteration

36
Scalar Expansion
  • Always safe to be applied
  • May need SSA to decide how to rewrite the code
  • Not always profitable
  • DO I 1, N
  • T T A(I) A(I-1)
  • A(I) T
  • ENDDO
  • After scalar expansion
  • T(0) T
  • DO I 1, N
  • S1 T(I) T(I-1) A(I) A(I-1)
  • S2 A(I) T(I)
  • ENDDO
  • T T(N)

37
Scalar Expansion Profitability
  • Dependences due to reuse of memory location vs.
    reuse of values
  • Dependences due to reuse of values must be
    preserved
  • Dependences due to reuse of memory location can
    be deleted by expansion
  • Overhead
  • Increased memory consumption

38
Scalar Renaming
After renaming T DO I 1, 100 S1 T1 A(I)
B(I) S2 C(I) T1 T1 S3 T2 D(I) -
B(I) S4 A(I1) T2 T2 ENDDO
  • DO I 1, 100
  • S1 T A(I) B(I)
  • S2 C(I) T T
  • S3 T D(I) - B(I)
  • S4 A(I1) T T
  • ENDDO

Vectorization S3 T2(1100) D(1100) -
B(1100) S4 A(2101) T2(1100)
T2(1100) S1 T1(1100) A(1100)
B(1100) S2 C(1100) T1(1100)
T1(1100) T T2(100)
39
Array Renaming
After renaming A DO I 1, N S1 A(I)
A(I-1) X S2 Y(I) A(I) Z S3 A(I)
B(I) C ENDDO
  • Original
  • DO I 1, N
  • S1 A(I) A(I-1) X
  • S2 Y(I) A(I) Z
  • S3 A(I) B(I) C
  • ENDDO

Vectorization S3 A(1N) B(1N) C S1
A(1N) A(0N-1) X S2 Y(1N) A(1N) Z
40
Node Splitting
  • DO I 1, N
  • S1 A(I) X(I1) X(I)
  • S2 X(I1) B(I) 10
  • ENDDO
  • Renaming does not work because the two
    dependences share one single access to X(I1)
  • Renaming will try to give both name spaces the
    original array name
  • Solution creating a copy of the node from which
    the critical anti-dependence emanates

41
Node Splitting Example
After node splitting DO I 1, N S1 X(I)
X(I1) S1 A(I) X(I) X(I) S2 X(I1) B(I)
10 ENDDO
  • Original
  • DO I 1, N
  • S1 A(I) X(I1) X(I)
  • S2 X(I1) B(I) 10
  • ENDDO

Vectorization S1 X(1N) X(2N1) S2 X(2N1)
B(1N) 10 S1 A(1N) X(1N) X(1N)
42
Today
  • Coarse-grained parallelization
  • Vectorization
  • Locality
  • Loop Transformations that may improve locality
  • Loop interchange
  • Loop blocking/tiling
  • Loop Transformations that enable further
    optimizations
  • Loop fusion
  • Loop skewing

43
Motivation
  • Cache is important to computer system
  • A fast buffer between CPU and memory
  • Stores most recently/most frequently accessed
    data
  • Affects system performance, cost, energy
    consumption
  • Locality
  • Cache effectiveness depends on program reuse
    pattern
  • Temporal locality
  • LRU replacement policy
  • Spatial locality
  • Cache blocks(lines)

44
CPU Memory Speed Gap
  • From Patterson Hennesy, Computer
    architecture, Morgan Kaufmann Publishers.

45
Motivation ExampleSpatial Reuse
M
J
DO I 1, N DO J 1, M S A(I, J) A(I,
J) B(I,J) ENDDO ENDDO
I
I1
  • Array storage
  • Fortran style column-major
  • Access pattern
  • J-loop iterate over a row-A(I,J) with I fixed
  • I-loop iterate over different rows
  • Potential spatial reuse
  • Cache misses
  • Could be NM for A(I,J) if M is large enough

N
46
Motivation ExampleSpatial Reuse
M
J
DO J 1, M DO I 1, N S A(I, J) A(I,
J) B(I,J) ENDDO ENDDO
I
N
  • Interchanging I-loop and J-loop
  • Access pattern
  • I-loop iterate over columns-A(I,J) with J fixed
  • Spatial locality exploited N/b misses given b as
    the cache line length in words
  • Cache misses
  • Always NM/b for A(I,J) assuming a perfect
    alignment
  • Similar result for B(I,J)

47
Motivation ExampleTemporal Reuse
DO I 1, N DO J 1, M S A(I) A(I)
B(J) ENDDO ENDDO
  • Assume block size b1
  • Access pattern
  • A(I) reused for different J-loop iterations N
    misses
  • B(J) reuses limited by cache size
  • When M is large, NM misses for LRU replacement
    policy
  • Strip-mine-interchage
  • Divide the large array into small sections
  • B(jj) M misses
  • A(I) NM/S misses

48
Profitability of Loop Interchange
  • Choosing the right loop to put innermost is
    critical
  • Spatial reuse consecutive loop iterations access
    adjacent memory locations
  • Temporal reuse consecutive loop iterations
    access the same set of memory locations
  • Not always a clear-cut
  • Cache misses
  • NM for B, N/b for D
  • Misses after interchange
  • NM/b for B, NM/b for D
  • When should we interchange?
  • N/b NM -2NM/b gt 0 ? M(b-2) 1 gt0

DO I 1, N DO J 1, M S
D(I)D(I)B(I,J) ENDDO ENDDO //assume
column-major
49
A Heuristic Approach
  • Carr, McKinley, Tseng, Compiler Optimizations
    for Improving Data Locality, ASPLOS94
  • For each loop l, attach a cost to each reference
    in the loop nest as if l is the innermost loop
  • Rank loops using the attached loop cost
  • Reorder loops from lowest cost to highest
  • Place the loop with the lowest cost in the
    innermost position, if direction matrix shows it
    can legally be placed there

50
Cost Assignment
  • Attach a cost to each reference as if the loop
    considered is the innermost loop
  • Cost1 if the reference does not depend on the
    loop induction variable
  • CostN if the reference is non-consecutive
    (induction variable strides over a noncontiguous
    dimension)
  • Cost Ns/b if the reference is consecutive in
    small steps of size s (induction variables
    strides over a contiguous dimension)
  • Multiply the cost by the trip count of each outer
    loop
  • Intuitively, the cost approximates the total
    number of cache lines accessed / the total number
    of cache misses

DO I1,N A(J) ENDDO
DO J1,N A(I,J) ENDDO
DO I1,N,s A(I,J) ENDDO
51
Example Matrix Multiplication
DO I 1, N DO J 1, N DO K 1, N S
C(I,J) C(I,J) A(I,K)B(K,J) ENDDO
ENDDO ENDDO
  • Ideal loop order I-innermost, J-outermost
  • Direction matrix lt

52
Example Matrix Multiplication
DO I 1, N DO J 1, N DO K 1, N S
C(I,J) C(I,J) A(I,K)B(K,J) ENDDO
ENDDO ENDDO
DO J 1, N DO K 1, N DO I 1, N S
C(I,J) C(I,J) A(I,K)B(K,J) ENDDO
ENDDO ENDDO
53
Loop Fusion
  • Takes multiple compatible loop nests and combines
    their bodies into one loop nest
  • Is legal if no data dependences are reversed
  • Improves locality directly by merging accesses to
    the same cache line into one loop iteration
  • Also enables further loop interchanges generate
    perfect loop nests

DO I 2, N DO K 1, N X(I,K)X(I,K)-X(I-1,
K)A(I,K)/B(I-1,K) ENDDO DO K 1, N
B(I,K)B(I,K)-A(I,K)/B(I-1,K) ENDDO ENDDO
54
Loop Fusion
DO I 2, N DO K 1, N X(I,K)X(I,K)-X(I-1,
K)A(I,K)/B(I-1,K) B(I,K)B(I,K)-A(I,K)/B(I-1,K)
ENDDO ENDDO
  • After loop fusion
  • After fusion interchange

DO K 1, N DO I 2, N X(I,K)X(I,K)-X(I-1,
K)A(I,K)/B(I-1,K) B(I,K)B(I,K)-A(I,K)/B(I-1,K)
ENDDO ENDDO
55
Loop Blocking
  • Example revisited
  • Spatial locality of B(I,J) exploited
  • How about D(I)?
  • Long-term reuse separated by N I-loop iterations
  • What if we reduce the number of intervening
    iterations?

DO J 1, M DO I 1, N S
D(I)D(I)B(I,J) ENDDO ENDDO
56
Strip-mine-and-interchange
DO J 1, M DO I 1, N S
D(I)D(I)B(I,J) ENDDO ENDDO
  • Iterates on smaller strips of I-dimension
  • 2NM/b N/bNM/b (11/M)NM/b

D
B
57
Loop Blocking
  • Splitting into smaller strips always legal
  • Interchanging the by-strip loop to the outside of
    some containing loop not always legal
  • Condition after interchange, no direction vector
    has gt as the leftmost non- direction
  • Could be overly conservative
  • Blocking is profitable if there is reuse between
    iterations of a loop that is not the innermost
    loop

DO I 1, N, S DO J 1, M DO iiI,
min(IS-1,N) S . . . ENDDO ENDDO ENDDO
58
References
  • Padua and Wolfe, Advanced Compiler Optimizations
    for Supercomputers, Communications of the ACM,
    1986 (a survey for parallelization
    transformations)
  • Allen and Kennedy, Automatic Loop Interchange,
    SIGPLAN SCC, 1984 (loop interchange)
  • Kennedy and McKinley, Maximizing Loop
    Parallelism and Improving Data Locality via Loop
    Fusion and Distribution, LCPC 1993 (loop
    distribution and fusion)
  • Carr, McKinley, Tseng, Compiler Optimizations
    for Improving Data Locality, ASPLOS, 1994 (loop
    interchange and loop fusion for locality)
  • Wolf and Lam, A Data Locality Optimizing
    Algorithm, PLDI, 1991(sophisticated
    transformations for locality)
  • Allen and Kennedy, Optimizing Compilers for
    Modern Architectures, Ch5, 6, 9

59
Program Dependence Graph
  • For procedure P
  • Nodes program statements of P
  • For each variable v used before defined in P,
    there is an added node v initial().
  • For each variable v named in Ps end statement,
    there is a final(v) node.
  • Additional distinguished entry vertex
  • Edges program dependencies
  • Data (output, flow, anti)
  • Control

J. Ferrante, K. Ottenstein, J. Warren. The
Program Dependence Graph and its Use in
Optimization. ACM Transactions on Programming
Languages and Systems, Vol. 9, No. 3, July 1987,
pp. 319 - 349
60
Example PDG Output Dependences
ENTRY
main() sum 0 i 1 while i lt 11 do sum
sum i i i 1 od end(sum,i)
sum 0
i 1
while i lt 11
final(i)
final(sum)
sum sum i
i i 1
61
Example PDG Loop Independent flow
ENTRY
main() sum 0 i 1 while i lt 11 do sum
sum i i i 1 od end(sum,i)
sum 0
i 1
while i lt 11
final(i)
final(sum)
sum sum i
i i 1
62
Example PDG Loop Carried Flow
ENTRY
main() sum 0 i 1 while i lt 11 do sum
sum i i i 1 od end(sum,i)
sum 0
i 1
while i lt 11
final(i)
final(sum)
sum sum i
i i 1
63
Example PDG
ENTRY
main() sum 0 i 1 while i lt 11 do sum
sum i i i 1 od end(sum,i)
sum 0
i 1
while i lt 11
final(i)
final(sum)
sum sum i
i i 1
Loop carried flow
Loop independent flow
Output
64
Control Dependence dc
  • V1 dc V2
  • Node V2 is control dependent on V1 - During
    execution, if when V1 evaluates to c (where c
    either true or false), V2 must eventually
    execute.
  • If V2 is control dependent on V1, then V1 must
    have two exits.
  • On exit 1, V2 must execute.
  • On exit 2, there is a path where V2 will not
    execute.

65
Control Dependence
  • V1 dc V2
  • V2 is control dependent on V1, if
  • ? a path from V1 to V2, V2 post-dominates every
    vertex p in that path, (p ltgt V1, V2), and
  • V2 does not strictly post-dominate V1.
  • p PDOM v, if every path from v to exit node
    includes p

A
exit
C
B
D
A dc B A dc C
D
A
B
C
to exit
66
Control Dependence
  • V2 is control dependent on V1, if
  • ? a path from V1 to V2, V2 post-dominates every
    vertex p in that path, (p ltgt V1, V2), and
  • V2 does not strictly post-dominate V1.

A
A dc B A dc C
C
exit
B
D
D
to exit
A
B
C
67
Example PDG Control Dependences
ENTRY
main() sum 0 i 1 while i lt 11 do sum
sum i i i 1 od end(sum,i)
T
T
T
T
T
sum 0
i 1
while i lt 11
final(i)
final(sum)
T
T
sum sum i
i i 1
68
Complete PDG for example
ENTRY
main() sum 0 i 1 while i lt 11 do sum
sum i i i 1 od end(sum,i)
T
T
T
T
T
sum 0
i 1
while i lt 11
final(i)
final(sum)
T
T
sum sum i
i i 1
Control
Loop carried flow
Loop independent flow
Output
69
Using PDGs
  • Constant Propagation folding
  • Via graph walking
  • Code Motion
  • Slicing
  • Basis for system-level analysis

70
Using PDG for Code Motion
i 1
i lt 100
do i 1, 100 k i (n2) do j i,
100 ai,j 100 n 10k j
end end
t
f
t1 n 2 k i t1 j i
f
j lt 100
i i 1
t
t2 100n t3 10 k t4 t2 t3 t5 t4
j j j 1
71
PDG
i 1 while i lt 100 do k i (n2) j
i while j lt 100 do ai,j 100 n
10k j j j 1 end i i
1 end
ENTRY
T
final(i)
final(j)
while i lt 100
i 1
while j lt 100
t1 n2
k it1
i i1
j i
T
t3 10k
t2 100n
t4 t2t3
t5 t4j
j j1
Control
Loop carried flow
Loop independent flow
Output
72
PDG
i 1 while i lt 100 do k i (n2) j
i while j lt 100 do ai,j 100 n
10k j j j 1 end i i
1 end
ENTRY
T
final(i)
final(j)
while i lt 100
i 1
while j lt 100
t1 n2
k it1
i i1
j i
T
t3 10k
t2 100n
t4 t2t3
t5 t4j
j j1
Goal Move invariant statements out of the loop
73
PDG
i 1 while i lt 100 do k i (n2) j
i while j lt 100 do ai,j 100 n
10k j j j 1 end i i
1 end
ENTRY
T
final(i)
final(j)
while i lt 100
i 1
while j lt 100
t1 n2
k it1
i i1
j i
T
t3 10k
t2 100n
t4 t2t3
t5 t4j
j j1
Loop carried flow fine
74
PDG
i 1 while i lt 100 do k i (n2) j
i while j lt 100 do ai,j 100 n
10k j j j 1 end i i
1 end
ENTRY
T
final(i)
final(j)
while i lt 100
i 1
while j lt 100
t1 n2
k it1
i i1
j i
T
t3 10k
t2 100n
t4 t2t3
t5 t4j
j j1
Output dependencies fine
75
PDG
i 1 while i lt 100 do k i (n2) j
i while j lt 100 do ai,j 100 n
10k j j j 1 end i i
1 end
ENTRY
T
final(i)
final(j)
while i lt 100
i 1
while j lt 100
t1 n2
k it1
i i1
j i
T
t3 10k
t2 100n
t4 t2t3
t5 t4j
j j1
Loop independent flow dotted lines unaffected.
76
PDG after the move
ENTRY
while i lt 100
t1 n2
t2 100n
final(i)
final(j)
i 1
while j lt 100
k it1
i i1
j i
t3 10k
t4 t2t3
t5 t4j
j j1
Loop independent flow fine
77
new PDG
i 1 while i lt 100 do k i (n2) j
i while j lt 100 do ai,j 100 n
10k j j j 1 end i i
1 end
ENTRY
T
final(i)
final(j)
while i lt 100
i 1
while j lt 100
t1 n2
k it1
i i1
j i
T
t3 10k
t2 100n
t4 t2t3
t5 t4j
j j1
Control some issues need to use conditional
to handle null case (which actually isnt an
issue, but )
78
Next Lecture
  • Topic
  • Inter-procedure analysis and optimization
  • References
  • Dragon Ch12

79
Computing Control Dependence from a CFG
  • Add slicing edge (entry -gt exit)
  • Choose S set of edges (A,B) where B does not
    post-dominate A.
  • For all edges (A,B) in S, if we traverse from B
    in the post-dominator tree until we reach A's
    parent, all nodes we visit (before A's parent)
    are control dependent on A.

80
Example 1
T
entry
start
exit
1
7
entry
T
2
3
3
1
6
T
T
4
2
5
start
5
4
True branches labeled (other branch is false).
6
7
exit
81
Computing Control Dependence from a CFG
  • Add slicing edge (entry -gt exit)
  • Choose S set of edges (A,B) where B does not
    post-dominate A.
  • For all edges (A,B) in S, if we traverse from B
    in the post-dominator tree until we reach A's
    parent, all nodes we visit (before A's parent)
    are control dependent on A.

82
Example 1
T
entry
start
exit
1
7
entry
T
2
3
3
1
6
T
T
4
2
5
start
5
4
S set of edges (A,B) where B does not
post-dominate A (E,S), (1,2), (1,3), (2,4),
(2,5), (3,5)
6
7
exit
83
Determining Control Dependence
  • Add slicing edge (entry -gt exit)
  • Choose S set of edges (A,B) where B does not
    post-dominate A.
  • For all edges (A,B) in S, if we traverse from B
    in the post-dominator tree until we reach A's
    parent, all nodes we visit (before A's parent)
    are control dependent on A.

84
  • S (E,S), (1,2), (1,3), (2,4), (2,5), (3,5)

exit
7
entry
3
1
6
4
2
5
start
85
  • S (E,S), (1,2), (1,3), (2,4), (2,5), (3,5)

Entry
T
T
T
Start
1
7
F
T
T
2
3
T
T
F
T
4
6
5
86
Region Nodes
  • Used to summarize conditions

Entry
T
T
1
7
F
Entry
T
T
T
T
R3
R5
Start
1
7
F
T
2
3
T
2
3
T
T
R1
R6
F
T
4
6
R2
R4
4
5
5
6
87
Example 1
T
entry
start
1
T
2
3
T
T
5
4
6
7
exit
88
Example 2
entry
T
A
H
B
G
entry
T
S
B
C
T
D
E
A
F
F
D
C
E
G
H
89
  • S (entry,A),(B,C),(C,D),(C,E)
  • Control Dependencies

H
G
entry
B
A
F
D
C
E
Write a Comment
User Comments (0)
About PowerShow.com