Enhancing FineGrained Parallelism - PowerPoint PPT Presentation

About This Presentation
Title:

Enhancing FineGrained Parallelism

Description:

Enhancing Fine-Grained Parallelism. Chapter 5 of Allen and Kennedy ... Codegen: tries to find parallelism using transformations of loop distribution ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 28
Provided by: ans116
Learn more at: https://cseweb.ucsd.edu
Category:

less

Transcript and Presenter's Notes

Title: Enhancing FineGrained Parallelism


1
Enhancing Fine-Grained Parallelism
  • Chapter 5 of Allen and Kennedy

Optimizing Compilers for Modern Architectures
2
Fine-Grained Parallelism
  • Techniques to enhance fine-grained parallelism
  • Loop Interchange
  • Scalar Expansion
  • Scalar Renaming
  • Array Renaming
  • Node Splitting

3
Recall Vectorization procedure.
  • procedure codegen(R, k, D)
  • // R is the region for which we must generate
    code.
  • // k is the minimum nesting level of possible
    parallel loops.
  • // D is the dependence graph among statements in
    R..
  • find the set S1, S2, ... , Sm of maximal
    strongly-connected regions in the dependence
    graph D restricted to R
  • construct Rp from R by reducing each Si to a
    single node and compute Dp, the dependence
    graph naturally induced on Rp by D
  • let p1, p2, ... , pm be the m nodes of Rp
    numbered in an order consistent with Dp (use
    topological sort to do the numbering)
  • for i 1 to m do begin
  • if pi is cyclic then begin
  • generate a level-k DO statement
  • let Di be the dependence graph consisting of
    all dependence edges in D that are at level k1
    or greater and are internal to pi
  • codegen (pi, k1, Di)
  • generate the level-k ENDDO statement
  • end
  • else
  • generate a vector statement for pi in r(pi)-k1
    dimensions, where r (pi) is the number of loops
    containing pi
  • end
  • end

4
Can we do better?
  • Codegen tries to find parallelism using
    transformations of loop distribution and
    statement reordering
  • If we deal with loops containing cyclic
    dependences early on in the loop nest, we can
    potentially vectorize more loops
  • Goal in Chapter 5 To explore other
    transformations to exploit parallelism

5
Motivational Example
  • DO J 1, M
  • DO I 1, N
  • T 0.0
  • DO K 1,L
  • T T A(I,K) B(K,J)
  • ENDDO
  • C(I,J) T
  • ENDDO
  • ENDDO
  • codegen will not uncover any vector operations.
    However, by scalar expansion, we can get
  • DO J 1, M
  • DO I 1, N
  • T(I) 0.0
  • DO K 1,L
  • T(I) T(I) A(I,K) B(K,J)
  • ENDDO
  • C(I,J) T(I)
  • ENDDO
  • ENDDO

6
Motivational Example
  • DO J 1, M
  • DO I 1, N
  • T(I) 0.0
  • DO K 1,L
  • T(I) T(I) A(I,K) B(K,J)
  • ENDDO
  • C(I,J) T(I)
  • ENDDO
  • ENDDO

7
Motivational Example II
  • Loop Distribution gives us
  • DO J 1, M
  • DO I 1, N
  • T(I) 0.0
  • ENDDO
  • DO I 1, N
  • DO K 1,L
  • T(I) T(I) A(I,K) B(K,J)
  • ENDDO
  • ENDDO
  • DO I 1, N
  • C(I,J) T(I)
  • ENDDO
  • ENDDO

8
Motivational Example III
  • Finally, interchanging I and K loops, we get
  • DO J 1, M
  • T(1N) 0.0
  • DO K 1,L
  • T(1N) T(1N) A(1N,K) B(K,J)
  • ENDDO
  • C(1N,J) T(1N)
  • ENDDO
  • A couple of new transformations used
  • Loop interchange
  • Scalar Expansion

9
Loop Interchange
  • DO I 1, N
  • DO J 1, M
  • S A(I,J1) A(I,J) B DV
    (, lt)
  • ENDDO
  • ENDDO
  • Applying loop interchange
  • DO J 1, M
  • DO I 1, N
  • S A(I,J1) A(I,J) B DV
    (lt, )
  • ENDDO
  • ENDDO
  • leads to
  • DO J 1, M
  • S A(1N,J1) A(1N,J) B
  • ENDDO

10
Loop Interchange
  • Loop interchange is a reordering transformation
  • Why?
  • Think of statements being parameterized with the
    corresponding iteration vector
  • Loop interchange merely changes the execution
    order of these statements.
  • It does not create new instances, or delete
    existing instances
  • DO J 1, M
  • DO I 1, N
  • S ltsome statementgt
  • ENDDO
  • ENDDO
  • If interchanged, S(2, 1) will execute before S(1,
    2)

11
Loop Interchange Safety
  • Safety not all loop interchanges are safe
  • DO J 1, M
  • DO I 1, N
  • A(I,J1) A(I1,J) B
  • ENDDO
  • ENDDO
  • Direction vector (lt, gt)
  • If we interchange loops, we violate the
    dependence

12
Loop Interchange Safety
  • A dependence is interchange-preventing with
    respect to a given pair of loops if interchanging
    those loops would reorder the endpoints of the
    dependence.

13
Loop Interchange Safety
  • A dependence is interchange-sensitive if it is
    carried by the same loop after interchange. That
    is, an interchange-sensitive dependence moves
    with its original carrier loop to the new level.
  • Example Interchange-Sensitive?
  • Example Interchange-Insensitive?

14
Loop Interchange Safety
  • Theorem 5.1 Let D(i,j) be a direction vector for
    a dependence in a perfect nest of n loops. Then
    the direction vector for the same dependence
    after a permutation of the loops in the nest is
    determined by applying the same permutation to
    the elements of D(i,j).
  • The direction matrix for a nest of loops is a
    matrix in which each row is a direction vector
    for some dependence between statements contained
    in the nest and every such direction vector is
    represented by a row.

15
Loop Interchange Safety
  • DO I 1, N
  • DO J 1, M
  • DO K 1, L
  • A(I1,J1,K) A(I,J,K) A(I,J1,K1)
  • ENDDO
  • ENDDO
  • ENDDO
  • The direction matrix for the loop nest is
  • lt lt
  • lt gt
  • Theorem 5.2 A permutation of the loops in a
    perfect nest is legal if and only if the
    direction matrix, after the same permutation is
    applied to its columns, has no "gt" direction as
    the leftmost non-"" direction in any row.
  • Follows from Theorem 5.1 and Theorem 2.3

16
Loop Interchange Profitability
  • Profitability depends on architecture
  • DO I 1, N
  • DO J 1, M
  • DO K 1, L
  • S A(I1,J1,K) A(I,J,K) B
  • ENDDO
  • ENDDO
  • ENDDO
  • For SIMD machines with large number of FUs
  • DO I 1, N
  • S A(I1,2M1,1L) A(I,1M,1L) B
  • ENDDO
  • Not suitable for vector register machines

17
Loop Interchange Profitability
  • For Vector machines, we want to vectorize loops
    with stride-one memory access
  • Since Fortran stores in column-major order
  • useful to vectorize the I-loop
  • Thus, transform to
  • DO J 1, M
  • DO K 1, L
  • S A(2N1,J1,K) A(1N,J,K) B
  • ENDDO
  • ENDDO

18
Loop Interchange Profitability
  • MIMD machines with vector execution units want
    to cut down synchronization costs
  • Hence, shift K-loop to outermost level
  • PARALLEL DO K 1, L
  • DO J 1, M
  • A(2N1,J1,K) A(1N,J,K) B
  • ENDDO
  • END PARALLEL DO

19
Scalar Expansion
  • DO I 1, N
  • S1 T A(I)
  • S2 A(I) B(I)
  • S3 B(I) T
  • ENDDO
  • Scalar Expansion
  • DO I 1, N
  • S1 T(I) A(I)
  • S2 A(I) B(I)
  • S3 B(I) T(I)
  • ENDDO
  • T T(N)
  • leads to
  • S1 T(1N) A(1N)
  • S2 A(1N) B(1N)
  • S3 B(1N) T(1N)
  • T T(N)

20
Scalar Expansion
  • However, not always profitable. Consider
  • DO I 1, N
  • T T A(I) A(I1)
  • A(I) T
  • ENDDO
  • Scalar expansion gives us
  • T(0) T
  • DO I 1, N
  • S1 T(I) T(I-1) A(I) A(I1)
  • S2 A(I) T(I)
  • ENDDO
  • T T(N)

21
Scalar Expansion Safety
  • Scalar expansion is always safe
  • When is it profitable?
  • Naïve approach Expand all scalars, vectorize,
    shrink all unnecessary expansions.
  • However, we want to predict when expansion is
    profitable
  • Dependences due to reuse of memory location vs.
    reuse of values
  • Dependences due to reuse of values must be
    preserved
  • Dependences due to reuse of memory location can
    be deleted by expansion

22
Scalar Expansion Drawbacks
  • Expansion increases memory requirements
  • Solutions
  • Expand in a single loop
  • Strip mine loop before expansion
  • Forward substitution
  • DO I 1, N
  • T A(I) A(I1)
  • A(I) T B(I)
  • ENDDO
  • DO I 1, N
  • A(I) A(I) A(I1) B(I)
  • ENDDO

23
Scalar Renaming
  • DO I 1, 100
  • S1 T A(I) B(I)
  • S2 C(I) T T
  • S3 T D(I) - B(I)
  • S4 A(I1) T T
  • ENDDO
  • Renaming scalar T
  • DO I 1, 100
  • S1 T1 A(I) B(I)
  • S2 C(I) T1 T1
  • S3 T2 D(I) - B(I)
  • S4 A(I1) T2 T2
  • ENDDO

24
Scalar Renaming
  • will lead to
  • S3 T2(1100) D(1100) - B(1100)
  • S4 A(2101) T2(1100) T2(1100)
  • S1 T1(1100) A(1100) B(1100)
  • S2 C(1100) T1(1100) T1(1100)
  • T T2(100)

25
Node Splitting
  • Sometimes Renaming fails
  • DO I 1, N
  • S1 A(I) X(I1) X(I)
  • S2 X(I1) B(I) 32
  • ENDDO
  • Recurrence kept intact by renaming algorithm

26
Node Splitting
  • DO I 1, N
  • S1 A(I) X(I1) X(I)
  • S2 X(I1) B(I) 32
  • ENDDO
  • Break critical antidependence
  • Make copy of node from which antidependence
    emanates
  • DO I 1, N
  • S1X(I) X(I1)
  • S1 A(I) X(I) X(I)
  • S2 X(I1) B(I) 32
  • ENDDO
  • Recurrence broken
  • Vectorized to
  • X(1N) X(2N1)
  • X(2N1) B(1N) 32
  • A(1N) X(1N) X(1N)

27
Node Splitting
  • Determining minimal set of critical
    antidependences is in NP-C
  • Perfect job of Node Splitting difficult
  • Heuristic
  • Select antidependences
  • Delete it to see if acyclic
  • If acyclic, apply Node Splitting
Write a Comment
User Comments (0)
About PowerShow.com