Using the Iteration Space Visualizer in Loop Parallelization - PowerPoint PPT Presentation

About This Presentation
Title:

Using the Iteration Space Visualizer in Loop Parallelization

Description:

4.8 Unimodular transformation (UT) ... Find a suitable UT reorders the iterations such that the new loop nest has a parallel loop ... – PowerPoint PPT presentation

Number of Views:240
Avg rating:3.0/5.0
Slides: 50
Provided by: markchri
Category:

less

Transcript and Presenter's Notes

Title: Using the Iteration Space Visualizer in Loop Parallelization


1
Using the Iteration Space Visualizer in Loop
Parallelization
  • Yijun YU
  • http//winpar.elis.rug.ac.be/ppt/isv

2
Overview
  • ISV A 3D Iteration Space Visualizer view the
    dependence in the iteration space
  • iteration -- one instance of the loop body
  • space the grid of all index values
  • Detect the parallelism
  • Estimate the speedup
  • Derive a loop transformation
  • Find Statement-level parallelism
  • Future development

3
1. Dependence
program
DO I 1,3 A(I) A(I-1) ENDDO
DOALL I 1,3 A(I) A(I-1) ENDDO
4
1.1 Example1
ISV directive
visualize
5
1.2 Visualize the Dependence
  • A dependence is visualized in an iteration space
    dependence graph

6
1.3 Parallelism?
  • Stepwise view sequential execution
  • No parallelism found
  • However, many programs have parallelism

7
2. Potential Parallelism
  • Time(sequential) number of iterations
  • Dataflow iterations are executed as soon as its
    data are readyTime(dataflow) number of
    iterations on the longest critical path
  • The potential parallelism is denoted byspeedup
    Time(sequential)/Time(dataflow)

8
2.1 Example 2
9
Diophantine Equations Loop bounds
(polytope) Iteration Space Dependencies
10
2.2 Irregular dependence
  • Dependences have non-uniform distance
  • Parallelism Analysis200 iterations over 15 data
    flow steps

Problem How to exploit it?
11
3. Visualize parallelism
  • Find answers to these questions
  • What is the dependence pattern?
  • Is there a parallel loop? (How to find?)
  • What is the maximal parallelism?(How to exploit
    it?)
  • Is the load of parallel tasks balanced?

12
3.1 Example 3
13
3.2 3D Space
14
3.3 Loop parallelizable?
  • The I, J, K loops are in the 3D space 32
    iterations

Simulate sequential execution
  • Which loop can be parallel?

15
3.4 Loop parallelization
  • Interactively try the parallelization

Interactively check a parallel loop I
  • The blinking dependence edges prevent the
    parallelization of the given loop I.

16
3.5 Parallel execution
  • Let ISV find the correct parallelization

Automatically check the parallel loop
  • It takes 16 time steps

Simulateparallel execution
17
3.6 Dataflow execution
  • Sequential execution takes 32 time steps

Simulatedata flow execution
  • Dataflow execution only takes 4 times
    steps
  • Potential speedup8.

18
3.7 Graph partitioning
  • Dataflow speedup 8

Iterating throughpartitions the connected
components
  • All the partitions are load balanced

19
4. Loop Transformation
Potential parallelism
Transformation
Real parallelism
20
4.1 Example 4
21
4.2 The iteration space
  • Sequentially 25 iterations

22
4.3 Loop Parallelizable?
  • check loop I
  • check loop J

23
4.4 Dataflow execution
  • Totally 9 steps
  • Potential speedup
  • 25/92.78
  • Wave front effectall iterations on the same
    wave are on the same line

24
4.5 Zoom-in on the I-space
25
4.6 Speedup vs program size
  • Zoom-in previews parallelism in part of a loop
    without modifying the program
  • Executing the programs of different size n
    estimates a speedup of n2/(2n-1)

26
4.7 How to obtain the potential parallelism
  • Here we already have these metrics
  • Sequential time steps N2
  • Dataflow time step 2N-1
  • potential speedup N2/(2N-1)

How to obtain the potential speedup of a loop?
Transformation.
27
4.8 Unimodular transformation (UT)
Unimodular matrix
New loop index
Old loop index
  • A unimodular matrix is a square integer matrix
    that has unit determinant. It is the result of
    identity matrix by three kinds of basic
    transformations reversal, interchange, and
    skewing
  • The new loop execution order is determined by the
    transformed index. The iteration space remains
    unit step size
  • Find a suitable UT reorders the iterations such
    that the new loop nest has a parallel loop

28
4.9 Hyperplane transformation
  • Interactively define a hyper-plane
  • Observe the plane iteration matches the dataflow
    simulation
  • plane dataflow
  • Based on the plane, ISV calculates a unimodular
    transformation

29
4.10 The derived UT
The transformed iteration space and the
generated loop
30
4.11 Verify the UT
  • ISV checks if the transformation is valid
  • Observe that the parallel loop execution in the
    transformed loop matches the plane execution
  • parallel plane

31
5. Statement-level parallelism
  • Unimodular transformations work at iteration
    level
  • The statement dependence within the loop body is
    hidden in the iteration space graph
  • How to exploit parallelism at statement level?
    Statement to iteration

32
5.1 Example 5
SSV statement space visualization
33
5.2 Iteration-level parallelism
  • The iteration space is 2D.
  • There are N216 iterations
  • The dataflow execution has 2N-17 time steps.
  • The potential speedup is 16/7 2.29

34
5.3 Parallelism in statements
  • The (statement) iteration space is 3D
  • There are 2N232 statements
  • The dataflow execution still has 2N-17 time
    steps.
  • The potential speedup is 32/7 4.58

35
5.4 Comparison
  • Here doubles the potential speedup at iteration
    level

36
5.5 Define the partition planes
  • partitions
  • hyper-planes

37
What is validity?
Show the execution order on top of the dependence
arrows.(for 1 plane or all together, depending
on the density of the slide)
38
5.6 Invalid UT
  • The invalid unimodular transformation derived
    from hyper-plane is refused by ISV
  • Alternatively, ISV calculates the unimodular
    transformation based on the dependence distance
    vectors available in the dependence graph

39
6. Pseudo distance method
  • The pseudo distance method
  • Extract base vectors from the dependent
    iterations
  • Examine if the base vectors generates all the
    distances
  • Calculate the unimodular transformation based on
    the base vectors

40
Another way to find parallelism automatically
The iteration space is a grid,non-uniform
dependencies are members of a uniform dependence
grid, with unknown base-vectors. Finding these
base vectors allows usto extend existing
parallelizationto the non-uniform case.
41
6.1 Dependence distance
  • (0,1,1)
  • (1,0,-1)

42
6.2 The Transformation
  • The transforming matrix discovered by pseudo
    distance method
  • 1 1 0
  • -1 0 1
  • 1 0 0
  • The distance vectors are transformed(1,0,-1)
    (0,1,0)(0,1,1)
    (0,0,1)
  • The dependent iterations have the same first
    index, implies the outermost loop is parallel.

43
6.3 Compare the UT matrices
  • The transforming matrix discovered by pseudo
    distance method
  • 1 1 0
  • -1 0 1
  • 1 0 0
  • An invalid transforming matrix discovered by the
    hyper-plane method
  • 1 0 0
  • -1 1 0
  • 1 0 1

The same first column means the transformed
outermost loops have the same index.
44
6.4 The transformed space
  • The outermost loop is parallel
  • There are 8 parallel tasks
  • The load of tasks is not balanced
  • The longest task takes 7 time steps

45
7. Non-perfectly nested loop
  • What is it?
  • The unimodular transformations only work for
    perfectly nested loops
  • For non-perfectly nested loop, the iteration
    space is constructed with extended indices
  • N fold non-perfectly nested loop to a N1 fold
    perfectly nested loop

46
7.1 Perfectly nested Loop?
  • Non-perfectly nested loop
  • DO I1 1,3
  • A(I1) A(I1-1)
  • DO I2 1,4
  • B(I1,I2) B(I1-1,I2)B(I1,I2-1)
  • ENDDO
  • ENDDO

Perfectly nested loop DO I1 1,3 DO I2 1,5
DO I3 0,1 IF (I2.EQ.1.AND.I3.EQ.0) A(I1)
A(I1-1) ELSE IF(I3.EQ.1)
B(I1-1,I2)B(I1-2,I2)B(I1-1,I2-1) ENDDO
ENDDO ENDDO
47
7.2 Exploit parallelism with UT
48
8. Applications
49
9. Future considerations
  • Weighted dependence graph
  • More semantics on data locality
  • data space graph, data communication graph
  • data reuse iteration space graph,
  • More loop transformation
  • Affine (statement) iteration space mappings
  • Automatic statement distribution
  • Integration with Omega library
Write a Comment
User Comments (0)
About PowerShow.com