ECE%20669%20Parallel%20Computer%20Architecture%20Lecture%2023%20Parallel%20Compilation

About This Presentation

Title:

ECE%20669%20Parallel%20Computer%20Architecture%20Lecture%2023%20Parallel%20Compilation

Description:

LOOP nest. Data array. Index expressions. Communication weight. Array ... Loop Nests. List of. loops. ECE669 L23: Parallel Compilation. April 29, 2004 ... – PowerPoint PPT presentation

Number of Views:107

Avg rating:3.0/5.0

Slides: 24

Provided by: RussTe7

Learn more at: http://www.ecs.umass.edu

Category:

more less

Transcript and Presenter's Notes

Title: ECE%20669%20Parallel%20Computer%20Architecture%20Lecture%2023%20Parallel%20Compilation

1
ECE 669Parallel Computer ArchitectureLecture
23Parallel Compilation
2
Parallel Compilation

Two approaches to compilation
Parallelize a program manually
Sequential code converted to parallel code
Develop a parallel compiler
Intermediate form
Partitioning
Block based or loop based
Placement
Routing

3
Compilation technologies for parallel machines

Assumptions
Input Parallel program
Output Coarse parallel program
directives for
Which threads run in 1 task
Where tasks are placed
Where data is placed
Which data elements go in each data chunk
Limitation No special optimizations
for synchronization --
synchro mem refs treated
as any other comm.

4
Toy example

Loop parallelization

5
Example

Matrix multiply
Typically,
Looking to find parallelism...

6
Choosing a program representation...

Dataflow graph
No notion of storage
problem
Data values flow along arcs
Nodes represent operations

7
Compiler representation

For certain kinds of structured programs
Unstructured programs

Array
A
Data array
Index expressions
LOOP
LOOP nest
Communication weight
Data X
Task B
Task A
8
Process reference graph

Nodes represent threads (processes) computation
Edges represent communication (memory references)
Can attach weights on edges to represent volume
of communication
Extension precedence relations edges can be
added too
Can also try to represent multiple loop produced
threads as one node

9
Process communication graph

Allocate data items to nodes as well
Nodes Threads, data objects
Edges Communication
Key Works for both shared-memory,
object-oriented, and dataflow systems! (Msg.
passing)

10
PCG for Jacobi relaxation
11
Compilation with PCGs

Fine process communication graph
Partitioning
Coarse process communication graph
12
Compilation with PCGs

Fine process communication graph
Partitioning
Coarse process communication graph
MP
Placement
Coarse process communication graph
... other phases, scheduling. Dynamic?
13
Parallel Compilation

Consider loop partitioning
Create small local compilation
Consider static routing between tiles
Short neighbor-to-neighbor communication
Compiler orchestrated

14
Flow Compilation

Modulo unrolling
Partitioning
Scheduling

15
Modulo Unrolling Smart Memory

Loop unrolling relies on dependencies
Allow maximum parallelism
Minimize communication

16
Array Partitioning Smart Memory

Assign each line to separate memory
Consider exchange of data
Approach is scalable

17
Communication Scheduling Smart Memory

Determine where data should be sent
Determine when data should be sent

18
Speedup for Jacobi Smart Memory

Virtual wires indicates scheduled paths
Hard wires are dedicated paths
Hard wires require more wiring resources
RAW is a parallel processor from MIT

19
Partitioning

Use heuristic for unstructured programs
For structured programs...
...start from

Arrays
A
B
C
List of arrays
List of loops
L0
L1
L2
Loop Nests
20
Notion of Iteration space, data space

E.g.

Matrix
A
Data space
Iteration space
j
Represents a thread with a given value of i,j
i
21
Notion of Iteration space, data space
E.g.

Partitioning How to tile iteration for MIMD
M/Cs data spaces?

Matrix
A
Data space
This thread affects the above computation
Iteration space
j
Represents a thread with a given value of i,j
i
22
Loop partitioning for caches