Title: Message Passing and MPI Collective Operations and Buffering
1Message Passing and MPICollective Operations and
Buffering
2Example Jacobi relaxation
Pseudocode A, Anew NxN 2D-array of (FP) numbers
loop (how many times?) for each I 1, N
for each J between 1, N AnewI,J
average of 4 neighbors and itself. Swap Anew
and A End loop
Red and Blue boundaries held at fixed values (say
temperature) Discretization divide the space
into a grid of cells. For all cells except those
on the boundary iteratively compute temperature
as average of their neighboring cells
3How to parallelize?
- Decide to decompose data
- What options are there? (e.g. 16 processors)
- Vertically
- Horizontally
- In square chunks
- Pros and cons
- Identify communication needed
- Let us assume we will run for a fixed number of
iterations - What data do I need from others?
- From whom specifically?
- Reverse the question Who needs my data?
- Express this with sends and recvs..
4Ghost cells a common apparition
- The data I need from neighbors
- But that I dont modify (therefore dont own)
- Can be stored in my data structures
- So that my inner loops dont have to know about
communication at all.. - They can be written as if they are sequential
code.
5Convergence Test
- Notice that all processors must report their
convergence - Only if all have converged the program has
converged - Send data to one processor (say 0)
- If you are running on 1000 processors?
- Too much overhead on that one processor
(serialization) - Use spanning tree
- Simple one processor Ps parents are (P-1)/2
- Children 2P1 2P2
- Is that the best spanning tree?
- Depends on the machine!
- MPI supports a single interface
- Imple,ented differently on different machines
6MPI_Reduce
- Reduce data, and use the result on root.
MPI_Reduce(data, result, size, MPI_Datatype,
MPI_Op, amIroot, communicator) MPI_Allreduce(data
, result, size, MPI_Datatype, MPI_Op, amIroot,
communicator)
7Others collective ops
- Barriers, Gather, Scatter
MPI_Barrier(MPI_Comm) MPI_Gather(sendBuf, size,
dataType, recvBuf, rcvSize, recvType,
root,comm) MPI_Scatter() MPI_AllGather(.. No
root..) MPI_AllScatter(. .)
8Collective calls
- Message passing is often, but not always, used
for SPMD style of programming - SPMD Single process multiple data
- All processors execute essentially the same
program, and same steps, but not in lockstep - All communication is almost in lockstep
- Collective calls
- global reductions (such as max or sum)
- syncBroadcast (often just called broadcast)
- syncBroadcast(whoAmI, dataSize, dataBuffer)
- whoAmI sender or receiver
9Other Operations
- Collective Operations
- Broadcast
- Reduction
- Scan
- All-to-All
- Gather/Scatter
- Support for Topologies
- Buffering issues optimizing message passing
- Data-type support