Message Passing Programming

About This Presentation

Title:

Message Passing Programming

Description:

SPMD (single program, multiple data) model. First look at building blocks ... Prints hello world from each processor. Sending/Receiving Messages ... – PowerPoint PPT presentation

Number of Views:66

Avg rating:3.0/5.0

Slides: 47

Provided by: carltr

Category:

more less

Transcript and Presenter's Notes

Title: Message Passing Programming

1
Message Passing Programming

Carl Tropper
Department of Computer Science

2
Generalities

Structure of message passing programs
Asynchronous
SPMD (single program, multiple data) model
First look at building blocks
Send and receive operations
Blocking and unblocking versions
MPI (the standard) specifics

3
Send and receive operations

send(void sendbuf, int nelems, int dest)
receive(void recvbuf, int nelems, int source)
Nelenselements to be sent/received
P0 sends data to P1
P0 P1
a 100 receive(a, 1, 0)
send(a, 1, 1) printf("d\n", a)
a 0
Good semantics- P1 to receives 100
Bad semantics-P1 receives 0
Could happen because dma and comm hardware could
return before 100 is actually sent

4
Blocking message passing operations

Handshake-
Sender asks to send, receiver agrees to receive
Sender sends, receiver receives
Implemented without buffers

5
Deadlocks in Blocking, non buffered send/receive

P0 P1
send(b, 1, 1) send(b, 1, 0)
receive(a, 1, 1) receive(a, 1, 0)
Both sends wait for both receives-DEADLOCK
Can cure this deadlock by reversing the send and
receive ops (e.g. in P1)
Ugh

6
Send/Receive Blocking Buffered

Buffers used at sender and receiver
Dedicated comm hardware at both ends
If sender has no buffer but receiver does, still
can be made to work (rhs below)

7
The impact of non-infinite buffer space

P0 P1
for (i 0 i lt 1000 i) for (i 0 i lt
1000 i)
produce_data(a) receive(a, 1, 0)
send(a, 1, 1) consume_data(a)
Consumer consumes slower then producer
produces..

8
Deadlocks in Buffered Send/Receive

P0 P1
receive(a, 1, 1) receive(a, 1, 0)
send(b, 1, 1) send(b, 1, 0)
Receive operation still blocks,so deadlock can
happen
Moral of the story-still have to be careful to
avoid deadlocks!

9
Non blocking optimizations

Blocking is safe but wastes time
Alternative-use non-blocking with check-status
operation
Process is free to perform any operation which
does not depend upon completion of send or
receive
Once transfer is complete, data can be used

10
Non blocking optimization
11
Possibilities
12
MPI

Vendors all had their own message passing
libraries
Enter MPI-the standard for C and Fortran
Defines syntax, semantics of core set of library
routines (125 are defined)

13
Core set of routines for MPI

MPI_Init Initializes MPI.
MPI_Finalize Terminates MPI.
MPI_Comm_size Determines the number of
processes.
MPI_Comm_rank Determines the label of calling
process.
MPI_Send Sends a message.
MPI_Recv Receives a message.

14
Starting and Terminating MPI

int MPI_Init(int argc, char argv)
int MPI_Finalize()
MPI_Init is called prior to other MPI routines-it
initializes the MPI environment
MPI_Finalize is called at the end-it does
clean-up
Return code for both is MPI_success
Mpi_h contains mpi constants and data structures

15
Communicators

Communication domain- processes which communicate
with one another
Communicators are variables of type MPI_Comm.
They store information about communication
domains
MPI_COMM_WORLD - default communicator, all
processes in program

16
Communicators

int MPI_Comm_size(MPI_Comm comm, int size)
int MPI_Comm_rank(MPI_Comm comm, int rank)
MPI_Comm_size - number of processes in
communicator
Rank ids each process

17
Hello world

include ltmpi.hgt
main(int argc, char argv)
int npes, myrank
MPI_Init(argc, argv)
MPI_Comm_size(MPI_COMM_WORLD, npes)
MPI_Comm_rank(MPI_COMM_WORLD, myrank)
printf("From process d out of d, Hello
World!\n",
myrank, npes)
MPI_Finalize()
Prints hello world from each processor

18
Sending/Receiving Messages

int MPI_Send(void buf, int count, MPI_Datatype
datatype, int dest, int tag, MPI_Comm comm)
int MPI_Recv(void buf, int count, MPI_Datatype
datatype, int source, int tag, MPI_Comm
comm, MPI_Status status)
MPI_Send sends data in buf, countentries of
type MPI_Datatype
Length of message is specified as a number of
entries, not as a number of bytes, for
portability
Destrank of destination process, tagtype of
message
MPI_ANY_SOURCE any process can be source
MPI_ANY_TAG same for tag
Buf is where received message is stored
Count,datatype specify length of buffer

19
Datatypes

MPI Datatype C Datatype
MPI_CHAR signed char
MPI_SHORT signed short int
MPI_INT signed int
MPI_LONG signed long int
MPI_UNSIGNED_CHAR unsigned char
MPI_UNSIGNED_SHORT unsigned short int
MPI_UNSIGNED unsigned int
MPI_UNSIGNED_LONG unsigned long int
MPI_FLOAT float
MPI_DOUBLE double
MPI_LONG_DOUBLE long double
MPI_BYTE
MPI_PACKED

20
Sending/Receiving

Status variable used to get info on Recv
operation
C status stored in MPI_Status
typedef struct MPI_Status
int MPI_SOURCE
int MPI_TAG
int MPI_ERROR
int MPI_Get_count(MPI_Status status,
MPI_Datatype datatype, int count) returns
entries in count variable

21
Sending/Receiving

MPI_Recv is a blocking receive op- it returns
after message is in buffer.
MPI_Send has 2 implementations
Returns after MPI_Recv issued and message is sent
Returns after MPI_Send copied message into
buffer-does not wait for MPI_Recv to be issued

22
Avoiding Deadlocks

Process 0 sends 2 messages to process 1,which
receives them in reverse order.
int a10, b10, myrank
MPI_Status status
...
MPI_Comm_rank(MPI_COMM_WORLD, myrank)
if (myrank 0)
MPI_Send(a, 10, MPI_INT, 1, 1,
MPI_COMM_WORLD)
MPI_Send(b, 10, MPI_INT, 1, 2,
MPI_COMM_WORLD)
else if (myrank 1)
MPI_Recv(b, 10, MPI_INT, 0, 2,
MPI_COMM_WORLD)
MPI_Recv(a, 10, MPI_INT, 0, 1,
MPI_COMM_WORLD)
...
If MPI_Send is implemented by blocking until
receive is issued, then process 0 waits for a
receive for the tag 1 message, and process 1
waits for process 0 to issue MPI_Send. Deadlock
Solution- Programmer has to match order in
which sends and receives are issued-Ugh!

23
Circular Deadlock

Process i sends a message to process i 1 and
receives a message from process i - 1
int a10, b10, npes, myrank
MPI_Status status
...
MPI_Comm_size(MPI_COMM_WORLD, npes)
MPI_Comm_rank(MPI_COMM_WORLD, myrank)
MPI_Send(a, 10, MPI_INT, (myrank1)npes, 1,
MPI_COMM_WORLD)
MPI_Recv(b, 10, MPI_INT, (myrank-1npes)npes, 1,
MPI_COMM_WORLD)
...
Deadlock if MPI_Send is blocking
Works if it is implemented using buffering
Deadlocks with two processes trying to send each
other messages.

24
Break the circle

Break circle into odd and even processes
Odds first send and then receive
Evens first receive and then send
int a10, b10, npes, myrank
MPI_Status status
...
MPI_Comm_size(MPI_COMM_WORLD, npes)
MPI_Comm_rank(MPI_COMM_WORLD, myrank)
if (myrank2 1)
MPI_Send(a, 10, MPI_INT, (myrank1)npes, 1,
MPI_COMM_WORLD)
MPI_Recv(b, 10, MPI_INT, (myrank-1npes)npes,
1, MPI_COMM_WORLD)
else
MPI_Recv(b, 10, MPI_INT, (myrank-1npes)npes,
1, MPI_COMM_WORLD)
MPI_Send(a, 10, MPI_INT, (myrank1)npes, 1,
MPI_COMM_WORLD)
...

25
Break the circle, part II

A simultaneous send/receive operation
int MPI_Sendrecv(void sendbuf, int
sendcount,
MPI_Datatype senddatatype, int dest, intsendtag,
void recvbuf, int recvcount,MPI_Datatype
recvdatatype, int source, int recvtag,MPI_Comm
comm, MPI_Status status)
Problem-need to use disjoint buffers
Solution-MPI_Sendrec_replace function-received
data replaces sent data in the same buffer
int MPI_Sendrecv_replace(void buf, int
count,
MPI_Datatype datatype, int dest, int sendtag,
int source, int recvtag, MPI_Comm comm,
MPI_Status status)

26
Topologies and Embedding
MPI-sees processes arranged linearly while
parallel programs communicate naturally in
higher dimensions Need to map linear ordering to
these topologies Possible mappings are
27
Solution

MPI helps programmer to arrange processes in
topologies by supplying libraries
Mapping to processors is done by libraries
without programmer intervention

28
Cartesian topologies

Can specify arbitrary topologies, but most
topologies are grid-like (Cartesian)
MPI_Cart_create takes processes in comm_old and
builds a virtual process topology
int MPI_Cart_create(MPI_Comm comm_old, int ndims,
int dims, int periods, int reorder,
MPI_Comm comm_cart)
New topology information is in comm_cart
Processes belonging to comm_old need to call
comm_cart
Ndimsdimensions,dimssize of each dimension
array periods specifies if there are wraparound
connections. PeriodItrue if a wrap in
dimension I
ReorderT allows processes to be reordered by MPI

29
Process Naming

Source, destination of processes are specified by
ranks in MPI
MPI_Cart_rank takes coordinates in array coords
and returns rank (maxdims is dimension of
coordinates address)
MPI_Cart_coord takes the rank of the process and
returns the its Cartesian coords in array coords
int MPI_Cart_coord(MPI_Comm comm_cart, int rank,
int maxdims, int coords)
int MPI_Cart_rank(MPI_Comm comm_cart, int
coords, int rank)

30
Shifting

Want to shift data along a dimension of the
topology?
int MPI_Cart_shift(MPI_Comm comm_cart, int dir,
int s_step, int rank_source, int rank_dest)
Dirdimension of shift (which dimension it lives
in)
S_stepsize of shift

31
Overlapping communication with computation

Blocking sends/receives do not permit overlap.
Need non-blocking functions
MPI_Isend starts send, but returns before it is
complete.
MPI_Irecv starts receive, but returns before data
is received
MPI_Test tests if non-blocking operation has
completed
MPI_Wait waits until non-blocking operation
finishes (dont say it)

32
More non blocking

int MPI_Isend(void buf, int count, MPI_Datatype
datatype,
int dest, int tag, MPI_Comm comm,
MPI_Request request)
int MPI_Irecv(void buf, int count, MPI_Datatype
datatype,
int source, int tag, MPI_Comm comm,
MPI_Request request)
Both allocate a request object and return a
pointer to an object.
The object is used as an argument by MPI_Test
and MPI_Wait to identify the op whose status
we want to query or
we want to wait for
int MPI_Test(MPI_Request request, int flag,
MPI_Status status)
FlagT if op is finished
int MPI_Wait(MPI_Request request, MPI_Status
status)

33
Avoiding deadlocks

Using non-blocking operations remove most
deadlocks.
Following code is not safe.
int a10, b10, myrank
MPI_Status status
...
MPI_Comm_rank(MPI_COMM_WORLD, myrank)
if (myrank 0)
MPI_Send(a, 10, MPI_INT, 1, 1, MPI_COMM_WORLD)
MPI_Send(b, 10, MPI_INT, 1, 2, MPI_COMM_WORLD)
else if (myrank 1)
MPI_Recv(b, 10, MPI_INT, 0, 2, status,
MPI_COMM_WORLD) MPI_Recv(a, 10, MPI_INT, 0, 1,
status, MPI_COMM_WORLD)
Replace either the send or the receive operations
with non-blocking counterparts fixes this
deadlock.

34
Collective Ops-communication and computation

Comm ops (MPI-broadcast, reduction,etc ops) are
implemented by MPI
All of the ops take a communicator argument which
defines the group of processes involved in the op
Ops dont act like barriers-can go past call
without waiting for other processes, but not a
great idea to do so..

35
The collective

Barrier synchronization operation
int MPI_Barrier(MPI_Comm comm)
Call returns after all processes have called the
function
The one-to-all broadcast operation is
int MPI_Bcast(void buf, int count, MPI_Datatype
datatype,int source, MPI_Comm comm)
Source sends data in buf to all proceses in
group.
The all-to-one reduction operation is
int MPI_Reduce(void sendbuf, void recvbuf, int
count, MPI_Datatype datatype, MPI_Op op, int
target, MPI_Comm comm)
Combines elements in sendbuf of each process
using op, returns combined values in recvbuf of
process with rank target
If count is more then one, then op is done on
each element

36
Pre-defined Reduction Types

MPI_MAX Maximum C integers and floating point
MPI_MIN Minimum C integers and floating point
MPI_SUM Sum C integers and floating
point
MPI_PROD Product C integers and
floating point
MPI_LAND Logical AND C integers
MPI_BAND Bit-wise AND C integers and byte
MPI_LOR Logical OR C integers
MPI_BOR Bit-wise OR C integers and byte
MPI_LXOR Logical XOR C integers
MPI_BXOR Bit-wise XOR C integers and byte
MPI_MAXLOC max-min value-location Data-pairs
MPI_MINLOC min-min value-location Data-pairs

37
More Reduction

The operation MPI_MAXLOC combines pairs of values
(vi, li) and returns the pair (v, l) such that v
is the maximum among all vi 's and l is the
corresponding li (if there are more than one, it
is the smallest among all these li 's).
MPI_MINLOC does the same, except for minimum
value of vi.

Possible to define your own ops

38
Reduction

Need MPI datatypes for data pairs used with
MPI_Maxloc and MPI_Minloc
MPI_2INT corresponds to C datatype pair of ints
MPI_Allreduce op returns result to all processes
Int MPI_Allreduce(void sendbuf, void recvbuf,
int count, MPI_Datatype datatype, MPI_Op op,
MPI_Comm comm)

39
Prefix Sum

Prefix sum op is done via MPI_Scan-store partial
sum up to node i on node i
int MPI_Scan(void sendbuf, void recvbuf, int
count, MPI_Datatype datatype, MPI_Op op,
MPI_Comm comm)
In the end, the receive buffer of process with
rank i stores reduction of send buffers of nodes
0 to i

40
Gather Ops

The gather operation is performed in MPI using
int MPI_Gather(void sendbuf, int sendcount,
MPI_Datatype senddatatype, void recvbuf,
int recvcount, MPI_Datatype recvdatatype,
int target, MPI_Comm comm)
Each process sends the data in sendbuf to target
Data is stored in recvbuf in rank order-data from
process I is stored at Isendcount of recvbuf
MPI also provides the MPI_Allgather function in
which the data are gathered at all the processes.
int MPI_Allgather(void sendbuf, int sendcount,
MPI_Datatype senddatatype, void recvbuf,
int recvcount, MPI_Datatype recvdatatype,
MPI_Comm comm)
These ops assume that the size of all of the
array is the same-there are versions of the
instructions which allow different size arrays

41
Scatter Op

MPI_Scatter
int MPI_Scatter(void sendbuf, int sendcount,
MPI_Datatype senddatatype, void recvbuf,
int recvcount, MPI_Datatype recvdatatype,
int source, MPI_Comm comm)
Source process sends a different part of sendbuf
to each process. Received data is stored in
recvbuf
A version of MPI_Scatter allows different amounts
of data to be sent to different processes

42
All to all Op

The all-to-all personalized communication
operation is performed by
int MPI_Alltoall(void sendbuf, int sendcount,
MPI_Datatype senddatatype, void recvbuf,
int recvcount, MPI_Datatype recvdatatype,
MPI_Comm comm)
Each process sends a different part of sendbuf to
other processes (isendcount elements)
Received data stored in recvbuf array
Vector variant exists, which allows different
amounts of data to be sent

43
Groups and communicators