Title: Blocking and nonblocking communication
1Blocking and nonblocking communication
2Point-to-Point Communication
- References
- Quinn test Ch 6, Ch 9
- MPI standard
- http//www-unix.mcs.anl.gov/mpi/
- Cornell Theory Center web site
- http//www.tc.cornell.edu
- My web page
- http//my.fit.edu/jim/
3Point-to-Point Communication Review
Z0 0.0 Z1 0.0 Z2 0.0
X0 3.14 X1 3.14
Process 10
Process 32
4Point-to-Point Communication Syntax
MPI_Status status sender 10 receiver
32 count 2 recv_allocated_size 3 tag
0 MPI_Comm_Rank( MPI_COMM_WORLD, my_rank) if
(my_rank sender) MPI_Send( X, count,
MPI_DOUBLE, receiver, tag,
MPI_COMM_WORLD) if (my_rank
receiver) MPI_Recv( Z, recv_allocated_size,
MPI_DOUBLE, sender, tag, MPI_COMM_WORLD,
status)
5Point-to-Point Communication Syntax
MPI_Status status sender 10 receiver
32 count 2 recv_allocated_size 3 tag
0 MPI_Comm_Rank( MPI_COMM_WORLD, my_rank) if
(my_rank sender) MPI_Send( X, count,
MPI_DOUBLE, receiver, tag,
MPI_COMM_WORLD) if (my_rank
receiver) MPI_Recv( Z, recv_allocated_size,
MPI_DOUBLE, sender, tag, MPI_COMM_WORLD,
status)
- NOTE MPI_Send and MPI_Recv are often inside of
if blocks - Same source code is executed on all processes
- Only want process 10 to call MPI_Send
- Only want process 32 to call MPI_Recv
- All other processes pass through if blocks doing
nothing
6Point-to-Point Communication Syntax
- MPI matches Send and Recv based on envelope
information - Identical tags and communicators
- MPI_Recv 4th argument matches the rank of
process calling MPI_Send - MPI_Send 4th argument matches the rank of
process calling MPI_Recv
MPI_Status status sender 10 receiver
32 count 2 recv_allocated_size 3 tag
0 MPI_Comm_Rank( MPI_COMM_WORLD, my_rank) if
(my_rank sender) MPI_Send( X, count,
MPI_DOUBLE, receiver, tag,
MPI_COMM_WORLD) if (my_rank
receiver) MPI_Recv( Z, recv_allocated_size,
MPI_DOUBLE, sender, tag, MPI_COMM_WORLD,
status)
7Point-to-Point Communication Details
- MPI_Recv allows wildcards
- MPI_ANY_SOURCE (for source or sender)
- MPI_ANY_TAG (for tag)
- MPI_Status structure contains at least 3 fields
- status -gt MPI_SOURCE
- status -gt MPI_TAG
- status -gt MPI_ERROR
- Received message may be smaller than 2nd argument
in MPI_Recv. Actual size of received message
computable by - MPI_Get_count( status, datatype, count)
8Point-to-Point Communication Review
Z0 3.14 Z1 3.14 Z2 0.0
X0 3.14 X1 3.14
Process 10
Process 32
9MPI_Send and MPI_Recv are blocking
- MPI_Send does not complete until its safe for
sender to modify (or delete) message contents. - MPI_Recv does not complete until its safe for
receiver to use message contents.
if (my_rank sender) MPI_Send( X, count,
MPI_DOUBLE, receiver, tag,
MPI_COMM_WORLD) X0 0.0 if (my_rank
receiver) MPI_Recv( Z, recv_allocated_size,
MPI_DOUBLE, sender, tag, MPI_COMM_WORLD,
status) two_pi Z0Z1
10Blocking Send
MPI_Send
- Processor 10 sends a ready to send message to
Processor 32
11Blocking Send
MPI_Send
MPI_Recv
- Processor 32 sends a ready to receive message
to Processor 10
12Blocking Send
MPI_Send
Data Transfer
MPI_Recv
- A copy of processors 10s data is now sent to
processor 32. - Both processors are free to continue
13Blocking Send
MPI_Send
Data Transfer
MPI_Recv
- Think about time line if MPI_Recv comes first.
14Blocking Send
- Typically there is less wasted time if receiver
is waiting by the phone ready for incoming call.
15Blocking Send Ring pass
up_proc myid 1 if (up_proc num_procs)
up_proc 0 down_proc myid - 1 if (down_proc
-1) down_proc num_procs - 1 MPI_Send( X,
count, MPI_DOUBLE, up_proc, tag_up,
MPI_COMM_WORLD) MPI_Recv( Y, count, MPI_DOUBLE,
down_proc, tag_up, MPI_COMM_WORLD)
16Blocking Send Ring Pass
0
P0 sends contents of his X
P1 receives contents into his Y
1
5
- Think of it as passing a note to your left
neighbor - Lean to left and get your neighbors attention
- Pass the note
4
2
3
17Deadlock in Ring Pass
0
P0 wants to send contents of his X
1
5
- Think of it as passing a note to your left
neighbor - Cant get left neighbors attention
- Hes busy trying to get his left neighbors
attention
4
2
3
18Deadlock in Ring Pass
0
P0 wants to send contents of his X
1
5
- Think of it as passing a note to your left
neighbor - Cant get left neighbors attention
- Hes busy trying to get his left neighbors
attention
A process is in a deadlock state if it is blocked
waiting for a condition that will never become
true.
4
2
3
19Ring Pass with 2 processes
0
P0 sends contents of his X
P0 receives contents into his Y
P1 receives contents into his Y
1
P1 sends contents of his X
- Think of it as exchanging notes with your
neighbor - Say to neighbor I have a note for you.
- Neighbor says OK, give me the note.
- Pass the note
20Deadlock in Ring Pass with 2 processes
0
P0 wants to send contents of his X
P1 wants to send contents of his X
1
- Think of it as exchanging notes with your
neighbor - Both of you are stuck saying I have a note for
you. - Neither of you get to the part of saying OK,
give me the note.
21Blocking Send potential deadlock problem with
num_procs 2
up_proc myid 1 if (up_proc num_procs)
up_proc 0 down_proc myid - 1 If (down_proc
-1) down_proc num_procs - 1 MPI_Send( X,
count, MPI_DOUBLE, up_proc, tag_up,
MPI_COMM_WORLD) MPI_Recv( Y, count, MPI_DOUBLE,
down_proc, tag_up, MPI_COMM_WORLD)
MPI_Send
P0
P1
MPI_Send
22Blocking Send potential deadlock problem
up_proc myid 1 if (up_proc num_procs)
up_proc 0 down_proc myid - 1 if (down_proc
-1) down_proc num_procs - 1 MPI_Send( X,
count, MPI_DOUBLE, up_proc, tag,
MPI_COMM_WORLD) MPI_Recv( Y, count, MPI_DOUBLE,
down_proc, tag, MPI_COMM_WORLD)
- Think how to remove potential deadlock by
reordering send and recv.
23Ring Pass with reordered send/recv to avoid
deadlock
if (myid 2 0) MPI_Send( X, count,
MPI_DOUBLE, up_proc, tag, MPI_COMM_WORLD)
MPI_Recv( Y, count, MPI_DOUBLE, down_proc,
tag, MPI_COMM_WORLD) else MPI_Recv( Y,
count, MPI_DOUBLE, down_proc, tag,
MPI_COMM_WORLD) MPI_Send( X, count, MPI_DOUBLE,
up_proc, tag, MPI_COMM_WORLD)
24Ring Pass without deadlock, first step
0
P0 sends contents of his X
P1 receives contents into his Y
1
5
4
2
3
25Ring Pass without deadlock, second step
0
1
5
4
2
3
26MPI implementations of MPI_Send and MPI_Recv
typically provide buffering
MPI_Send
Buffer on P0
MPI_Recv
- When P0 makes send call, message contents are
immediately copied (buffered) into temporary
storage (buffer) in P0s local memory - When P1 makes receive call, data is sent from
buffer to P1s local memory
27MPI implementations of MPI_Send and MPI_Recv
typically provide buffering
MPI_Send
Data Transfer
Buffer on P1
- Buffer may be in receivers memory instead (or in
addition) - When P0 makes send call, message contents are
immediately sent and copied (buffered) into
temporary storage (buffer) in P1s local memory - When P1 makes receive call, data is copied from
buffer to P1s local memory
MPI_Recv
28However implemented MPI_Send and MPI_Recv are
blocking
- MPI_Send does not complete until its safe for
sender to modify (or delete) message contents. - MPI_Recv does not complete until its safe for
receiver to use message contents.
29However implemented MPI_Send and MPI_Recv are
blocking
- When MPI_Send completes, you should not assume
that the data has been received on the other
processor. Its safe to modify your copy of the
data, but the message may be buffered somewhere
or in transit. - MPI_Ssend (S for synchronous) send completes only
after other processor has executed matching
MPI_Recv and data transfer has begun - not
completed. Like MPI_Send, its now safe to modify
senders copy of data. Same arguments as MPI_Send.
30However implemented MPI_Send and MPI_Recv are
blocking
- The original ring pass code likely will not
deadlock because of buffering, however - Buffering is not required by the MPI standard
- Its not good practice to rely on buffering
- Buffering large messages reduces available memory
- Other MPI functions avoid the potential deadlock
completely
31MPI_Sendrecv a combined MPI_Send and MPI_Recv
MPI_Sendrecv( void send_data,
int send_count, MPI_Datatype send_type, int d
estination, int send_tag, void recv_data, in
t recv_count, MPI_Datatype recv_type, int sour
ce, int recv_tag, MPI_Comm comm, MPI_Status s
tatus)
- Like MPI_Send followed by MPI_Recv (or vice
versa). But MPI can avoid deadlock.
32MPI_Sendrecv no deadlock problem
up_proc myid 1 if (up_proc num_procs)
up_proc 0 down_proc myid - 1 If (down_proc
-1) down_proc num_procs - 1 MPI_Sendrecv(
X, count, MPI_DOUBLE, up_proc, tag,
Y, count, MPI_DOUBLE, down_proc, tag,
MPI_COMM_WORLD, status)
33MPI_Sendrecv_replace allows received data to
overwrite sent data.
MPI_Sendrecv_replace( void data,
int count, MPI_Datatype type, int destination
, int send_tag, int source, int recv_tag, M
PI_Comm comm, MPI_Status status)
- Requires some sort of buffering, or temporary
storage in the MPI implementation.
34Nonblocking communication
- Break the message passing event into two parts
(think of send for now) - Initiating the send. This can happen as soon as I
have the data required to be sent. - Finalizing the send. This needs to happen before
I modify (or delete) the data to be sent.
35Nonblocking communication
- Break the message passing event into two parts
(think of send for now) - Initiating the send. This can happen as soon as I
have the data required to be sent. - Finalizing the send. This needs to happen before
I modify (or delete) the data to be sent. - Allows computations in what might otherwise be
dead times.
36Nonblocking communication
- Break the message passing event into two parts
(think of send for now) - Initiating the send. This can happen as soon as I
have the data required to be sent. - Finalizing the send. This needs to happen before
I modify (or delete) the data to be sent. - Allows computations in what might otherwise be
dead times.
Initialize sending X to P32 Do some computations
changing stuff other than X Finalize sending of
X to P32 Mess with X
37Nonblocking communication
- Break the message passing event into two parts
(think of send for now) - Initiating the send. This can happen as soon as I
have the data required to be sent. - Finalizing the send. This needs to happen before
I modify (or delete) the data to be sent. - Allows computations in what might otherwise be
dead times.
Overlapping of communication and computation can
occur on some machines. This allows the cost of
communication to be hidden.
Initialize sending X to P32 Do some computations
changing stuff other than X Finalize sending of
X to P32 Mess with X
38MPI_Isend a nonblocking send (I for immediate
return)
MPI_Isend( void data, int count, MPI_Dat
atype type, int destination, int tag, MPI_Com
m comm, MPI_Request request)
- This call initializes or posts the send.
- request is a handle to an opaque object.
39MPI_Wait finalizes a nonblocking send identified
by its handle
MPI_Wait( MPI_Request request, MPI_Status stat
us)
- This call completes when the send identified by
the request handle is done. - request is returned as MPI_REQUEST_NULL.
- status has no meaning for Isend.
- The data should not be written to between the
Isend and Wait calls.
40MPI_Wait finalizes a nonblocking send identified
by its handle
MPI_Wait( MPI_Request request, MPI_Status stat
us)
- This call completes when the send identified by
the request handle is done. - request is returned as MPI_REQUEST_NULL.
- status has no meaning for Isend.
- The data should not be written to between the
Isend and Wait calls. - MPI Standard you can not read data either
- MPI Standard you cant have two pending Isends
with the same data
41MPI_Irecv a nonblocking receive (I for immediate
return)
MPI_Irecv( void data, int count, MPI_Dat
atype type, int source, int tag, MPI_Comm com
m, MPI_Request request)
- This call initializes or posts the receive.
- request is a handle to an opaque object.
42MPI_Wait finalizes a nonblocking recv identified
by its handle
MPI_Wait( MPI_Request request, MPI_Status stat
us)
- This call completes when the receive identified
by the request handle is done. - request is returned as MPI_REQUEST_NULL.
- status has same meaning as with MPI_Recv.
- The data should not be read from or written to in
between IRecv and Wait calls.
43Blocking Send
MPI_Send
Data Transfer
MPI_Recv
- Solid lines useful computation
44Nonblocking communication best case
MPI_ISend
Data Transfer
MPI_Wait
- Best case requires
- Simultaneous communication and computation.
- Cost of communication is completely hidden.
MPI_IRecv
MPI_Wait
45Nonblocking communication nearly best case
MPI_ISend
Data Transfer
MPI_Wait
- Hardware (or software) may not allow processor to
do computations simultaneously to actual data
transfer.
MPI_IRecv
MPI_Wait
46Nonblocking communication nearly best case
MPI_ISend
Data Transfer
MPI_Wait
- Useful computations gained over blocking case
- Sender useful computation possible between Isend
call and writing to data
MPI_IRecv
MPI_Wait
47Nonblocking communication nearly best case
MPI_ISend
Data Transfer
MPI_Wait
- Useful computations gained over blocking case
- Sender useful computation possible between Isend
call and writing to data
MPI_IRecv
MPI_Wait
48Nonblocking communication nearly best case
MPI_ISend
Data Transfer
MPI_Wait
- Useful computations gained over blocking case
- Receiver useful computation possible between
Irecv call and reading data
MPI_IRecv
MPI_Wait
49Nonblocking communication nearly best case
MPI_ISend
Data Transfer
MPI_Wait
- Useful computations gained over blocking case
- Receiver useful computation possible between
Irecv call and reading data
MPI_IRecv
MPI_Wait
50ISend (or IRecv) imediately followed by Wait is
same as blocking Send (or Recv)
MPI_ISend MPI_Wait
Data Transfer
MPI_IRecv MPI_Wait
- Sender no computation between send call and
writing to data location - Receiver no computation between recv call and
reading from data location
51Can mix blocking and non-blocking calls
MPI_ISend
Data Transfer
MPI_Wait
MPI_Recv
MPI_Send
Data Transfer
MPI_Wait
MPI_IRecv
52Nonblocking communication removes potential
deadlock from ring pass
up_proc myid 1 if (up_proc num_procs)
up_proc 0 down_proc myid - 1 If (down_proc
-1) down_proc num_procs - 1 MPI_Isend( X,
count, MPI_DOUBLE, down_proc, tag,
MPI_COMM_WORLD, recv_request) MPI_Irecv( Y,
count, MPI_DOUBLE, up_proc, tag,
MPI_COMM_WORLD, send_request) / Here could do
anything except reads from Y writes to X
/ MPI_Wait(send_request, status) MPI_Wait(re
cv_request, status)
53Best Practices
- Use nonblocking communication if there is useful
work to do - post sends and receives as early as possible
- Sender as soon a data to send is available
- Receiver as soon as storage for data is
available - Do waits as late as possible
- Sender just before data will be overwritten or
deleted - Receiver just before data will be read
54Best Practices
- Post Irecvs before Isends to potentially avoid
buffering.
MPI_Irecv
55Best Practices
- Post Irecvs before Isends to potentially avoid
buffering.
MPI_Isend
MPI_Irecv