Title: Collective Communication
1Collective Communication
2Collective Communication
- Collective communication is defined as
communication that involves a group of processes - More restrictive than point to point
- Data sent is same as the data received, i.e.
type, amount - All processes involved make one call, no tag to
match operation - Processes involved can return only when operation
completes - blocking communication only
- Standard Mode only
3Collective Functions
- Barrier synchronization across all group members
- Broadcast from one member to all members of a
group - Gather data from all group members to one member
- Scatter data from one member to all members of a
group - A variation on Gather where all members of the
group receive the result. (allgather) - Scatter/Gather data from all members to all
members of a group (also called complete exchange
or all-to-all) (alltoall) - Global reduction operations such as sum, max,
min, or user-defined functions, where the result
is returned to all group members and a variation
where the result is returned to only one member - A combined reduction and scatter operation
- Scan across all members of a group (also called
prefix)
4Collective Functions
5Collective Functions
6Collective Functions MPI_BARRIER
- blocks the caller until all group members have
called it - returns at any process only after all group
members have entered the call - C
- int MPI_Barrier(MPI_Comm comm )
- Input Parameter
- comm communicator (handle)
- Fortran
- MPI_BARRIER(COMM, IERROR)
- INTEGER COMM, IERROR
7Collective Functions MPI_BCAST
- broadcasts a message from the process with rank
root to all processes of the group, itself
included - C
- int MPI_Bcast(void buffer, int count,
MPI_Datatype datatype, int root, MPI_Comm comm ) - Input Parameters
- count number of entries in buffer (integer)
- datatype data type of buffer (handle)
- root rank of broadcast root (integer)
- comm communicator (handle)
- Input / Output Parameter
- buffer starting address of buffer (choice)
- Fortran
- MPI_BCAST(BUFFER, COUNT, DATATYPE, ROOT, COMM,
IERROR) - lttypegt BUFFER()
- INTEGER COUNT, DATATYPE, ROOT, COMM, IERROR
8Collective Functions MPI_BCAST
A
A
A
A
A
9Collective Functions MPI_GATHER
- Each process (root process included) sends the
contents of its send buffer to the root process. - The root process receives the messages and stores
them in rank order - C
- int MPI_Gather(void sendbuf, int sendcount,
MPI_Datatype sendtype, void recvbuf, int
recvcount, MPI_Datatype recvtype, int root,
MPI_Comm comm) - Input Parameters
- sendbuf starting address of send buffer (choice)
- sendcount number of elements in send buffer
(integer) - sendtype data type of send buffer elements
(handle) - recvcount number of elements for any single
receive (integer, significant only at root) - recvtype data type of recv buffer elements
(significant only at root) (handle) - root rank of receiving process (integer)
- comm communicator (handle)
10Collective Functions MPI_GATHER
- Output Parameter
- recvbuf address of receive buffer (choice,
significant only at root) - Fortran
- MPI_GATHER(SENDBUF, SENDCOUNT, SENDTYPE, RECVBUF,
RECVCOUNT, RECVTYPE, ROOT, COMM, IERROR) - lttypegt SENDBUF(), RECVBUF()
- INTEGER SENDCOUNT, SENDTYPE, RECVCOUNT, RECVTYPE,
ROOT, COMM, IERROR
11Collective Functions MPI_GATHER
B
A
C
D
A
B
C
D
D
C
A
B
12Collective Functions MPI_SCATTER
- MPI_SCATTER is the inverse operation to
MPI_GATHER - C
- int MPI_Scatter(void sendbuf, int sendcount,
MPI_Datatype sendtype, void recvbuf, int
recvcount, MPI_Datatype recvtype, int root,
MPI_Comm comm) - Input Parameters
- sendbuf address of send buffer (choice,
significant only at root) - sendcount number of elements sent to each
process (integer, significant only at root) - sendtype data type of send buffer elements
(significant only at root) (handle) - recvcount number of elements in receive buffer
(integer) - recvtype data type of receive buffer elements
(handle) - root rank of sending process (integer)
- comm communicator (handle)
13Collective Functions MPI_SCATTER
- Output Parameter
- recvbuf address of receive buffer (choice)
- Fortran
- MPI_SCATTER(SENDBUF, SENDCOUNT, SENDTYPE,
RECVBUF, RECVCOUNT, RECVTYPE, ROOT, COMM, IERROR)
- lttypegt SENDBUF(), RECVBUF()
- INTEGER SENDCOUNT, SENDTYPE, RECVCOUNT, RECVTYPE,
ROOT, COMM, IERROR
14Collective Functions MPI_SCATTER
A
B
C
D
A
B
C
D
D
C
A
B
15Collective Functions MPI_ALLGATHER
- MPI_ALLGATHER can be thought of as MPI_GATHER,
but where all processes receive the result,
instead of just the root. - The jth block of data sent from each process is
received by every process and placed in the jth
block of the buffer recvbuf. - C
- int MPI_Allgather(void sendbuf, int sendcount,
MPI_Datatype sendtype, void recvbuf, int
recvcount, MPI_Datatype recvtype, MPI_Comm comm) - Input Parameters
- sendbuf starting address of send buffer (choice)
- sendcount number of elements in send buffer
(integer) - sendtype data type of send buffer elements
(handle) - recvcount number of elements received from any
process (integer) - recvtype data type of receive buffer elements
(handle) - comm communicator (handle)
16Collective Functions MPI_ALLGATHER
- Output Parameter
- recvbuf address of receive buffer (choice)
- Fortran
- MPI_ALLGATHER(SENDBUF, SENDCOUNT, SENDTYPE,
RECVBUF, RECVCOUNT, RECVTYPE, COMM, IERROR) - lttypegt SENDBUF(), RECVBUF()
- INTEGER SENDCOUNT, SENDTYPE, RECVCOUNT, RECVTYPE,
COMM, IERROR
17Collective Functions MPI_ALLGATHER
B
A
C
D
MPI_ALLGATHER
D
C
A
B
18Collective Functions MPI_ALLTOALL
- Extension of MPI_ALLGATHER to the case where each
process sends distinct data to each of the
receivers. The jth block sent from process i is
received by process j and is placed in the ith
block of recvbuf - C
- int MPI_Alltoall(void sendbuf, int sendcount,
MPI_Datatype sendtype, void recvbuf, int
recvcount, MPI_Datatype recvtype, MPI_Comm comm) - Input Parameters
- sendbuf starting address of send buffer (choice)
- sendcount number of elements sent to each
process (integer) - sendtype data type of send buffer elements
(handle) - recvcount number of elements received from any
process (integer) - recvtype data type of receive buffer elements
(handle) - comm communicator (handle)
19Collective Functions MPI_ALLTOALL
- Output Parameter
- recvbuf address of receive buffer (choice)
- Fortran
- MPI_ALLTOALL(SENDBUF, SENDCOUNT, SENDTYPE,
RECVBUF, RECVCOUNT, RECVTYPE, COMM, IERROR) - lttypegt SENDBUF(), RECVBUF()
- INTEGER SENDCOUNT, SENDTYPE, RECVCOUNT, RECVTYPE,
COMM, IERROR
20Collective Functions MPI_ALLTOALL
Rank 0
Rank 1
Rank 2
Rank 3
MPI_ALLTOALL
21Collective Functions MPI_REDUCE
- MPI_REDUCE combines the elements provided in the
input buffer (sendbuf) of each process in the
group, using the operation op, and returns the
combined value in the output buffer (recvbuf) of
the process with rank root - C
- int MPI_Reduce(void sendbuf, void recvbuf, int
count, MPI_Datatype datatype, MPI_Op op, int
root, MPI_Comm comm) - Input Parameters
- sendbuf address of send buffer (choice)
- count number of elements in send buffer
(integer) - datatype data type of elements of send buffer
(handle) - op reduce operation (handle)
- root rank of root process (integer)
- comm communicator (handle)
- Output Parameter
- recvbuf address of receive buffer (choice,
significant only at root)
22Collective Functions MPI_REDUCE
- Fortran
- MPI_REDUCE(SENDBUF, RECVBUF, COUNT, DATATYPE, OP,
ROOT, COMM, IERROR) - lttypegt SENDBUF(), RECVBUF()
- INTEGER COUNT, DATATYPE, OP, ROOT, COMM, IERROR
- Predefined Reduce Operations
- MPI_MAX maximum
- MPI_MIN minimum
- MPI_SUM sum
- MPI_PROD product
- MPI_LAND logical and
- MPI_BAND bit-wise and
- MPI_LOR logical or
- MPI_BOR bit-wise or
- MPI_LXOR logical xor
- MPI_BXOR bit-wise xor
- MPI_MAXLOC max value and location (return the
max and an integer, which is the rank storing the
max value) - MPI_MINLOC min value and location
23Collective Functions MPI_REDUCE
Rank 0
Rank 1
Rank 2
Rank 3
if count 2, there will be BoFoJoN in the 2nd
element of the array
AoEoIoM
In this case, root 1
24Collective Functions MPI_ALLREDUCE
- Variants of the reduce operations where the
result is returned to all processes in the group - The all-reduce operations can be implemented as a
reduce, followed by a broadcast. However, a
direct implementation can lead to better
performance. - C
- int MPI_Allreduce(void sendbuf, void recvbuf,
int count, MPI_Datatype datatype, MPI_Op op,
MPI_Comm comm)
25Collective Functions MPI_ALLREDUCE
- Input Parameters
- sendbuf starting address of send buffer (choice)
- count number of elements in send buffer
(integer) - datatype data type of elements of send buffer
(handle) - op operation (handle)
- comm communicator (handle)
- Output Parameter
- recvbuf starting address of receive buffer
(choice) - Fortran
- MPI_ALLREDUCE(SENDBUF, RECVBUF, COUNT, DATATYPE,
OP, COMM, IERROR) - lttypegt SENDBUF(), RECVBUF()
- INTEGER COUNT, DATATYPE, OP, COMM, IERROR
26Collective Functions MPI_ALLREDUCE
Rank 0
Rank 1
Rank 2
Rank 3
AoEoIoM
27Collective Functions MPI_REDUCE_SCATTER
- Variants of each of the reduce operations where
the result is scattered to all processes in the
group on return. - MPI_REDUCE_SCATTER first does an element-wise
reduction on vector of count?i recvcountsi
elements in the send buffer defined by sendbuf,
count and datatype. - Next, the resulting vector of results is split
into n disjoint segments, where n is the number
of members in the group. Segment i contains
recvcountsi elements. - The ith segment is sent to process i and stored
in the receive buffer defined by recvbuf,
recvcountsi and datatype. - The MPI_REDUCE_SCATTER routine is functionally
equivalent to A MPI_REDUCE operation function
with count equal to the sum of recvcountsi
followed by MPI_SCATTERV with sendcounts equal to
recvcounts. However, a direct implementation may
run faster.
28Collective Functions MPI_REDUCE_SCATTER
- C
- int MPI_Reduce_scatter(void sendbuf, void
recvbuf, int recvcounts, MPI_Datatype datatype,
MPI_Op op, MPI_Comm comm) - Input Parameters
- sendbuf starting address of send buffer (choice)
- recvcounts integer array specifying the number
of elements in result distributed to each
process. Array must be identical on all calling
processes. - datatype data type of elements of input buffer
(handle) - op operation (handle)
- comm communicator (handle)
- Output Parameter
- recvbuf starting address of receive buffer
(choice) - Fortran
- MPI_REDUCE_SCATTER(SENDBUF, RECVBUF, RECVCOUNTS,
DATATYPE, OP, COMM, IERROR) - lttypegt SENDBUF(), RECVBUF()
- INTEGER RECVCOUNTS(), DATATYPE, OP, COMM, IERROR
29Collective Functions MPI_REDUCE_SCATTER
Rank 0 recvcounts 1
AoEoIoM
Rank 1 recvcounts 2
BoFoJoN
Rank 2 recvcounts 0
Rank 3 recvcounts 1
CoGoKoO
DoHoLoP
30Collective Functions MPI_SCAN
- MPI_SCAN is used to perform a prefix reduction on
data distributed across the group. The operation
returns, in the receive buffer of the process
with rank i, the reduction of the values in the
send buffers of processes with ranks 0,...,i
(inclusive). The type of operations supported,
their semantics, and the constraints on send and
receive buffers are as for MPI_REDUCE. - C
- int MPI_Scan(void sendbuf, void recvbuf, int
count, MPI_Datatype datatype, MPI_Op op, MPI_Comm
comm )
31Collective Functions MPI_SCAN
- Input Parameters
- sendbuf starting address of send buffer (choice)
- count number of elements in input buffer
(integer) - datatype data type of elements of input buffer
(handle) - op operation (handle)
- comm communicator (handle)
- Output Parameter
- recvbuf starting address of receive buffer
(choice) - Fortran
- MPI_SCAN(SENDBUF, RECVBUF, COUNT, DATATYPE, OP,
COMM, IERROR) - lttypegt SENDBUF(), RECVBUF()
- INTEGER COUNT, DATATYPE, OP, COMM, IERROR
32Collective Functions MPI_SCAN
Rank 0
A
Rank 1
AoE
Rank 2
AoEoI
Rank 3
AoEoIoM
33Example MPI_BCAST
- To demonstrate how to use MPI_BCAST to distribute
an array to other process
34Example MPI_BCAST (C)
- /
- // root broadcast the array to all processes
- /
- includeltstdio.hgt
- includeltmpi.hgt
- define SIZE 10
- main( int argc, char argv)
-
- int my_rank // the rank of each proc
- int arraySIZE
- int root 0 // the rank of root
- int i
- MPI_Comm comm MPI_COMM_WORLD
- MPI_Init(argc, argv)
- MPI_Comm_rank(comm, my_rank)
35Example MPI_BCAST (C)
- else
-
- for (i 0 i lt SIZE i )
-
- arrayi 0
-
-
- printf("Proc d (Before Broadcast) ", my_rank)
- for (i 0 i lt SIZE i )
-
- printf("d ", arrayi)
-
- printf("\n")
- MPI_Bcast(array, SIZE, MPI_INT, root, comm)
- printf("Proc d (After Broadcast) ", my_rank)
- for (i 0 i lt SIZE i )
36Example MPI_BCAST (Fortran)
- C /
- C root broadcast the array to all processes
- C /
- PROGRAM main
- INCLUDE 'mpif.h'
- PARAMETER (SIZE 10)
- INTEGER my_rank, ierr, root, i
- INTEGER array(SIZE)
- INTEGER comm
- INTEGER arraysize
- root 0
- comm MPI_COMM_WORLD
- arraysize SIZE
37Example MPI_BCAST (Fortran)
- CALL MPI_INIT(ierr)
- CALL MPI_COMM_RANK(comm, my_rank, ierr)
- IF (my_rank.EQ.0) THEN
- DO i 1, SIZE
- array(i) i
- END DO
- ELSE
- DO i 1, SIZE
- array(i) 0
- END DO
- END IF
- WRITE(6, ) "Proc ", my_rank, " (Before
Broadcast)", (array(i), i1, SIZE) - CALL MPI_Bcast(array, arraysize, MPI_INTEGER,
root, comm, ierr) - WRITE(6, ) "Proc ", my_rank, " (After
Broadcast)", (array(i), i1, SIZE) - call MPI_FINALIZE(ierr)
38Case Study 1 MPI_SCATTER and MPI_REDUCE
- Master distributes (scatters) an array across
processes. Processes add their elements, then
combine sum in master through a reduction
operation. - Step 1
- Proc 0 initializes a 16 integers array
- Proc 0 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16
39Case Study 1 MPI_SCATTER and MPI_REDUCE
- Step 2
- Scatter array among all processes
- Proc 0 1, 2, 3, 4
- Proc 1 5, 6, 7, 8
- Proc 2 9, 10, 11, 12
- Proc 3 13, 14, 15, 16
- Step 3
- Each process does some calculations
40Case Study 1 MPI_SCATTER and MPI_REDUCE
- Step 4
- Reduce to Proc 0
- Proc 0 Total Sum
- C
- mpi_scatter_reduce01.c
- Compilation
- mpicc mpi_scatter_reduce01.c o
mpi_scatter_reduce01 - Run
- mpirun np 4 mpi_scatter_reduce01
- Fortran
- mpi_scatter_reduce01.f
- Compilation
- mpif77 mpi_scatter_reduce01.f o
mpi_scatter_reduce01 - Run
- mpirun np 4 mpi_scatter_reduce01
41Case Study 2 MPI_GATHERMatrix Multiplication
- Algorithm
- 4x4 matrix A x 4x1 vector x product
- Each process stores a row of A and a single entry
of x - Use 4 gather operations to place a full copy of x
in each process, then perform multiplications
42Case Study 2 MPI_GATHERMatrix Multiplication
- Step 1
- Initialization
- Proc 0 1 5 9 13, 17
- Proc 1 2 6 10 14, 18
- Proc 2 3 7 11 15, 19
- Proc 3 4 8 12 16, 20
- Step 2
- Perform 4 times MPI_GATHER to gather the column
matrix to each process - Proc0 1 5 9 13, 17 18 19 20
- Proc1 2 6 10 14, 17 18 19 20
- Proc2 3 7 11 15, 17 18 19 20
- Proc3 4 8 12 16, 17 18 19 20
43Case Study 2 MPI_GATHERMatrix Multiplication
- Step 3
- Perform multiplication
- Proc 0 1x175x189x1913x20538
- Proc 1 2x176x1810x1914x20612
- Proc 2 3x177x1811x1915x20686
- Proc 3 4x178x1812x1916x20760
- Step 4
- Gather all process inner product into master
process and display the result
44Case Study 2 MPI_GATHERMatrix Multiplication
- C
- mpi_gather01.c
- Compilation
- mpicc mpi_gather01.c o mpi_gather01
- Run
- mpirun np 4 mpi_gather01
- Fortran
- mpi_gather01.f
- Compilation
- mpif77 mpi_gather01.f o mpi_gather01
- Run
- mpirun np 4 mpi_gather01
45END