Collective Communication - PowerPoint PPT Presentation

About This Presentation
Title:

Collective Communication

Description:

All processes involved make one call, no tag to match operation ... The jth block sent from process i is received by process j and is placed in the ith ... – PowerPoint PPT presentation

Number of Views:8
Avg rating:3.0/5.0
Slides: 46
Provided by: ProjectA7
Category:

less

Transcript and Presenter's Notes

Title: Collective Communication


1
Collective Communication
2
Collective Communication
  • Collective communication is defined as
    communication that involves a group of processes
  • More restrictive than point to point
  • Data sent is same as the data received, i.e.
    type, amount
  • All processes involved make one call, no tag to
    match operation
  • Processes involved can return only when operation
    completes
  • blocking communication only
  • Standard Mode only

3
Collective Functions
  • Barrier synchronization across all group members
  • Broadcast from one member to all members of a
    group
  • Gather data from all group members to one member
  • Scatter data from one member to all members of a
    group
  • A variation on Gather where all members of the
    group receive the result. (allgather)
  • Scatter/Gather data from all members to all
    members of a group (also called complete exchange
    or all-to-all) (alltoall)
  • Global reduction operations such as sum, max,
    min, or user-defined functions, where the result
    is returned to all group members and a variation
    where the result is returned to only one member
  • A combined reduction and scatter operation
  • Scan across all members of a group (also called
    prefix)

4
Collective Functions
5
Collective Functions
6
Collective Functions MPI_BARRIER
  • blocks the caller until all group members have
    called it
  • returns at any process only after all group
    members have entered the call
  • C
  • int MPI_Barrier(MPI_Comm comm )
  • Input Parameter
  • comm communicator (handle)
  • Fortran
  • MPI_BARRIER(COMM, IERROR)
  • INTEGER COMM, IERROR

7
Collective Functions MPI_BCAST
  • broadcasts a message from the process with rank
    root to all processes of the group, itself
    included
  • C
  • int MPI_Bcast(void buffer, int count,
    MPI_Datatype datatype, int root, MPI_Comm comm )
  • Input Parameters
  • count number of entries in buffer (integer)
  • datatype data type of buffer (handle)
  • root rank of broadcast root (integer)
  • comm communicator (handle)
  • Input / Output Parameter
  • buffer starting address of buffer (choice)
  • Fortran
  • MPI_BCAST(BUFFER, COUNT, DATATYPE, ROOT, COMM,
    IERROR)
  • lttypegt BUFFER()
  • INTEGER COUNT, DATATYPE, ROOT, COMM, IERROR

8
Collective Functions MPI_BCAST
A
A
A
A
A
9
Collective Functions MPI_GATHER
  • Each process (root process included) sends the
    contents of its send buffer to the root process.
  • The root process receives the messages and stores
    them in rank order
  • C
  • int MPI_Gather(void sendbuf, int sendcount,
    MPI_Datatype sendtype, void recvbuf, int
    recvcount, MPI_Datatype recvtype, int root,
    MPI_Comm comm)
  • Input Parameters
  • sendbuf starting address of send buffer (choice)
  • sendcount number of elements in send buffer
    (integer)
  • sendtype data type of send buffer elements
    (handle)
  • recvcount number of elements for any single
    receive (integer, significant only at root)
  • recvtype data type of recv buffer elements
    (significant only at root) (handle)
  • root rank of receiving process (integer)
  • comm communicator (handle)

10
Collective Functions MPI_GATHER
  • Output Parameter
  • recvbuf address of receive buffer (choice,
    significant only at root)
  • Fortran
  • MPI_GATHER(SENDBUF, SENDCOUNT, SENDTYPE, RECVBUF,
    RECVCOUNT, RECVTYPE, ROOT, COMM, IERROR)
  • lttypegt SENDBUF(), RECVBUF()
  • INTEGER SENDCOUNT, SENDTYPE, RECVCOUNT, RECVTYPE,
    ROOT, COMM, IERROR

11
Collective Functions MPI_GATHER
B
A
C
D
A
B
C
D
D
C
A
B
12
Collective Functions MPI_SCATTER
  • MPI_SCATTER is the inverse operation to
    MPI_GATHER
  • C
  • int MPI_Scatter(void sendbuf, int sendcount,
    MPI_Datatype sendtype, void recvbuf, int
    recvcount, MPI_Datatype recvtype, int root,
    MPI_Comm comm)
  • Input Parameters
  • sendbuf address of send buffer (choice,
    significant only at root)
  • sendcount number of elements sent to each
    process (integer, significant only at root)
  • sendtype data type of send buffer elements
    (significant only at root) (handle)
  • recvcount number of elements in receive buffer
    (integer)
  • recvtype data type of receive buffer elements
    (handle)
  • root rank of sending process (integer)
  • comm communicator (handle)

13
Collective Functions MPI_SCATTER
  • Output Parameter
  • recvbuf address of receive buffer (choice)
  • Fortran
  • MPI_SCATTER(SENDBUF, SENDCOUNT, SENDTYPE,
    RECVBUF, RECVCOUNT, RECVTYPE, ROOT, COMM, IERROR)
  • lttypegt SENDBUF(), RECVBUF()
  • INTEGER SENDCOUNT, SENDTYPE, RECVCOUNT, RECVTYPE,
    ROOT, COMM, IERROR

14
Collective Functions MPI_SCATTER
A
B
C
D
A
B
C
D
D
C
A
B
15
Collective Functions MPI_ALLGATHER
  • MPI_ALLGATHER can be thought of as MPI_GATHER,
    but where all processes receive the result,
    instead of just the root.
  • The jth block of data sent from each process is
    received by every process and placed in the jth
    block of the buffer recvbuf.
  • C
  • int MPI_Allgather(void sendbuf, int sendcount,
    MPI_Datatype sendtype, void recvbuf, int
    recvcount, MPI_Datatype recvtype, MPI_Comm comm)
  • Input Parameters
  • sendbuf starting address of send buffer (choice)
  • sendcount number of elements in send buffer
    (integer)
  • sendtype data type of send buffer elements
    (handle)
  • recvcount number of elements received from any
    process (integer)
  • recvtype data type of receive buffer elements
    (handle)
  • comm communicator (handle)

16
Collective Functions MPI_ALLGATHER
  • Output Parameter
  • recvbuf address of receive buffer (choice)
  • Fortran
  • MPI_ALLGATHER(SENDBUF, SENDCOUNT, SENDTYPE,
    RECVBUF, RECVCOUNT, RECVTYPE, COMM, IERROR)
  • lttypegt SENDBUF(), RECVBUF()
  • INTEGER SENDCOUNT, SENDTYPE, RECVCOUNT, RECVTYPE,
    COMM, IERROR

17
Collective Functions MPI_ALLGATHER
B
A
C
D
MPI_ALLGATHER
D
C
A
B
18
Collective Functions MPI_ALLTOALL
  • Extension of MPI_ALLGATHER to the case where each
    process sends distinct data to each of the
    receivers. The jth block sent from process i is
    received by process j and is placed in the ith
    block of recvbuf
  • C
  • int MPI_Alltoall(void sendbuf, int sendcount,
    MPI_Datatype sendtype, void recvbuf, int
    recvcount, MPI_Datatype recvtype, MPI_Comm comm)
  • Input Parameters
  • sendbuf starting address of send buffer (choice)
  • sendcount number of elements sent to each
    process (integer)
  • sendtype data type of send buffer elements
    (handle)
  • recvcount number of elements received from any
    process (integer)
  • recvtype data type of receive buffer elements
    (handle)
  • comm communicator (handle)

19
Collective Functions MPI_ALLTOALL
  • Output Parameter
  • recvbuf address of receive buffer (choice)
  • Fortran
  • MPI_ALLTOALL(SENDBUF, SENDCOUNT, SENDTYPE,
    RECVBUF, RECVCOUNT, RECVTYPE, COMM, IERROR)
  • lttypegt SENDBUF(), RECVBUF()
  • INTEGER SENDCOUNT, SENDTYPE, RECVCOUNT, RECVTYPE,
    COMM, IERROR

20
Collective Functions MPI_ALLTOALL
Rank 0
Rank 1
Rank 2
Rank 3
MPI_ALLTOALL
21
Collective Functions MPI_REDUCE
  • MPI_REDUCE combines the elements provided in the
    input buffer (sendbuf) of each process in the
    group, using the operation op, and returns the
    combined value in the output buffer (recvbuf) of
    the process with rank root
  • C
  • int MPI_Reduce(void sendbuf, void recvbuf, int
    count, MPI_Datatype datatype, MPI_Op op, int
    root, MPI_Comm comm)
  • Input Parameters
  • sendbuf address of send buffer (choice)
  • count number of elements in send buffer
    (integer)
  • datatype data type of elements of send buffer
    (handle)
  • op reduce operation (handle)
  • root rank of root process (integer)
  • comm communicator (handle)
  • Output Parameter
  • recvbuf address of receive buffer (choice,
    significant only at root)

22
Collective Functions MPI_REDUCE
  • Fortran
  • MPI_REDUCE(SENDBUF, RECVBUF, COUNT, DATATYPE, OP,
    ROOT, COMM, IERROR)
  • lttypegt SENDBUF(), RECVBUF()
  • INTEGER COUNT, DATATYPE, OP, ROOT, COMM, IERROR
  • Predefined Reduce Operations
  • MPI_MAX maximum
  • MPI_MIN minimum
  • MPI_SUM sum
  • MPI_PROD product
  • MPI_LAND logical and
  • MPI_BAND bit-wise and
  • MPI_LOR logical or
  • MPI_BOR bit-wise or
  • MPI_LXOR logical xor
  • MPI_BXOR bit-wise xor
  • MPI_MAXLOC max value and location (return the
    max and an integer, which is the rank storing the
    max value)
  • MPI_MINLOC min value and location

23
Collective Functions MPI_REDUCE
Rank 0
Rank 1
Rank 2
Rank 3
if count 2, there will be BoFoJoN in the 2nd
element of the array
AoEoIoM
In this case, root 1
24
Collective Functions MPI_ALLREDUCE
  • Variants of the reduce operations where the
    result is returned to all processes in the group
  • The all-reduce operations can be implemented as a
    reduce, followed by a broadcast. However, a
    direct implementation can lead to better
    performance.
  • C
  • int MPI_Allreduce(void sendbuf, void recvbuf,
    int count, MPI_Datatype datatype, MPI_Op op,
    MPI_Comm comm)

25
Collective Functions MPI_ALLREDUCE
  • Input Parameters
  • sendbuf starting address of send buffer (choice)
  • count number of elements in send buffer
    (integer)
  • datatype data type of elements of send buffer
    (handle)
  • op operation (handle)
  • comm communicator (handle)
  • Output Parameter
  • recvbuf starting address of receive buffer
    (choice)
  • Fortran
  • MPI_ALLREDUCE(SENDBUF, RECVBUF, COUNT, DATATYPE,
    OP, COMM, IERROR)
  • lttypegt SENDBUF(), RECVBUF()
  • INTEGER COUNT, DATATYPE, OP, COMM, IERROR

26
Collective Functions MPI_ALLREDUCE
Rank 0
Rank 1
Rank 2
Rank 3
AoEoIoM
27
Collective Functions MPI_REDUCE_SCATTER
  • Variants of each of the reduce operations where
    the result is scattered to all processes in the
    group on return.
  • MPI_REDUCE_SCATTER first does an element-wise
    reduction on vector of count?i recvcountsi
    elements in the send buffer defined by sendbuf,
    count and datatype.
  • Next, the resulting vector of results is split
    into n disjoint segments, where n is the number
    of members in the group. Segment i contains
    recvcountsi elements.
  • The ith segment is sent to process i and stored
    in the receive buffer defined by recvbuf,
    recvcountsi and datatype.
  • The MPI_REDUCE_SCATTER routine is functionally
    equivalent to A MPI_REDUCE operation function
    with count equal to the sum of recvcountsi
    followed by MPI_SCATTERV with sendcounts equal to
    recvcounts. However, a direct implementation may
    run faster.

28
Collective Functions MPI_REDUCE_SCATTER
  • C
  • int MPI_Reduce_scatter(void sendbuf, void
    recvbuf, int recvcounts, MPI_Datatype datatype,
    MPI_Op op, MPI_Comm comm)
  • Input Parameters
  • sendbuf starting address of send buffer (choice)
  • recvcounts integer array specifying the number
    of elements in result distributed to each
    process. Array must be identical on all calling
    processes.
  • datatype data type of elements of input buffer
    (handle)
  • op operation (handle)
  • comm communicator (handle)
  • Output Parameter
  • recvbuf starting address of receive buffer
    (choice)
  • Fortran
  • MPI_REDUCE_SCATTER(SENDBUF, RECVBUF, RECVCOUNTS,
    DATATYPE, OP, COMM, IERROR)
  • lttypegt SENDBUF(), RECVBUF()
  • INTEGER RECVCOUNTS(), DATATYPE, OP, COMM, IERROR

29
Collective Functions MPI_REDUCE_SCATTER
Rank 0 recvcounts 1
AoEoIoM
Rank 1 recvcounts 2
BoFoJoN
Rank 2 recvcounts 0
Rank 3 recvcounts 1
CoGoKoO
DoHoLoP
30
Collective Functions MPI_SCAN
  • MPI_SCAN is used to perform a prefix reduction on
    data distributed across the group. The operation
    returns, in the receive buffer of the process
    with rank i, the reduction of the values in the
    send buffers of processes with ranks 0,...,i
    (inclusive). The type of operations supported,
    their semantics, and the constraints on send and
    receive buffers are as for MPI_REDUCE.
  • C
  • int MPI_Scan(void sendbuf, void recvbuf, int
    count, MPI_Datatype datatype, MPI_Op op, MPI_Comm
    comm )

31
Collective Functions MPI_SCAN
  • Input Parameters
  • sendbuf starting address of send buffer (choice)
  • count number of elements in input buffer
    (integer)
  • datatype data type of elements of input buffer
    (handle)
  • op operation (handle)
  • comm communicator (handle)
  • Output Parameter
  • recvbuf starting address of receive buffer
    (choice)
  • Fortran
  • MPI_SCAN(SENDBUF, RECVBUF, COUNT, DATATYPE, OP,
    COMM, IERROR)
  • lttypegt SENDBUF(), RECVBUF()
  • INTEGER COUNT, DATATYPE, OP, COMM, IERROR

32
Collective Functions MPI_SCAN
Rank 0
A
Rank 1
AoE
Rank 2
AoEoI
Rank 3
AoEoIoM
33
Example MPI_BCAST
  • To demonstrate how to use MPI_BCAST to distribute
    an array to other process

34
Example MPI_BCAST (C)
  • /
  • // root broadcast the array to all processes
  • /
  • includeltstdio.hgt
  • includeltmpi.hgt
  • define SIZE 10
  • main( int argc, char argv)
  • int my_rank // the rank of each proc
  • int arraySIZE
  • int root 0 // the rank of root
  • int i
  • MPI_Comm comm MPI_COMM_WORLD
  • MPI_Init(argc, argv)
  • MPI_Comm_rank(comm, my_rank)

35
Example MPI_BCAST (C)
  • else
  • for (i 0 i lt SIZE i )
  • arrayi 0
  • printf("Proc d (Before Broadcast) ", my_rank)
  • for (i 0 i lt SIZE i )
  • printf("d ", arrayi)
  • printf("\n")
  • MPI_Bcast(array, SIZE, MPI_INT, root, comm)
  • printf("Proc d (After Broadcast) ", my_rank)
  • for (i 0 i lt SIZE i )

36
Example MPI_BCAST (Fortran)
  • C /
  • C root broadcast the array to all processes
  • C /
  • PROGRAM main
  • INCLUDE 'mpif.h'
  • PARAMETER (SIZE 10)
  • INTEGER my_rank, ierr, root, i
  • INTEGER array(SIZE)
  • INTEGER comm
  • INTEGER arraysize
  • root 0
  • comm MPI_COMM_WORLD
  • arraysize SIZE

37
Example MPI_BCAST (Fortran)
  • CALL MPI_INIT(ierr)
  • CALL MPI_COMM_RANK(comm, my_rank, ierr)
  • IF (my_rank.EQ.0) THEN
  • DO i 1, SIZE
  • array(i) i
  • END DO
  • ELSE
  • DO i 1, SIZE
  • array(i) 0
  • END DO
  • END IF
  • WRITE(6, ) "Proc ", my_rank, " (Before
    Broadcast)", (array(i), i1, SIZE)
  • CALL MPI_Bcast(array, arraysize, MPI_INTEGER,
    root, comm, ierr)
  • WRITE(6, ) "Proc ", my_rank, " (After
    Broadcast)", (array(i), i1, SIZE)
  • call MPI_FINALIZE(ierr)

38
Case Study 1 MPI_SCATTER and MPI_REDUCE
  • Master distributes (scatters) an array across
    processes. Processes add their elements, then
    combine sum in master through a reduction
    operation.
  • Step 1
  • Proc 0 initializes a 16 integers array
  • Proc 0 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
    13, 14, 15, 16

39
Case Study 1 MPI_SCATTER and MPI_REDUCE
  • Step 2
  • Scatter array among all processes
  • Proc 0 1, 2, 3, 4
  • Proc 1 5, 6, 7, 8
  • Proc 2 9, 10, 11, 12
  • Proc 3 13, 14, 15, 16
  • Step 3
  • Each process does some calculations

40
Case Study 1 MPI_SCATTER and MPI_REDUCE
  • Step 4
  • Reduce to Proc 0
  • Proc 0 Total Sum
  • C
  • mpi_scatter_reduce01.c
  • Compilation
  • mpicc mpi_scatter_reduce01.c o
    mpi_scatter_reduce01
  • Run
  • mpirun np 4 mpi_scatter_reduce01
  • Fortran
  • mpi_scatter_reduce01.f
  • Compilation
  • mpif77 mpi_scatter_reduce01.f o
    mpi_scatter_reduce01
  • Run
  • mpirun np 4 mpi_scatter_reduce01

41
Case Study 2 MPI_GATHERMatrix Multiplication
  • Algorithm
  • 4x4 matrix A x 4x1 vector x product
  • Each process stores a row of A and a single entry
    of x
  • Use 4 gather operations to place a full copy of x
    in each process, then perform multiplications

42
Case Study 2 MPI_GATHERMatrix Multiplication
  • Step 1
  • Initialization
  • Proc 0 1 5 9 13, 17
  • Proc 1 2 6 10 14, 18
  • Proc 2 3 7 11 15, 19
  • Proc 3 4 8 12 16, 20
  • Step 2
  • Perform 4 times MPI_GATHER to gather the column
    matrix to each process
  • Proc0 1 5 9 13, 17 18 19 20
  • Proc1 2 6 10 14, 17 18 19 20
  • Proc2 3 7 11 15, 17 18 19 20
  • Proc3 4 8 12 16, 17 18 19 20

43
Case Study 2 MPI_GATHERMatrix Multiplication
  • Step 3
  • Perform multiplication
  • Proc 0 1x175x189x1913x20538
  • Proc 1 2x176x1810x1914x20612
  • Proc 2 3x177x1811x1915x20686
  • Proc 3 4x178x1812x1916x20760
  • Step 4
  • Gather all process inner product into master
    process and display the result

44
Case Study 2 MPI_GATHERMatrix Multiplication
  • C
  • mpi_gather01.c
  • Compilation
  • mpicc mpi_gather01.c o mpi_gather01
  • Run
  • mpirun np 4 mpi_gather01
  • Fortran
  • mpi_gather01.f
  • Compilation
  • mpif77 mpi_gather01.f o mpi_gather01
  • Run
  • mpirun np 4 mpi_gather01

45
END
Write a Comment
User Comments (0)
About PowerShow.com