Message Passing Programming - PowerPoint PPT Presentation

About This Presentation
Title:

Message Passing Programming

Description:

SPMD (single program, multiple data) model. First look at building blocks ... Prints hello world from each processor. Sending/Receiving Messages ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 47
Provided by: carltr
Category:

less

Transcript and Presenter's Notes

Title: Message Passing Programming


1
Message Passing Programming
  • Carl Tropper
  • Department of Computer Science

2
Generalities
  • Structure of message passing programs
  • Asynchronous
  • SPMD (single program, multiple data) model
  • First look at building blocks
  • Send and receive operations
  • Blocking and unblocking versions
  • MPI (the standard) specifics

3
Send and receive operations
  • send(void sendbuf, int nelems, int dest)
  • receive(void recvbuf, int nelems, int source)
  • Nelenselements to be sent/received
  • P0 sends data to P1
  • P0 P1
  • a 100 receive(a, 1, 0)
  • send(a, 1, 1) printf("d\n", a)
  • a 0
  • Good semantics- P1 to receives 100
  • Bad semantics-P1 receives 0
  • Could happen because dma and comm hardware could
    return before 100 is actually sent

4
Blocking message passing operations
  • Handshake-
  • Sender asks to send, receiver agrees to receive
  • Sender sends, receiver receives
  • Implemented without buffers

5
Deadlocks in Blocking, non buffered send/receive
  • P0 P1
  • send(b, 1, 1) send(b, 1, 0)
  • receive(a, 1, 1) receive(a, 1, 0)
  • Both sends wait for both receives-DEADLOCK
  • Can cure this deadlock by reversing the send and
    receive ops (e.g. in P1)
  • Ugh

6
Send/Receive Blocking Buffered
  • Buffers used at sender and receiver
  • Dedicated comm hardware at both ends
  • If sender has no buffer but receiver does, still
    can be made to work (rhs below)

7
The impact of non-infinite buffer space
  • P0 P1
  • for (i 0 i lt 1000 i) for (i 0 i lt
    1000 i)
  • produce_data(a) receive(a, 1, 0)
  • send(a, 1, 1) consume_data(a)
  • Consumer consumes slower then producer
    produces..


8
Deadlocks in Buffered Send/Receive
  • P0 P1
  • receive(a, 1, 1) receive(a, 1, 0)
  • send(b, 1, 1) send(b, 1, 0)
  • Receive operation still blocks,so deadlock can
    happen
  • Moral of the story-still have to be careful to
    avoid deadlocks!

9
Non blocking optimizations
  • Blocking is safe but wastes time
  • Alternative-use non-blocking with check-status
    operation
  • Process is free to perform any operation which
    does not depend upon completion of send or
    receive
  • Once transfer is complete, data can be used

10
Non blocking optimization
11
Possibilities
12
MPI
  • Vendors all had their own message passing
    libraries
  • Enter MPI-the standard for C and Fortran
  • Defines syntax, semantics of core set of library
    routines (125 are defined)

13
Core set of routines for MPI
  • MPI_Init Initializes MPI.
  • MPI_Finalize Terminates MPI.
  • MPI_Comm_size Determines the number of
    processes.
  • MPI_Comm_rank Determines the label of calling
    process.
  • MPI_Send Sends a message.
  • MPI_Recv Receives a message.

14
Starting and Terminating MPI
  • int MPI_Init(int argc, char argv)
  • int MPI_Finalize()
  • MPI_Init is called prior to other MPI routines-it
    initializes the MPI environment
  • MPI_Finalize is called at the end-it does
    clean-up
  • Return code for both is MPI_success
  • Mpi_h contains mpi constants and data structures

15
Communicators
  • Communication domain- processes which communicate
    with one another
  • Communicators are variables of type MPI_Comm.
    They store information about communication
    domains
  • MPI_COMM_WORLD - default communicator, all
    processes in program

16
Communicators
  • int MPI_Comm_size(MPI_Comm comm, int size)
  • int MPI_Comm_rank(MPI_Comm comm, int rank)
  • MPI_Comm_size - number of processes in
    communicator
  • Rank ids each process

17
Hello world
  • include ltmpi.hgt
  • main(int argc, char argv)
  • int npes, myrank
  • MPI_Init(argc, argv)
  • MPI_Comm_size(MPI_COMM_WORLD, npes)
  • MPI_Comm_rank(MPI_COMM_WORLD, myrank)
  • printf("From process d out of d, Hello
    World!\n",
  • myrank, npes)
  • MPI_Finalize()
  • Prints hello world from each processor

18
Sending/Receiving Messages
  • int MPI_Send(void buf, int count, MPI_Datatype
    datatype, int dest, int tag, MPI_Comm comm)
  • int MPI_Recv(void buf, int count, MPI_Datatype
    datatype, int source, int tag, MPI_Comm
    comm, MPI_Status status)
  • MPI_Send sends data in buf, countentries of
    type MPI_Datatype
  • Length of message is specified as a number of
    entries, not as a number of bytes, for
    portability
  • Destrank of destination process, tagtype of
    message
  • MPI_ANY_SOURCE any process can be source
  • MPI_ANY_TAG same for tag
  • Buf is where received message is stored
  • Count,datatype specify length of buffer

19
Datatypes
  • MPI Datatype C Datatype
  • MPI_CHAR signed char
  • MPI_SHORT signed short int
  • MPI_INT signed int
  • MPI_LONG signed long int
  • MPI_UNSIGNED_CHAR unsigned char
  • MPI_UNSIGNED_SHORT unsigned short int
  • MPI_UNSIGNED unsigned int
  • MPI_UNSIGNED_LONG unsigned long int
  • MPI_FLOAT float
  • MPI_DOUBLE double
  • MPI_LONG_DOUBLE long double
  • MPI_BYTE
  • MPI_PACKED

20
Sending/Receiving
  • Status variable used to get info on Recv
    operation
  • C status stored in MPI_Status
  • typedef struct MPI_Status
  • int MPI_SOURCE
  • int MPI_TAG
  • int MPI_ERROR
  • int MPI_Get_count(MPI_Status status,
    MPI_Datatype datatype, int count) returns
    entries in count variable

21
Sending/Receiving
  • MPI_Recv is a blocking receive op- it returns
    after message is in buffer.
  • MPI_Send has 2 implementations
  • Returns after MPI_Recv issued and message is sent
  • Returns after MPI_Send copied message into
    buffer-does not wait for MPI_Recv to be issued

22
Avoiding Deadlocks
  • Process 0 sends 2 messages to process 1,which
    receives them in reverse order.
  • int a10, b10, myrank
  • MPI_Status status
  • ...
  • MPI_Comm_rank(MPI_COMM_WORLD, myrank)
  • if (myrank 0)
  • MPI_Send(a, 10, MPI_INT, 1, 1,
    MPI_COMM_WORLD)
  • MPI_Send(b, 10, MPI_INT, 1, 2,
    MPI_COMM_WORLD)
  • else if (myrank 1)
  • MPI_Recv(b, 10, MPI_INT, 0, 2,
    MPI_COMM_WORLD)
  • MPI_Recv(a, 10, MPI_INT, 0, 1,
    MPI_COMM_WORLD)
  • ...
  • If MPI_Send is implemented by blocking until
    receive is issued, then process 0 waits for a
    receive for the tag 1 message, and process 1
    waits for process 0 to issue MPI_Send. Deadlock
  • Solution- Programmer has to match order in
    which sends and receives are issued-Ugh!

23
Circular Deadlock
  • Process i sends a message to process i 1 and
    receives a message from process i - 1
  • int a10, b10, npes, myrank
  • MPI_Status status
  • ...
  • MPI_Comm_size(MPI_COMM_WORLD, npes)
  • MPI_Comm_rank(MPI_COMM_WORLD, myrank)
  • MPI_Send(a, 10, MPI_INT, (myrank1)npes, 1,
    MPI_COMM_WORLD)
  • MPI_Recv(b, 10, MPI_INT, (myrank-1npes)npes, 1,
    MPI_COMM_WORLD)
  • ...
  • Deadlock if MPI_Send is blocking
  • Works if it is implemented using buffering
  • Deadlocks with two processes trying to send each
    other messages.

24
Break the circle
  • Break circle into odd and even processes
  • Odds first send and then receive
  • Evens first receive and then send
  • int a10, b10, npes, myrank
  • MPI_Status status
  • ...
  • MPI_Comm_size(MPI_COMM_WORLD, npes)
  • MPI_Comm_rank(MPI_COMM_WORLD, myrank)
  • if (myrank2 1)
  • MPI_Send(a, 10, MPI_INT, (myrank1)npes, 1,
    MPI_COMM_WORLD)
  • MPI_Recv(b, 10, MPI_INT, (myrank-1npes)npes,
    1, MPI_COMM_WORLD)
  • else
  • MPI_Recv(b, 10, MPI_INT, (myrank-1npes)npes,
    1, MPI_COMM_WORLD)
  • MPI_Send(a, 10, MPI_INT, (myrank1)npes, 1,
    MPI_COMM_WORLD)
  • ...

25
Break the circle, part II
  • A simultaneous send/receive operation
  • int MPI_Sendrecv(void sendbuf, int
    sendcount,
  • MPI_Datatype senddatatype, int dest, intsendtag,
    void recvbuf, int recvcount,MPI_Datatype
    recvdatatype, int source, int recvtag,MPI_Comm
    comm, MPI_Status status)
  • Problem-need to use disjoint buffers
  • Solution-MPI_Sendrec_replace function-received
    data replaces sent data in the same buffer
  • int MPI_Sendrecv_replace(void buf, int
    count,
  • MPI_Datatype datatype, int dest, int sendtag,
  • int source, int recvtag, MPI_Comm comm,
  • MPI_Status status)

26
Topologies and Embedding
MPI-sees processes arranged linearly while
parallel programs communicate naturally in
higher dimensions Need to map linear ordering to
these topologies Possible mappings are
27
Solution
  • MPI helps programmer to arrange processes in
    topologies by supplying libraries
  • Mapping to processors is done by libraries
    without programmer intervention

28
Cartesian topologies
  • Can specify arbitrary topologies, but most
    topologies are grid-like (Cartesian)
  • MPI_Cart_create takes processes in comm_old and
    builds a virtual process topology
  • int MPI_Cart_create(MPI_Comm comm_old, int ndims,
  • int dims, int periods, int reorder,
    MPI_Comm comm_cart)
  • New topology information is in comm_cart
  • Processes belonging to comm_old need to call
    comm_cart
  • Ndimsdimensions,dimssize of each dimension
  • array periods specifies if there are wraparound
    connections. PeriodItrue if a wrap in
    dimension I
  • ReorderT allows processes to be reordered by MPI

29
Process Naming
  • Source, destination of processes are specified by
    ranks in MPI
  • MPI_Cart_rank takes coordinates in array coords
    and returns rank (maxdims is dimension of
    coordinates address)
  • MPI_Cart_coord takes the rank of the process and
    returns the its Cartesian coords in array coords
  • int MPI_Cart_coord(MPI_Comm comm_cart, int rank,
    int maxdims, int coords)
  • int MPI_Cart_rank(MPI_Comm comm_cart, int
    coords, int rank)

30
Shifting
  • Want to shift data along a dimension of the
    topology?
  • int MPI_Cart_shift(MPI_Comm comm_cart, int dir,
    int s_step, int rank_source, int rank_dest)
  • Dirdimension of shift (which dimension it lives
    in)
  • S_stepsize of shift

31
Overlapping communication with computation
  • Blocking sends/receives do not permit overlap.
    Need non-blocking functions
  • MPI_Isend starts send, but returns before it is
    complete.
  • MPI_Irecv starts receive, but returns before data
    is received
  • MPI_Test tests if non-blocking operation has
    completed
  • MPI_Wait waits until non-blocking operation
    finishes (dont say it)

32
More non blocking
  • int MPI_Isend(void buf, int count, MPI_Datatype
    datatype,
  • int dest, int tag, MPI_Comm comm,
  • MPI_Request request)
  • int MPI_Irecv(void buf, int count, MPI_Datatype
    datatype,
  • int source, int tag, MPI_Comm comm,
  • MPI_Request request)
  • Both allocate a request object and return a
    pointer to an object.
  • The object is used as an argument by MPI_Test
    and MPI_Wait to identify the op whose status
  • we want to query or
  • we want to wait for
  • int MPI_Test(MPI_Request request, int flag,
  • MPI_Status status)
  • FlagT if op is finished
  • int MPI_Wait(MPI_Request request, MPI_Status
    status)

33
Avoiding deadlocks
  • Using non-blocking operations remove most
    deadlocks.
  • Following code is not safe.
  • int a10, b10, myrank
  • MPI_Status status
  • ...
  • MPI_Comm_rank(MPI_COMM_WORLD, myrank)
  • if (myrank 0)
  • MPI_Send(a, 10, MPI_INT, 1, 1, MPI_COMM_WORLD)
  • MPI_Send(b, 10, MPI_INT, 1, 2, MPI_COMM_WORLD)
  • else if (myrank 1)
  • MPI_Recv(b, 10, MPI_INT, 0, 2, status,
    MPI_COMM_WORLD) MPI_Recv(a, 10, MPI_INT, 0, 1,
    status, MPI_COMM_WORLD)
  • Replace either the send or the receive operations
    with non-blocking counterparts fixes this
    deadlock.

34
Collective Ops-communication and computation
  • Comm ops (MPI-broadcast, reduction,etc ops) are
    implemented by MPI
  • All of the ops take a communicator argument which
    defines the group of processes involved in the op
  • Ops dont act like barriers-can go past call
    without waiting for other processes, but not a
    great idea to do so..

35
The collective
  • Barrier synchronization operation
  • int MPI_Barrier(MPI_Comm comm)
  • Call returns after all processes have called the
    function
  • The one-to-all broadcast operation is
  • int MPI_Bcast(void buf, int count, MPI_Datatype
    datatype,int source, MPI_Comm comm)
  • Source sends data in buf to all proceses in
    group.
  • The all-to-one reduction operation is
  • int MPI_Reduce(void sendbuf, void recvbuf, int
    count, MPI_Datatype datatype, MPI_Op op, int
    target, MPI_Comm comm)
  • Combines elements in sendbuf of each process
    using op, returns combined values in recvbuf of
    process with rank target
  • If count is more then one, then op is done on
    each element

36
Pre-defined Reduction Types
  • MPI_MAX Maximum C integers and floating point
  • MPI_MIN Minimum C integers and floating point
  • MPI_SUM Sum C integers and floating
    point
  • MPI_PROD Product C integers and
    floating point
  • MPI_LAND Logical AND C integers
  • MPI_BAND Bit-wise AND C integers and byte
  • MPI_LOR Logical OR C integers
  • MPI_BOR Bit-wise OR C integers and byte
  • MPI_LXOR Logical XOR C integers
  • MPI_BXOR Bit-wise XOR C integers and byte
  • MPI_MAXLOC max-min value-location Data-pairs
  • MPI_MINLOC min-min value-location Data-pairs

37
More Reduction
  • The operation MPI_MAXLOC combines pairs of values
    (vi, li) and returns the pair (v, l) such that v
    is the maximum among all vi 's and l is the
    corresponding li (if there are more than one, it
    is the smallest among all these li 's).
  • MPI_MINLOC does the same, except for minimum
    value of vi.
  • Possible to define your own ops

38
Reduction
  • Need MPI datatypes for data pairs used with
    MPI_Maxloc and MPI_Minloc
  • MPI_2INT corresponds to C datatype pair of ints
  • MPI_Allreduce op returns result to all processes
  • Int MPI_Allreduce(void sendbuf, void recvbuf,
    int count, MPI_Datatype datatype, MPI_Op op,
    MPI_Comm comm)

39
Prefix Sum
  • Prefix sum op is done via MPI_Scan-store partial
    sum up to node i on node i
  • int MPI_Scan(void sendbuf, void recvbuf, int
    count, MPI_Datatype datatype, MPI_Op op,
  • MPI_Comm comm)
  • In the end, the receive buffer of process with
    rank i stores reduction of send buffers of nodes
    0 to i

40
Gather Ops
  • The gather operation is performed in MPI using
  • int MPI_Gather(void sendbuf, int sendcount,
  • MPI_Datatype senddatatype, void recvbuf,
  • int recvcount, MPI_Datatype recvdatatype,
  • int target, MPI_Comm comm)
  • Each process sends the data in sendbuf to target
  • Data is stored in recvbuf in rank order-data from
    process I is stored at Isendcount of recvbuf
  • MPI also provides the MPI_Allgather function in
    which the data are gathered at all the processes.
  • int MPI_Allgather(void sendbuf, int sendcount,
    MPI_Datatype senddatatype, void recvbuf,
  • int recvcount, MPI_Datatype recvdatatype,
  • MPI_Comm comm)
  • These ops assume that the size of all of the
    array is the same-there are versions of the
    instructions which allow different size arrays

41
Scatter Op
  • MPI_Scatter
  • int MPI_Scatter(void sendbuf, int sendcount,
  • MPI_Datatype senddatatype, void recvbuf,
  • int recvcount, MPI_Datatype recvdatatype,
  • int source, MPI_Comm comm)
  • Source process sends a different part of sendbuf
    to each process. Received data is stored in
    recvbuf
  • A version of MPI_Scatter allows different amounts
    of data to be sent to different processes

42
All to all Op
  • The all-to-all personalized communication
    operation is performed by
  • int MPI_Alltoall(void sendbuf, int sendcount,
    MPI_Datatype senddatatype, void recvbuf,
  • int recvcount, MPI_Datatype recvdatatype,
    MPI_Comm comm)
  • Each process sends a different part of sendbuf to
    other processes (isendcount elements)
  • Received data stored in recvbuf array
  • Vector variant exists, which allows different
    amounts of data to be sent

43
Groups and communicators
  • Might want to split a group of processes into
    subgroups
  • int MPI_Comm_split(MPI_Comm comm, int color, int
    key, MPI_Comm newcomm)
  • Has to be called by all processes in a group
  • Partitions processes in communicator comm into
    disjoint subgoups
  • Color and key are input parameters
  • Color defines the subgroups
  • Key defines the rank within the subgroups
  • New communicator is returned for each group in
    newcomm parameter

44
MPI_Comm_split
45
Splitting Cartesian Topologies
  • MPI_Cart_sub splits a cartesian topology into
    smaller topologies
  • int MPI_Cart_sub(MPI_Comm comm_cart, int
    keep_dims, MPI_Comm comm_subcart)
  • Array keep_dims is tells us how to break up the
    topology
  • Original topology is stored in comm_cart,
    comm_subcart stores new topologies

46
Splitting Cartesian Topologies
  • Array keep_dims tells us how. If keep_dimsIT,
    then keep the ith dimension
Write a Comment
User Comments (0)
About PowerShow.com