MPI - PowerPoint PPT Presentation

1 / 62
About This Presentation
Title:

MPI

Description:

Title: Principles of Message Passing and MPI Author: Sathish Vadhiyar Last modified by: sathish Created Date: 1/12/2004 2:55:49 AM Document presentation format – PowerPoint PPT presentation

Number of Views:112
Avg rating:3.0/5.0
Slides: 63
Provided by: Sathi4
Category:
Tags: mpi | hypercubes

less

Transcript and Presenter's Notes

Title: MPI


1
MPI Message Passing Interface
  • Source http//www.netlib.org/utk/papers/mpi-book/
    mpi-book.html

2
Message Passing Principles
  • Explicit communication and synchronization
  • Programming complexity is high
  • But widely popular
  • More control with the programmer

3
MPI Introduction
  • A standard for explicit message passing in MIMD
    machines.
  • Need for a standard
  • gtgt portability
  • gtgt for hardware vendors
  • gtgt for widespread use of concurrent computers
  • Started in April 1992, MPI Forum in 1993, 1st MPI
    standard in May 1994.

4
MPI contains
  • Point-Point (1.1)
  • Collectives (1.1)
  • Communication contexts (1.1)
  • Process topologies (1.1)
  • Profiling interface (1.1)
  • I/O (2)
  • Dynamic process groups (2)
  • One-sided communications (2)
  • Extended collectives (2)
  • About 125 functions Mostly 6 are used

5
MPI Implementations
  • OpenMPI
  • MPICH (Argonne National Lab)
  • LAM-MPI (Ohio, Notre Dame, Bloomington)
  • Cray, IBM, SGI
  • MPI-FM (Illinois)
  • MPI / Pro (MPI Software Tech.)
  • Sca MPI (Scali AS)
  • Plenty others

6
  • Communication Primitives
  • - Communication scope
  • - Point-point communications
  • - Collective communications

7
Point-Point communications send and recv
  • MPI_SEND(buf, count, datatype, dest, tag, comm)

Rank of the destination
Communication context
Message
Message identifier
MPI_RECV(buf, count, datatype, source, tag, comm,
status)
MPI_GET_COUNT(status, datatype, count)
8
A Simple Example
  • comm MPI_COMM_WORLD
  • rank MPI_Comm_rank(comm, rank)
  • for(i0 iltn i) ai 0
  • if(rank 0)
  • MPI_Send(an/2, n/2, MPI_INT, 1, tag, comm)
  • else
  • MPI_Recv(b, n/2, MPI_INT, 0, tag, comm,
    status)
  • / process array a /
  • / do reverse communication /

9
Communication Scope
  • Explicit communications
  • Each communication associated with communication
    scope
  • Process defined by
  • Group
  • Rank within a group
  • Message label by
  • Message context
  • Message tag
  • A communication handle called Communicator
    defines the scope

10
Communicator
  • Communicator represents the communication domain
  • Helps in the creation of process groups
  • Can be intra or inter (more later).
  • Default communicator MPI_COMM_WORLD includes
    all processes
  • Wild cards
  • The receiver source and tag fields can be wild
    carded MPI_ANY_SOURCE, MPI_ANY_TAG

11
Buffering and Safety
  • The previous send and receive are blocking.
    Buffering mechanisms can come into play.
  • Safe buffering

Process 0
Process 1
MPI_Send MPI_Recv ..
MPI_Recv MPI_Send ..
OK
MPI_Recv MPI_Send ..
MPI_Recv MPI_Send ..
Leads to deadlock
MPI_Send MPI_Recv ..
MPI_Send MPI_Recv ..
May or may not succeed. Unsafe
12
Non-blocking communications
  • A post of a send or recv operation followed by
    complete of the operation
  • MPI_ISEND(buf, count, datatype, dest, tag, comm,
    request)
  • MPI_IRECV(buf, count, datatype, dest, tag, comm,
    request)
  • MPI_WAIT(request, status)
  • MPI_TEST(request, flag, status)
  • MPI_REQUEST_FREE(request)

13
Non-blocking
  • A post-send returns before the message is copied
    out of the send buffer
  • A post-recv returns before data is copied into
    the recv buffer
  • Non-blocking calls consume space
  • Efficiency depends on the implementation

14
Other Non-blocking communications
  • MPI_WAITANY(count, array_of_requests, index,
    status)
  • MPI_TESTANY(count, array_of_requests, index,
    flag, status)
  • MPI_WAITALL(count, array_of_requests,
    array_of_statuses)
  • MPI_TESTALL(count, array_of_requests, flag,
    array_of_statuses)
  • MPI_WAITSOME(incount, array_of_requests,
    outcount, array_of_indices, array_of_statuses)
  • MPI_TESTSOME(incount, array_of_requests,
    outcount, array_of_indices, array_of_statuses)

15
Buffering and Safety
Process 0
Process 1
MPI_Send(1) MPI_Send(2) ..
MPI_Irecv(2) MPI_Irecv(1) ..
Safe
MPI_Isend MPI_Recv ..
MPI_Isend MPI_Recv
Safe
16
Communication Modes
Mode Start Completion
Standard (MPI_Send) Before or after recv Before recv (buffer) or after (no buffer)
Buffered (MPI_Bsend) (Uses MPI_Buffer_Attach) Before or after recv Before recv
Synchronous (MPI_Ssend) Before or after recv Particular point in recv
Ready (MPI_Rsend) After recv After recv
17
  • Collective Communications

18
Example Matrix-vector Multiply
A
b
x

Communication All processes should gather all
elements of b.
19
Collective Communications AllGather
data
processors
A1
A2
A3
A4
A0
A0
A0
A1
A2
A3
A4
A1
AllGather
A2
A1
A2
A3
A4
A0
A1
A2
A3
A4
A0
A3
A4
A0
A1
A2
A3
A4
MPI_ALLGATHER(sendbuf, sendcount, sendtype,
recvbuf, recvcount, recvtype, comm)
MPI_ALLGATHERV(sendbuf, sendcount, sendtype,
array_of_recvbuf, array_of_displ, recvcount,
recvtype, comm)
20
Example Row-wise Matrix-Vector Multiply
  • MPI_Comm_size(comm, size)
  • MPI_Comm_rank(comm, rank)
  • nlocal n/size
  • MPI_Allgather(local_b,nlocal,MPI_DOUBLE, b,
    nlocal, MPI_DOUBLE, comm)
  • for(i0 iltnlocal i)
  • xi 0.0
  • for(j0 jltn j)
  • xi ainjbj

21
Example Column-wise Matrix-vector Multiply
A
b
x

Dot-products corresponding to each element of x
will be parallelized
Steps
1. Each process computes its contribution to x
2. Contributions from all processes are added and
stored in appropriate process.
22
Example Column-wise Matrix-Vector Multiply
  • MPI_Comm_size(comm, size)
  • MPI_Comm_rank(comm, rank)
  • nlocal n/size
  • / Compute partial dot-products /
  • for(i0 iltn i)
  • pxi 0.0
  • for(j0 jltnlocal j)
  • pxi ainlocaljbj

23
Collective Communications Reduce, Allreduce
data
processors
A1
A2
A0
A0B0C0
A1B1C1
A2B2C2
B0
B1
B2
Reduce
C1
C2
C0
MPI_REDUCE(sendbuf, recvbuf, count, datatype, op,
root, comm)
A1
A2
A0
A0B0C0
A1B1C1
A2B2C2
B0
B1
B2
A0B0C0
A1B1C1
A2B2C2
AllReduce
C1
C2
C0
A0B0C0
A1B1C1
A2B2C2
MPI_ALLREDUCE(sendbuf, recvbuf, count, datatype,
op, comm)
24
Collective Communications Scatter Gather
data
processors
A1
A2
A3
A4
A0
A0
A1
Scatter
A2
Gather
A3
A4
MPI_SCATTER(sendbuf, sendcount, sendtype,
recvbuf, recvcount, recvtype, root, comm)
MPI_SCATTERV(sendbuf, array_of_sendcounts,
array_of_displ, sendtype, recvbuf, recvcount,
recvtype, root, comm)
MPI_GATHER(sendbuf, sendcount, sendtype, recvbuf,
recvcount, recvtype, root, comm)
MPI_GATHERV(sendbuf, sendcount, sendtype,
recvbuf, array_of_recvcounts, array_of_displ,
recvtype, root, comm)
25
Example Column-wise Matrix-Vector Multiply
  • / Summing the dot-products /
  • MPI_Reduce(px, fx, n, MPI_DOUBLE, MPI_SUM, 0,
    comm)
  • / Now all values of x is stored in process 0.
    Need to scatter them /
  • MPI_Scatter(fx, nlocal, MPI_DOUBLE, x, nlocal,
    MPI_DOUBLE, 0, comm)

26
Or
  • for(i0 iltsize i)
  • MPI_Reduce(pxinlocal, x, nlocal,
    MPI_DOUBLE, MPI_SUM, i, comm)

27
Collective Communications
  • Only blocking standard mode no tags
  • Simple variant or vector variant
  • Some collectives have roots
  • Different types
  • One-to-all
  • All-to-one
  • All-to-all

28
Collective Communications - Barrier
MPI_BARRIER(comm)
A return from barrier in one process tells the
process that the other processes have entered the
barrier.
29
Collective Communications - Broadcast
processors
A
A
A
A
A
A
MPI_BCAST(buffer, count, datatype, root, comm)
30
Collective Communications AlltoAll
data
processors
A1
A2
A3
A4
A0
D0
A0
B0
E0
C0
B0
B1
B2
B3
B4
E1
C1
A1
D1
B1
AlltoAll
A2
C1
C2
C3
C4
C0
B2
C2
D2
E2
D1
D2
D3
D4
E3
D0
C3
D3
A3
B3
C4
A4
E0
E1
E2
E3
E4
B4
E4
D4
MPI_ALLTOALL(sendbuf, sendcount, sendtype,
recvbuf, recvcount, recvtype, comm)
MPI_ALLTOALLV(sendbuf, array_of_sendcounts,
array_of_displ, sendtype, array_of_recvbuf,
array_of_displ, recvcount, recvtype, comm)
31
Collective Communications ReduceScatter, Scan
data
processors
A1
A2
A0
A0B0C0
B0
B1
B2
A1B1C1
ReduceScatter
A2B2C2
C1
C2
C0
MPI_REDUCESCATTER(sendbuf, recvbuf,
array_of_recvcounts, datatype, op, comm)
A1
A2
A0
A0
A1
A2
B0
B1
B2
A0B0
A1B1
A2B2
scan
C1
C2
C0
A0B0C0
A1B1C1
A2B2C2
MPI_SCAN(sendbuf, recvbuf, count, datatype, op,
comm)
32
  • Communicators and Groups

33
Communicators
  • For logical division of processes
  • For forming communication contexts and avoiding
    message conflicts
  • Communicator specifies a communication domain
    for communications
  • Can be
  • Intra used for communicating within a single
    group of processes
  • Inter used for communication between two
    disjoint group of processes
  • Default communicators MPI_COMM_WORLD,
    MPI_COMM_SELF

34
Groups
  • An ordered set of processes.
  • New group derived from base groups.
  • Group represented by a communicator
  • Group associated with MPI_COMM_WORLD is the first
    base group
  • New groups can be created with Unions,
    intersections, difference of existing groups
  • Functions provided for obtaining sizes, ranks

35
Communicator functions
  • MPI_COMM_DUP(comm, newcomm)
  • MPI_COMM_CREATE(comm, group, newcomm)
  • MPI_GROUP_INCL(group, n, ranks,
    newgroup)
  • MPI_COMM_GROUP(comm,
    group)
  • MPI_COMM_SPLIT(comm, color, key, newcomm)

Rank 0 1 2 3 4 5 6 7 8 9
Process A B C D E F G H I J
Color 0 N 3 0 3 0 0 5 3 N
Key 3 1 2 5 1 1 1 2 1 0
F,G,A,D, E,I,C, h
36
Intercommunicators
  • For multi-disciplinary applications, pipeline
    applications, easy readability of program
  • Inter-communicator can be used for point-point
    communication (send and recv) between processes
    of disjoint groups
  • Does not support collectives in 1.1
  • MPI_INTERCOMM_CREATE(local_comm, local_leader,
    bridge_comm, remote_leader, tag, comm)
  • MPI_INTERCOMM_MERGE(intercomm, high,
    newintracomm)

37
Communicator and Groups example
Group 2
Group 1
Group 0
0(0) 1(3) 2(6) 3(9)
0(2) 1(5) 2(8) 3(11)
0(1) 1(4) 2(7) 3(10)
main() membership rank 3
MPI_Comm_Split(MPI_COMM_WORLD, membership, rank,
mycomm)
38
Communicator and Groups example
  • if(membership 0)
  • MPI_Intercomm_Create(mycomm, 0,
    MPI_COMM_WORLD, 1, 01, my1stcomm)
  • else if(membership 1)
  • MPI_Intercomm_Create(mycomm, 0,
    MPI_COMM_WORLD, 0, 01, my1stcomm)
  • MPI_Intercomm_Create(mycomm, 0,
    MPI_COMM_WORLD, 2, 12, my2ndcomm)
  • else
  • MPI_Intercomm_Create(mycomm, 0,
    MPI_COMM_WORLD, 1, 12, my1stcomm)

39
MPI Process Topologies
40
Motivation
  • Logical process arrangement
  • For convenient identification of processes
    program readability
  • For assisting runtime system in mapping processes
    onto hardware increase in performance
  • Default linear array, ranks from 0 n-1
  • Virtual topology can give rise to trees, graphs,
    meshes etc.

41
Introduction
  • Any process topology can be represented by
    graphs.
  • MPI provides defaults for ring, mesh, torus and
    other common structures

42
Cartesian Topology
  • Cartesian structures of arbitrary dimensions
  • Can be periodic along any number of dimensions
  • Popular cartesian structures linear array,
    ring, rectangular mesh, cylinder, torus
    (hypercubes)

43
Cartesian Topology - constructors
  • MPI_CART_CREATE(
  • comm_old - old communicator,
  • ndims number of dimensions,
  • dims - number of processes along each
    dimension,
  • periods periodicity of the dimensions,
  • reorder whether ranks may be reordered,
  • comm_cart new communicator representing
    cartesian topology
  • )
  • Collective communication call

44
Cartesian Topology - Constructors
  • MPI_DIMS_CREATE(
  • nnodes(in) - number of nodes in a grid,
  • ndims(in) number of dimensions,
  • dims(inout) - number of processes along each
    dimension
  • )
  • Helps to create size of dimensions such that the
    sizes are as close to each other as possible.
  • User can specify constraints by specifying ve
    integers in certain entries of dims.
  • Only entries with 0 are modified.

dims before call (nnodes, ndims) dims after call
(0, 0) (6, 2)
(0, 3, 0) (6,3)
(0, 3, 0) (7, 3)
(3,2)
(2,3, 1)
error
45
Cartesian Topology Inquiry Translators
  • MPI_CARTDIM_GET(comm, ndims)
  • MPI_CART_GET(comm, maxdims, dims, periodic,
    coords)
  • MPI_CART_RANK(comm, coords, rank) coordinates -gt
    rank
  • MPI_CART_COORDS(comm, rank, maxdims, coords)
    rank-gtcoordinates

46
Cartesian topology - Shifting
  • MPI_CART_SHIFT(
  • comm,
  • direction,
  • displacement,
  • source,
  • dest
  • )
  • Useful for subsequent SendRecv
  • MPI_SendRecv(, dest., source)
  • Example

MPI_CART_SHIFT(comm, 1, 1, source, dest)
47
Example Cannons Matrix-Matrix Multiplication
sqrt(P)
B0,0 B0,1 B0,2 B0,3 B1,0
B1,1 B1,2 B1,3 B2,0 B2,1
B2,2 B2,3 B3,0 B3,1 B3,2
B3,3
A0,0 A0,1 A0,2 A0,3 A1,0
A1,1 A1,2 A1,3 A2,0 A2,1
A2,2 A2,3 A3,0 A3,1 A3,2
A3,3
sqrt(P)
A0,0 A0,1 A0,2 A0,3 B0,0
B1,1 B2,2 B3,3 A1,1 A1,2
A1,3 A1,0 B1,0 B2,1 B3,2
B0,3 A2,2 A2,3 A2,0 A2,1 B2,0
B3,1 B0,2 B1,3 A3,3 A3,0
A3,1 A3,2 B3,0 B0,1 B1,2
B2,3
Initial Realignment
48
Example Cannons Matrix-Matrix Multiplication
A0,3 A0,0 A0,1 A0,2 B3,0
B0,1 B1,2 B2,3 A1,0 A1,1
A1,2 A1,3 B0,0 B1,1 B2,2
B3,3 A2,1 A2,2 A2,3 A2,0 B1,0
B2,1 B3,2 B0,3 A3,2 A3,3
A3,0 A3,1 B2,0 B3,1 B0,2
B1,3
A0,2 A0,3 A0,0 A0,1 B2,0
B3,1 B0,2 B1,3 A1,3 A1,0
A1,1 A1,2 B3,0 B0,1 B1,2
B2,3 A2,0 A2,1 A2,2 A2,3 B0,0
B1,1 B2,2 B3,3 A3,1 A3,2
A3,3 A3,0 B1,0 B2,1 B3,2
B0,3
A0,1 A0,2 A0,3 A0,0 B1,0
B2,1 B3,2 B0,3 A1,2 A1,3
A1,0 A1,1 B2,0 B3,1 B0,2
B1,3 A2,3 A2,0 A2,1 A2,2 B3,0
B0,1 B1,2 B2,3 A3,0 A3,1
A3,2 A3,3 B0,0 B1,1 B2,2
B3,3
Third shift
Second shift
First shift
49
Cannons Algotihm with MPI Topologies
  • dims0 dims1 sqrt(P)
  • periods0 periods1 1
  • MPI_Cart_Create(comm,2,dims,periods,1,comm_2d)
  • MPI_Comm_rank(comm_2d, my2drank)
  • MPI_Cart_coords(comm_2d, my2drank, 2, mycoords)
  • MPI_Cart_shift(comm_2d, 0, -1, rightrank,
    leftrank)
  • MPI_Cart_shift(comm_2d, 1, -1, downrank,
    uprank)
  • nlocal n/dims0

50
Cannons Algotihm with MPI Topologies
  • / Initial Matrix Alignment /
  • MPI_Cart_shift(comm_2d, 0, -mycoords0,
    shiftsource, shiftdest)
  • MPI_Sendrecv_replace(a, nlocalnlocal,
    MPI_DOUBLE, shiftdest, 1, shiftsource, 1,
    comm_2d, status)
  • MPI_Cart_shift(comm_2d, 1, -mycoords1,
    shiftsource, shiftdest)
  • MPI_Sendrecv_replace(b, nlocalnlocal,
    MPI_DOUBLE, shiftdest, 1, shiftsource, 1,
    comm_2d, status)

51
Cannons Algotihm with MPI Topologies
  • / Main Computation Loop /
  • for(i0 iltdims0 i)
  • MatrixMultiply(nlocal,a,b,c) / ccab/
  • / Shift matrix a left by one /
  • MPI_Sensrecv_replace(a, nlocalnlocal,
    MPI_DOUBLE, leftrank, 1, rightrank, 1, comm_2d,
    status)
  • / Shift matrix b up by one /
  • MPI_Sensrecv_replace(b, nlocalnlocal,
    MPI_DOUBLE, uprank, 1, downrank, 1, comm_2d,
    status)

52
Cannons Algotihm with MPI Topologies
  • / Restore original distribution of a and b /
  • MPI_Cart_shift(comm_2d, 0, mycoords0,
    shiftsource, shiftdest)
  • MPI_Sendrecv_replace(a, nlocalnlocal,
    MPI_DOUBLE, shiftdest, 1, shiftsource, 1,
    comm_2d, status)
  • MPI_Cart_shift(comm_2d, 1, mycoords1,
    shiftsource, shiftdest)
  • MPI_Sendrecv_replace(b, nlocalnlocal,
    MPI_DOUBLE, shiftdest, 1, shiftsource, 1,
    comm_2d, status)

53
General Graph Topology
  • MPI_GRAPH_CREATE(comm_old,
  • nnodes, index, edges,
  • reorder, comm_graph)
  • Example
  • nnodes 8,
  • index 3, 4, 6, 7, 10, 11, 13, 14
  • edges 1, 2, 4, 0, 0, 3, 2, 0, 5, 6, 4, 4, 7,
    6

0
1
2
4
3
5
6
7
54
General Graph Topology - Inquiry
  • MPI_Graphdims_get(MPI_Comm comm, int nnodes, int
    nedges)
  • MPI_Graph_get(MPI_Comm comm, int maxindex, int
    maxedges, int index, int edges)
  • MPI_Graph_neighbors_count(MPI_Comm comm, int
    rank, int nneighbors)
  • MPI_Graph_neighbors(MPI_Comm comm, int rank, int
    maxneighbors, int neighbors)
  • MPI_TOPO_TEST(comm, status)
  • status can be MPI_GRAPH, MPI_CART, MPI_UNDEFINED

55
  • END

56
Communicators as caches
  • Caches used for storing and retrieving attributes
  • MPI_KEYVAL_CREATE(copy_fn, delete_fn, keyval,
    extra_state)
  • typedef int MPI_Copy_function(MPI_Comm oldcomm,

  • int keyval, void extra_state,

  • void attribute_val_in,

  • void attribute_val_out, int flag)
  • typedef int MPI_Delete_function(MPI
    _Comm comm, int keyval,

  • void attribute_val, void
    extra_state)
  • MPI_ATTR_PUT(comm, keyval, attribute_val)
  • MPI_ATTR_GET(comm, keyval, attribute_val)

57
Point-Point example
  • main()
  • int count
  • sendbufcount
  • if(rank ! 0)
  • MPI_Recv(recvbuf, count, rank-1, tag, comm,
    status)
  • else
  • for(i0 iltcount i)
  • recvbufi 0
  • for(i0 iltcount i)
  • recvbufi sendbufi
  • if(rank ! size-1)
  • MPI_Send(recvbuf, count, rank1, tag, comm)

58
Collective CommunicationsFinding maximum
  • main()
  • MPI_Scatter(full_array, local_size, MPI_INT,
    local_array, local_size, MPI_INT, 0, comm)
  • local_max max(local_array)
  • MPI_Allreduce(local_max, global_max, MPI_INT,
    MPI_MAX, comm)

59
Miscellanious attributes / functions
  • MPI_WTIME_IS_GLOBAL for checking clock
    synchronization
  • MPI_GET_PROCESSOR_NAME(name, resultlen)
  • MPI_WTIME(), MPI_WTICK()

60
Profiling Interface
  • Primarily intended for profiling tool developers
  • Also used for combining different MPI
    implementations
  • MPI implementors need to provide equivalent
    functions with PMPI_extension.
  • For e.g. PMPI_Bcast for MPI_Bcast

61
Profiling Interface - Example
  • pragma weak MPI_Send PMPI_Send
  • int PMPI_Send(/ appropriate args /)
  • / Useful content /

62
Point-PointSome more functions / default
definitions
  • MPI_PROC_NULL
  • MPI_ANY_SOURCE
  • MPI_IPROBE(source, tag, comm, flag, status)
  • MPI_PROBE(source, tag, comm, status)
  • MPI_CANCEL(request)
  • MPI_TEST_CANCELLED(status, flag)
  • Persistent Communication Requests
  • MPI_SEND_INIT(buf, count, datatype, dest, tag,
    comm, request)
  • MPI_RECV_INIT(buf, count, datatype, source, tag,
    comm, request)
  • MPI_START(request)
  • MPI_STARTALL(count, array_of_requests)
Write a Comment
User Comments (0)
About PowerShow.com