PPT – MPI PowerPoint presentation | free to view

About This Presentation

Title:

MPI

Description:

... structures linear array, ring, rectangular mesh, cylinder, torus (hypercubes) ... MPI provides defaults for ring, mesh, torus and other common structures ... – PowerPoint PPT presentation

Number of Views:253

Avg rating:3.0/5.0

Slides: 63

Provided by: SathishV4

Category:

Tags: mpi | torus

more less

Transcript and Presenter's Notes

Title: MPI

1
MPI Message Passing Interface

Source http//www.netlib.org/utk/papers/mpi-book/
mpi-book.html

2
Message Passing Principles

Explicit communication and synchronization
Programming complexity is high
But widely popular
More control with the programmer

3
MPI Introduction

A standard for explicit message passing in MIMD
machines.
Need for a standard
gtgt portability
gtgt for hardware vendors
gtgt for widespread use of concurrent computers
Started in April 1992, MPI Forum in 1993, 1st MPI
standard in May 1994.

4
MPI contains

Point-Point (1.1)
Collectives (1.1)
Communication contexts (1.1)
Process topologies (1.1)
Profiling interface (1.1)
I/O (2)
Dynamic process groups (2)
One-sided communications (2)
Extended collectives (2)
About 125 functions Mostly 6 are used

5
MPI Implementations

OpenMPI
MPICH (Argonne National Lab)
LAM-MPI (Ohio, Notre Dame, Bloomington)
Cray, IBM, SGI
MPI-FM (Illinois)
MPI / Pro (MPI Software Tech.)
Sca MPI (Scali AS)
Plenty others

Communication Primitives
- Communication scope
- Point-point communications
- Collective communications

7
Point-Point communications send and recv

MPI_SEND(buf, count, datatype, dest, tag, comm)

Rank of the destination
Communication context
Message
Message identifier
MPI_RECV(buf, count, datatype, source, tag, comm,
status)
MPI_GET_COUNT(status, datatype, count)
8
A Simple Example

comm MPI_COMM_WORLD
rank MPI_Comm_rank(comm, rank)
for(i0 iltn i) ai 0
if(rank 0)
MPI_Send(an/2, n/2, MPI_INT, 1, tag, comm)
else
MPI_Recv(b, n/2, MPI_INT, 0, tag, comm,
status)
/ process array a /
/ do reverse communication /

9
Communication Scope

Explicit communications
Each communication associated with communication
scope
Process defined by
Group
Rank within a group
Message label by
Message context
Message tag
A communication handle called Communicator
defines the scope

10
Communicator

Communicator represents the communication domain
Helps in the creation of process groups
Can be intra or inter (more later).
Default communicator MPI_COMM_WORLD includes
all processes
Wild cards
The receiver source and tag fields can be wild
carded MPI_ANY_SOURCE, MPI_ANY_TAG

11
Buffering and Safety

The previous send and receive are blocking.
Buffering mechanisms can come into play.
Safe buffering

Process 0
Process 1
MPI_Send MPI_Recv ..
MPI_Recv MPI_Send ..
OK
MPI_Recv MPI_Send ..
MPI_Recv MPI_Send ..
Leads to deadlock
MPI_Send MPI_Recv ..
MPI_Send MPI_Recv ..
May or may not succeed. Unsafe
12
Non-blocking communications

A post of a send or recv operation followed by
complete of the operation
MPI_ISEND(buf, count, datatype, dest, tag, comm,
request)
MPI_IRECV(buf, count, datatype, dest, tag, comm,
request)
MPI_WAIT(request, status)
MPI_TEST(request, flag, status)
MPI_REQUEST_FREE(request)

13
Non-blocking

A post-send returns before the message is copied
out of the send buffer
A post-recv returns before data is copied into
the recv buffer
Non-blocking calls consume space
Efficiency depends on the implementation

14
Other Non-blocking communications

MPI_WAITANY(count, array_of_requests, index,
status)
MPI_TESTANY(count, array_of_requests, index,
flag, status)
MPI_WAITALL(count, array_of_requests,
array_of_statuses)
MPI_TESTALL(count, array_of_requests, flag,
array_of_statuses)
MPI_WAITSOME(incount, array_of_requests,
outcount, array_of_indices, array_of_statuses)
MPI_TESTSOME(incount, array_of_requests,
outcount, array_of_indices, array_of_statuses)

15
Buffering and Safety
Process 0
Process 1
MPI_Send(1) MPI_Send(2) ..
MPI_Irecv(2) MPI_Irecv(1) ..
Safe
MPI_Isend MPI_Recv ..
MPI_Isend MPI_Recv
Safe
16
Communication Modes
17

Collective Communications

18
Example Matrix-vector Multiply
A
b
x

Communication All processes should gather all
elements of b.
19
Collective Communications AllGather
data
processors
A1
A2
A3
A4
A0
A0
A0
A1
A2
A3
A4
A1
AllGather
A2
A1
A2
A3
A4
A0
A1
A2
A3
A4
A0
A3
A4
A0
A1
A2
A3
A4
MPI_ALLGATHER(sendbuf, sendcount, sendtype,
recvbuf, recvcount, recvtype, comm)
MPI_ALLGATHERV(sendbuf, sendcount, sendtype,
array_of_recvbuf, array_of_displ, recvcount,
recvtype, comm)
20
Example Row-wise Matrix-Vector Multiply

MPI_Comm_size(comm, size)
MPI_Comm_rank(comm, rank)
nlocal n/size
MPI_Allgather(local_b,nlocal,MPI_DOUBLE, b,
nlocal, MPI_DOUBLE, comm)
for(i0 iltnlocal i)
xi 0.0
for(j0 jltn j)
xi ainjbj

21
Example Column-wise Matrix-vector Multiply
A
b
x

Dot-products corresponding to each element of x
will be parallelized
Steps
1. Each process computes its contribution to x
2. Contributions from all processes are added and
stored in appropriate process.
22
Example Column-wise Matrix-Vector Multiply

MPI_Comm_size(comm, size)
MPI_Comm_rank(comm, rank)
nlocal n/size
/ Compute partial dot-products /
for(i0 iltn i)
pxi 0.0
for(j0 jltnlocal j)
pxi ainlocaljbj

23
Collective Communications Reduce, Allreduce
data
processors
A1
A2
A0
A0B0C0
A1B1C1
A2B2C2
B0
B1
B2
Reduce
C1
C2
C0
MPI_REDUCE(sendbuf, recvbuf, count, datatype, op,
root, comm)
A1
A2
A0
A0B0C0
A1B1C1
A2B2C2
B0
B1
B2
A0B0C0
A1B1C1
A2B2C2
AllReduce
C1
C2
C0
A0B0C0
A1B1C1
A2B2C2
MPI_ALLREDUCE(sendbuf, recvbuf, count, datatype,
op, comm)
24
Collective Communications Scatter Gather
data
processors
A1
A2
A3
A4
A0
A0
A1
Scatter
A2
Gather
A3
A4
MPI_SCATTER(sendbuf, sendcount, sendtype,
recvbuf, recvcount, recvtype, root, comm)
MPI_SCATTERV(sendbuf, array_of_sendcounts,
array_of_displ, sendtype, recvbuf, recvcount,
recvtype, root, comm)
MPI_GATHER(sendbuf, sendcount, sendtype, recvbuf,
recvcount, recvtype, root, comm)
MPI_GATHERV(sendbuf, sendcount, sendtype,
recvbuf, array_of_recvcounts, array_of_displ,
recvtype, root, comm)
25
Example Column-wise Matrix-Vector Multiply

/ Summing the dot-products /
MPI_Reduce(px, fx, n, MPI_DOUBLE, MPI_SUM, 0,
comm)
/ Now all values of x is stored in process 0.
Need to scatter them /
MPI_Scatter(fx, nlocal, MPI_DOUBLE, x, nlocal,
MPI_DOUBLE, 0, comm)

26
Or

for(i0 iltsize i)
MPI_Reduce(pxinlocal, x, nlocal,
MPI_DOUBLE, MPI_SUM, i, comm)

27
Collective Communications

Only blocking standard mode no tags
Simple variant or vector variant
Some collectives have roots
Different types
One-to-all
All-to-one
All-to-all

28
Collective Communications - Barrier
MPI_BARRIER(comm)
A return from barrier in one process tells the
process that the other processes have entered the
barrier.
29
Collective Communications - Broadcast
processors
A
A
A
A
A
A
MPI_BCAST(buffer, count, datatype, root, comm)
30
Collective Communications AlltoAll
data
processors
A1
A2
A3
A4
A0
D0
A0
B0
E0
C0
B0
B1
B2
B3
B4
E1
C1
A1
D1
B1
AlltoAll
A2
C1
C2
C3
C4
C0
B2
C2
D2
E2
D1
D2
D3
D4
E3
D0
C3
D3
A3
B3
C4
A4
E0
E1
E2
E3
E4
B4
E4
D4
MPI_ALLTOALL(sendbuf, sendcount, sendtype,
recvbuf, recvcount, recvtype, comm)
MPI_ALLTOALLV(sendbuf, array_of_sendcounts,
array_of_displ, sendtype, array_of_recvbuf,
array_of_displ, recvcount, recvtype, comm)
31
Collective Communications ReduceScatter, Scan
data
processors
A1
A2
A0
A0B0C0
B0
B1
B2
A1B1C1
ReduceScatter
A2B2C2
C1
C2
C0
MPI_REDUCESCATTER(sendbuf, recvbuf,
array_of_recvcounts, datatype, op, comm)
A1
A2
A0
A0
A1
A2
B0
B1
B2
A0B0
A1B1
A2B2
scan
C1
C2
C0
A0B0C0
A1B1C1
A2B2C2
MPI_SCAN(sendbuf, recvbuf, count, datatype, op,
comm)
32

Communicators and Groups

33
Communicators

For logical division of processes
For forming communication contexts and avoiding
message conflicts
Communicator specifies a communication domain
for communications
Can be
Intra used for communicating within a single
group of processes
Inter used for communication between two
disjoint group of processes
Default communicators MPI_COMM_WORLD,
MPI_COMM_SELF

34
Groups

An ordered set of processes.
New group derived from base groups.
Group represented by a communicator
Group associated with MPI_COMM_WORLD is the first
base group
New groups can be created with Unions,
intersections, difference of existing groups
Functions provided for obtaining sizes, ranks

35
Communicator functions

MPI_COMM_DUP(comm, newcomm)
MPI_COMM_CREATE(comm, group, newcomm)
MPI_GROUP_INCL(group, n, ranks,
newgroup)
MPI_COMM_GROUP(comm,
group)
MPI_COMM_SPLIT(comm, color, key, newcomm)

F,G,A,D, E,I,C, h
36
Intercommunicators

For multi-disciplinary applications, pipeline
applications, easy readability of program
Inter-communicator can be used for point-point
communication (send and recv) between processes
of disjoint groups
Does not support collectives in 1.1
MPI_INTERCOMM_CREATE(local_comm, local_leader,
bridge_comm, remote_leader, tag, comm)
MPI_INTERCOMM_MERGE(intercomm, high,
newintracomm)

37
Communicator and Groups example
Group 2
Group 1
Group 0
0(0) 1(3) 2(6) 3(9)
0(2) 1(5) 2(8) 3(11)
0(1) 1(4) 2(7) 3(10)
main() membership rank 3
MPI_Comm_Split(MPI_COMM_WORLD, membership, rank,
mycomm)
38
Communicator and Groups example

if(membership 0)
MPI_Intercomm_Create(mycomm, 0,
MPI_COMM_WORLD, 1, 01, my1stcomm)
else if(membership 1)
MPI_Intercomm_Create(mycomm, 0,
MPI_COMM_WORLD, 0, 01, my1stcomm)
MPI_Intercomm_Create(mycomm, 0,
MPI_COMM_WORLD, 2, 12, my2ndcomm)
else
MPI_Intercomm_Create(mycomm, 0,
MPI_COMM_WORLD, 1, 12, my1stcomm)

39
MPI Process Topologies
40
Motivation

Logical process arrangement
For convenient identification of processes
program readability
For assisting runtime system in mapping processes
onto hardware increase in performance
Default linear array, ranks from 0 n-1
Virtual topology can give rise to trees, graphs,
meshes etc.

41
Introduction

Any process topology can be represented by
graphs.
MPI provides defaults for ring, mesh, torus and
other common structures

42
Cartesian Topology

Cartesian structures of arbitrary dimensions
Can be periodic along any number of dimensions
Popular cartesian structures linear array,
ring, rectangular mesh, cylinder, torus
(hypercubes)

43
Cartesian Topology - constructors

MPI_CART_CREATE(
comm_old - old communicator,
ndims number of dimensions,
dims - number of processes along each
dimension,
periods periodicity of the dimensions,
reorder whether ranks may be reordered,
comm_cart new communicator representing
cartesian topology
)
Collective communication call

44
Cartesian Topology - Constructors

MPI_DIMS_CREATE(
nnodes(in) - number of nodes in a grid,
ndims(in) number of dimensions,
dims(inout) - number of processes along each
dimension
)
Helps to create size of dimensions such that the
sizes are as close to each other as possible.
User can specify constraints by specifying ve
integers in certain entries of dims.
Only entries with 0 are modified.

(3,2)
(2,3, 1)
error
45
Cartesian Topology Inquiry Translators

MPI_CARTDIM_GET(comm, ndims)
MPI_CART_GET(comm, maxdims, dims, periodic,
coords)
MPI_CART_RANK(comm, coords, rank) coordinates -gt
rank
MPI_CART_COORDS(comm, rank, maxdims, coords)
rank-gtcoordinates

46
Cartesian topology - Shifting

MPI_CART_SHIFT(
comm,
direction,
displacement,
source,
dest
)
Useful for subsequent SendRecv
MPI_SendRecv(, dest., source)
Example

MPI_CART_SHIFT(comm, 1, 1, source, dest)
47
Example Cannons Matrix-Matrix Multiplication
sqrt(P)
B0,0 B0,1 B0,2 B0,3 B1,0
B1,1 B1,2 B1,3 B2,0 B2,1
B2,2 B2,3 B3,0 B3,1 B3,2
B3,3
A0,0 A0,1 A0,2 A0,3 A1,0
A1,1 A1,2 A1,3 A2,0 A2,1
A2,2 A2,3 A3,0 A3,1 A3,2
A3,3
sqrt(P)
A0,0 A0,1 A0,2 A0,3 B0,0
B1,1 B2,2 B3,3 A1,1 A1,2
A1,3 A1,0 B1,0 B2,1 B3,2
B0,3 A2,2 A2,3 A2,0 A2,1 B2,0
B3,1 B0,2 B1,3 A3,3 A3,0
A3,1 A3,2 B3,0 B0,1 B1,2
B2,3
Initial Realignment
48
Example Cannons Matrix-Matrix Multiplication
A0,3 A0,0 A0,1 A0,2 B3,0
B0,1 B1,2 B2,3 A1,0 A1,1
A1,2 A1,3 B0,0 B1,1 B2,2
B3,3 A2,1 A2,2 A2,3 A2,0 B1,0
B2,1 B3,2 B0,3 A3,2 A3,3
A3,0 A3,1 B2,0 B3,1 B0,2
B1,3
A0,2 A0,3 A0,0 A0,1 B2,0
B3,1 B0,2 B1,3 A1,3 A1,0
A1,1 A1,2 B3,0 B0,1 B1,2
B2,3 A2,0 A2,1 A2,2 A2,3 B0,0
B1,1 B2,2 B3,3 A3,1 A3,2
A3,3 A3,0 B1,0 B2,1 B3,2
B0,3
A0,1 A0,2 A0,3 A0,0 B1,0
B2,1 B3,2 B0,3 A1,2 A1,3
A1,0 A1,1 B2,0 B3,1 B0,2
B1,3 A2,3 A2,0 A2,1 A2,2 B3,0
B0,1 B1,2 B2,3 A3,0 A3,1
A3,2 A3,3 B0,0 B1,1 B2,2
B3,3
Third shift
Second shift
First shift
49
Cannons Algotihm with MPI Topologies

dims0 dims1 sqrt(P)
periods0 periods1 1
MPI_Cart_Create(comm,2,dims,periods,1,comm_2d)
MPI_Comm_rank(comm_2d, my2drank)
MPI_Cart_coords(comm_2d, my2drank, 2, mycoords)
MPI_Cart_shift(comm_2d, 0, -1, rightrank,
leftrank)
MPI_Cart_shift(comm_2d, 1, -1, downrank,
uprank)
nlocal n/dims0

50
Cannons Algotihm with MPI Topologies

/ Initial Matrix Alignment /
MPI_Cart_shift(comm_2d, 0, -mycoords0,
shiftsource, shiftdest)
MPI_Sendrecv_replace(a, nlocalnlocal,
MPI_DOUBLE, shiftdest, 1, shiftsource, 1,
comm_2d, status)
MPI_Cart_shift(comm_2d, 1, -mycoords1,
shiftsource, shiftdest)
MPI_Sendrecv_replace(b, nlocalnlocal,
MPI_DOUBLE, shiftdest, 1, shiftsource, 1,
comm_2d, status)

51
Cannons Algotihm with MPI Topologies

/ Main Computation Loop /
for(i0 iltdims0 i)
MatrixMultiply(nlocal,a,b,c) / ccab/
/ Shift matrix a left by one /
MPI_Sensrecv_replace(a, nlocalnlocal,
MPI_DOUBLE, leftrank, 1, rightrank, 1, comm_2d,
status)
/ Shift matrix b up by one /
MPI_Sensrecv_replace(b, nlocalnlocal,
MPI_DOUBLE, uprank, 1, downrank, 1, comm_2d,
status)

52
Cannons Algotihm with MPI Topologies

/ Restore original distribution of a and b /
MPI_Cart_shift(comm_2d, 0, mycoords0,
shiftsource, shiftdest)
MPI_Sendrecv_replace(a, nlocalnlocal,
MPI_DOUBLE, shiftdest, 1, shiftsource, 1,
comm_2d, status)
MPI_Cart_shift(comm_2d, 1, mycoords1,
shiftsource, shiftdest)
MPI_Sendrecv_replace(b, nlocalnlocal,
MPI_DOUBLE, shiftdest, 1, shiftsource, 1,
comm_2d, status)

53
General Graph Topology

MPI_GRAPH_CREATE(comm_old,
nnodes, index, edges,
reorder, comm_graph)
Example
nnodes 8,
index 3, 4, 6, 7, 10, 11, 13, 14
edges 1, 2, 4, 0, 0, 3, 2, 0, 5, 6, 4, 4, 7,
6

0
1
2
4
3
5
6
7
54
General Graph Topology - Inquiry

MPI_Graphdims_get(MPI_Comm comm, int nnodes, int
nedges)
MPI_Graph_get(MPI_Comm comm, int maxindex, int
maxedges, int index, int edges)
MPI_Graph_neighbors_count(MPI_Comm comm, int
rank, int nneighbors)
MPI_Graph_neighbors(MPI_Comm comm, int rank, int
maxneighbors, int neighbors)
MPI_TOPO_TEST(comm, status)
status can be MPI_GRAPH, MPI_CART, MPI_UNDEFINED

56
Communicators as caches

Caches used for storing and retrieving attributes
MPI_KEYVAL_CREATE(copy_fn, delete_fn, keyval,
extra_state)
typedef int MPI_Copy_function(MPI_Comm oldcomm,
int keyval, void extra_state,
void attribute_val_in,
void attribute_val_out, int flag)
typedef int MPI_Delete_function(MPI
_Comm comm, int keyval,
void attribute_val, void
extra_state)
MPI_ATTR_PUT(comm, keyval, attribute_val)
MPI_ATTR_GET(comm, keyval, attribute_val)

57
Point-Point example

main()
int count
sendbufcount
if(rank ! 0)
MPI_Recv(recvbuf, count, rank-1, tag, comm,
status)
else
for(i0 iltcount i)
recvbufi 0
for(i0 iltcount i)
recvbufi sendbufi
if(rank ! size-1)
MPI_Send(recvbuf, count, rank1, tag, comm)

58
Collective CommunicationsFinding maximum

main()
MPI_Scatter(full_array, local_size, MPI_INT,
local_array, local_size, MPI_INT, 0, comm)
local_max max(local_array)
MPI_Allreduce(local_max, global_max, MPI_INT,
MPI_MAX, comm)

59
Miscellanious attributes / functions