Title: MPI Message Passing Interface
1MPIMessage Passing Interface
Mehmet Balman Cmpe 587 Dec, 2001
2Parallel Computing
- Separate workers or processes
- Interact by exchanging information
- Types of parallel computing
- SIMD (single instruction multiple data)
- SPMD (single program multiple data)
- MPMD (multiple program multiple data)
- Hardware models
- Distributed memory (Paragon, IBM SPx, workstation
network) - Shared memory (SGI Power Challenge, Cray T3D)
3Communication with other processes
- One sided
- one worker performs transfer of data
- Cooperative
- all parties agree to transfer data
4What is MPI?
- A message-passing library specification
- Multiple processors by message passing
- Library of functions and macros that can be used
in C FORTRAN and C programs - For parallel computers, clusters, and
heterogeneous networks
- Who designed MPI?
- Vendors IBM, Intel, TMC, Meiko, Cray,
- Convex, Ncube
- Library writers PVM, p4, Zipcode,
TCGMSG, - Chameleon, Express, Linda
- Broad Participation
5- Development history (1993-1994)
- Began at Williamsburg Workshop in April, 1992
- Organized at Supercomputing '92 (November)
- Met every six weeks for two days
- Pre-final draft distributed at Supercomputing '93
- Final version of draft in May, 1994
- Public and vendor implementations available
6Features of MPI
- Point-to-point communication
- blocking, nonblocking
- synchronous, asynchronous
- ready,buffered
- Collective routines
- built-in, user defined
- Large of data movement routines
- Built-in support for grids and graphs
- 125 functions (MPI is large)
- 6 basic functions (MPI is small)
- Communicators combine context and groups for
message security
7example
include "mpi.h" include ltstdio.hgt int main(
int argc, char argv) int rank, size
MPI_Init( argc, argv ) MPI_Comm_rank(
MPI_COMM_WORLD, rank ) MPI_Comm_size(
MPI_COMM_WORLD, size ) printf( "Hello world!
I'm d of d\n", rank, size ) MPI_Finalize()
return 0
8- What happens when an MPI job is run
-
- The user issues a directive to the operating
system which has the effect of placing a copy of
the executable program on each processor - Each processor begins execution of its copy of
the executable - Different processes can execute different
statements by branching within the program
Typically the branching will be based on process
ranks
- Envelope of a message (control block)
- the rank of the receiver
- the rank of the sender
- a tag
- a communicator
9Two mechanisms for partitioning message space
Tags(0-32767) Communicators(MPI_COMM_WORLD)
10 MPI_Init MPI_Finalize MPI_Comm_size
MPI_Comm_rank MPI_Send MPI_Recv
MPI_Send( start, count, datatype, dest, tag, comm
) MPI_Recv(start, count, datatype, source, tag,
comm, status) MPI_Bcast(start, count, datatype,
root, comm) MPI_Reduce(start, result, count,
datatype, operation, root, comm)
11Collective patterns
12Collective Computation Operations
Operation Name Meaning MPI MAX Maximum MPI MIN
Minimum MPI SUM Sum MPI PROD Product MPI LAND
Logical And MPI BAND Bitwise And MPI LOR
Logical Or MPI BOR Bitwise Or MPI LXOR
Logical Exclusive Or MPI BXOR Bitwise Exclusive
Or MPI MAXLOC Maximum and Location of
Maximum MPI MINLOC Minimum and Location of
Minimum
MPI_Op_create( user_function, commutetrue if
commutative, op) MPI_Op_free(op)
13User defined communication groups
Communicator contains a context and a group.
Group just a set of processes.
MPI_Comm_create( oldcomm, group, newcomm
) MPI_Comm_group( oldcomm, group
) MPI_Group_free( group ) MPI_Group_incl
MPI_Group_excl MPI_Group_range_incl
MPI_Group_range_excl MPI_Group_union
MPI_Group_intersection
14Non-blocking operations
Non-blocking operations return immediately.
- MPI_Isend(start, count, datatype, dest, tag,
comm, request) - MPI_Irecv(start, count, datatype, dest, tag,
comm, request) - MPI_Wait(request, status)
- MPI_Waitall
- MPI_Waitany
- MPI_Waitsome
- MPI_Test( request, flag, status)
15Communication Modes
- Synchronous mode ( MPI_Ssend) the send does not
complete until a matching receive has begun. - Buffered mode ( MPI_Bsend) the user supplies the
buffer to system for its use. - Ready mode ( MPI_Rsend) user guarantees that
matching receive has been posted. Non-blocking
versions MPI_Issend - MPI_Irsend
- MPI_Ibsend
int bufsize char buf malloc(bufsize)
MPI_Buffer_attach( buf, bufsize ) ...
MPI_Bsend( ... same as MPI_Send ... ) ...
MPI_Buffer_detach( buf, bufsize )
16Datatypes
- Two main purpose
- Heterogenity --- parallel programs between
different processors - Noncontiguous data --- structures, vectors with
non-unit stride
MPI datatype C datatype MPI CHAR signed
char MPI SHORT signed short int MPI INT
signed int MPI LONG signed long int MPI
UNSIGNED CHAR unsigned char MPI UNSIGNED SHORT
unsigned short int MPI UNSIGNED unsigned
int MPI UNSIGNED LONG unsigned long int MPI
FLOAT foat MPI DOUBLE ouble MPI LONG DOUBLE
long double MPI BYTE MPI PACKED
17Build derived type
void Build_derived_type(INDATA_TYPE
indata,MPI_Datatype message_type_ptr) int
block_lengths3 MPI_Aint displacements3
MPI_Aint addresses4 MPI_Datatype typelist3
typelist0 MPI_FLOAT typelist1
MPI_FLOAT typelist2 MPI_INT
block_lengths0 block_lengths1
block_lengths2 1 MPI_Address(indata,
addresses0) MPI_Address((indatagta),
addresses1) MPI_Address((indatagtb),
addresses2) MPI_Address((indatagtn),
addresses3) displacements0 addresses1
addresses0 displacements1 addresses2
addresses0 displacements2 addresses3
addresses0 MPI_Type_struct(3, block_lengths,
displacements, typelist, message_type_ptr)
MPI_Type_commit(message_type_ptr)
18Other derived data types
- int MPI_Type_contiguous(int count, MPI_Datatype
oldtype,MPI_Datatype newtype) - elements are contiguous entries in an array
- int MPI_Type_vector(int count, int
block_length,int stride,MPI_Datatype
element_type,MPI_Datatype new_type) - elements are equally spaced entries of an array
- int MPI_Type_indexed(int count,int
array_of_blocklengths, int array_of_displacements,
MPI_Datatype element_type,MPI_Datatype new_type) - elements are arbitrary entries of an array
19Pack/unpack
void Get_data4(int my_rank, float a_ptr, float
b_ptr, int n_ptr) int root 0 char
buffer10int position if (my_rank
0) printf(''Enter a, b, and nn'') scanf(''f
f d'', a_ptr, b_ptr, n_ptr) position
0 MPI_Pack(a_ptr, 1, MPI_FLOAT, buffer, 100,
position, MPI_COMM_WORLD) MPI_Pack(b_ptr, 1,
MPI_FLOAT, buffer, 100, position,
MPI_COMM_WORLD) MPI_Pack(n_ptr, 1, MPI_INT,
buffer, 100, position, MPI_COMM_WORLD)
MPI_Bcast(buffer, 100, MPI_PACKED, root,
MPI_COMM_WORLD) else MPI_Bcast(buffer, 100,
MPI_PACKED, root, MPI_COMM_WORLD) position
0 MPI_Unpack(buffer, 100, position, a_ptr, 1,
MPI_FLOAT, MPI_COMM_WORLD) MPI_Unpack(buffer,
100, position, b_ptr, 1, MPI_FLOAT,
MPI_COMM_WORLD) MPI_Unpack(buffer, 100,
position, n_ptr, 1, MPI_INT, MPI_COMM_WORLD)
20Profiling
static int nsend 0 int MPI_Send( start,
count, datatype, dest, tag, comm ) nsend
return PMPI_Send( start, count, datatype, dest,
tag, comm )
21Architecture of MPI
- Complex communication operations can be expressed
portably in terms of lower-level ones - All MPI functions are implemented in terms of the
macros and functions that make up the
ADI(Abstract Device Interface)
- ADI
- specifying a message to be sent or received
- moving data between the API and the
message-passing hardware - managing lists of pending messages (both sent and
received), - providing basic information about the execution
environment (e.g., how many tasks are there
22Upper layers of MPICH
23Channel Interface
- Routines for send and receive envelope(control)
information - MPID_SendControl(MPID_SendControlBlock )
- MPID_RecvAnyControl
- MPID_ControlMsgAvail
- Send and receive data
- MPID_SendChannel
- MPID_RecvFromChannel
24Channel InterfaceThree different data exchange
mechanisms
Eager (default) Data is sent to the destination
immediately. Buffered on receiver
site. Rendezvous (MPI_Bsend) Data is sent to the
destination only when requested. Get (shared
memory) Data is read directly by the receiver.
25Lower layers of MPICH
26Summary
- Point-to-point and collective operations
- Blocking
- NonBlocking
- Asynchronous
- Synchronous
- Buffered
- Ready
- Abstraction for processes
- Rank of the group
- Virtual topologies
- Data types
- User specific
- predefined
- pack/unpack
- Architecture of MPI
- ADI(Abstract Device Interface)
- Channel Interface