Title: MPI and OpenMP
1MPI and OpenMP
- Kevin Leung
- Nathan Liang
- Paul Maynard
2What is MPI ?
- The Message Passing Interface (MPI) standard
- is a library with functions that can be called
from C, C or Fortran - MPI is a programming paradigm used widely on
parallel computers, e.g. - Scalable Parallel Computers (SPCs) with
distributed memory, - Networks of Workstations (NOWs).
- developed by a broadly based committee of
vendors, implementers, and users.
3Why MPI is introduced?
- Motivation for Parallel System
- Hardware limits on single CPUs
- Commodity computing
- Problem
- Coordinating use of multiple CPUs
- Solution
- Message passing
- Problem
- Proprietary Systems, Lack of Portablilty
- Solution
- MPI Consortium started in 1992
4Goals of MPI
- Design an application programming interface.
- Allow efficient communication.
- Allows data to be passed between processes in a
distributed memory environment - Allow for implementations that can be used in a
heterogeneous environment - Allow convenient C and Fortran 77 bindings for
the interface. - Provide a reliable communication interface. The
user need not cope with communication failures. - Define an interface not too different from
current practice, such as NX, PVM etc. and
provides extensions that allow greater
flexibility - Define an interface that can be implemented on
many vendors platforms.
5Version of MPI
- The original MPI standard was created by the
Message Passing Interface Forum (MPIF). - The public release of version 1.0 of MPI was made
in June 1994. - The MPIF began meeting again in March 1995.
- In June 1995 and version 1.1 of the standard was
released - In July of 1997, the original MPI is being
referred to as MPI-1 and the new effort is being
called MPI-2
6What is Includes in MPI (standard)?
- Bindings for Fortran 77 and C
- Point-to-point communication
- Collective operations
- Process groups
- Communication domains
- Process topologies
- Environmental Management and inquiry
- Profiling interface
7Language Binding
- All MPI names have an MPI_ prefix,
- In Fortran 77, all characters are upper case
- In C, constants are in all capital letters, and
defined types and functions have one capital
letter after the prefix - Programs must not declare variables or functions
with names beginning with the prefix MPI_ or
PMPI_ - The definition of named constants, function
prototypes, and type definitions must be supplied
in an include file mpi.h. and mpif.h
8MPI Function
- MPI is large (there are 128 MPI routines )
- 6 Basics function
- MPI_INIT ( Int arg, char argv)
- Initiate an MPI computation.
- MPI_FINALIZE ()
- Shutdown a computation.
- MPI_COMM_SIZE (comm, size)
- Determine the number of processes in a
computation - MPI_COMM_RANK (comm, pid)
- Determine the identifier of the current process.
- MPI_SEND (buf, count, datatype, dest, tag comm)
- Send a message
- MPI_RECV (buf, count, datatype, source, tag,
comm, status) - Receive a message
9How to use MPI
- Include the MPI header files
- e.g. include ltmpi.hgt
- Initialize the MPI environment
- Write the code
- Finalize the MPI enviroment
- e.g. MPI_Finalize()
10Hello World!!
include "mpi.h" int main(int argc, char
argv) int my_rank, p, source, dest, tag
0 char message100 MPI_Status
status MPI_Init(argc, argv)
MPI_Comm_rank(MPI_COMM_WORLD, my_rank)
MPI_Comm_size(MPI_COMM_WORLD, p) if
(my_rank ! 0) / Create message /
sprintf(message, Hello from process d!",
my_rank) dest 0 MPI_Send(message,
strlen(message)1, MPI_CHAR, dest, tag,
MPI_COMM_WORLD) else for(source 1
source lt p source) MPI_Recv(message,
100, MPI_CHAR, source, tag, MPI_COMM_WORLD,
status) printf("s", message)
MPI_Finalize()
11Include File
Include
Include MPI header file
include ltstdio.hgt include ltstdlib.hgt include
ltmpi.hgt int main(int argc, char argv)
Initialize
Work
Terminate
12Initialize MPI
Include
Initialize MPI environment
int main(int argc, char argv) int
numtasks, rank MPI_Init (argc,argv)
MPI_Comm_size(MPI_COMM_WORLD, numtasks)
MPI_Comm_rank(MPI_COMM_WORLD, rank) ...
Initialize
Work
Terminate
13Initialize MPI (cont.)
MPI_Init (argc,argv) Not MPI functions called
before this call. MPI_Comm_size(MPI_COMM_WORLD,
nump) A communicator is a collection of
processes that can send messages to each other.
MPI_COMM_WORLD is a predefined communicator
that consists of all the processes running when
the program execution begins. MPI_Comm_rank(MPI
_COMM_WORLD, myrank) In order for a process to
find out its rank.
Include
Initialize
Work
Terminate
14Work with MPI
Work Make message passing calls (Send, Receive)
Include
if(my_rank ! 0) MPI_Send(data, strlen(data)1,
MPI_CHAR, dest, tag, MPI_COMM_WORLD) els
e MPI_Recv(data, 100, MPI_CHAR, source, tag,
MPI_COMM_WORLD, status)
Initialize
Work
Terminate
15Terminate MPI environment
Terminate MPI environment
Include
include ltstdio.hgt include ltstdlib.hgt include
ltmpi.hgt int main(int argc, char argv)
MPI_Finalize()
Initialize
Work
No MPI functions called after this call.
Terminate
16Compile and Run MPI
- Compile
- gcc c hello.exe mpi_hello.c lmpi
- mpicc mpi_hello.c
- Run
- mpirun np 5 hello.exe
- Output
mpirun np 5 hello.exe Hello from process
1! Hello from process 2! Hello from process
3! Hello from process 4!
17Implementation
- MPI's advantage over older message passing
libraries is that it is both portable (because
MPI has been implemented for almost every
distributed memory architecture) and fast
(because each implementation is optimized for the
hardware it runs on).
18Kinds of Commands
- Point to Point Communication
- Collective Communication
- User Defined Datatypes and Packing
- Groups and Communicators
- Process Topologies
19Point to Point Communication
- The basic communication mechanism
- handle data transmission between any two
processors - one sends the data and the other receives it
20Example C code. Process 0 sends a message to
process 1. char msg20 int myrank, tag 99
MPI_STATUS status ... MPI_Comm_rank(MPI_COMM_WO
RLD, myrank) / find my rank / if (myrank
0) strcpy(msg, "Hello there") MPI_SEND(msg,
strlen(msg)1, MPI_CHAR, 1, tag,
MPI_COMM_WORLD) else if (myrank 1)
MPI_Recv(msg, 20, MPI_CHAR, 0, tag,
MPI_COMM_WORLD, status)
21- Blocking Communication
- MPI_SEND and MPI_RECV
- The send function blocks until process 0 can
safely over-write the contents of msg - the receive function blocks until the receive
buffer actually contains the contents of the msg
22Deadlock
- Example
- Solutions
- Reorder the communications
- Use the MPI_Sendrecv
- Use non-blocking ISend or IRecv
- Use the buffered mode BSend
Process 0 Process 1 Recv(1) Recv(0)
Send(1) Send(0)
Process 0 Process 1 Send(1) Recv(0) Recv(1)
Send(0)
Process 0 Process 1 Sendrecv(1) Sendrecv(0)
Process 0 Process 1 ISend(1) ISend(0)
IRecv(1) IRecv(0) Waitall Waitall
Process 0 Process 1 Bsend(1) Bsend(0) Recv(1)
Recv(0)
23- Nonblocking Communication
- MPI_ISEND and MPI_IRECV
- The process is immediately, no wait for calls to
be completed - Concurrency
24User Defined Datatypes and Packing
- All MPI communication functions take a datatype
argument. In the simplest case this will be a
primitive type, such as an integer or
floating-point number. - An important and powerful generalization results
by allowing user-defined types wherever the
primitive types can occur. - The user can define derived datatypes, that
specify more general data layouts - A sending process can explicitly pack
noncontiguous data (an array, a structure, etc.)
into a contiguous buffer, and next send it - A receive process can unpack the contiguous
buffer and store it as noncontiguous data.
25Collective Communication
- Collective communications transmit data among all
processes in a group - Barrier Synchronization
- MPI_Barrier synchronizes all processes in the
communicator calling this function - Data movement
- Broadcast from 1 --gt all
- Gather data from all --gt 1
- Scatter data from 1 --gt all
- All gather
- All to all
26Groups and Communicators
- Division of processes
- MPI_COMM_WORLD
- MPI_COMM_SIZE
- MPI_COMM_RANK
- Avoiding Message Conflicts between Modules.
- Expand the functionality of the message passing
system - Safety
27Process Topologies
- The rank processes are arranged in topological
patterns such as two- or three-dimensional grids - A topology can provide a convenient naming
mechanism for the processes of a group (within a
communicator), and additionally, may assist the
runtime system in mapping the processes onto
hardware.
Relationship between ranks and Cartesian
coordinates for a 3x4 2D topology. The upper
number in each box is the rank of the process and
the lower value is the (row, column) coordinates
Overlapping topology. The upper values in each
process is the rank / (row,col) in the original
2D topology and the lower values are the same for
the shifted 2D topology
28OpenMP
29What is it?
- What does openMP stand for?
- Open specifications for Multi Processing
- It is an API with three main components
- Compiler directives
- Library routines
- Variables
- Used for writing multithreaded programs
30What do you need?
- What programming languages?
- C\C
- FORTRAN (77, 90, 95)
- What operating systems?
- UNIX
- Windows NT
- Can I compile openMP code with gcc?
- No it takes a special compiler
31Some compilers for openMP
- SGI MIPSpro
- Fortran, C, C
- IBM XL
- C/C and Fortran
- Sun Studio 10
- Fortran 95, C, and C
- Portland Group Compilers and Tools
- Fortran, C, and C
- Absoft Pro FortranMP
- Fortran, C, and C
- PathScale
- Fortran
32What it does
- Program starts off with a master thread
- It runs for some amount of time
- When the master thread reaches a region where the
work can be done concurrently - It creates several threads
- They all do work in this region
- When the end of the region is reached
- All the threads terminate
- Except for the master thread
33Example
- I get a job moving boxes
- When I go to work I bring several friends
- Who help me move the boxes
- On pay day
- I dont bring any friends and I get all the money
34OpenMP directives
- Format example
- pragma omp parallel for shared(y)
- Always starts with
- pragma omp
- Then the directive name
- parallel for
- Followed by an clause
- The clause is optional
- shared(y)
- At the end a newline
35Directives list
- PARALLEL
- Multiple threads will execute on the code
- DO/for
- Causes the do or for loop to be executed in
parallel by the worker threads - SECTIONS
- Each section will be executed by multiple threads
- SINGLE
- Only to be executed by one thread
- PARALLEL DO/for
- Contains only one DO/for loop in the block
- PARALLEL SECTIONS
- Contains only one section in the block
36Work Sharing
37Work Sharing
38Work Sharing
39Data scope attribute clauses
- PRIVATE
- Variables declared in this block are independent
for each thread - SHARED
- Variables declared in this block are shared for
each thread - DEFAULT
- Allows a scope for all variables in the block
- FIRSTPRIVATE
- PRIVATE that has initialization of the variables
- LASTPRIVATE
- PRIVATE that copies the value from the last loop
through the block is copied to the original
object - COPYIN
- Assign the same value to a variable independent
for each thread - REDUCTION
- Applies the variable to all the private copies of
a shared variable
40Directives and clauses
41Synchronization
- MASTER
- Only the master thread can execute this block
- CRITICAL
- Only one thread can execute this block at a time
- BARRIER
- Causes all of the threads to wait at this point
until all of the threads reaches this point - ATOMIC
- The memory location will be wrote one at a time
- FLUSH
- The view of memory must be consistent
- ORDERED
- The loop will be executed as if it was serially
executed
42Environment Variables
- OMP_SCHEDULE
- Number of runs through a loop
- OMP_NUM_THREADS
- Number of threads
- OMP_DYNAMIC
- If dynamic number of thread is allowed
- OMP_NESTED
- If nested parallelism is allowed
43Library Routines
- OMP_SET_NUM_THREADS
- OMP_GET_NUM_THREADS
- OMP_GET_MAX_THREADS
- OMP_GET_THREAD_NUM
- OMP_GET_NUM_PROCS
- OMP_IN_PARALLEL
- OMP_SET_DYNAMIC
- OMP_GET_DYNAMIC
- OMP_SET_NESTED
- OMP_GET_NESTED
- OMP_INIT_LOCK
- OMP_DESTROY_LOCK
- OMP_SET_LOCK
- OMP_UNSET_LOCK
- OMP_TEST_LOCK
44Example http//beowulf.lcs.mit.edu/18.337/beowulf
.html
- include ltmath.hgt
- include ltstdio.hgt
- define N 16384
- define M 10
- double dotproduct(int, double )
- double dotproduct(int i, double x)
-
- double temp0.0, denom
- int j
-
- for (j0 jltN j)
-
- // zero based!!
- denom (ij)(ij1)/2 i1
- temp temp xj(1/denom)
- return temp
- int main()
-
- double x new doubleN
- double y new doubleN
- double eig sqrt(N)
- double denom,temp
- int i,j,k
- for (i0 iltN i) xi 1/eig
- for (k0kltMk)
- yi0 // compute y Ax
- pragma omp parallel for shared(y)
- for (i0 iltN i) yi dotproduct(i,x)
- // find largest eigenvalue of y eig 0
- for (i0 iltN i) eig eig yiyi
- eig sqrt(eig)
- printf("The largest eigenvalue after 2d
iteration is 16.15e\n",k1, eig) // normalize
- for (i0 iltN i) xi yi/eig
-
45References
- http//beowulf.lcs.mit.edu/18.337/beowulf.html
- http//www.compunity.org/resources/compilers/index
.php - http//www.llnl.gov/computing/tutorials/workshops/
workshop/openMP/MAIN.htmlClausesDirectives