MPI and OpenMP - PowerPoint PPT Presentation

1 / 45

About This Presentation

Title:

MPI and OpenMP

Description:

Title: OpenMP Author: user Last modified by: Kahim Created Date: 3/21/2005 2:44:47 AM Document presentation format: Company: University of Kentucky – PowerPoint PPT presentation

Number of Views:72

Avg rating:3.0/5.0

Slides: 46

Provided by: mgnetOrg

Category:

more less

Transcript and Presenter's Notes

Title: MPI and OpenMP

1
MPI and OpenMP

Kevin Leung
Nathan Liang
Paul Maynard

2
What is MPI ?

The Message Passing Interface (MPI) standard
is a library with functions that can be called
from C, C or Fortran
MPI is a programming paradigm used widely on
parallel computers, e.g.
Scalable Parallel Computers (SPCs) with
distributed memory,
Networks of Workstations (NOWs).
developed by a broadly based committee of
vendors, implementers, and users.

3
Why MPI is introduced?

Motivation for Parallel System
Hardware limits on single CPUs
Commodity computing
Problem
Coordinating use of multiple CPUs
Solution
Message passing
Problem
Proprietary Systems, Lack of Portablilty
Solution
MPI Consortium started in 1992

4
Goals of MPI

Design an application programming interface.
Allow efficient communication.
Allows data to be passed between processes in a
distributed memory environment
Allow for implementations that can be used in a
heterogeneous environment
Allow convenient C and Fortran 77 bindings for
the interface.
Provide a reliable communication interface. The
user need not cope with communication failures.
Define an interface not too different from
current practice, such as NX, PVM etc. and
provides extensions that allow greater
flexibility
Define an interface that can be implemented on
many vendors platforms.

5
Version of MPI

The original MPI standard was created by the
Message Passing Interface Forum (MPIF).
The public release of version 1.0 of MPI was made
in June 1994.
The MPIF began meeting again in March 1995.
In June 1995 and version 1.1 of the standard was
released
In July of 1997, the original MPI is being
referred to as MPI-1 and the new effort is being
called MPI-2

6
What is Includes in MPI (standard)?

Bindings for Fortran 77 and C
Point-to-point communication
Collective operations
Process groups
Communication domains
Process topologies
Environmental Management and inquiry
Profiling interface

7
Language Binding

All MPI names have an MPI_ prefix,
In Fortran 77, all characters are upper case
In C, constants are in all capital letters, and
defined types and functions have one capital
letter after the prefix
Programs must not declare variables or functions
with names beginning with the prefix MPI_ or
PMPI_
The definition of named constants, function
prototypes, and type definitions must be supplied
in an include file mpi.h. and mpif.h

8
MPI Function

MPI is large (there are 128 MPI routines )
6 Basics function
MPI_INIT ( Int arg, char argv)
Initiate an MPI computation.
MPI_FINALIZE ()
Shutdown a computation.
MPI_COMM_SIZE (comm, size)
Determine the number of processes in a
computation
MPI_COMM_RANK (comm, pid)
Determine the identifier of the current process.
MPI_SEND (buf, count, datatype, dest, tag comm)
Send a message
MPI_RECV (buf, count, datatype, source, tag,
comm, status)
Receive a message

9
How to use MPI

Include the MPI header files
e.g. include ltmpi.hgt
Initialize the MPI environment
Write the code
Finalize the MPI enviroment
e.g. MPI_Finalize()

10
Hello World!!
include "mpi.h" int main(int argc, char
argv) int my_rank, p, source, dest, tag
0 char message100 MPI_Status
status MPI_Init(argc, argv)
MPI_Comm_rank(MPI_COMM_WORLD, my_rank)
MPI_Comm_size(MPI_COMM_WORLD, p) if
(my_rank ! 0) / Create message /
sprintf(message, Hello from process d!",
my_rank) dest 0 MPI_Send(message,
strlen(message)1, MPI_CHAR, dest, tag,
MPI_COMM_WORLD) else for(source 1
source lt p source) MPI_Recv(message,
100, MPI_CHAR, source, tag, MPI_COMM_WORLD,
status) printf("s", message)
MPI_Finalize()
11
Include File
Include
Include MPI header file
include ltstdio.hgt include ltstdlib.hgt include
ltmpi.hgt int main(int argc, char argv)
Initialize
Work
Terminate
12
Initialize MPI
Include
Initialize MPI environment
int main(int argc, char argv) int
numtasks, rank MPI_Init (argc,argv)
MPI_Comm_size(MPI_COMM_WORLD, numtasks)
MPI_Comm_rank(MPI_COMM_WORLD, rank) ...
Initialize
Work
Terminate
13
Initialize MPI (cont.)
MPI_Init (argc,argv) Not MPI functions called
before this call. MPI_Comm_size(MPI_COMM_WORLD,
nump) A communicator is a collection of
processes that can send messages to each other.
MPI_COMM_WORLD is a predefined communicator
that consists of all the processes running when
the program execution begins. MPI_Comm_rank(MPI
_COMM_WORLD, myrank) In order for a process to
find out its rank.
Include
Initialize
Work
Terminate
14
Work with MPI
Work Make message passing calls (Send, Receive)
Include
if(my_rank ! 0) MPI_Send(data, strlen(data)1,
MPI_CHAR, dest, tag, MPI_COMM_WORLD) els
e MPI_Recv(data, 100, MPI_CHAR, source, tag,
MPI_COMM_WORLD, status)
Initialize
Work
Terminate
15
Terminate MPI environment
Terminate MPI environment
Include
include ltstdio.hgt include ltstdlib.hgt include
ltmpi.hgt int main(int argc, char argv)
MPI_Finalize()
Initialize
Work
No MPI functions called after this call.
Terminate
16
Compile and Run MPI

Compile
gcc c hello.exe mpi_hello.c lmpi
mpicc mpi_hello.c
Run
mpirun np 5 hello.exe
Output

mpirun np 5 hello.exe Hello from process
1! Hello from process 2! Hello from process
3! Hello from process 4!
17
Implementation

MPI's advantage over older message passing
libraries is that it is both portable (because
MPI has been implemented for almost every
distributed memory architecture) and fast
(because each implementation is optimized for the
hardware it runs on).

18
Kinds of Commands

Point to Point Communication
Collective Communication
User Defined Datatypes and Packing
Groups and Communicators
Process Topologies

19
Point to Point Communication

The basic communication mechanism
handle data transmission between any two
processors
one sends the data and the other receives it

20
Example C code. Process 0 sends a message to
process 1. char msg20 int myrank, tag 99
MPI_STATUS status ... MPI_Comm_rank(MPI_COMM_WO
RLD, myrank) / find my rank / if (myrank
0) strcpy(msg, "Hello there") MPI_SEND(msg,
strlen(msg)1, MPI_CHAR, 1, tag,
MPI_COMM_WORLD) else if (myrank 1)
MPI_Recv(msg, 20, MPI_CHAR, 0, tag,
MPI_COMM_WORLD, status)
21

Blocking Communication
MPI_SEND and MPI_RECV
The send function blocks until process 0 can
safely over-write the contents of msg
the receive function blocks until the receive
buffer actually contains the contents of the msg

22
Deadlock

Example
Solutions
Reorder the communications
Use the MPI_Sendrecv
Use non-blocking ISend or IRecv
Use the buffered mode BSend

Process 0 Process 1 Recv(1) Recv(0)
Send(1) Send(0)
Process 0 Process 1 Send(1) Recv(0) Recv(1)
Send(0)
Process 0 Process 1 Sendrecv(1) Sendrecv(0)
Process 0 Process 1 ISend(1) ISend(0)
IRecv(1) IRecv(0) Waitall Waitall
Process 0 Process 1 Bsend(1) Bsend(0) Recv(1)
Recv(0)
23

Nonblocking Communication
MPI_ISEND and MPI_IRECV
The process is immediately, no wait for calls to
be completed
Concurrency

24
User Defined Datatypes and Packing

All MPI communication functions take a datatype
argument. In the simplest case this will be a
primitive type, such as an integer or
floating-point number.
An important and powerful generalization results
by allowing user-defined types wherever the
primitive types can occur.
The user can define derived datatypes, that
specify more general data layouts
A sending process can explicitly pack
noncontiguous data (an array, a structure, etc.)
into a contiguous buffer, and next send it
A receive process can unpack the contiguous
buffer and store it as noncontiguous data.

25
Collective Communication

Collective communications transmit data among all
processes in a group
Barrier Synchronization
MPI_Barrier synchronizes all processes in the
communicator calling this function
Data movement
Broadcast from 1 --gt all
Gather data from all --gt 1
Scatter data from 1 --gt all
All gather
All to all

26
Groups and Communicators

Division of processes
MPI_COMM_WORLD
MPI_COMM_SIZE
MPI_COMM_RANK
Avoiding Message Conflicts between Modules.
Expand the functionality of the message passing
system
Safety

27
Process Topologies

The rank processes are arranged in topological
patterns such as two- or three-dimensional grids
A topology can provide a convenient naming
mechanism for the processes of a group (within a
communicator), and additionally, may assist the
runtime system in mapping the processes onto
hardware.

Relationship between ranks and Cartesian
coordinates for a 3x4 2D topology. The upper
number in each box is the rank of the process and
the lower value is the (row, column) coordinates
Overlapping topology. The upper values in each
process is the rank / (row,col) in the original
2D topology and the lower values are the same for
the shifted 2D topology
28
OpenMP

Paul Maynard

29
What is it?

What does openMP stand for?
Open specifications for Multi Processing
It is an API with three main components
Compiler directives
Library routines
Variables
Used for writing multithreaded programs

30
What do you need?

What programming languages?
C\C
FORTRAN (77, 90, 95)
What operating systems?
UNIX
Windows NT
Can I compile openMP code with gcc?
No it takes a special compiler

31
Some compilers for openMP

SGI MIPSpro
Fortran, C, C
IBM XL
C/C and Fortran
Sun Studio 10
Fortran 95, C, and C
Portland Group Compilers and Tools
Fortran, C, and C
Absoft Pro FortranMP
Fortran, C, and C
PathScale
Fortran

32
What it does

Program starts off with a master thread
It runs for some amount of time
When the master thread reaches a region where the
work can be done concurrently
It creates several threads
They all do work in this region
When the end of the region is reached
All the threads terminate
Except for the master thread

33
Example

I get a job moving boxes
When I go to work I bring several friends
Who help me move the boxes
On pay day
I dont bring any friends and I get all the money

34
OpenMP directives

Format example
pragma omp parallel for shared(y)
Always starts with
pragma omp
Then the directive name
parallel for
Followed by an clause
The clause is optional
shared(y)
At the end a newline

35
Directives list

PARALLEL
Multiple threads will execute on the code
DO/for
Causes the do or for loop to be executed in
parallel by the worker threads
SECTIONS
Each section will be executed by multiple threads
SINGLE
Only to be executed by one thread
PARALLEL DO/for
Contains only one DO/for loop in the block
PARALLEL SECTIONS
Contains only one section in the block

36
Work Sharing
37
Work Sharing
38
Work Sharing
39
Data scope attribute clauses

PRIVATE
Variables declared in this block are independent
for each thread
SHARED
Variables declared in this block are shared for
each thread
DEFAULT
Allows a scope for all variables in the block
FIRSTPRIVATE
PRIVATE that has initialization of the variables
LASTPRIVATE
PRIVATE that copies the value from the last loop
through the block is copied to the original
object
COPYIN
Assign the same value to a variable independent
for each thread
REDUCTION
Applies the variable to all the private copies of
a shared variable

40
Directives and clauses
41
Synchronization

MASTER
Only the master thread can execute this block
CRITICAL
Only one thread can execute this block at a time
BARRIER
Causes all of the threads to wait at this point
until all of the threads reaches this point
ATOMIC
The memory location will be wrote one at a time
FLUSH
The view of memory must be consistent
ORDERED
The loop will be executed as if it was serially
executed

42
Environment Variables

OMP_SCHEDULE
Number of runs through a loop
OMP_NUM_THREADS
Number of threads
OMP_DYNAMIC
If dynamic number of thread is allowed
OMP_NESTED
If nested parallelism is allowed

43
Library Routines

OMP_SET_NUM_THREADS
OMP_GET_NUM_THREADS
OMP_GET_MAX_THREADS
OMP_GET_THREAD_NUM
OMP_GET_NUM_PROCS
OMP_IN_PARALLEL
OMP_SET_DYNAMIC
OMP_GET_DYNAMIC

OMP_SET_NESTED
OMP_GET_NESTED
OMP_INIT_LOCK
OMP_DESTROY_LOCK
OMP_SET_LOCK
OMP_UNSET_LOCK
OMP_TEST_LOCK

44
Example http//beowulf.lcs.mit.edu/18.337/beowulf
.html

include ltmath.hgt
include ltstdio.hgt
define N 16384
define M 10
double dotproduct(int, double )
double dotproduct(int i, double x)
double temp0.0, denom
int j
for (j0 jltN j)
// zero based!!
denom (ij)(ij1)/2 i1
temp temp xj(1/denom)
return temp

int main()
double x new doubleN
double y new doubleN
double eig sqrt(N)
double denom,temp
int i,j,k
for (i0 iltN i) xi 1/eig
for (k0kltMk)
yi0 // compute y Ax
pragma omp parallel for shared(y)
for (i0 iltN i) yi dotproduct(i,x)
// find largest eigenvalue of y eig 0
for (i0 iltN i) eig eig yiyi
eig sqrt(eig)
printf("The largest eigenvalue after 2d
iteration is 16.15e\n",k1, eig) // normalize
for (i0 iltN i) xi yi/eig

45
References

http//beowulf.lcs.mit.edu/18.337/beowulf.html
http//www.compunity.org/resources/compilers/index
.php
http//www.llnl.gov/computing/tutorials/workshops/
workshop/openMP/MAIN.htmlClausesDirectives

Write a Comment

User Comments (0)