Title: Message Passing Programming
1Message Passing Programming
2Learning Objectives
- Understanding how MPI programs execute
- Familiarity with fundamental MPI functions
3Review of Flynns Taxonomy
- SPMD (Single program different data)
MIMD can be converted to SPMD MPI is primarily
for MIMD/SPMD codes
4Message-passing Model
5SPMD Model
- Single Program Multiple Data
- Each processor has a copy of the same program
- All run them at their own rate
- May take different paths through the code
- Process specific control through
- My process number
- Total number of processors
- Explicit Communication and Synchornization
6Task/Channel vs. Message-passing
- Communication
- Point to Point
- Broadcast
7Advantages of Message-passing Model
- Portability to many architectures
- Natural fit for multicomputers
- Distinguishes between local memory (fast access)
and remote memory(slow access) - Ability to manage memory hierarchy
- Each process controls its own memory
- No cache coherence problems
- Easier to create a deterministic program
- Simplifies debugging
8The Message Passing Interface
- 1980s vendors had unique libraries
- 1989 Parallel Virtual Machine (PVM) developed at
Oak Ridge National Lab - 1992 Work on MPI standard begun
- 1994 Version 1.0 of MPI standard
- 1997 Version 2.0 of MPI standard
- Today MPI is dominant message passing library
standard - Public Domain versions at http//www-unix.mcs.anl
.gov/mpi/
9MPI Features
- A message passing library specification
- Message passing model
- Not a language or complier specification
- For parallel computers, clusters and heterogenous
networks - Designed for ease of parallel software
development - Designed to provide access to advance hardware
- Not designed for fault tolerance
- No process management
- No virtual memory management
10Flexibility of MPI
- Large (has 125 Functions)
- But most programs can be written using just 6
functions - Need not master all parts of MPI to use it
11Getting Started
12Getting Started
- Setup path to MPICH
- set MPI_ROOT /home/software/mpich
- set path (path MPI_ROOT/bin)
- Copy Makefile and machines.sample in your working
directory - Create .rhosts file in you home directory
- Add names of machines from machines.sample
machine_name userid
13Hello World Version 1(hello0.c)
include ltstdio.hgt include "mpi.h int main
(int argc, char argv) MPI_Init (argc,
argv) printf ("\n Hello World\n")
MPI_Finalize ()
Compile make hello0
Run mpirun -np x -machinefile machines.sample
hello0
14Include Files
include ltmpi.hgt
include ltstdio.hgt
15Initialize MPI
MPI_Init (argc, argv)
- First MPI function called by each process
- Not necessarily first executable statement
- Allows system to do any necessary setup
16Shutting Down MPI
MPI_Finalize()
- Call after all other MPI library calls
- Allows system to free up MPI resources
17Communicators
- Communicator opaque object that provides
message-passing environment for processes - MPI_COMM_WORLD
- Default communicator
- Includes all processes
18Communicator
MPI_COMM_WORLD
0
5
2
1
4
3
19Determine Number of Processes
MPI_Comm_size (MPI_COMM_WORLD, p)
- First argument is communicator
- Number of processes returned through second
argument
20Determine Process Rank
MPI_Comm_rank (MPI_COMM_WORLD, id)
- First argument is communicator
- Process rank (in range 0, 1, , p-1) returned
through second argument
21Replication of Automatic Variables
22Hello World Version 2 (hello1.c)
include ltstdio.hgt include "mpi.h" int main (int
argc, char argv) int rank, n, i,
message char buff1000 MPI_Init (argc,
argv) MPI_Comm_size (MPI_COMM_WORLD, n)
MPI_Comm_rank (MPI_COMM_WORLD, rank)
printf ("\n Hello from process 3d \n", rank)
MPI_Finalize ()
23Point-to-point Communication
- Involves a pair of processes
- One process sends a message
- Other process receives the message
24Send/Receive (Blocking)
25Function MPI_Send
int MPI_Send ( void message,
int count, MPI_Datatype
datatype, int dest, int
tag, MPI_Comm comm )
26Function MPI_Recv
int MPI_Recv ( void message,
int count, MPI_Datatype
datatype, int source, int
tag, MPI_Comm comm,
MPI_Status status ) MPI_Recv blocks until the
message has been received, or error occurs
27Inside MPI_Send and MPI_Recv
Sending Process
Receiving Process
Program Memory
System Buffer
System Buffer
Program Memory
28Return from MPI_Send
- Function blocks until message buffer free
- Message buffer is free when
- Message copied to system buffer, or
- Message transmitted
- Typical scenario
- Message copied to system buffer
- Transmission overlaps computation
29Return from MPI_Recv
- Function blocks until message in buffer
- If message never arrives, function never returns
30Deadlock
- Deadlock process waiting for a condition that
will never become true - Easy to write send/receive code that deadlocks
- Two processes both receive before send
- Send tag doesnt match receive tag
- Process sends message to wrong destination process
31Hello World Version 3 (hello2.c)
include ltstdio.hgt include "mpi.h" int main (int
argc, char argv) int rank, n, i,
message char buff1000 MPI_Status
status MPI_Init (argc, argv)
MPI_Comm_size (MPI_COMM_WORLD, n)
MPI_Comm_rank (MPI_COMM_WORLD, rank) if
(rank0) / Process 0 will output data /
printf ("\n Hello from process 3d", rank)
for (i1iltni) MPI_Recv
(message, 1, MPI_INT, i, 111,
MPI_COMM_WORLD, status) printf ("\n
Hello Sent from process 3d\n", message)
else MPI_Send (rank, 1, MPI_INT, 0,
111, MPI_COMM_WORLD) MPI_Finalize ()
32MPI Functions
- MPI_Init
- MPI_Comm_Size
- MPI_Comm_Rank
- MPI_Send
- MPI_Recv
- MPI_Finalize
33Global Communications
34Prototype of MPI_Reduce()
int MPI_Reduce ( void operand,
/ addr of 1st reduction element /
void result, / addr of
1st reduction result / int count,
/ reductions to perform /
MPI_Datatype type, / type of
elements / MPI_Op operator,
/ reduction operator / int
root, / process getting
result(s) / MPI_Comm comm
/ communicator / )
35Addition (add1.c)
. MPI_Init (argc, argv) MPI_Comm_size
(MPI_COMM_WORLD, n) MPI_Comm_rank
(MPI_COMM_WORLD, rank) divisiontotal_numbers
/n startrankdivision end(rank1)division
sum0 total_sum0 for(istart
iltendi) sumsumi printf ("\n
Process 3d calculated from d to d \n", rank,
start,,end) MPI_Reduce(sum,total_sum,
1,MPI_INT,MPI_SUM,0,MPI_COMM_WORLD) if
(rank0) / Process 0 will output data /
printf ("\n Total sum is 3d\n", total_sum
) MPI_Finalize ()
36Function MPI_Bcast
int MPI_Bcast ( void buffer, / Addr of 1st
element / int count, / elements to
broadcast / MPI_Datatype datatype, / Type of
elements / int root, / ID of root
process / MPI_Comm comm) / Communicator /
MPI_Bcast (k, 1, MPI_INT, 0, MPI_COMM_WORLD)
37Addition(add2.c)
MPI_Init (argc, argv) MPI_Comm_size
(MPI_COMM_WORLD, n) MPI_Comm_rank
(MPI_COMM_WORLD, rank) if(rank0)
printf("How many numbers ? \n") fscanf(stdin,
"d", total_numbers) MPI_Bcast(total_number
s, 1, MPI_INT, 0, MPI_COMM_WORLD) printf ("\n
Process 3d knows total numbers is d \n", rank,
total_numbers) divisiontotal_numbers/n
startrankdivision end(rank1)division
sum0 total_sum0 for(istart iltendi)
sumsumi printf ("\n Process 3d
calculated from d to d \n", rank, start, end)
MPI_Reduce(sum,total_sum,1,MPI_INT,MPI_SUM,0,M
PI_COMM_WORLD) if (rank0) / Process 0
will output data / printf ("\n Total sum is
3d\n", total_sum ) MPI_Finalize ()
38MPI_Datatype Options
- MPI_CHAR
- MPI_DOUBLE
- MPI_FLOAT
- MPI_INT
- MPI_LONG
- MPI_LONG_DOUBLE
- MPI_SHORT
- MPI_UNSIGNED_CHAR
- MPI_UNSIGNED
- MPI_UNSIGNED_LONG
- MPI_UNSIGNED_SHORT
39MPI_Op Options
- MPI_BAND Bitwise AND
- MPI_BOR Bitwise OR
- MPI_BXOR Bitwise XOR
- MPI_LAND Logical AND
- MPI_LOR Logical OR
- MPI_MAX Maximum
- MPI_MAXLOC Maximum and Location
- MPI_MIN Minimum
- MPI_MINLOC Minimum and Location
- MPI_PROD Product
- MPI_SUM Sum
40Benchmarking the Program
- MPI_Barrier ? barrier synchronization
- MPI_Wtick ? timer resolution
- MPI_Wtime ? current time
41Addition(add3.c)
long sum,total_sum double startwtime,
endwtime, exetime MPI_Init (argc, argv)
MPI_Barrier(MPI_COMM_WORLD) MPI_Comm_size
(MPI_COMM_WORLD, n) MPI_Comm_rank
(MPI_COMM_WORLD, rank) if(rank0)
printf("How many numbers ? \n") fscanf(stdin,
"d", total_numbers) startwtimeMPI_Wtime()
MPI_Bcast(total_numbers, 1, MPI_INT, 0,
MPI_COMM_WORLD) divisiontotal_numbers/n
startrankdivision end(rank1)division
sum0 total_sum0 for(istart iltendi)
sumsumi MPI_Reduce(sum,total_sum,
1,MPI_LONG,MPI_SUM,0,MPI_COMM_WORLD) if
(rank0) / Process 0 will output data
/ endwtimeMPI_Wtime() printf ("\n
Total sum is 3d\n", total_sum ) printf("Time
taken f\n", endwtime-startwtime)
MPI_Finalize ()
42Benchmarking Results
43Circuit Satisfiability
1
1
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
44Solution Method
- Circuit satisfiability is NP-complete
- No known algorithms to solve in polynomial time
- We seek all solutions
- We find through exhaustive search
- 16 inputs ? 65,536 combinations to test
45Partitioning Functional Decomposition
- Embarrassingly parallel No channels between
tasks
46Agglomeration and Mapping
- Properties of parallel algorithm
- Fixed number of tasks
- No communications between tasks
- Time needed per task is variable
- Consult mapping strategy decision tree
- Map tasks to processors in a cyclic fashion
47Cyclic (interleaved) Allocation
- Assume p processes
- Each process gets every pth piece of work
- Example 5 processes and 12 pieces of work
- P0 0, 5, 10
- P1 1, 6, 11
- P2 2, 7
- P3 3, 8
- P4 4, 9
48Cyclic Allocation
- Assume n pieces of work, p processes, and cyclic
allocation - What is the most pieces of work any process has?
- What is the least pieces of work any process has?
- How many processes have the most pieces of work?
49Summary of Program Design
- Program will consider all 65,536 combinations of
16 boolean inputs - Combinations allocated in cyclic fashion to
processes - Each process examines each of its combinations
- If it finds a satisfiable combination, it will
print it
50include ltmpi.hgtinclude ltstdio.hgtint main
(int argc, char argv) int i int id
int p void check_circuit (int, int)
MPI_Init (argc, argv) MPI_Comm_rank
(MPI_COMM_WORLD, id) MPI_Comm_size
(MPI_COMM_WORLD, p) for (i id i lt 65536
i p) check_circuit (id, i) printf
("Process d is done\n", id) fflush
(stdout) MPI_Finalize() return 0
51/ Return 1 if 'i'th bit of 'n' is 1 0 otherwise
/ define EXTRACT_BIT(n,i) ((n(1ltlti))?10) void
check_circuit (int id, int z) int v16
/ Each element is a bit of z / int i
for (i 0 i lt 16 i) vi
EXTRACT_BIT(z,i) if ((v0 v1)
(!v1 !v3) (v2 v3)
(!v3 !v4) (v4 !v5)
(v5 !v6) (v5 v6) (v6
!v15) (v7 !v8) (!v7
!v13) (v8 v9) (v8
!v9) (!v9 !v10) (v9
v11) (v10 v11) (v12
v13) (v13 !v14) (v14
v15)) printf ("d) dddddddddd
dddddd\n", id, v0,v1,v2,v3,v
4,v5,v6,v7,v8,v9,
v10,v11,v12,v13,v14,v15)
fflush (stdout)
52Our Call to MPI_Reduce()
MPI_Reduce (count, global_count,
1, MPI_INT,
MPI_SUM, 0,
MPI_COMM_WORLD)
if (!id) printf ("There are d different
solutions\n", global_count)
53Benchmarking Code
double elapsed_time MPI_Init (argc,
argv)MPI_Barrier (MPI_COMM_WORLD)elapsed_time
- MPI_Wtime() MPI_Reduce ()elapsed_time
MPI_Wtime()
54Summary
- Message-passing programming follows naturally
from task/channel model - Portability of message-passing programs
- MPI most widely adopted standard
55Summary
- MPI functions introduced
- MPI_Init
- MPI_Comm_rank
- MPI_Comm_size
- MPI_Reduce
- MPI_Finalize
- MPI_Barrier
- MPI_Wtime
- MPI_Wtick