Message Passing Programming - PowerPoint PPT Presentation

1 / 55

About This Presentation

Title:

Message Passing Programming

Description:

Copy Makefile and machines.sample in your working directory. Create .rhosts file in you home directory ... Combinations allocated in cyclic fashion to processes ... – PowerPoint PPT presentation

Number of Views:74

Avg rating:3.0/5.0

Slides: 56

Provided by: saikatmuk

Category:

more less

Transcript and Presenter's Notes

Title: Message Passing Programming

1
Message Passing Programming
2
Learning Objectives

Understanding how MPI programs execute
Familiarity with fundamental MPI functions

3
Review of Flynns Taxonomy

SISD
SIMD
MISD
MIMD

SPMD (Single program different data)

MIMD can be converted to SPMD MPI is primarily
for MIMD/SPMD codes
4
Message-passing Model
5
SPMD Model

Single Program Multiple Data
Each processor has a copy of the same program
All run them at their own rate
May take different paths through the code
Process specific control through
My process number
Total number of processors
Explicit Communication and Synchornization

6
Task/Channel vs. Message-passing

Communication
Point to Point
Broadcast

7
Advantages of Message-passing Model

Portability to many architectures
Natural fit for multicomputers
Distinguishes between local memory (fast access)
and remote memory(slow access)
Ability to manage memory hierarchy
Each process controls its own memory
No cache coherence problems
Easier to create a deterministic program
Simplifies debugging

8
The Message Passing Interface

1980s vendors had unique libraries
1989 Parallel Virtual Machine (PVM) developed at
Oak Ridge National Lab
1992 Work on MPI standard begun
1994 Version 1.0 of MPI standard
1997 Version 2.0 of MPI standard
Today MPI is dominant message passing library
standard
Public Domain versions at http//www-unix.mcs.anl
.gov/mpi/

9
MPI Features

A message passing library specification
Message passing model
Not a language or complier specification
For parallel computers, clusters and heterogenous
networks
Designed for ease of parallel software
development
Designed to provide access to advance hardware
Not designed for fault tolerance
No process management
No virtual memory management

10
Flexibility of MPI

Large (has 125 Functions)
But most programs can be written using just 6
functions
Need not master all parts of MPI to use it

11
Getting Started
12
Getting Started

Setup path to MPICH
set MPI_ROOT /home/software/mpich
set path (path MPI_ROOT/bin)
Copy Makefile and machines.sample in your working
directory
Create .rhosts file in you home directory
Add names of machines from machines.sample
machine_name userid

13
Hello World Version 1(hello0.c)
include ltstdio.hgt include "mpi.h int main
(int argc, char argv) MPI_Init (argc,
argv) printf ("\n Hello World\n")
MPI_Finalize ()
Compile make hello0
Run mpirun -np x -machinefile machines.sample
hello0
14
Include Files
include ltmpi.hgt

MPI header file

include ltstdio.hgt

Standard I/O header file

15
Initialize MPI
MPI_Init (argc, argv)

First MPI function called by each process
Not necessarily first executable statement
Allows system to do any necessary setup

16
Shutting Down MPI
MPI_Finalize()

Call after all other MPI library calls
Allows system to free up MPI resources

17
Communicators

Communicator opaque object that provides
message-passing environment for processes
MPI_COMM_WORLD
Default communicator
Includes all processes

18
Communicator
MPI_COMM_WORLD
0
5
2
1
4
3
19
Determine Number of Processes
MPI_Comm_size (MPI_COMM_WORLD, p)

First argument is communicator
Number of processes returned through second
argument

20
Determine Process Rank
MPI_Comm_rank (MPI_COMM_WORLD, id)

First argument is communicator
Process rank (in range 0, 1, , p-1) returned
through second argument

21
Replication of Automatic Variables
22
Hello World Version 2 (hello1.c)
include ltstdio.hgt include "mpi.h" int main (int
argc, char argv) int rank, n, i,
message char buff1000 MPI_Init (argc,
argv) MPI_Comm_size (MPI_COMM_WORLD, n)
MPI_Comm_rank (MPI_COMM_WORLD, rank)
printf ("\n Hello from process 3d \n", rank)
MPI_Finalize ()
23
Point-to-point Communication

Involves a pair of processes
One process sends a message
Other process receives the message

24
Send/Receive (Blocking)
25
Function MPI_Send
int MPI_Send ( void message,
int count, MPI_Datatype
datatype, int dest, int
tag, MPI_Comm comm )
26
Function MPI_Recv
int MPI_Recv ( void message,
int count, MPI_Datatype
datatype, int source, int
tag, MPI_Comm comm,
MPI_Status status ) MPI_Recv blocks until the
message has been received, or error occurs
27
Inside MPI_Send and MPI_Recv
Sending Process
Receiving Process
Program Memory
System Buffer
System Buffer
Program Memory
28
Return from MPI_Send

Function blocks until message buffer free
Message buffer is free when
Message copied to system buffer, or
Message transmitted
Typical scenario
Message copied to system buffer
Transmission overlaps computation

29
Return from MPI_Recv

Function blocks until message in buffer
If message never arrives, function never returns

30
Deadlock

Deadlock process waiting for a condition that
will never become true
Easy to write send/receive code that deadlocks
Two processes both receive before send
Send tag doesnt match receive tag
Process sends message to wrong destination process

31
Hello World Version 3 (hello2.c)
include ltstdio.hgt include "mpi.h" int main (int
argc, char argv) int rank, n, i,
message char buff1000 MPI_Status
status MPI_Init (argc, argv)
MPI_Comm_size (MPI_COMM_WORLD, n)
MPI_Comm_rank (MPI_COMM_WORLD, rank) if
(rank0) / Process 0 will output data /
printf ("\n Hello from process 3d", rank)
for (i1iltni) MPI_Recv
(message, 1, MPI_INT, i, 111,
MPI_COMM_WORLD, status) printf ("\n
Hello Sent from process 3d\n", message)
else MPI_Send (rank, 1, MPI_INT, 0,
111, MPI_COMM_WORLD) MPI_Finalize ()
32
MPI Functions

MPI_Init
MPI_Comm_Size
MPI_Comm_Rank
MPI_Send
MPI_Recv
MPI_Finalize

33
Global Communications

MPI_Reduce
MPI_Bcast

34
Prototype of MPI_Reduce()
int MPI_Reduce ( void operand,
/ addr of 1st reduction element /
void result, / addr of
1st reduction result / int count,
/ reductions to perform /
MPI_Datatype type, / type of
elements / MPI_Op operator,
/ reduction operator / int
root, / process getting
result(s) / MPI_Comm comm
/ communicator / )
35
Addition (add1.c)
. MPI_Init (argc, argv) MPI_Comm_size
(MPI_COMM_WORLD, n) MPI_Comm_rank
(MPI_COMM_WORLD, rank) divisiontotal_numbers
/n startrankdivision end(rank1)division
sum0 total_sum0 for(istart
iltendi) sumsumi printf ("\n
Process 3d calculated from d to d \n", rank,
start,,end) MPI_Reduce(sum,total_sum,
1,MPI_INT,MPI_SUM,0,MPI_COMM_WORLD) if
(rank0) / Process 0 will output data /
printf ("\n Total sum is 3d\n", total_sum
) MPI_Finalize ()
36
Function MPI_Bcast
int MPI_Bcast ( void buffer, / Addr of 1st
element / int count, / elements to
broadcast / MPI_Datatype datatype, / Type of
elements / int root, / ID of root
process / MPI_Comm comm) / Communicator /
MPI_Bcast (k, 1, MPI_INT, 0, MPI_COMM_WORLD)
37
Addition(add2.c)
MPI_Init (argc, argv) MPI_Comm_size
(MPI_COMM_WORLD, n) MPI_Comm_rank
(MPI_COMM_WORLD, rank) if(rank0)
printf("How many numbers ? \n") fscanf(stdin,
"d", total_numbers) MPI_Bcast(total_number
s, 1, MPI_INT, 0, MPI_COMM_WORLD) printf ("\n
Process 3d knows total numbers is d \n", rank,
total_numbers) divisiontotal_numbers/n
startrankdivision end(rank1)division
sum0 total_sum0 for(istart iltendi)
sumsumi printf ("\n Process 3d
calculated from d to d \n", rank, start, end)
MPI_Reduce(sum,total_sum,1,MPI_INT,MPI_SUM,0,M
PI_COMM_WORLD) if (rank0) / Process 0
will output data / printf ("\n Total sum is
3d\n", total_sum ) MPI_Finalize ()
38
MPI_Datatype Options

MPI_CHAR
MPI_DOUBLE
MPI_FLOAT
MPI_INT
MPI_LONG
MPI_LONG_DOUBLE
MPI_SHORT
MPI_UNSIGNED_CHAR
MPI_UNSIGNED
MPI_UNSIGNED_LONG
MPI_UNSIGNED_SHORT

39
MPI_Op Options

MPI_BAND Bitwise AND
MPI_BOR Bitwise OR
MPI_BXOR Bitwise XOR
MPI_LAND Logical AND
MPI_LOR Logical OR
MPI_MAX Maximum
MPI_MAXLOC Maximum and Location
MPI_MIN Minimum
MPI_MINLOC Minimum and Location
MPI_PROD Product
MPI_SUM Sum

40
Benchmarking the Program

MPI_Barrier ? barrier synchronization
MPI_Wtick ? timer resolution
MPI_Wtime ? current time

41
Addition(add3.c)
long sum,total_sum double startwtime,
endwtime, exetime MPI_Init (argc, argv)
MPI_Barrier(MPI_COMM_WORLD) MPI_Comm_size
(MPI_COMM_WORLD, n) MPI_Comm_rank
(MPI_COMM_WORLD, rank) if(rank0)
printf("How many numbers ? \n") fscanf(stdin,
"d", total_numbers) startwtimeMPI_Wtime()
MPI_Bcast(total_numbers, 1, MPI_INT, 0,
MPI_COMM_WORLD) divisiontotal_numbers/n
startrankdivision end(rank1)division
sum0 total_sum0 for(istart iltendi)
sumsumi MPI_Reduce(sum,total_sum,
1,MPI_LONG,MPI_SUM,0,MPI_COMM_WORLD) if
(rank0) / Process 0 will output data
/ endwtimeMPI_Wtime() printf ("\n
Total sum is 3d\n", total_sum ) printf("Time
taken f\n", endwtime-startwtime)
MPI_Finalize ()
42
Benchmarking Results
43
Circuit Satisfiability
1
1
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
44
Solution Method

Circuit satisfiability is NP-complete
No known algorithms to solve in polynomial time
We seek all solutions
We find through exhaustive search
16 inputs ? 65,536 combinations to test

45
Partitioning Functional Decomposition

Embarrassingly parallel No channels between
tasks

46
Agglomeration and Mapping

Properties of parallel algorithm
Fixed number of tasks
No communications between tasks
Time needed per task is variable
Consult mapping strategy decision tree
Map tasks to processors in a cyclic fashion

47
Cyclic (interleaved) Allocation

Assume p processes
Each process gets every pth piece of work
Example 5 processes and 12 pieces of work
P0 0, 5, 10
P1 1, 6, 11
P2 2, 7
P3 3, 8
P4 4, 9

48
Cyclic Allocation

Assume n pieces of work, p processes, and cyclic
allocation
What is the most pieces of work any process has?
What is the least pieces of work any process has?
How many processes have the most pieces of work?

49
Summary of Program Design

Program will consider all 65,536 combinations of
16 boolean inputs
Combinations allocated in cyclic fashion to
processes
Each process examines each of its combinations
If it finds a satisfiable combination, it will
print it

50
include ltmpi.hgtinclude ltstdio.hgtint main
(int argc, char argv) int i int id
int p void check_circuit (int, int)
MPI_Init (argc, argv) MPI_Comm_rank
(MPI_COMM_WORLD, id) MPI_Comm_size
(MPI_COMM_WORLD, p) for (i id i lt 65536
i p) check_circuit (id, i) printf
("Process d is done\n", id) fflush
(stdout) MPI_Finalize() return 0
51
/ Return 1 if 'i'th bit of 'n' is 1 0 otherwise
/ define EXTRACT_BIT(n,i) ((n(1ltlti))?10) void
check_circuit (int id, int z) int v16
/ Each element is a bit of z / int i
for (i 0 i lt 16 i) vi
EXTRACT_BIT(z,i) if ((v0 v1)
(!v1 !v3) (v2 v3)
(!v3 !v4) (v4 !v5)
(v5 !v6) (v5 v6) (v6
!v15) (v7 !v8) (!v7
!v13) (v8 v9) (v8
!v9) (!v9 !v10) (v9
v11) (v10 v11) (v12
v13) (v13 !v14) (v14
v15)) printf ("d) dddddddddd
dddddd\n", id, v0,v1,v2,v3,v
4,v5,v6,v7,v8,v9,
v10,v11,v12,v13,v14,v15)
fflush (stdout)
52
Our Call to MPI_Reduce()
MPI_Reduce (count, global_count,
1, MPI_INT,
MPI_SUM, 0,
MPI_COMM_WORLD)
if (!id) printf ("There are d different
solutions\n", global_count)
53
Benchmarking Code
double elapsed_time MPI_Init (argc,
argv)MPI_Barrier (MPI_COMM_WORLD)elapsed_time
- MPI_Wtime() MPI_Reduce ()elapsed_time
MPI_Wtime()
54
Summary