Title: Message Passing Interface
1Message Passing Interface
- Outline
- Introduction to Message passing library (MPI)
- Basics of MPI implementation (blocking
communication) - Basic input and output data
- Basic nonbloking communication
2Introduction
- Basic concept of message passing
- Most commonly used method of programming in
distributed-memory MIMD systems - In message passing, the processes coordinate
their activities by explicitly sending and
receiving messages
3Message Passing Interface
4Introduction to MPI
- Message Passing Interface (MPI)
- Commonly used message passing library, which can
statically allocate processes (number of
processes is set at the beginning of the program
execution, and no additional processes are
created during execution). - Each processes is assigned a unique integer rank
in the rang 0, 1, p-1 (p is the total number of
processes defined) - Basically, one can write a single program and
execute on different processes (SPMD)
5Introduction to MPI
- Message Passing Interface (MPI)
- The selective execution is based on the
conditional branch within the source code. - Buffering in communication
- Blocking and non-blocking communication
6Introduction to MPI
- Parallel computing utility library of
subroutine/functions, not a independent language - MPI subroutines and functions can be called from
Fortran and C, respectively - Compiled with FORTRAN or C compilers
- MPI-1 doesnt support F90, but MPI-2 does support
F90 and C
7Introduction to MPI (cont.)
- Why people use MPI?
- speed up computation
- big demand of CPU time and more memory
- more portable and scalable rather than using
automatic "parallelizer" , which might not work - good for distributed memory computers, such as
distributed clusters, network based computers or
workstations
8Introduction to MPI (cont.)
- Why people are afraid of MPI?
- more complicated than serial computing
- more complicated to master the technique
- synchronization lost
- amount of time required to convert serial code to
parallelized code
9Introduction to MPI (cont.)
- Alternative ways?
- data parallel model using high level language
such as HPF - advanced library (or interface), such as (The
Portable, Extensible Toolkit for Scientific
Computation (PETSC) - Java multithread computing on internet based
distributed computation
10Basics of MPI
- MPI header (library) file should be included in
users FORTRAN or C codes. The library files
contains definitions of constants, prototypes.
include "mpif.h" for FORTRAN code include
"mpi.h" for C code
11Basics of MPI
- MPI is initiated by calling MPI_Init() first
before invoking any other MPI subroutines or
functions. - MPI processing ends with a call MPI_Finalize().
12Basics of MPI
- Only difference between MPI subroutines (for
FORTRAN) and MPI functions (for C) is the error
reporting flag. - In FORTRAN, it is returned as the last member of
the subroutine's argument list. In C, the integer
error flag is returned through the function
return value.
13Basics of MPI
- Consequently, MPI FORTRAN subroutines always
contain one additional variable in the argument
list than the C counterpart.
14Basics of MPI (cont.)
- C's MPI function names start with MPI_ followed
by a character string with the leading character
in upper case letter while the rest in lower case
letters - FORTRAN subroutines bear the same names but are
case-insensitive. - On SGI's Origin20000 (NCSA), parallel I/O is
supported.
15Compilation and Execution (f77)
- To compile and execute a f77 (or f90) code
without MPI
f77 -o example example.f f90 o example
example.f /bin/time example Or time example
16- To compile and execute a f77 (or f90) code with
MPI
f77 -o example1_1 example1_1.f lmpi g77 -o
example1_1 example1_1.f -lmpi f90 -o example1_1
example1_1.f lmpi mpif77 -o example1_1
example1_1.f (our cluster) mpif90 -o example1_1
example1_1.f (our cluster) bin/time mpirun -np
4 example1_1 time mpirun -np 4 example1_1
17- To compile and execute a C code without MPI
gcc -o exampleC exampleC.c -lm Or cc -o
exampleC exampleC.c -lm exampleC
18- To compile and execute a C code with MPI
cc o exampleC1_1 exampleC1_1.c lm lmpi gcc o
exampleC1_1 exampleC1_1.c lm lmpi mpicc
exampleC1_1.c (our cluster) Execution bin/tim
e mpirun -np 10 exampleC1_1 time mpirun -np 10
exampleC1_1
19Basic communication among processes
- Example 0 basic communication between processes
- p, multiple processes starting from 0 to p-1
- process 0 receive message from other processes
message
process 1
process 0
process 2
process 3
20Learning MPI by Examples
- Example 0 mechanism
- system copies the executable code to each
processes - each process begins execution of the copied
executable code, simultaneously - different processes can execute different
statements by branching within the program based
on their ranks (this form of MIMD programming is
called single-program multiple-data (SPMD)
programming)
21/
greetings.c -- greetings program Send a
message from all processes with rank ! 0 to
process 0. Process 0 prints the messages
received. Input none. Output contents
of messages received by process 0.
/ include ltstdio.hgt include ltstring.hgt include
"mpi.h"
include MPI library
22Passing command-line parameters to main function
main(int argc, char argv) int
my_rank / rank of process
/ int p /
number of processes / int source
/ rank of sender /
int dest / rank of
receiver / int tag 0
/ tag for messages /
char message100 / storage for message
/ MPI_Status status / return
status for receive / / Start up MPI /
MPI_Init(argc, argv)
23Obtain the rank number
/ Find out process rank /
MPI_Comm_rank(MPI_COMM_WORLD, my_rank)
printf("my_rank is d\n",my_rank) / Find
out number of processes / MPI_Comm_size(MPI_C
OMM_WORLD, p) printf("p, the toal number
of processes d\n",p) if (my_rank ! 0)
/ other processes, but not process 0 /
/ Create message /
sprintf(message, "Greetings from process d!",
my_rank) dest 0 /
destination to where the message send
24 / Use strlen1 so that '\0' gets transmitted
/ MPI_Send(message, strlen(message)1,
MPI_CHAR, dest, tag,
MPI_COMM_WORLD) else
/ my_rank 0 , process 0/
for (source 1 source lt p source)
MPI_Recv(message, 100, MPI_CHAR, source, tag,
MPI_COMM_WORLD, status)
printf("s\n", message)
25Learning MPI by Examples
/ Shut down MPI / MPI_Finalize()
/ main /
Commands
mpicc greetings.c mpirun -np 8 a.out
26Result mpicc greetings.c mpirun -np 8
a.out my_rank is 3 p, the toal number of
processes 8 my_rank is 4 p, the toal number of
processes 8 my_rank is 0 p, the toal number of
processes 8 my_rank is 1 p, the toal number of
processes 8 Greetings from process 1! my_rank is
2
27p, the toal number of processes 8 my_rank is
7 p, the toal number of processes 8 Greetings
from process 2! Greetings from process 3! my_rank
is 5 p, the toal number of processes 8 Greetings
from process 4! Greetings from process 5! my_rank
is 6 p, the toal number of processes 8 Greetings
from process 6! Greetings from process 7!
28c greetings.f -- greetings program c c Send a
message from all processes with rank ! 0 to
process 0. c Process 0 prints the messages
received. c c Input none. c Output contents
of messages received by process 0. c c Note
Due to the differences in character data in
Fortran and char c in C, their may be
problems in MPI_Send/MPI_Recv c
29 program greetings c include 'mpif.h' c integer
my_rank integer p integer source integer
dest integer tag character100
message character10 digit_string integer
size integer status(MPI_STATUS_SIZE) integer
ierr c
30c function integer string_len c call
MPI_Init(ierr) c call MPI_Comm_rank(MPI_COMM_
WORLD, my_rank, ierr) call
MPI_Comm_size(MPI_COMM_WORLD, p, ierr) c if
(my_rank.ne.0) then call
to_string(my_rank, digit_string, size)
message 'Greetings from process ! ' //
digit_string(1size) // dest 0
tag 0 call MPI_Send(message,
string_len(message), MPI_CHARACTER,
dest, tag, MPI_COMM_WORLD, ierr) else
31 do 200 source 1, p-1 tag 0 call
MPI_Recv(message, 100, MPI_CHARACTER, source,
tag, MPI_COMM_WORLD, status, ierr)
call MPI_Get_count(status, MPI_CHARACTER, size,
ierr) write(6,100) message(1size) 100
format(' ',a) 200 continue endif c
call MPI_Finalize(ierr) stop
end c c
32cccccccccccccccccccccccccccccccccccccccccccccccccc
cccccccc c c Converts the integer stored in
number into an ascii c string. The string is
returned in string. The number of c digits is
returned in size. subroutine to_string(number,
string, size) integer number character ()
string integer size character100
temp integer local integer last_digit integer
i local number i 0
33c strip digits off starting with least
significant c do-while loop 100 last_digit
mod(local,10) local local/10 i i
1 temp(ii) char(last_digit
ichar('0')) if (local.ne.0) go to 100 size
i c reverse digits do 200 i 1, size
string(size-i1size-i1) temp(ii)
200 continue c return end
34c to_string c c cccccccccccccccccccccccccccccccc
ccccccccccccccccccccccccc c Finds the number of
characters stored in a string c integer function
string_len(string) character()
string c character1 space parameter (space '
') integer i c i len(string)
35c while loop 100 if ((string(ii).eq.space).
and.(i.gt.1)) then i i - 1 go
to 100 endif c if ((i.eq.1).and.(string(ii).eq.
space)) then string_len 0 else
string_len i endif c return end c end of
string_len
36mpif77 greetings.f mpirun np 8 a.out
37- Not necessary to call MPI_Init function at the
beginning of your code. - Not necessary to call MPI_finalize function athe
the end of your code. - MPI section should be inserted only into wherever
you need the code to be in parallel.
38Numerical Integration
- Example 1 numerical integration using mid-point
method - mathematical problem
- numerical method
- serial programming and parallel programming
39- Problem
- Testing integration of cos(x) from 0 to p/2
40(No Transcript)
41Example of C serial program
/ serial.c -- serial version of trapezoidal
rule Calculate definite integral using
trapezoidal rule. The function f(x) is
hardwired. Input a, b, n. Output estimate
of integral from a to b of f(x) using n
trapezoids. See Chapter 4, pp. 53 ff. in
PPMPI. / include ltstdio.hgt
42main() float integral / Store result
in integral / float a, b / Left
and right endpoints / int n
/ Number of trapezoids / float h
/ Trapezoid base width / float
x int i float f(float x) /
Function we're integrating / printf("Enter
a, b, and n\n") scanf("f f d", a, b,
n)
43 h (b-a)/n integral (f(a) f(b))/2.0
x a for (i 1 i lt n-1 i)
x x h integral integral
f(x) integral integralh
printf("With n d trapezoids, our estimate\n",
n) printf("of the integral from f to f
f\n", a, b, integral)
44 float f(float x) float return_val
/ Calculate f(x). Store calculation in
return_val. / return_val xx return
return_val
45Example of serial code in Fortran
C serial.f -- calculate definite integral using
trapezoidal rule. C C The function f(x) is
hardwired. C Input a, b, n. C Output estimate
of integral from a to b of f(x) C using n
trapezoids. C C See Chapter 4, pp. 53 ff. in
PPMPI. C PROGRAM serial INCLUDE
'mpif.h' real integral real a
real b
46 integer n real h real
x integer i C real f C
print , 'Enter a, b, and n' read , a,
b, n C h (b-a)/n integral (f(a)
f(b))/2.0 x a do 100 i 1 , n-1
x x h integral integral
f(x) 100 continue
47 integral integralh C print ,'With n
', n,' trapezoids, our estimate' print
,'of the integral from ', a, ' to ',b, ' ' ,
integral end C C
real function
f(x) real x C Calculate f(x). Store
calculation in return_val. C f xx
return end
48- To compile and execute serial.f
- Result
g77 -o serial serial.f example
The result 1.000000 real 0.021 user
0.002 sys 0.013
49- Parallel programming with MPI blocking
Send/Receive - implement-dependent because using assignment of
inputs - Using the following MPI functions
- MPI_Init and MPI_Finalize
- MPI_Comm_rank
- MPI_Comm_size
- MPI_Recv
- MPI_Send
50- Parallel programming with MPI blocking
Send/Receive - master process receives each partial result,
based on subinterval integration from other
process - master sum all of the sub-result together
- other processes are idle during master's
performance (due to blocking communication)
51Example of parallel programming in C (trap.c)
/ trap.c -- Parallel Trapezoidal Rule, first
version Input None. Output Estimate
of the integral from a to b of f(x) using
the trapezoidal rule and n trapezoids.
Algorithm 1. Each process calculates
"its" interval of integration. 2.
Each process estimates the integral of f(x)
over its interval using the trapezoidal
rule. 3a. Each process ! 0 sends its
integral to process 0. 3b. Process 0 sums
the calculations received from the
individual processes and prints the result.
52 Notes 1. f(x), a, b, and n are all
hardwired. 2. The number of processes (p)
should evenly divide the number of
trapezoids (n 1024) See Chap. 4, pp. 56
ff. in PPMPI. / include ltstdio.hgt / We'll be
using MPI routines, definitions, etc. / include
"mpi.h"
53main(int argc, char argv) int
my_rank / My process rank /
int p / The number of processes
/ float a 0.0 / Left endpoint
/ float b 1.0 / Right
endpoint / int n 1024
/ Number of trapezoids / float
h / Trapezoid base length / /
local_a and local_b are the bounds for each
integration performed in individual process /
float local_a / Left endpoint my
process / float local_b / Right
endpoint my process / int local_n
/ Number of trapezoids for /
/ my calculation /
float integral / Integral over my
interval /
54 float total / Total integral
/ int source / Process
sending integral / int dest 0
/ All messages go to 0 / int
tag 0 MPI_Status status / Trap
function prototype. Trap function is used to
calculate local integral / float
Trap(float local_a, float local_b, int local_n,
float h) / Let the system do what it needs
to start up MPI / MPI_Init(argc, argv)
/ Get my process rank /
MPI_Comm_rank(MPI_COMM_WORLD, my_rank)
55 / Find out how many processes are being used
/ MPI_Comm_size(MPI_COMM_WORLD, p) h
(b-a)/n / h is the same for all processes
/ local_n n/p / So is the number of
trapezoids / / Length of each process'
interval of integration local_nh. So
my interval starts at / local_a a
my_ranklocal_nh local_b local_a
local_nh integral Trap(local_a, local_b,
local_n, h) if (my_rank 0) /
Add up the integrals calculated by each process
/ total integral / this is the
intergal calculated by process 0 /
56 for (source 1 source lt p source)
MPI_Recv(integral, 1, MPI_FLOAT,
source, tag, MPI_COMM_WORLD,
status) total total integral
else printf("The
intergal calculated from process d is f\n",
my_rank,integral )
MPI_Send(integral, 1, MPI_FLOAT, dest, tag,
MPI_COMM_WORLD)
57 / Print the result / if (my_rank 0)
printf("With n d trapezoids, our
estimate\n", n) printf("of the integral
from f to f f\n",a,b,total) /
Shut down MPI / MPI_Finalize()
58 float Trap( float local_a / in
/, float local_b / in /,
int local_n / in /, float h
/ in /) float integral / Store
result in integral / float x int i
float f(float x) / function we're integrating
/ integral (f(local_a)
f(local_b))/2.0 x local_a
59 for (i 1 i lt local_n-1 i)
x x h integral integral
f(x) integral integralh return
integral / Trap / float f(float x)
float return_val / Calculate f(x). /
/ Store calculation in return_val. /
return_val xx return return_val / f /
60- To compile a C code with MPI library
- In our cluster system
cc -o trap trap.c -lmpi -lm
mpicc trap.c mpirun -np 8 a.out
61With n 1024 trapezoids, our estimate of the
integral from 0.000000 to 1.000000 0.333333 The
intergal calculated from process 3 is
0.024089 The intergal calculated from process 4
is 0.039714 The intergal calculated from process
7 is 0.110026 The intergal calculated from
process 5 is 0.059245 The intergal calculated
from process 1 is 0.004557 The intergal
calculated from process 2 is 0.012370 The
intergal calculated from process 6 is 0.082682
62- Example of parallel programming in Fortran
(trap.f)
c trap.f -- Parallel Trapezoidal Rule, first
version c c Input None. c Output Estimate
of the integral from a to b of f(x) c using
the trapezoidal rule and n trapezoids. c c
Algorithm c 1. Each process calculates
"its" interval of c integration. c
2. Each process estimates the integral of f(x) c
over its interval using the trapezoidal
rule. c 3a. Each process ! 0 sends its
integral to 0. c 3b. Process 0 sums the
calculations received from
63c the individual processes and prints the
result. c c Notes c 1. f(x), a, b, and
n are all hardwired. c 2. Assumes number of
processes (p) evenly divides c number of
trapezoids (n 1024) c c See Chap. 4, pp. 56
ff. in PPMPI. c program trapezoidal c
include 'mpif.h' c integer my_rank
integer p real a
64 real b integer n real
h real local_a real
local_b integer local_n real
integral real total
integer source integer dest
integer tag integer
status(MPI_STATUS_SIZE) integer ierr c
real Trap c
65 data a, b, n, dest, tag /0.0, 1.0, 1024, 0,
0/ call MPI_INIT(ierr) call
MPI_COMM_RANK(MPI_COMM_WORLD, my_rank, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, p, ierr)
h (b-a)/n local_n n/p
local_a a my_ranklocal_nh local_b
local_a local_nh integral
Trap(local_a, local_b, local_n, h) if
(my_rank .EQ. 0) then total integral
66 do 100 source 1, p-1 call
MPI_RECV(integral, 1, MPI_REAL, source, tag,
MPI_COMM_WORLD, status, ierr)
total total integral 100
continue else call
MPI_SEND(integral, 1, MPI_REAL, dest,
tag, MPI_COMM_WORLD, ierr) endif
if (my_rank .EQ. 0) then
write(6,200) n 200 format(' ','With n
',I4,' trapezoids, our estimate')
write(6,300) a, b, total 300 format('
','of the integral from ',f6.2,' to ',f6.2,
' ',f11.5) endif
67 call MPI_FINALIZE(ierr) end
c c real function Trap(local_a, local_b,
local_n, h) real local_a real
local_b integer local_n real
h c real integral real
x real i c real f
68 integral (f(local_a) f(local_b))/2.0
x local_a do 100 i 1,
local_n-1 x x h
integral integral f(x) 100 continue
Trap integralh return
end c real function f(x) real x
real return_val return_val xx
f return_val return end
69- Example of parallel programming in Fortran
(trap.f)
With n 1024 trapezoids, our estimate of the
integral from 0.00 to 1.00 0.33333
To compile a f77 code with MPI library In our
cluster system
f77 -o trap trap.f -lmpi
mpif77 trap.f mpirun -np 8 trap
70- Basic mechanism of message passing through
buffering - Compose a message and put it in a buffer
- Drop a message in a box, called by MPI_Send
- Sending addresses should be addressed.
- Envelopes should be created, which contains
destination of message, information size of
message, as well as add source process to the
envelope. - Tags or message types are the standard on message
passing - Tag is used to identify the process action on the
data
71- Message envelope contains at least the following
information - The rank of the receiver
- The rank of the sender
- A tag, like project identification
- A communicator, collection of processes that can
send message to each other. The predefined
MPI_COMM_WORLD on all MPI system consists of all
the processes running when execution of the
program starts. - Message refers the actual data being transmitted
- Status information on the data that was actually
received
72- MPI datatype
- MPI_CHAR signed char
- MPI_SHORT signed short int
- MPI_INT signed int
- MPI_LONG signed long int
- MPI_UNSIGNED_CHAR unsigned char
- MPI_UNSIGNED_SHORT unsigned short int
- MPI_UNSIGNED unsigned int
- MPI_UNSIGNED_LONG unsigned long int
- MPI_FLOAT float
- MPI_DOUBLE double
- MPI_LONG_DOUBLE long double
- MPI_BYTE
- MPI_PACKED
73Int MPI_Send ( void message / in
/, int count / in /, MPI_Datatype datatype
/ in /, int dest / in /, int tag / in
/, MPI_Comm comm / in /) Int MPI_Recv
( void message / out /, int count /
in /, MPI_Datatype datatype / in
/, int source / in /, int tag / in
/, MPI_Comm comm / in /, MPI_Status status
/ out /)
74- Parallel programming with MPI non blocking
Send/Receive - do not make processes idle
- Using the following MPI functions
- MPI_Init and MPI_Finalize
- MPI_Comm_rank
- MPI_Comm_size
- MPI_Recv
- MPI_ISend
75- Basic input and output in MPI
- Global and local variables
- Some variables are significant on all the
processes - Some variables are significant on individual
processes - I/O on parallel system
- Many parallel system provide standards of I/O
(keyboard input and terminal output) on process 0 - Some systems allow all the processes to read and
write - How do we deal with
76- If we want to input values such as a, b and n
from keyboard, should we add - Scanf(f f d, a, a, n) ?????????
- Usually we assume process 0 can read and write
- Modified parallel code
77/ get_data.c -- Parallel Trapezoidal Rule, uses
basic Get_data function for input.
Input a, b limits of integration.
n number of trapezoids. Output Estimate of
the integral from a to b of f(x) using the
trapezoidal rule and n trapezoids. Notes
1. f(x) is hardwired. 2. Assumes
number of processes (p) evenly divides
number of trapezoids (n). See Chap. 4, pp.
60 ff in PPMPI. /
78include ltstdio.hgt / We'll be using MPI
routines, definitions, etc. / include
"mpi.h" main(int argc, char argv) int
my_rank / My process rank /
int p / The number of
processes / float a / Left
endpoint / float b
/ Right endpoint / int
n / Number of trapezoids /
float h / Trapezoid base length
/ float local_a / Left endpoint
my process / float local_b /
Right endpoint my process / int
local_n / Number of trapezoids for /
/ my calculation
/
79 float integral / Integral over my
interval / float total / Total
integral / int source
/ Process sending integral / int
dest 0 / All messages go to 0 /
int tag 0 MPI_Status status /
function prototypes / void Get_data(float
a_ptr, float b_ptr, int
n_ptr, int my_rank, int p) float Trap(float
local_a, float local_b, int local_n,
float h) / Calculate local integral /
/ Let the system do what it needs to start up
MPI / MPI_Init(argc, argv)
80 / Get my process rank /
MPI_Comm_rank(MPI_COMM_WORLD, my_rank) /
Find out how many processes are being used /
MPI_Comm_size(MPI_COMM_WORLD, p)
Get_data(a, b, n, my_rank, p) h
(b-a)/n / h is the same for all processes
/ local_n n/p / So is the number of
trapezoids / / Length of each process'
interval of integration local_nh. So
my interval starts at / local_a a
my_ranklocal_nh local_b local_a
local_nh integral Trap(local_a, local_b,
local_n, h)
81 / Add up the integrals calculated by each
process / if (my_rank 0)
total integral for (source 1 source
lt p source)
MPI_Recv(integral, 1, MPI_FLOAT, source, tag,
MPI_COMM_WORLD, status)
total total integral
else MPI_Send(integral, 1,
MPI_FLOAT, dest, tag,
MPI_COMM_WORLD)
82 / Print the result / if (my_rank 0)
printf("With n d trapezoids, our
estimate\n", n) printf("of the integral
from f to f f\n", a, b, total)
/ Shut down MPI /
MPI_Finalize() / main / /
/ /
Function Get_data Reads in the user input a,
b, and n. Input parameters
83 1. int my_rank rank of current
process. 2. int p number of processes.
Output parameters 1. float a_ptr
pointer to left endpoint a. 2. float
b_ptr pointer to right endpoint b. 3.
int n_ptr pointer to number of trapezoids.
Algorithm 1. Process 0 prompts user for
input and reads in the values.
2. Process 0 sends input values to other
processes. / void Get_data( float
a_ptr / out /, float b_ptr /
out /, int n_ptr / out /,
84 int my_rank / in /, int
p / in /) int source 0 /
All local variables used by / int dest
/ MPI_Send and MPI_Recv / int
tag MPI_Status status if (my_rank
0) printf("Enter a, b, and n\n")
scanf("f f d", a_ptr, b_ptr, n_ptr)
for (dest 1 dest lt p dest)
tag 0
85 MPI_Send(a_ptr, 1, MPI_FLOAT, dest,
tag, MPI_COMM_WORLD)
tag 1 MPI_Send(b_ptr, 1,
MPI_FLOAT, dest, tag,
MPI_COMM_WORLD) tag 2
MPI_Send(n_ptr, 1, MPI_INT, dest,
tag, MPI_COMM_WORLD)
else tag 0
MPI_Recv(a_ptr, 1, MPI_FLOAT, source,
tag, MPI_COMM_WORLD, status)
tag 1
86MPI_Recv(b_ptr, 1, MPI_FLOAT, source,
tag, MPI_COMM_WORLD, status)
tag 2 MPI_Recv(n_ptr, 1, MPI_INT,
source, tag,
MPI_COMM_WORLD, status) / Get_data /
/
/ float Trap( float local_a / in /,
float local_b / in /,
int local_n / in /, float h
/ in /)
87 float integral / Store result in
integral / float x int i
float f(float x) / function we're integrating
/ integral (f(local_a) f(local_b))/2.0
x local_a for (i 1 i lt
local_n-1 i) x x h
integral integral f(x) integral
integralh return integral / Trap
/
88/
/ float f(float x) float return_val
/ Calculate f(x). / / Store
calculation in return_val. / return_val
xx return return_val / f /
89Enter a, b, and n 0 1.0 1024 With n 1024
trapezoids, our estimate of the integral from
0.000000 to 1.000000 0.333333
90- Non blocking Send/Receive
/ get_dataNonBlocking.c -- Parallel Trapezoidal
Rule, uses basic Get_data function for
input. It uses non blocking MPI functions
Input a, b limits of integration.
n number of trapezoids. Output Estimate of
the integral from a to b of f(x) using the
trapezoidal rule and n trapezoids. Notes
91 1. f(x) is hardwired. 2. Assumes
number of processes (p) evenly divides
number of trapezoids (n). See Chap. 4, pp.
60 ff in PPMPI. / include ltstdio.hgt /
We'll be using MPI routines, definitions, etc.
/ include "mpi.h" main(int argc, char argv)
int my_rank / My process rank
/ int p / The
number of processes / float a
/ Left endpoint /
92 float b / Right endpoint
/ int n / Number of
trapezoids / float h /
Trapezoid base length / float
local_a / Left endpoint my process /
float local_b / Right endpoint my
process / int local_n / Number
of trapezoids for /
/ my calculation / float
integral / Integral over my interval /
float total / Total integral
/ int source / Process
sending integral / int dest 0
/ All messages go to 0 / int
tag 0 MPI_Status status MPI_Request
req
93 / function prototypes / void
Get_data(float a_ptr, float b_ptr,
int n_ptr, int my_rank, int p) float
Trap(float local_a, float local_b, int local_n,
float h) / Calculate local
integral / / Let the system do what it
needs to start up MPI / MPI_Init(argc,
argv) / Get my process rank /
MPI_Comm_rank(MPI_COMM_WORLD, my_rank) /
Find out how many processes are being used /
MPI_Comm_size(MPI_COMM_WORLD, p)
Get_data(a, b, n, my_rank, p)
94 h (b-a)/n / h is the same for all
processes / local_n n/p / So is the
number of trapezoids / / Length of each
process' interval of integration
local_nh. So my interval starts at /
local_a a my_ranklocal_nh local_b
local_a local_nh integral Trap(local_a,
local_b, local_n, h) / Add up the
integrals calculated by each process / if
(my_rank 0) total integral
95 for (source 1 source lt p source)
MPI_Recv(integral, 1, MPI_FLOAT,
source, tag, MPI_COMM_WORLD,
status) total total integral
else
MPI_Isend(integral,1,MPI_FLOAT,dest,
tag,MPI_COMM_WORLD,req)
MPI_Wait(req,status) / Print the
result / if (my_rank 0)
96 printf("With n d trapezoids, our
estimate\n", n) printf("of the integral
from f to f f\n", a, b, total)
/ Shut down MPI /
MPI_Finalize() / main / /
/ / Function
Get_data Reads in the user input a, b, and n.
Input parameters 1. int my_rank rank
of current process. 2. int p number of
processes.
97 Output parameters 1. float
a_ptr pointer to left endpoint a. 2.
float b_ptr pointer to right endpoint b.
3. int n_ptr pointer to number of
trapezoids. Algorithm 1. Process 0
prompts user for input and reads in
the values. 2. Process 0 sends input
values to other processes. / void
Get_data( float a_ptr / out /,
float b_ptr / out /, int
n_ptr / out /, int my_rank
/ in /, int p / in /)
98 int source 0 / All local
variables used by / int dest /
MPI_Send and MPI_Recv / int tag
MPI_Status status if (my_rank 0)
printf("Enter a, b, and n\n")
scanf("f f d", a_ptr, b_ptr, n_ptr)
for (dest 1 dest lt p dest)
tag 0 MPI_Send(a_ptr, 1,
MPI_FLOAT, dest, tag,
MPI_COMM_WORLD) tag 1
99 MPI_Send(b_ptr, 1, MPI_FLOAT, dest,
tag, MPI_COMM_WORLD)
tag 2 MPI_Send(n_ptr, 1, MPI_INT,
dest, tag, MPI_COMM_WORLD)
else tag 0
MPI_Recv(a_ptr, 1, MPI_FLOAT, source,
tag, MPI_COMM_WORLD, status)
tag 1 MPI_Recv(b_ptr, 1, MPI_FLOAT,
source, tag,
MPI_COMM_WORLD, status) tag 2
MPI_Recv(n_ptr, 1, MPI_INT, source,
tag, MPI_COMM_WORLD, status) /
Get_data /
100 /
/ float Trap( float local_a / in /,
float local_b / in /, int
local_n / in /, float h
/ in /) float integral / Store
result in integral / float x int i
float f(float x) / function we're
integrating / integral (f(local_a)
f(local_b))/2.0
101 x local_a for (i 1 i lt
local_n-1 i) x x h
integral integral f(x) integral
integralh return integral / Trap
/ //
float f(float x) float return_val
/ Calculate f(x). / / Store calculation in
return_val. / return_val xx return
return_val / f /
102- Non blocking Send/Receive (in Fortran)
Program Example1_2 c
c example1_1.f c
parallel programming in Fortran c to solve
numerical integration using mid-point method c
function selected is cos(x) c it demonstrate
non-block communication c c This is an MPI
example on parallel integration
103c It demonstrates the use of c c MPI_Init c
MPI_Comm_rank c MPI_Comm_size c MPI_Recv c
MPI_Isend c MPI_Wait c MPI_Finalize c c
implicit none integer n, p, i, j, k, ierr,
master real h, a, b, integral, pi
integer req(1)
104 include "mpif.h" !! This brings in
pre-defined MPI constants, ... integer Iam,
source, dest, tag, status(MPI_STATUS_SIZE)
real my_result, Total_result, result data
master/0/ cStarts MPI processes ...
call MPI_Init(ierr)
!! starts MPI call MPI_Comm_rank(MPI_COMM_WO
RLD, Iam, ierr)
!! get current proc id
call MPI_Comm_size(MPI_COMM_WORLD, p, ierr)
!! get number of procs pi acos(-1.0)
!! 3.14159... a 0.0 !!
lower limit of integration b pi/2.
!! upper limit of integration
105 n 500 !! number of increment
within each process dest master !!
define the process that computes the final
result tag 123 !! set the tag to
identify this particular job h (b-a)/n/p
!! length of increment my_result
integral(a,Iam,h,n) write(,)'Iam',Iam,',
my_result',my_result if(Iam .eq. master)
then ! the following is serial
result my_result do k1,p-1 !! more
efficient, less prone to deadlock !! root
receives my_result from proc call
MPI_Recv(my_result, 1, MPI_REAL,
MPI_ANY_SOURCE, tag, MPI_COMM_WORLD,
status, ierr) result result
my_result enddo
106 else call MPI_Isend(my_result, 1,
MPI_REAL, dest, tag, MPI_COMM_WORLD,
req, ierr) !!
send my_result to intended dest. call
MPI_Wait(req, status, ierr) !! wait for nonblock
send ... endif cresults from all procs
have been collected and summed ... if(Iam
.eq. 0) then write(,)'Final Result
',result endif call
MPI_Finalize(ierr) !! let
MPI finish up ... stop end
107real function integral(a,i,h,n) implicit
none integer n, i, j real h, h2, aij,
a real fct, x fct(x) cos(x)
!! kernel of the integral integral
0.0 !! initialize integral
h2 h/2. do j0,n-1 !!
sum over all "j" integrals aij a (in
j)h !! lower limit of "j" integral
integral integral fct(aijh2)h
enddo return end
108Result Process 6 has the partial result of
0.056906 Process 1 has the partial result of
0.187593 Process 0 has the partial result of
0.195090 Process 2 has the partial result of
0.172887 Process 3 has the partial result of
0.151536 Process 4 has the partial result of
0.124363 Process 5 has the partial result of
0.092410 Process 7 has the partial result of
0.019215 The result 0.9999998