Title: Parallel Programming with Java
1Parallel Programming with Java
- Aamir Shafi
- National University of Sciences and Technology
(NUST) - http//hpc.seecs.edu.pk/aamir
- http//mpj-express.org
2Two Important Concepts
- Two fundamental concepts of parallel programming
are - Domain decomposition
- Functional decomposition
3Domain Decomposition
Image taken from https//computing.llnl.gov/tutori
als/parallel_comp/
4Functional Decomposition
Image taken from https//computing.llnl.gov/tutori
als/parallel_comp/
5Message Passing Interface (MPI)
- MPI is a standard (an interface or an API)
- It defines a set of methods that are used by
application developers to write their
applications - MPI library implement these methods
- MPI itself is not a libraryit is a specification
document that is followed! - MPI-1.2 is the most popular specification version
- Reasons for popularity
- Software and hardware vendors were involved
- Significant contribution from academia
- MPICH served as an early reference implementation
- MPI compilers are simply wrappers to widely used
C and Fortran compilers - History
- The first draft specification was produced in
1993 - MPI-2.0, introduced in 1999, adds many new
features to MPI - Bindings available to C, C, and Fortran
- MPI is a success story
- It is the mostly adopted programming paradigm of
IBM Blue Gene systems - At least two production-quality MPI libraries
- MPICH2 (http//www-unix.mcs.anl.gov/mpi/mpich2/)
- OpenMPI (http//open-mpi.org)
6Message Passing Model
- Message passing model allows processors to
communicate by passing messages - Processors do not share memory
- Data transfer between processors required
cooperative operations to be performed by each
processor - One processor sends the message while other
receives the message
7Distributed Memory Cluster
barq.niit.edu.pk (cluster head node)
chenab1
chenab2
chenab3
chenab7
chenab6
chenab4
chenab5
8Steps involved in executing the Hello World!
program
- Lets logon to the cluster head node
- Write the Hello World program
- Compile the program
- Write the machines files
- Start MPJ Express daemons
- Execute the parallel program
- Stop MPJ Express daemons
9Step1 Logon to the head node
10Step 2 Write the Hello World Program
11Step 3 Compile the code
12Step 4 Write the machines file
13Step 5 Start MPJ Express daemons
14Step 6 Execute the parallel program
- aamir_at_barq/projects/mpj-usergt mpjrun.sh -np 6
-headnodeip 10.3.20.120 -dport 11050 HelloWorld - ..
- Hi from process lt3gt of total lt6gt
- Hi from process lt1gt of total lt6gt
- Hi from process lt2gt of total lt6gt
- Hi from process lt4gt of total lt6gt
- Hi from process lt5gt of total lt6gt
- Hi from process lt0gt of total lt6gt
15Step 7 Stop the MPJ Express daemons
16COMM WORLD Communicator
- import java.util.
- import mpi.
- ..
- // Initialize MPI
- MPI.Init(args) // start up MPI
- // Get total number of processes and rank
- size MPI.COMM_WORLD.Size()
- rank MPI.COMM_WORLD.Rank()
- ..
17What is size?
import java.util. import mpi. .. // Get
total number of processes size
MPI.COMM_WORLD.Size() ..
- Total number of processes in a communicator
- The size of MPI.COMM_WORLD is 6
18What is rank?
import java.util. import mpi. .. // Get
total number of processes rank
MPI.COMM_WORLD.Rank() ..
- The unique identify (id) of a process in a
communicator - Each of the six processes in MPI.COMM_WORLD has a
distinct rank or id
19Single Program Multiple Data (SPMD) Model
- import java.util.
- import mpi.
- public class HelloWorld
-
- MPI.Init(args) // start up MPI
- size MPI.COMM_WORLD.Size()
- rank MPI.COMM_WORLD.Rank()
- if (rank 0)
- System.out.println(I am Process 0)
-
- else if (rank 1)
- System.out.println(I am Process 1)
-
- MPI.Finalize()
20Single Program Multiple Data (SPMD) Model
- import java.util.
- import mpi.
- public class HelloWorld
-
- MPI.Init(args) // start up MPI
- size MPI.COMM_WORLD.Size()
- rank MPI.COMM_WORLD.Rank()
- if (rank2 0)
- System.out.println(I am an even process)
-
- else if (rank2 1)
- System.out.println(I am an odd process)
-
- MPI.Finalize()
21Point to Point Communication
- The most fundamental facility provided by MPI
- Basically exchange messages between two
processes - One process (source) sends message
- The other process (destination) receives message
22Point to Point Communication
- It is possible to send message for each basic
datatype - Floats (MPI.FLOAT), Integers (MPI.INT), Doubles
(MPI.DOUBLE) - Java Objects (MPI.OBJECT)
- Each message contains a tagan identifier
Tag1
Tag2
23Point to Point Communication
Process 1
Process 2
Process 0
message
Process 3
Process 7
Process 6
Process 4
Process 5
24Blocking Send() and Recv() Methods
- public void Send(Object buf, int offset, int
count, - Datatype datatype, int dest, int tag)
throws MPIException - public Status Recv(Object buf, int offset, int
count, - Datatype datatype, int src,
int tag) throws MPIException -
25Blocking and Non-blocking Point-to-Point Comm
- There are blocking and non-blocking version of
send and receive methods - Blocking versions
- A process calls Send() or Recv(), these methods
return when the message has been physically sent
or received - Non-blocking versions
- A process calls Isend() or Irecv(), these methods
return immediately - The user can check the status of message by
calling Test() or Wait() - Non-blocking versions provide overlapping of
computation and communication - Asynchronous communication
26(No Transcript)
27Non-blocking Point-to-Point Comm
- public Request Isend(Object buf, int offset, int
count, - Datatype datatype, int dest, int tag)
throws MPIException - public Request Irecv(Object buf, int offset, int
count, - Datatype datatype, int src, int
tag) throws MPIException - public Status Wait() throws MPIException
- public Status Test() throws MPIException
-
28Performance Evaluation of Point to Point
Communication
- Normally ping pong benchmarks are used to
calculate - Latency How long it takes to send N bytes from
sender to receiver? - Throughput How much bandwidth is achieved?
- Latency is a useful measure for studying the
performance of small messages - Throughput is a useful measure for studying the
performance of large messages
29Latency on Myrinet
30Throughput on Myrinet
31Collective communications
- Provided as a convenience for application
developers - Save significant development time
- Efficient algorithms may be used
- Stable (tested)
- Built on top of point-to-point communications
- These operations include
- Broadcast, Barrier, Reduce, Allreduce, Alltoall,
Scatter, Scan, Allscatter - Versions that allows displacements between the
data
32Broadcast, scatter, gather, allgather, alltoall
Image from MPI standard doc
33Broadcast, scatter, gather, allgather, alltoall
- public void Bcast(Object buf, int offset, int
count, - Datatype type, int
root) throws MPIException - public void Scatter(Object sendbuf, int
sendoffset, int sendcount, Datatype sendtype, - Object recvbuf,
int recvoffset, int recvcount, Datatype recvtype, - int root) throws
MPIException - public void Gather(Object sendbuf, int
sendoffset, int sendcount, Datatype sendtype, - Object recvbuf,
int recvoffset, int recvcount, Datatype recvtype, - int root) throws
MPIException - public void Allgather(Object sendbuf, int
sendoffset int sendcount, Datatype sendtype,
Object recvbuf, int
recvoffset, int recvcount, Datatype recvtype)
throws MPIException - public void Alltoall(Object sendbuf, int
sendoffset, int sendcount, Datatype sendtype, - Object recvbuf,
int recvoffset, int recvcount, Datatype recvtype)
- throws MPIException
-
-
-
34Reduce collective operations
Processes
- MPI.PROD
- MPI.SUM
- MPI.MIN
- MPI.MAX
- MPI.LAND
- MPI.BAND
- MPI.LOR
- MPI.BOR
- MPI.LXOR
- MPI.BXOR
- MPI.MINLOC
- MPI.MAXLOC
35Reduce collective operations
- public void Reduce(Object sendbuf, int
sendoffset, - Object recvbuf,
int recvoffset, int count, - Datatype
datatype, Op op, int root) - throws
MPIException - public void Allreduce(Object sendbuf, int
sendoffset, - Object recvbuf,
int recvoffset, int count, - Datatype
datatype, Op op) - throws
MPIException
36Collective Communication Performance
37Summary
- MPJ Express is a Java messaging system that can
be used to write parallel applications - MPJ/Ibis and mpiJava are other similar software
- MPJ Express provides point-to-point communication
methods like Send() and Recv() - Blocking and non-blocking versions
- Collective communications is also supported
- Feel free to contact me if you have any queries