Title: Domain Decomposed Parallel Heat Distribution Problem in Two Dimensions
1Domain Decomposed Parallel Heat
DistributionProblem in Two Dimensions
- Yana Kortsarts
- Jeff Rufinus
- Widener University
- Computer Science Department
2Introduction
- 2004 Office of Science in the Department of
Energy issued a twenty-year strategic plan with
seven highest priorities ranging from fusion
energy to genomics. - To achieve the necessary levels of algorithmic
and computational capabilities it is essential to
educate students in computation and computational
techniques. - Parallel Computing topic is one of the attractive
topics in computation science field
3Introductory Parallel Computer Course
- Computer Science Department, Widener University,
CS and CIS majors - Series of two courses Introduction to Parallel
Computing I and II - Resources computer cluster of six nodes, each
node has two 2.4 GHz processors and 1 GB of
memory, nodes are connected by Gigabit Ethernet
switch.
4Course Curriculum
- Matrix Manipulation
- Numerical Simulation Concepts
- Direct applications in science and engineering
- Introduction to MPI libraries and their
applications - Concepts of parallelism
- Finite difference method for the 2-D heat
equation using parallel algorithm
52-D Heat Distribution Problem
- The problem to determine the temperature
u(x,y,t) in an isotropic two-dimensional
rectangular plate - The model
6Finite Difference Method
- The finite difference method begins with the
discretization of space and time such that there
is an integer number of points in space and an
integer number of times at which we calculate the
temperature
(xi , yj)
tk
tk1
?t
?y
?x
7We will use the following notation
We will use the finite difference approximations
for the derivatives
Expressing ui,j,k1 from this equation yields
8Finite Difference Method Explicit Scheme
ui,j1,k
ui,j,k1
ui,j,k
ui-1,j,k
ui1,j,k
k ? k 1
ui,j-1,k
9Single Processor Implementation
- double u_oldn1n1, u_newn1n1
- Initialize u_old with initial values and boundary
conditions - while (still time points to compute)
- for (i 1 i lt n i)
- for (j 1 j lt n j)
- compute u_newi, j using formula (1)
- //end of for
- // end of for
- u_old ? u_new
- // end of while
10Parallel ImplementationDomain Decomposition
- Dividing computation and data into pieces
- Domain could be decomposed in three ways
- Column-wise adjacent groups of columns (A)
- Row-wise adjacent groups of rows (B)
- Block-wise adjacent groups of two dimensional
blocks (C) -
11Domain Decomposition and Partition
- Example column-wise domain decomposition method,
200 points - to be calculated simultaneously, 4 processors
- MPI_Send and MPI_Recv
Processor 4 x149x199
Processor 1 x0x49
Processor 2 x50x99
Processor3 x100x149
12Load Imbalance
- When dividing the data into processes we have to
pay attention to the number of loads being
processed by each processor - Uneven load distribution may cause some processes
to finish earlier than others - Load imbalance is one source of overhead
- Good task mapping is needed
- All tasks should be mapped onto processes as
evenly as possible so that all tasks complete in
the shortest amount of time and the idle time is
minimized
13Communication
- Communication time depends on the latency and the
speed of communication network these two
factors are much slower than CPUs communication
time - There is a catch of using too many communications
14Running Time and Speedup
- The running time of one time iteration of the
sequential algorithm is ?(MN), where M and N are
numbers of grid points in each direction - The running time of one time iteration of the
parallel algorithm is computational time
communication time - ??(MN/p) B
- where p is the number of processors and B is
the total send-receive communication time that is
required for one time iteration - The speedup is always defined as
-
15Results
The temperature distribution on the
two-dimensional plate at a much later time
16Results
- Two cases were considered
- M x N 1000
- M x N 500,000.
- Next slide shows the speed-up versus the number
of processors for two different inputs 500,000
(the top chart) and 1000 (the bottom chart). - The dashed line indicates the speed-up equals to
one, which is the sequential version of the
algorithm. - The higher the speed-up (at a specific number of
processors) means the better the performance of
the parallel algorithm. - Most of the results come from the column-wise
domain decomposition method.
17Results
18Results
- For the case of input 1000, the sequential
version (with p 1) is faster than
the parallel version (p 2). The parallel
version is slower because of the latency and
speed of the communication network which does not
exist in the sequential version. - The top chart shows the speedup versus the number
of processors for total input 500,000. In this
case, as we increase the number of processors,
the speedup also increases, reaching the speedup
of 4.13 at p 10. - For a large number of inputs the communication
time begins to catch up with the CPUs
computation time, resulting in a better
performance of the parallel algorithm.
19Speedup comparisons for column-wise and
block-wise decomposition methods for number of
processors equals to 4 and 9
Total number of inputs Speedup Column-wise Decomposition P 4 Speedup Column-wise Decomposition P 9 Speedup Block-wise Decomposition P 4 Speedup Block-wise Decomposition P 9
1,000 0.093 0.05 0.065 0.04
500,000 2.12 3.82 2.19 3.51
20Results
- Overall, the speedups between the two methods are
not very different. - For number of inputs 1,000 the column-wise
decomposition produces better speed-ups than the
block-wise decomposition. - For number of inputs 500,000, we have a mixed
result. The column-wise method performs better
for 9 processors while the block-wise method
performs (slightly) better for 4 processors. - The results given in the table do not give a
conclusive idea of which decomposition method is
better, unless the number of inputs and the
number of processors could be extended beyond the
ones used in here.
21Summary
- Numerical simulation of two-dimensional heat
distribution has been used as an example that can
be used to teach parallel computing concepts in
an introductory course. - With this simple example we introduce the core
concepts of parallelism - Domain decomposition and partitioning
- Load balancing and mapping
- Communication
- Speedup
- We show the benchmarking results of the parallel
version of two-dimensional heat distribution
problem with different number of processors.
22References
- 1. J. Dongarra, I. Foster, G. Fox, W. Gropp, K.
Kennedy, L. Torczon, and A. White, (Editors),
Sourcebook of Parallel Computing. Elsevier
Science (2003). - 2. I. Foster, Designing and Building Parallel
Programs. Addison Wesley (1994). - 3. G. E. Karniadakis and R. M. Kirby, Parallel
Scientific Computing in C and MPI. Cambridge
University Press (2003). - 4. M. J. Quinn, Parallel Programming in C with
MPI and OpenMP. McGraw Hill Publishers (2005). - 5. B. Wilkinson and M. Allen, Parallel
Programming. Second edition. Prentice-Hall
(2005). - 6. M. Snir, S. Otto, S. Huss-Lederman, D. Walker
and J. Dongarra, MPI The Complete Reference,
Volume 1. Second edition. MIT Press (1998). - 7. W. F. Ames, Numerical Methods for Partial
Differential Equations. Second edition. Academic
Press, New York (1977). - 8. T. Myint-U and L. Debnath, Partial
Differential Equations for Scientists and
Engineers. Elsevier Science (1987). - 9. G. D. Smith, Numerical Solution of Partial
Differential Equations Finite Difference
Methods. Third edition. Oxford University Press
(1985). - 10. S. S. Rao, Applied Numerical Methods for
Engineers and Scientists. Prentice-Hall (2002).