Parallelisation of Gridoriented Problems - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Parallelisation of Gridoriented Problems

Description:

do calculations for all grid points in subdomain ... Tcalc : calculation time ; Tcomm : communication ... relative to calculation cost ! For the model problems ... – PowerPoint PPT presentation

Number of Views:41

Avg rating:3.0/5.0

Slides: 25

Provided by: dirkr

Category:

more less

Transcript and Presenter's Notes

Title: Parallelisation of Gridoriented Problems

1
Parallelisation of Grid-oriented Problems

Algoritmen voor parallelle computers
8/11/2000

2
Grid-oriented problems

PDEs, image processing, data set defined on a
gridlocal computations with small stencilsÆ
data dependencies between neighbouring grid
points
grid point generic name for data associated
withgrid point, pixel, cell, finite element,
grid, data set associated work partitioned in
subdomainsthe subdomains are assigned (mapped)
to processors

3
Grid-oriented problems (cont.)

extra tasks (compared with sequential code)
partitioning mapping to ensure work load
balance and communication minimisation
communication between neighbouring subdomains

4
Model problems

PDEs
explicit time integration (forward Euler)
relaxation methods (Jacobi, Gauss-Seidel, SOR, )
on a structured (regular) 2D grid
image processing
convolution
on a 2D pixel matrix
same data-dependency pattern
Æ same parallelisation strategy

5
Explicit time integration convolution
6
Computational molecules

5 point stencil 9 point stencil

7
Computational molecules (cont.)

two different 9 point stencils

8
Subdomains overlap regions

Note overlap region can have a width gt 1

9
Skeleton of a typical program

in every subdomain (processor)
exchange data in the overlap region
communication with procs. holding neighbouring
subdomains
do calculations for all grid points in subdomain
check for stopping criterion (e.g. convergence
check)
global communication (reduction)

10
Exchange overlap regions

5 point stencil

11
Analysis of communication overhead

assume p processors n nx x ny points per
subdomain
only communication overhead no sequential
partno load imbalance
T(n,p) parallel execution time T(n,1)
execution time on 1 proc.
Tcalc calculation time Tcomm communication
time
Speedup
Efficiency

12
Analysis of communication overhead (cont.)

Communication overhead
relative to calculation cost !
For the model problems
tcalc time to perform a floating point
operation
tcomm average time to communicate one floating
point number
Note in case of 1 message of length m tcomm
(ts mtw)/m !!

13
Analysis of communication overhead (cont.)

Communication overhead
depends on
the size of the subdomain large subdomains have
a small perimeter to surface ratio
the machine characteristic tcomm/tcalc
indicates how fast communication can be performed
compared with floating point operations
the algorithm via the ratio cc /cf fc is small
when many flops per grid point (cf) compared with
the amount of data assocoated with a grid point
(cc)

14
Partitioning strategies

2D grid M grid points n M/p grid points per
proc.
blockwise partitioning stripwise
partitioning
n nx x ny square blockwise partitioning if
nx ny n

15
Partitioning strategies (cont.)

communication volume perimeter of subdomain
square subdomains (nx ny) minimal perimeter
? blockwise partitioning is to be preferred
BUT
stripwise partitioning
higher communication volume
fewer neighbours Æ fewer messages
choice depends on problem machine
characteristics
stripwise partitioning may be better also when
communication mainly in one direction
(an-isotropic communication)

16
Comm. overhead dependence on problem size

2D grid M grid points n M/p grid points
per proc.
blockwise partitioning
per proc points
fc (and speedup efficiency) is constant when n
(problem size per proc) is constant and p grows
fc ?(speedup efficiencyØ) when total problem
size is constant and p grows

17
Comm. overhead dependence on problem size

2D grid M grid points n M/p grid points
per proc.
stripwise partitioning
per proc points
fc ? (speedup efficiency Ø) when n is constant
and p grows
fc ?? (speedup efficiency ØØ) when total
problem size is constant and p grows

18
Comm. overhead dependence on problem size

3D problemscommunication volume
µ surface to volume ratio of the subdomains
blockwise partitioning points per
proc.
fc increases slower as function of n than in 2D
case
d-dimensional problems fc µ 1/n1/d

19
Comm. overhead dependence on comput. molecule

computational molecules of increasing size
when the molecule covers the whole domain (i.e.
new value
of grid point depends on all other grid points)
!!

l l u l l
l l l l l l l l l l l l l l l l l l l l l l l l u
l l l l l l l l l l l l l l l l l l l l l l l l
l l l l u l l l l
l l l l l l l l l l l l u l l l l l l l l l l l l
20
Analysis of load imbalance

Let calculation time for processor i,
i 1 p
average calculation time
maximal calculation time (over all
procs.)
Assume
number of operations (counted sequentially)
independent of p
communication time sequential fraction can be
neglected
Execution time of the parallel program
determined by
Efficiency
Load balance factor

21
Analysis of load imbalance (cont.)

Load balance factor does not depend on !

22
Analysis of load imbalance (cont.)

Assume in addition
amount of work is equal for each grid point
procs. are (implicitly) synchronised by the
communication at the end of each iteration
Let Nmax maximum number of grid points per
subdomain
Naverage M/p average number of grid
points /subdomain
then

23
Load imbalance and partitioning

If computational cost is NOT equal for each grid
point
different physics in different regions
grid points corresponding to boundary
conditions
optimal partitioning w.r.t. work load balance
difficult to compute
If work load imbalance is due only to boundary
conditions
then blockwise partitioning ensures that the
boundary
conditions are well distributed over the
processors
Æ good load balance
blockwise partitioning stripwise
partitioning

24
Load imbalance and partitioning (cont.)

If in a rectangular grid, the number of grid
lines is not a multiple of p then typically the
grid is partitioned in (unequal) rectangles
not optimal w.r.t work load balance, but easy
Also in this case, a blockwise partitioning leads
to
minimal work load imbalance
blockwise partitioning stripwise
partitioning