On the domain decomposition approach in some convectiondiffusionreaction problems - PowerPoint PPT Presentation

About This Presentation
Title:

On the domain decomposition approach in some convectiondiffusionreaction problems

Description:

domain overlapping in the advection-diffusion submodel ... The performance of the horizontal advection and diffusion can be carried out ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 56
Provided by: sveto
Category:

less

Transcript and Presenter's Notes

Title: On the domain decomposition approach in some convectiondiffusionreaction problems


1
On the domain decomposition approach in some
convection-diffusion-reaction problems
  • K. Georgiev 1, Z. Zlatev 2
  • 1 Institute for Parallel Processing,
  • Bulgarian Academy of Sciences, Bulgaria
  • 2 National Environmental Research Institute,
  • Roskilde, Denmark

2
Introduction
  • Efficient numerical methods algorithms for
    solving
  • convection-diffusion-reaction systems
  • are of big priority due to
  • numerous practically very much important problems

3
Introduction
  • Promoninent among this subject are simulations
    in
  • air pollution modelling

4
Introduction
  • Promoninent among this subject are simulations
    in
  • air pollution modelling
  • pipe networks

5
Introduction
  • Promoninent among this subject are simulations
    in
  • air pollution modelling
  • pipe networks
  • acoustics

6
Introduction
  • Promoninent among this subject are simulations
    in
  • air pollution modelling
  • pipe networks
  • acoustics
  • turbolent kinetic energy and its dispersion

7
Introduction
  • Promoninent among this subject are simulations
    in
  • air pollution modelling
  • pipe networks
  • acoustics
  • turbolent kinetic energy and its dispersion
  • non-Newton flows (extra stresses)

8
Introduction
  • Promoninent among this subject are simulations
    in
  • air pollution modelling
  • pipe networks
  • acoustics
  • turbolent kinetic energy and its dispersion
  • non-Newton flows (extra stresses)
  • magneto - hydrodynamics

9
Introduction
  • Promoninent among this subject are simulations
    in
  • air pollution modelling
  • pipe networks
  • acoustics
  • turbolent kinetic energy and its dispersion
  • non-Newton flows (extra stresses)
  • magneto - hydrodynamics
  • modelling of zeolite filters
  • etc.

10
The Mathematical Model
  • Danish Eulerian Model for long range transport of
    air pollutants

11
Splitting into submodels
12
Numerical treatment
  • Finite elements (1D linear first order, bilinear,
    nonconforming)

13
Numerical treatment in 2D case
  • Finite elements (1D linear first order, bilinear,
    nonconforming)
  • Predictor-corrector methods with several
    different correctors in advection-diffusion
    submodel
  • QSSA (Quasi-Steady-State Algorithm) in
    chemistry-emission submodel
  • Exact solution in deposition submodel

14
Size of the computational task
15
Parallelization strategy
  • Distributed memory parallelization model via
    Message Passing Interface (MPI) - maximum
    portability of the code

16
Parallelization strategy
  • Distributed memory parallelization model via
    Message Passing Interface (MPI) - maximum
    portability of the code
  • Based on domain decomposition of the horizontal
    grid
  • domain overlapping in the advection-diffusion
    submodel
  • nonoverlapping subdomains in chemistry
    -deposition submodel

17
DD approach
18
Need of parallel computations
  • SunFire 6800 (24 CPU UltraSparc-III/750 MHz
  • (DTU, Lyngby, Denmark)

19
Need of parallel computations
  • SunFire 6800 (24 CPU UltraSparc-III/750 MHz
  • (DTU, Lyngby, Denmark)
  • Grid 480 x 480 (10 km. resolution)
  • 2D version of UNI-DEM
  • ONE processor

20
Need of parallel computations
  • SunFire 6800 (24 CPU UltraSparc-III/750 MHz
  • (DTU, Lyngby, Denmark)
  • Grid 480 x 480 (10 km. resolution)
  • 2D version of UNI-DEM
  • ONE processor
  • 4 017 852 sec. 1116 h 46.5 day!

21
Danish Eulerian Model (UNI-DEM)
  • vector computers (CRAY C92A, Fujitsu, etc.)

22
Danish Eulerian Model (UNI-DEM)
  • vector computers (CRAY C92A, Fujitsu, etc.)
  • parallel computers with distributed memory (IBM
    SP, CRAY T3E, Beowulf clusters, etc.)

23
Danish Eulerian Model (UNI-DEM)
  • vector computers (CRAY C92A, Fujitsu, etc.)
  • parallel computers with distributed memory (IBM
    SP, CRAY T3E, Beowulf clusters, etc.)
  • parallel computers with shared memory (SGI
    Origin, SUN, etc.)

24
Danish Eulerian Model (UNI-DEM)
  • vector computers (CRAY C92A, Fujitsu, etc.)
  • parallel computers with distributed memory (IBM
    SP, CRAY T3E, Beowulf clusters, etc.)
  • parallel computers with shared memory (SGI
    Origin, SUN, etc.)
  • parallel computers with two levels of parallelism
    (IBM SMP, Macitntoch G4 clusters, etc.)

25
Some numerical experiments
  • Machines used
  • SunFire 6800 at DTU, Lyngby, Denmark
  • (24 CPU UltraSparc-III/750 MHz)

26
Some numerical experiments
  • Machines used
  • SunFire 6800 at DTU, Lyngby, Denmark
  • (24 CPU UltraSparc-III/750 MHz)
  • Macintosh G4 Power PC cluster at IPP, Sofia
  • (Linux cluster, 4 nodes x 2 CPU G4/450 MHz)

27
Some numerical experiments
  • Computing time (in sec.) and speedup on
    Macintosh cluster
  • 1 proc. 2 proc. 4 proc. 8 proc
  • Grid size
  • 96 x 96 65 036 33 698 (1.93)
    17 185 (3.78) 9 684 (6.72)
  • 288 x 288 1 338 960 699 424 (1.91) 366
    548 (3.65) 175 066 (7.65)

28
Some numerical experiments
  • Computing time (in sec.) and speedup on
    Macintosh cluster
  • 1 proc. 2 proc. 4 proc. 8 proc
  • Grid size
  • 96 x 96 65 036 33 698 (1.93)
    17 185 (3.78) 9 684 (6.72)
  • 288 x 288 1 338 960 699 424 (1.91) 366
    548 (3.65) 175 066 (7.65)
  • Computing time (in sec.) and speedup on SunFire
    6800
  • 96 x 96 52 744 23 217 (2.27)
    13 296 (3.97) 7 765 (6.79)
  • 288 x 288 709 339 327 400 (2.17) 198
    030 (3.58) 100 033 (7.09)

29
Some numerical experiments
  • SunFire 6800 vs Macintosh G4 cluster

30
Some numerical experiments
  • SunFire 6800 vs Macintosh G4 cluster
  • SUN top performance
  • --------------------------------
    1.67
  • G4 top performance

31
Some numerical experiments
  • SunFire 6800 vs Macintosh G4 cluster
  • SUN top performance
  • --------------------------------
    1.67
  • G4 top performance
  • 1 proc. 2 proc. 4 proc. 8 proc
  • Grid size
  • 96 x 96 1.23 1.45 1.29 1.25
  • 288 x 288 1.89 2.14 1.85 1.75

32
Acknowledgments
  • This research was support in part by
  • Grant IO-01/03 of Bulgarian NSF
  • Grant from the NATO Scientific Programme (CRG
    960505).

33
Implementations
  • for sequential computers
  • for vector computers (CRAY C92A, Fujitsu, etc.)
  • for parallel computers with distributed memory
    (IBM SP, CRAY T3E, Beowulf clusters, etc.)
  • for parallel computers with shared memory (SGI
    Origin, SUN, etc.)
  • for parallel computers with two level of
    parallelism distributed memory between nodes
    and shared memory inside the nodes that consist
    of several processors (IBM SMP, clusters of
    multiprocessor nodes, etc.)

34
Space discretization
  • 32 x 32 (150 km. resolution)
  • 96 x 96 (50 km. resolution)
  • 288 x 288 (16,7 km resolution)
  • 480 x 480 (10 km resolution)

35
Need of parallel computations
  • No of equations per system of ODEs that are
    treated at every time-step (typically 3456 for
    one month period)
  • Grid---gt 32x32x10 96x96x10 288x288x10
    480x480x10
  • No. of
  • spec.
  • 35 358400 3 225 600 29 030
    400 80 640 000
  • 56 573 400 5 160 960 46 448
    640 129 024 000
  • 168 1 720 320 15 482 880 139 345 920
    387 072 000

36
Need of parallel computations
  • SunFire 6800 (24 CPU UltraSparc-III/750 MHz
  • (DTU, Lyngby, Denmark)
  • Grid 480 x 480 (10 km. resolution)
  • 2D version of UNI-DEM
  • ONE processor
  • 4 017 852 sec. 1116 h 46.5 day!

37
Computing time modules
  • Comp. time in sec. on one proc. of IBM SMP
  • Module Comp. time
    Percent
  • Chemistry 16 147
    83.09
  • Advection 3 013
    15.51
  • Initialization 2
    0.00
  • Input operations 50
    0.26
  • Output operations 220
    1.13
  • Total time 19 432
    100.00

38
Chemistry chunks
  • To reduce the computing time used in the chemical
    module it is worthwhile to divide the arrays into
    smaller portions chunks.
  • Copy data from appropriate sections of the large
    arrays in small arrays where the chunks are
    stored. Then the bulk of the computational work
    is performed by using data from the small arrays
    (which will hopefully stay in the caches)

39
Chemistry chunks
  • Let us assume that M is the length of the
    leading dimension of the 2D arrays used in the
    chemical module
  • Divide these arrays into nchunks chunks
  • Leading dimension of the obtained smaller
    arrays, is nsize M/nchunks

40
Chemistry chunks
  • The largest arrays in chemical submodel
  • (a) three arrays for the concentrations,
  • (b) one array for the emissions,
  • (c) one array for the time-dependent chemical
    rate coefficients and
  • (d) three arrays for the depositions (dry, wet
    and total).

41
Chemistry chunks
  • DO ichunk 1, nchunks
  • Copy chunk ichunk from some of the eight large
    arrays into small two-dimensional arrays with
    leading dimension nsize
  • DO j 1, nspecies
  • DO i 1, nsize
  • Perform the chemical reactions involving
    species j for grid-point i
  • END DO
  • END DO
  • Copy some of the small two-dimensional arrays
    with leading dimension nsize into chunk ichunk of
    the corresponding large arrays
  • END DO

42
Chemistry chunks
  • Comp. time in sec. , 2D UNI-DEM, (96 x 96) grid,
    one proc.
  • Chunks size Fujitsu Origin 2000 Mac
    G4 IBM SMP
  • 1 76 964 14 847 6 952 10 313
  • 48 2 611 12 114
    5 792 5 225
  • 9216 494 18 549 12 893 19
    432

43
Chemistry chunks
  • Conclusions
  • The optimal length of the chunks depends on the
    memory hierarchy of the computer.
  • The length of the chunks, nsize, should be a
    parameter which can be selected in the main
    program.

44
Chemistry chunks
  • Conclusions (cont)
  • The use of long chunks leads to many cache misses
    and, therefore, to a very significant increase of
    the computing time.
  • The use of very short chunks increases too much
    the number of copies that have to be made and,
    thus, the computing time.
  • The use of medium chunks in rather large range
    gives normally very good results

45
UNI-DEM on shared memory computers
  • OpenMP
  • (i) to get good results on different shared
    memory computers when such directives are used
  • (ii) to achieve a high degree of portability.

46
UNI-DEM on shared memory computers
  • It is important to identify the parallel tasks
    and to group them in an appropriate way when
    necessary
  • The horizontal advection and diffusion
  • The performance of the horizontal advection and
    diffusion can be carried out independently for
    every chemical compound (and for the 3-D version
    for every layer)

47
UNI-DEM on shared memory computers
  • The chemistry and deposition
  • These two processes can be carried out in
    parallel for every grid-point.
  • The number of parallel tasks is equal to the
    number of gridpoints, but each task is a small
    task.
  • Therefore, the tasks should be grouped in an
    appropriate way.

48
UNI-DEM on shared memory computers
  • The vertical exchange
  • The performance of the vertical exchange along
    each vertical grid-line is a parallel task. The
    number Nx x Ny x Nz
  • If the grid is fine, then the number of these
    tasks is becoming enormous.
  • However, the parallel tasks are not very big and
    have to be grouped.

49
UNI-DEM on distributed memory computers
  • Message Passing Interface (MPI)
  • The space domain of the model is divided into
    several sub-domains (the number of these
    sub-domains being equal to the number of the
    processors assigned to the job).
  • Each processor works on its own sub-domain.

50
DD of the domain
51
UNI-DEM on distributed memory computers
  • The pre-processing procedure
  • The input data (the meteorological data and the
    emission data) are distributed (consistently with
    the sub-domains) to the assigned processors.
  • Not only is each processor working on its own
    sub-domain, but it also has access to all
    meteorological and emission data, which are
    needed in the run.

52
UNI-DEM on distributed memory computers
  • The post-processing procedure
  • During the run, each processor prepares output
    data files for its own subdomain. At the end of
    the job all these files have to be collected on
    one of the processors and prepared for using them
    in the future.
  • The use of the pre-processing and post-processing
    procedures is done in order to reduce as much as
    possible the communications during the actual
    computations.

53
Some performance results
  • SGI Origin 2000 (96 x 96 x 10) grid
  • Proc. Time(sec.) Speed up Efficiency
  • 1 42 907 -- --
  • 32 2 215 19.37 61

54
Some performance results
  • IBM SP (480 x 480) grid
  • Proc. Time(sec.) Speed up Efficiency
  • 8 54 978 -- --
  • 32 15 998 3.44 86

55
Some performance results
  • IBM SMP (96 x 96) grid
  • Proc. Time(sec.) Speed up Efficiency
  • 1 5 978 -- --
  • 16 424 12.32 72
Write a Comment
User Comments (0)
About PowerShow.com