MLD2P4: a package of parallel algebraic multilevel Preconditioners - PowerPoint PPT Presentation

About This Presentation

Title:

MLD2P4: a package of parallel algebraic multilevel Preconditioners

Description:

64 Intel Itanium dual-processor nodes connected by Quadrics QSNetII Elan 4. 32 AMD Opteron dual-processor nodes connected by Myrinet ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 22

Provided by: pasqua3

Category:

more less

Transcript and Presenter's Notes

Title: MLD2P4: a package of parallel algebraic multilevel Preconditioners

1
MLD2P4 a package of parallel algebraic
multilevel Preconditioners
Bologna, March 2008

Pasqua DAmbra, Institute for High-Performance
Computing and Networking (ICAR-CNR), Naples
Branch, Italy

joint work with Daniela di Serafino, Second
University of Naples Salvatore Filippone,
University of Rome Tor-Vergata
2
Overview

Motivations
Background
Objectives
MLD2P4 Multi-Level Domain Decomposition Parallel
Preconditioners Package based on PSBLAS
Algorithms and computational kernels
Software architecture
Some Results Applications

3
Background

Large-scale applications have to solve

The linear system matrix is
Real or complex and square
Large and Sparse
Distributed among parallel processors
Matrix dimensions and entries, conditioning,
sparsity pattern and coupling among variables
vary along simulations

4
Background (contd)

What is the best method/preconditioner?
No absolute winner, experimentation is needed
Reliable preconditioners require access to the
complete matrix
Parallel implementation is not trivial

Interfacing with application software is required
Custom-made interfaces to parallel legacy codes
Different interfaces for different
preconditioners/solvers

5
Objectives

designing and implementing a suite of
algebraic preconditioners
based on Linear Algebra kernels for
parallel sparse matrix computations

Flexibility
Different preconditioners by single API
Portability Efficiency
Standard base software for serial kernels and
data communications
Simplicity of usage
Modern (OO) Fortran 95 features and auxiliary
routines for smooth legacy code integration

6
MLD2P4

Multi-Level Domain Decomposition
Parallel Preconditioners Package based on PSBLAS

7
PSBLAS (Filippone et al., http//www.ce.uniroma2.i
t/psblas/) Basic Linear Algebra Operations with
Sparse Matrices on MIMD Architectures
Iterative Sparse Linear Solvers CG, BiCG, CGS,
BiCGSTAB, RGMRES,
Appl.
Parallel Sparse Matrix Operations matrix-matrix
products, matrix-vector products,
Parallel Sparse Matrix Management allocate,
build, update,
Kernels
BLACS Basic Linear Algebra Communication
Subprograms
Base sw
MPI
F95
F77
8
MLD2P4 Design Algorithms

Algebraic multi-level Schwarz preconditioners
based on smoothed aggregation
good trade-off between parallelism and
convergence
optimal scalability for symmetric
positive-definite matrices
algebraic framework allows general-purpose
application

9
(1-lev) Schwarz basic ingredients
Adjacency graph of A
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
0-overlap partition of W
d-overlap partition of W
10
AS basic ingredients (contd)
Restriction/prolongation operators
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
Restriction of A
11
Coarse level correction basic ingredients
Algebraic coarsening uncoupled aggregation
Smoothed prol./restr. operators
Coarse-level matrix
12
Multilevel-Schwarz preconditioners
computational kernels
Example 2-lev hybrid-post
P. DAmbra, D. di Serafino, S. Filippone, On the
Development of PSBLAS-based Parallel Two-level
Schwarz Preconditioners, Applied Numerical
Mathematics, 57, 2007.
13
MLD2P4 Design Software Architecture
14
Performance Results Comparisons

Different test matrices from various sources
thm matrices thermal diffusion in solids
kivap matrices automotive engine design
shipsec matrices from UF sparse matrix
collection
Experiments carried out on different Linux
clusters
64 Intel Itanium dual-processor nodes connected
by Quadrics QSNetII Elan 4
32 AMD Opteron dual-processor nodes connected by
Myrinet
8 AMD Opteron dual-processor nodes connected by
InfiniBand
8 Intel Itanium dual-processor nodes connected by
Myrinet
16 Intel Pentium IV nodes connected by Fast
Ethernet
Comparison with up-to-date related work
Trilinos-ML

A. Buttari, P. DAmbra, D. di Serafino, S.
Filippone, 2LEV-D2P4 a package of
high-performance preconditioners for scientific
and engineering applications , Applicable Algebra
in Engineering, Communication and Computing, Vol.
18, 2007.
15
Experimental Setting

MLD2P4 right-preconditioned BiCGSTAB
1-lev Restricted Additive Schwarz preconditioner
with ILU(0) (RAS)
2-lev hybrid Schwarz preconditioner, with
RAS/ILU(0) as 1-lev prec.
Distributed coarsest matrix 4 sweeps of block
Jacobi with ILU(0) (2LDI) or with UMFPACK (2LDU)
on diagonal blocks
3-lev hybrid Schwarz preconditioner, with
RAS/ILU(0) as 1-lev prec.
Distributed coarsest matrix 4 sweeps of block
Jacobi with ILU(0) (3LDI) or with UMFPACK (3LDU)
on diagonal blocks

16
thm matrices number of iterations
thm1 n 600000 nnz 2996800
np OV0 OV0 OV0 OV0 OV0
np RAS 2LDI 2LDU 3LDI 3LDU
1 613 190 - 70 -
2 705 184 - 72 -
4 761 206 - 74 -
8 688 202 44 67 28
16 748 211 61 70 36
32 766 186 81 69 51
64 809 196 113 86 68
np OV1 OV1 OV1 OV1 OV1
np RAS 2LDI 2LDU 3LDI 3LDU
1 613 190 - 70 -
2 923 183 - 76 -
4 684 178 - 63 -
8 937 191 34 62 27
16 688 172 57 68 33
32 714 181 74 65 45
64 720 180 107 77 62
64 Intel Itanium dual-processor nodes connected
by QSNetII
17
thm matrices execution times and speed-ups
(OV1 best execution times3LDU)
64 Intel Itanium dual-processor nodes connected
by QSNetII
18
Application test caselarge eddy simulation of
incompressible turbulent flows in a bi-periodical
channel

main computational kernel
nonsymmetric and singular linear systems arising
from elliptic PDE with Neumann b.c.

A. Aprovitola, P. DAmbra, F. M. Denaro, D. di
Serafino, S. Filippone, Application of Parallel
Algebraic Multilevel Domain Decomposition
Preconditioners in Large-Eddy Simulations of
Wall-bounded Turbulent Flows First Experiments,
RT-ICAR-NA-2007-02, July 2007.
19
Experimental Setting
Reynolds number 180 Computational Grid
140x32x45 non-uniform in the y direction,
time-step 10-4
Pressure linear system n201600 nnz1398600

MLD2P4 right-preconditioned RGMRES(30)
1-lev Restricted Additive Schwarz preconditioner
with ILU(0) (RAS)
2-lev/3-lev hybrid Schwarz preconditioner, with
RAS/ILU(0) as 1-lev prec.
Distributed coarse matrix 4 sweeps of block
Jacobi with ILU(0) (2LDI/3LDI) on diagonal blocks
Stopping criterion or
maxit
General row-block distribution

20
LES of incompressible wall-bounded flow
SOR on 1 proc.9 sec.
SOR on 1 proc.8580 sec.
16 Intel Itanium dual-processor nodes connected
by QSNetII
21
Work in progress