MLD2P4: a package of parallel algebraic multilevel Preconditioners - PowerPoint PPT Presentation

About This Presentation
Title:

MLD2P4: a package of parallel algebraic multilevel Preconditioners

Description:

64 Intel Itanium dual-processor nodes connected by Quadrics QSNetII Elan 4. 32 AMD Opteron dual-processor nodes connected by Myrinet ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 22
Provided by: pasqua3
Category:

less

Transcript and Presenter's Notes

Title: MLD2P4: a package of parallel algebraic multilevel Preconditioners


1
MLD2P4 a package of parallel algebraic
multilevel Preconditioners
Bologna, March 2008
  • Pasqua DAmbra, Institute for High-Performance
    Computing and Networking (ICAR-CNR), Naples
    Branch, Italy

joint work with Daniela di Serafino, Second
University of Naples Salvatore Filippone,
University of Rome Tor-Vergata
2
Overview
  • Motivations
  • Background
  • Objectives
  • MLD2P4 Multi-Level Domain Decomposition Parallel
    Preconditioners Package based on PSBLAS
  • Algorithms and computational kernels
  • Software architecture
  • Some Results Applications

3
Background
  • Large-scale applications have to solve
  • The linear system matrix is
  • Real or complex and square
  • Large and Sparse
  • Distributed among parallel processors
  • Matrix dimensions and entries, conditioning,
    sparsity pattern and coupling among variables
    vary along simulations

4
Background (contd)
  • What is the best method/preconditioner?
  • No absolute winner, experimentation is needed
  • Reliable preconditioners require access to the
    complete matrix
  • Parallel implementation is not trivial
  • Interfacing with application software is required
  • Custom-made interfaces to parallel legacy codes
  • Different interfaces for different
    preconditioners/solvers

5
Objectives
  • designing and implementing a suite of
  • algebraic preconditioners
  • based on Linear Algebra kernels for
  • parallel sparse matrix computations
  • Flexibility
  • Different preconditioners by single API
  • Portability Efficiency
  • Standard base software for serial kernels and
    data communications
  • Simplicity of usage
  • Modern (OO) Fortran 95 features and auxiliary
    routines for smooth legacy code integration

6
MLD2P4
  • Multi-Level Domain Decomposition
  • Parallel Preconditioners Package based on PSBLAS

7
PSBLAS (Filippone et al., http//www.ce.uniroma2.i
t/psblas/) Basic Linear Algebra Operations with
Sparse Matrices on MIMD Architectures
Iterative Sparse Linear Solvers CG, BiCG, CGS,
BiCGSTAB, RGMRES,
Appl.
Parallel Sparse Matrix Operations matrix-matrix
products, matrix-vector products,
Parallel Sparse Matrix Management allocate,
build, update,
Kernels
BLACS Basic Linear Algebra Communication
Subprograms
Base sw
MPI
F95
F77
8
MLD2P4 Design Algorithms
  • Algebraic multi-level Schwarz preconditioners
  • based on smoothed aggregation
  • good trade-off between parallelism and
    convergence
  • optimal scalability for symmetric
    positive-definite matrices
  • algebraic framework allows general-purpose
    application

9
(1-lev) Schwarz basic ingredients
Adjacency graph of A
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
0-overlap partition of W
d-overlap partition of W
10
AS basic ingredients (contd)
Restriction/prolongation operators
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
Restriction of A
11
Coarse level correction basic ingredients
Algebraic coarsening uncoupled aggregation
Smoothed prol./restr. operators
Coarse-level matrix
12
Multilevel-Schwarz preconditioners
computational kernels
Example 2-lev hybrid-post
P. DAmbra, D. di Serafino, S. Filippone, On the
Development of PSBLAS-based Parallel Two-level
Schwarz Preconditioners, Applied Numerical
Mathematics, 57, 2007.
13
MLD2P4 Design Software Architecture
14
Performance Results Comparisons
  • Different test matrices from various sources
  • thm matrices thermal diffusion in solids
  • kivap matrices automotive engine design
  • shipsec matrices from UF sparse matrix
    collection
  • Experiments carried out on different Linux
    clusters
  • 64 Intel Itanium dual-processor nodes connected
    by Quadrics QSNetII Elan 4
  • 32 AMD Opteron dual-processor nodes connected by
    Myrinet
  • 8 AMD Opteron dual-processor nodes connected by
    InfiniBand
  • 8 Intel Itanium dual-processor nodes connected by
    Myrinet
  • 16 Intel Pentium IV nodes connected by Fast
    Ethernet
  • Comparison with up-to-date related work
  • Trilinos-ML

A. Buttari, P. DAmbra, D. di Serafino, S.
Filippone, 2LEV-D2P4 a package of
high-performance preconditioners for scientific
and engineering applications , Applicable Algebra
in Engineering, Communication and Computing, Vol.
18, 2007.
15
Experimental Setting
  • MLD2P4 right-preconditioned BiCGSTAB
  • 1-lev Restricted Additive Schwarz preconditioner
    with ILU(0) (RAS)
  • 2-lev hybrid Schwarz preconditioner, with
    RAS/ILU(0) as 1-lev prec.
  • Distributed coarsest matrix 4 sweeps of block
    Jacobi with ILU(0) (2LDI) or with UMFPACK (2LDU)
    on diagonal blocks
  • 3-lev hybrid Schwarz preconditioner, with
    RAS/ILU(0) as 1-lev prec.
  • Distributed coarsest matrix 4 sweeps of block
    Jacobi with ILU(0) (3LDI) or with UMFPACK (3LDU)
    on diagonal blocks

16
thm matrices number of iterations
thm1 n 600000 nnz 2996800
np OV0 OV0 OV0 OV0 OV0
np RAS 2LDI 2LDU 3LDI 3LDU
1 613 190 - 70 -
2 705 184 - 72 -
4 761 206 - 74 -
8 688 202 44 67 28
16 748 211 61 70 36
32 766 186 81 69 51
64 809 196 113 86 68
np OV1 OV1 OV1 OV1 OV1
np RAS 2LDI 2LDU 3LDI 3LDU
1 613 190 - 70 -
2 923 183 - 76 -
4 684 178 - 63 -
8 937 191 34 62 27
16 688 172 57 68 33
32 714 181 74 65 45
64 720 180 107 77 62
64 Intel Itanium dual-processor nodes connected
by QSNetII
17
thm matrices execution times and speed-ups
(OV1 best execution times3LDU)
64 Intel Itanium dual-processor nodes connected
by QSNetII
18
Application test caselarge eddy simulation of
incompressible turbulent flows in a bi-periodical
channel
  • main computational kernel
  • nonsymmetric and singular linear systems arising
    from elliptic PDE with Neumann b.c.

A. Aprovitola, P. DAmbra, F. M. Denaro, D. di
Serafino, S. Filippone, Application of Parallel
Algebraic Multilevel Domain Decomposition
Preconditioners in Large-Eddy Simulations of
Wall-bounded Turbulent Flows First Experiments,
RT-ICAR-NA-2007-02, July 2007.
19
Experimental Setting
Reynolds number 180 Computational Grid
140x32x45 non-uniform in the y direction,
time-step 10-4
Pressure linear system n201600 nnz1398600
  • MLD2P4 right-preconditioned RGMRES(30)
  • 1-lev Restricted Additive Schwarz preconditioner
    with ILU(0) (RAS)
  • 2-lev/3-lev hybrid Schwarz preconditioner, with
    RAS/ILU(0) as 1-lev prec.
  • Distributed coarse matrix 4 sweeps of block
    Jacobi with ILU(0) (2LDI/3LDI) on diagonal blocks
  • Stopping criterion or
    maxit
  • General row-block distribution

20
LES of incompressible wall-bounded flow
SOR on 1 proc.9 sec.
SOR on 1 proc.8580 sec.
16 Intel Itanium dual-processor nodes connected
by QSNetII
21
Work in progress
  • Package available on the web very soon
  • More sophisticated aggregation algorithms
  • Integration of preconditioners and solvers in
    large-scale applications
Write a Comment
User Comments (0)
About PowerShow.com