Title: MUMPS A Multifrontal Massively Parallel Solver
1MUMPS A Multifrontal Massively Parallel Solver
http//www.ens-lyon.fr/jylexcel/MUMPS/http//www
.enseeiht.fr/apo/MUMPS/
- Main features MUMPS solves large systems of
linear equations of the form Axb by factorizing
A into ALU or LDLT - symmetric or unsymmetric marices (partial
pivoting), - parallel factorization and solve phases
(uniprocessor version also available), - Iterative refinement and backward error
analysis, - various matrix input formats
- assembled format
- distributed assembled format
- sum of elemental matrices
- Null space functionalities (experimental) rank
detection and null space basis - Partial factorization and Schur complement
matrix, - Version for complex arithmetic.
IMPLEMENTATION
- Distributed multifrontal solver
- MPI / F90 based (C user interface also available)
- Stability based on partial pivoting
- Dynamic Distributed Scheduling to accomodate both
numerical fill-in and multi-user environment - Use of BLAS, LAPACK, ScaLAPACK
A fully asynchronous distributed solver (VAMPIR
trace, 8 processors).
Competitve performance
AVAILABILITY
The MUMPS package has a good perfornance relative
to other parallel sparse solvers for example we
see in the table below comparisons with the
SuperLU code from Demmel and Li. These results
are taken from Analysis and comparison of two
general solvers for distributed memory
computers, ACM TOMS, 27, 388-421.
- MUMPS is available free of charge for non
commercial use. - it has been used on a number of platforms (Cray
T3E, Origin 2000, IBM SP, Linux clusters, ) by a
few hundred current users (finite elements,
chemistry, simulation, aeronautics, ) - If you are interested in obtaining MUMPS for you
own use, please refer to the MUMPS home page.
BMW car body 148770 unknowns 5396386
nonzeros MSC.Software.
Factorisation time in seconds of large matrices
on the CRAY T3E (1 procnot enough memory).
CURRENT RESEARCH ACTIVE RESEARCH IS FEEDING THE
MUMPS SOFTWARE PLATFORM.
Mixing dynamic and static scheduling
strategies MUMPS uses a completely dynamic
approach with distributed scheduling and scales
well until around 100 processors. Introducing
more static information helps reducing the costs
of the dynamic decisions and makes MUMPS more
scalable.
Reorderings and optimization of the memory
usage MUMPS uses state-of-the-art reordering
techniques (AMD, AMF, ND, SCOTCH, PORD, METIS).
Those techniques have a strong impact on the
parallelism and number of operations and we are
currently studying their impact of such
techniques on the dynamic memory usage of MUMPS.
In particular we designed algorithms to optimize
the memory occupation of the multifrontal stack.
Future work includes dynamic memory load
balancing and the design of an out-of-core
version. Best decrease obtained using our
algorithm to decrease the stack for each
reordering technique. Results obtained by A.
Guermouche, (PhD student in the INRIA ReMaP
project).
Effect of a injecting more static information to
the dynamic scheduling of MUMPS. Rectangular
grids of increasing size, ND. Results obtained by
C. Vömel (PhD Cerfacs) on a CRAY T3E.
Platforms with heterogeneous network (clusters of
SMP) In the MUMPS scheduling, work is given to
processors according to their load. Giving a
penalty to the load of processors on a distant
node helps performing tasks with high
communication on the same node and improves the
performance, as shown in the Table below.
Mixing MPI and OpenMP on clusters of SMP We
report below on a preliminary experiment of
hybrid parallelism on one node (16 procs) of an
IBM SP. Best results are obtained when using 8
MPI processes with 2 OpenMP threads each. Regular
problem from an 11pt discretization (Cubic grid
64x64x64), ND used. Results obtained by S.Pralet
(PhD Cerfacs).
Effect of taking the hybrid network into account.
Matrix PRE2, SCOTCH, 2 nodes of 16 processors of
an IBM SP. Results obtained by S. Pralet (PhD
CERFACS).
The MUMPS package has been partially supported by
the Esprit IV Project PARASOL and by CERFACS,
ENSEEIHT-IRIT, INRIA Rhône-Alpes,
LBNL-NERSC,PARALLAB and RAL. The authors are
Patrick Amestoy, Jean-Yves LExcellent, Iain Duff
and Jacko Koster. Functionalities related to
rank-revealing were first implemented by M. Tuma
(Institute of Computer Science, Academy of
Sciences of the Czech Republic), while he was at
CERFACS. We are also grateful to C. Bousquet, C.
Daniel, A. Guermouche, G. Richard, S. Pralet and
C. Vömel who have been working on some specific
parts of this software.
This poster was prepared by Jean-Yves
LExcellent (Jean-Yves.LExcellent_at_inrialpes.fr).