Title: Symmetric Eigensolvers in Sca/LAPACK
1Symmetric Eigensolvers in Sca/LAPACK
- Osni Marques
- (oamarques_at_lbl.gov)
2LAPACK Symmetric Tridiagonal Eigensolvers
- QR (STEQR) all eigenvectors, O(n3)
- Bisection plus inverse iteration (STEVX) subset
of eigenvectors, O(n2) - Divide-and-conquer (STEDC) all eigenvectors,
faster than the the previous two but needs more
workspace. - Multiple relative robust representations (STEGR)
faster than all above for most matrices from
industrial and scientific applications, least
workspace.
Typical performance (timing) of different
eigensolvers on matrices coming from industrial
applications. In the picture, old refers to the
version currently available in LAPACK, which will
be soon replaced by a new and more robust
implementation n ranges from 1824 to 8012.
3The Essence of the MRRR Algorithm
- Factor T??ILDLT, (L,D) is a relative robust
representation RRR for the eigenvalue subset ? - determines all eigenvalues in ? to high relative
accuracy - small relative changes in entries of L and D
cause small relative changes in each eigenvalue
in ? - Given an RRR for a set of eigenvalues
- For each eigenvalue with a large relative gap
- Compute eigenvalue to high relative accuracy
- Compute the FP (Fernando Parlett) vector
(eigenvector) - For each of the remaining groups of eigenvalues
- Choose shift outside the group
- Compute new RRR, LDLTLDLT ??new I
- Refine the eigenvalues.
4Testing LAPACK functionalities
- At installation time optional and limited number
of test cases to verify the integrity of the
installation (LAPACK/TESTING) - During the development phase intensive and
stressful tests on a variety of computer
architectures
5Intensive Testing Requirements and Goals
- Generation of difficult test cases
- Bookkeeping of test cases (so that new or
competing algorithm can stressed in a similar
way) - Various platforms
- AMD Athlon
- AMD Opteron
- Itanium 2
- Pentium III
- Pentium 4
- POWER 3
- SGI IP35
- SUN sparcv9
- CRAY X1
- ?
- Various (Fortran) compilers Intel, SUN, SGI, IBM
- Accuracy
- Performance (time)
- Tuning of parameters (automatic or manual)
- Algorithmic choices (different IEEE variants)
Reveal different numerical behaviors (in
particular IEEE arithmetic features), as well as
performance issues
6Matrix Types
- Built-in matrices
- 1-2-1 tridiagonal matrix
- (1D Poisson equation)
- Wilkinson tridiagonal matrix
- (eigenvalues clustered in pairs)
- ?
- Built-in eigenvalue distributions
- repeated eigenvalues
- ?11 and ?i1/k, i2,3n
- ?11 and ?i1, i1,2n-1, ?n1/k
- geometric distribution
- ?i k (1- i)/(n-1), i1,2n
- different condition numbers (k)
- different random number distributions
- ? can be multiplied by random signs
- ?
- Glued matrices
- combinations of the above cases
- very tight eigenvalue clusters
- Eigenvalue distributions (D) read from files
QTDQ?T with random orthogonal Q - Tridiagonal matrices from real world applications
- Chemistry
- (analysis of molecules)
- Harwell-Boeing Collection (structural
engineering, etc) - University of Florida Collection
- (FEM analysis, NASA)
- Matrices from LAPACK users
- ?
- Lanczos algorithm without reorthogonalization to
provoke very close eigenvalues
7Examples of eigenvalue distributions of matrices
from applications
8Accuracy and timings for families of matrices,
for a number of different computer architectures
9Profiles
10What have we found?
- LAPACK 3.0 STEGR (and STEDC!) fails on some of
the new test matrices - Different matrix classes with different
challenges - STEGR about 10 times slower than STEDC for glued
Wilkinson matrices - Architecture differences
- Pentium slows when infinity occurs
- Vectorization issues on CRAY
- Reference tester for future development
11Parallel Eigensolvers
- PDSYEVX bisection inverse iteration
- PDSYEVD parallel divide and conquer (F. Tisseur)
- PDSYEVR MRRR (C. Vömel)
12Pitfalls of Parallelization
- Straightforward approach n eigenpairs, p
processors ? cyclic assignment of ? n/p
eigenpairs to each processor - Each processor computes orthogonal eigenvectors
- Orthogonality between processors is not
guaranteed - ScaLAPACK PDSYEVX can break!
13Parallelization the right way
14MRRR versus DC
(Tridiagonal part of PDSYEVR and PDSYEVD)
Lapw (n22908, A. Tate). Runtime and efficiency
of the tridiagonal MRRR/DC part on the IBM SP5.
Hubbard (n63504, Ward and Bai). Runtime and
efficiency of the tridiagonal MRRR/DC part on
the IBM SP5.
15References
- Performance and Accuracy of LAPACK's Symmetric
Tridiagonal Eigensolvers, J. Demmel, O. Marques,
B. Parlett, and C. Vömel. SIAM J. Sci. Comp.,
3015081526, 2008. - A Testing Infrastructure for Symmetric
Tridiagonal Eigensolvers, J. Demmel, O. Marques,
B. Parlett, and C. Vömel. ACM TOMS, 35, 2008. - Computations of Eigenpair Subsets with the MRRR
Algorithm, B. Parlett, O. Marques and C. Voemel.
Numerical Linear Algebra with Applications,
13643-653, 2006. - The Design and Implementation of the MRRR
Algorithm, I. Dhillon, B. Parlett, and C. Vömel.
Technical Report UT-CS-04-541, December, 2004. - ScaLAPACKS MRRR Algorithm, C. Vömel, LAPACK
Working Note 195, November 2007. - http//crd.lbl.gov/osni/Codes/stetester (source
code available upon request)