Symmetric Eigensolvers in Sca/LAPACK - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Symmetric Eigensolvers in Sca/LAPACK

Description:

Divide-and-conquer (STEDC): all eigenvectors, faster than the the previous two ... PDSYEVD: parallel divide and conquer (F. Tisseur) PDSYEVR: MRRR (C. V mel) ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 16
Provided by: osn6
Category:

less

Transcript and Presenter's Notes

Title: Symmetric Eigensolvers in Sca/LAPACK


1
Symmetric Eigensolvers in Sca/LAPACK
  • Osni Marques
  • (oamarques_at_lbl.gov)

2
LAPACK Symmetric Tridiagonal Eigensolvers
  • QR (STEQR) all eigenvectors, O(n3)
  • Bisection plus inverse iteration (STEVX) subset
    of eigenvectors, O(n2)
  • Divide-and-conquer (STEDC) all eigenvectors,
    faster than the the previous two but needs more
    workspace.
  • Multiple relative robust representations (STEGR)
    faster than all above for most matrices from
    industrial and scientific applications, least
    workspace.

Typical performance (timing) of different
eigensolvers on matrices coming from industrial
applications. In the picture, old refers to the
version currently available in LAPACK, which will
be soon replaced by a new and more robust
implementation n ranges from 1824 to 8012.
3
The Essence of the MRRR Algorithm
  • Factor T??ILDLT, (L,D) is a relative robust
    representation RRR for the eigenvalue subset ?
  • determines all eigenvalues in ? to high relative
    accuracy
  • small relative changes in entries of L and D
    cause small relative changes in each eigenvalue
    in ?
  • Given an RRR for a set of eigenvalues
  • For each eigenvalue with a large relative gap
  • Compute eigenvalue to high relative accuracy
  • Compute the FP (Fernando Parlett) vector
    (eigenvector)
  • For each of the remaining groups of eigenvalues
  • Choose shift outside the group
  • Compute new RRR, LDLTLDLT ??new I
  • Refine the eigenvalues.

4
Testing LAPACK functionalities
  • At installation time optional and limited number
    of test cases to verify the integrity of the
    installation (LAPACK/TESTING)
  • During the development phase intensive and
    stressful tests on a variety of computer
    architectures

5
Intensive Testing Requirements and Goals
  • Generation of difficult test cases
  • Bookkeeping of test cases (so that new or
    competing algorithm can stressed in a similar
    way)
  • Various platforms
  • AMD Athlon
  • AMD Opteron
  • Itanium 2
  • Pentium III
  • Pentium 4
  • POWER 3
  • SGI IP35
  • SUN sparcv9
  • CRAY X1
  • ?
  • Various (Fortran) compilers Intel, SUN, SGI, IBM
  • Accuracy
  • Performance (time)
  • Tuning of parameters (automatic or manual)
  • Algorithmic choices (different IEEE variants)

Reveal different numerical behaviors (in
particular IEEE arithmetic features), as well as
performance issues
6
Matrix Types
  • Built-in matrices
  • 1-2-1 tridiagonal matrix
  • (1D Poisson equation)
  • Wilkinson tridiagonal matrix
  • (eigenvalues clustered in pairs)
  • ?
  • Built-in eigenvalue distributions
  • repeated eigenvalues
  • ?11 and ?i1/k, i2,3n
  • ?11 and ?i1, i1,2n-1, ?n1/k
  • geometric distribution
  • ?i k (1- i)/(n-1), i1,2n
  • different condition numbers (k)
  • different random number distributions
  • ? can be multiplied by random signs
  • ?
  • Glued matrices
  • combinations of the above cases
  • very tight eigenvalue clusters
  • Eigenvalue distributions (D) read from files
    QTDQ?T with random orthogonal Q
  • Tridiagonal matrices from real world applications
  • Chemistry
  • (analysis of molecules)
  • Harwell-Boeing Collection (structural
    engineering, etc)
  • University of Florida Collection
  • (FEM analysis, NASA)
  • Matrices from LAPACK users
  • ?
  • Lanczos algorithm without reorthogonalization to
    provoke very close eigenvalues

7
Examples of eigenvalue distributions of matrices
from applications
8
Accuracy and timings for families of matrices,
for a number of different computer architectures
9
Profiles
10
What have we found?
  • LAPACK 3.0 STEGR (and STEDC!) fails on some of
    the new test matrices
  • Different matrix classes with different
    challenges
  • STEGR about 10 times slower than STEDC for glued
    Wilkinson matrices
  • Architecture differences
  • Pentium slows when infinity occurs
  • Vectorization issues on CRAY
  • Reference tester for future development

11
Parallel Eigensolvers
  • PDSYEVX bisection inverse iteration
  • PDSYEVD parallel divide and conquer (F. Tisseur)
  • PDSYEVR MRRR (C. Vömel)

12
Pitfalls of Parallelization
  • Straightforward approach n eigenpairs, p
    processors ? cyclic assignment of ? n/p
    eigenpairs to each processor
  • Each processor computes orthogonal eigenvectors
  • Orthogonality between processors is not
    guaranteed
  • ScaLAPACK PDSYEVX can break!

13
Parallelization the right way
14
MRRR versus DC
(Tridiagonal part of PDSYEVR and PDSYEVD)
Lapw (n22908, A. Tate). Runtime and efficiency
of the tridiagonal MRRR/DC part on the IBM SP5.
Hubbard (n63504, Ward and Bai). Runtime and
efficiency of the tridiagonal MRRR/DC part on
the IBM SP5.
15
References
  • Performance and Accuracy of LAPACK's Symmetric
    Tridiagonal Eigensolvers, J. Demmel, O. Marques,
    B. Parlett, and C. Vömel. SIAM J. Sci. Comp.,
    3015081526, 2008.
  • A Testing Infrastructure for Symmetric
    Tridiagonal Eigensolvers, J. Demmel, O. Marques,
    B. Parlett, and C. Vömel. ACM TOMS, 35, 2008.
  • Computations of Eigenpair Subsets with the MRRR
    Algorithm, B. Parlett, O. Marques and C. Voemel.
    Numerical Linear Algebra with Applications,
    13643-653, 2006.
  • The Design and Implementation of the MRRR
    Algorithm, I. Dhillon, B. Parlett, and C. Vömel.
    Technical Report UT-CS-04-541, December, 2004.
  • ScaLAPACKS MRRR Algorithm, C. Vömel, LAPACK
    Working Note 195, November 2007.
  • http//crd.lbl.gov/osni/Codes/stetester (source
    code available upon request)
Write a Comment
User Comments (0)
About PowerShow.com