Title: A PARALLEL BISECTION ALGORITHM (WITHOUT COMMUNICATION)
1A PARALLEL BISECTION ALGORITHM (WITHOUT
COMMUNICATION)
- Rui Ralha
- DMAT, CMAT
- Univ. do Minho
- Portugal
- r_ralha_at_math.uminho.pt
2Acknowledgements
- CMAT
- FCT
- POCTI (European Union contribution)
- Prof. B. Parlett
3Outline
- Counting eigenvalues of symmetric tridiagonals
- The ScaLAPACKs routine
- A parallel algorithm without communication
- An alternative algorithm
- Some conclusions
4Counting eigenvalues
5Nonmonotonicity of Count(x)
6The ScaLAPACKs implementation (1)
7The ScaLAPACKs implementation (2)
- In 1 the authors wrote
- Ideally, we would like a bracketing algorithm
that was simultaneously parallel, load balanced,
devoid of communication, and correct in the face
of nonmonotonicity. We still do not know how to
achieve this completely in the most general
case, when different parallel processors do not
even possess the same floating point format, we
do not know how to implement a correct and
reasonably fast algorithm at all. Even when
floating point formats are the same, we do not
know how to avoid some global communication -
- and considered a bracketing algorithm to be
correct if - (1) every eigenvalue is computed
exactly once, - (2) the computed eigenvalues are
correct to within the user - specified error tolerance,
- (3) the computed eigenvalues are in
sorted order.
8The ScaLAPACKs implementation (3)
9The ScaLAPACKs implementation (4)
10Drawbacks of the ScaLAPACKs implementation
11A simple and incorrect parallel algorithm
(without communication)
- To partition the initial Gerschgorin interval
into p subintervals of equal width - and assign to processor i the task of finding
all the eigenvalues in the - ith subinterval . But, even with
processors with the same arithmetic - (nonmonotonic) the algorithm may be incorrect.
- For example, with np3, it may happen 1
- Therefore, the second eigenvalue will be computed
twice (processors 1 and 3)
12Parallel bisection for computing the eigenvalues
of -1 2 -1 with 100 processors
13Our proposal (1)
14Our proposal (2)
15Our proposal (3)
16Our proposal (4)
17Sorting eigenvalues
- For the Wilkinsons matrix of order 21 we have
- With single precision in Matlab we get
- With double precision we get
- We assume that eigenvalues are to be gathered in
a master - processor (this is a standard feature of
ScaLAPACK). Supose that the - master receives
(out of order) and knows that the - processor that computed has better
accuracy. Then, it keeps - and, if required, it corrects to be smaller
than .
18An alternative algorithm (1)
- Phase 1(equal for every processor) carry out a
(not too large) number of bisection steps in a
breadth first search to get a good picture of
the spectrum. Produces a number of intervals (at
least p number of processors). - Phase 2 distributes intervals to processors
trying to achieve load- balance (the same number
of eigenvalues to each processor) - Phase 3 each processor computes the assigned
eigenvalues to some prescribed accuracy
19An alternative algorithm (2)
20An alternative algorithm (3)
- Preliminar implementation (in Matlab)
- Finishes Phase 1 when enough intervals have been
produced such that, for each k1,,p-1, an end
point x of one of those intervals satisfies -
-
- This may affect the speedup by 10.
- This termination criteria for Phase 1 may be hard
(i.e, take too many bisection steps) to satisfy
in some cases.
21Parallel bisection for computing the eigenvalues
of -1 2 -1 of order 104
22Conclusions
- Parallel bracketing in ScaLAPACKs requires
global communication - We have proposed an algorithm that is
communication free and is load balanced in the
sense that each processor computes the same
number of eigenvalues (if p divides n) - In homogeneous systems, our algorithm produces
sorted eigenvalues even when the arithmetic is
nonmonotonic - In heterogeneous systems, eigenvalues may be
unsorted (they may be sorted by the master if
required)