A PARALLEL BISECTION ALGORITHM (WITHOUT COMMUNICATION) - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

A PARALLEL BISECTION ALGORITHM (WITHOUT COMMUNICATION)

Description:

and considered a bracketing algorithm to be correct if ... and, if required, it corrects to be smaller than . 18. An alternative algorithm (1) ... – PowerPoint PPT presentation

Number of Views:30

Avg rating:3.0/5.0

Slides: 23

Provided by: ccam1

Learn more at: https://www.cse.psu.edu

Category:

more less

Transcript and Presenter's Notes

Title: A PARALLEL BISECTION ALGORITHM (WITHOUT COMMUNICATION)

1
A PARALLEL BISECTION ALGORITHM (WITHOUT
COMMUNICATION)

Rui Ralha
DMAT, CMAT
Univ. do Minho
Portugal
r_ralha_at_math.uminho.pt

2
Acknowledgements

CMAT
FCT
POCTI (European Union contribution)
Prof. B. Parlett

3
Outline

Counting eigenvalues of symmetric tridiagonals
The ScaLAPACKs routine
A parallel algorithm without communication
An alternative algorithm
Some conclusions

4
Counting eigenvalues
5
Nonmonotonicity of Count(x)
6
The ScaLAPACKs implementation (1)
7
The ScaLAPACKs implementation (2)

In 1 the authors wrote
Ideally, we would like a bracketing algorithm
that was simultaneously parallel, load balanced,
devoid of communication, and correct in the face
of nonmonotonicity. We still do not know how to
achieve this completely in the most general
case, when different parallel processors do not
even possess the same floating point format, we
do not know how to implement a correct and
reasonably fast algorithm at all. Even when
floating point formats are the same, we do not
know how to avoid some global communication
and considered a bracketing algorithm to be
correct if
(1) every eigenvalue is computed
exactly once,
(2) the computed eigenvalues are
correct to within the user
specified error tolerance,
(3) the computed eigenvalues are in
sorted order.

8
The ScaLAPACKs implementation (3)
9
The ScaLAPACKs implementation (4)
10
Drawbacks of the ScaLAPACKs implementation
11
A simple and incorrect parallel algorithm
(without communication)

To partition the initial Gerschgorin interval
into p subintervals of equal width
and assign to processor i the task of finding
all the eigenvalues in the
ith subinterval . But, even with
processors with the same arithmetic
(nonmonotonic) the algorithm may be incorrect.
For example, with np3, it may happen 1
Therefore, the second eigenvalue will be computed
twice (processors 1 and 3)

12
Parallel bisection for computing the eigenvalues
of -1 2 -1 with 100 processors
13
Our proposal (1)
14
Our proposal (2)
15
Our proposal (3)
16
Our proposal (4)
17
Sorting eigenvalues

For the Wilkinsons matrix of order 21 we have
With single precision in Matlab we get
With double precision we get
We assume that eigenvalues are to be gathered in
a master
processor (this is a standard feature of
ScaLAPACK). Supose that the
master receives
(out of order) and knows that the
processor that computed has better
accuracy. Then, it keeps
and, if required, it corrects to be smaller
than .

18
An alternative algorithm (1)

Phase 1(equal for every processor) carry out a
(not too large) number of bisection steps in a
breadth first search to get a good picture of
the spectrum. Produces a number of intervals (at
least p number of processors).
Phase 2 distributes intervals to processors
trying to achieve load- balance (the same number
of eigenvalues to each processor)
Phase 3 each processor computes the assigned
eigenvalues to some prescribed accuracy

19
An alternative algorithm (2)

20
An alternative algorithm (3)

Preliminar implementation (in Matlab)
Finishes Phase 1 when enough intervals have been
produced such that, for each k1,,p-1, an end
point x of one of those intervals satisfies
This may affect the speedup by 10.
This termination criteria for Phase 1 may be hard
(i.e, take too many bisection steps) to satisfy
in some cases.

21
Parallel bisection for computing the eigenvalues
of -1 2 -1 of order 104
22
Conclusions

Parallel bracketing in ScaLAPACKs requires
global communication
We have proposed an algorithm that is
communication free and is load balanced in the
sense that each processor computes the same
number of eigenvalues (if p divides n)
In homogeneous systems, our algorithm produces
sorted eigenvalues even when the arithmetic is
nonmonotonic
In heterogeneous systems, eigenvalues may be
unsorted (they may be sorted by the master if
required)