Parallel Algorithms and Parallel Computers ii in4026 Lecture 4

1 / 25

About This Presentation

Title:

Parallel Algorithms and Parallel Computers ii in4026 Lecture 4

Description:

There exists an (O(m x n x p) , O(1)) parallel algorithm to compute C = A x B on a CRCW-PRAM. ... Am on a CRCW-PRAM. Comment. combine the previous results: ... –

Number of Views:96

Avg rating:3.0/5.0

Slides: 26

Provided by: ceeswit

Category:

more less

Transcript and Presenter's Notes

Title: Parallel Algorithms and Parallel Computers ii in4026 Lecture 4

1
Parallel Algorithms and Parallel Computers
(ii)in4026 Lecture 4

Cees WitteveenC.Witteveen_at_tudelft.nl
Software TechnologyFaculty EEMCS, TU-Delft

2
Subjects

ExerciseDiscussion of last weeks exercise
(accelerated cascading)
Solving Linear EquationsTo solve a linear
equation Ax b, Gaussian Elimination (GE) is
used to reduce the equation to Ux b where L is
upper-triangular. Thereafter back- or forward
substitution is applied to solve x. We discuss a
pipelining technique for applying GE and a divide
and conquer approach for solving x from Ux b
Matrix multiplication special casesWe discuss
matrix multiplication for boolean matrices and
apply it to compute the transitive closure of a
relation.
Linear RecurrencesWe provide some methods to
solve linear (matrix) recurrencesand special
matrix multiplications.

3
Exercise

Let A be an (O(n log n), O(log n))-algorithm for
P. Algorithm B with
(O(n), O(log n/ (log log n)) reduces every
instance of P with c gt 1
without affecting the solution.
Construct an (O(n), O(log n)) -algorithm for P.

Note
To prove that O(log n / log log n) O(log (n/c)
/ log log (n/c)) ... O(log n)
we reason as followslet x log log n/ci
and y log log n. Clearly we have y p.x for
some p gt 1. Hence, 2x/x 2x.p/ x.p 2y/y,
solog (n/ci) /log log (n/ci) log n /log log
n.
This impliesO(log n / log log n) O(log (n/c)
/ log log (n/c)) ... O(log n / log log n
O(log n/ log log n) O(log n / log log
n)O(log n / log log n x log log n ) O(log n)

5
Solving Linear Equations
6
Linear systems Ax b

Let Ax b be a system of linear equations
standard construction to compute x
reduce A to (upper or lower) triangular form
withGaussian elimination
apply simple back or forward substitution
algorithm tocompute the values of the unknown
variables x1, ... , xn
construction and discussion of two parallel
algorithms
first phase parallelizing Gauss-elimination by
a standard pipelining approach
second phaseforward substitution a fast
parallel algorithm for solving linear systems
with triangular matrices.

7
Gaussian Elimination (GE)
see Grama, pages 352 - 357

GE principlefor k 1, . . . , n, the variable xk
is eliminated from equations k1 to n by
dividing row ak by akk
multiplying row ak by a suitable constant cki
and
then substracting the result ckiak from row aki
such that entry aki,k 0

ak 1
ak 2
ak k
ak k1
ak n
aki 1
aki 2
aki,j aki,j - aki k x ak kj / akk
8
Gaussian Elimination (GE)
see Grama, pages 352 - 357

GE principlefor equation k, k 1, . . . , n, the
variable xki , for i 1 to n - k, is
eliminated by
dividing row ak by akk
multiplying row ak by a suitable constant cki
and
then substracting the result ckiak from row aki
such that entry aki,k 0
The resulting matrix is an upper triangular
matrix U with ukk 1.
ComplexityWe need n (n-1) 1 n(n-1)/2
divisions plus (n-1)2 1 (n-1)n(2n-1)/6
substractions and (n-1)n(2n-1)/6
multiplications. The resulting number of
operations is therefore is W(n) (4n1)n(n-1)/6
O(n3)

9
Gaussian elimination
(this is from Grama)

parallelizing GE (naively)use a row-wise 1D
partitioning with one processor per
row.Divisions per row k cost n-k-1 time,
followed by n-k-1 multiplications and
substractions. So iteration step k costs
- 3(n-k-1) (multiplications divisions
substractions) - one-to-all broadcast costs to
distribute the results of multiplications to
other rows. Cost tstw(n-k-1)log n
computation time (summing over row-iterations)Tn
(n) 3n(n-1)/2 ts n log n tw n(n-1)/2 x log n
O(n2 log n)
Note that the work complexity equals n x Tp(n)
O(n3 log n)

10
Gaussian elimination

pipelining GE
We consider tasks pertaining to the k-th row as
consisting of the following sequence of
subtasks
a sequence of k-1 subtasks,where a result from
processor j (ck x aj ) is substracted from row
ak for j1, ... , k-1
a division step where row elements of row k are
divided by akk
a sequence of n-k subtasks where row ak is
multipliedwith a suitable scalar cki and the
result is communicated to the task pertaining to
the (ki)-th row.
Attach a separate process to each of these
tasks. Then the substasks can be pipelined. The
resulting time complexity is O(n2) Convince
yourself

11
solving Ax b

Assumptions
Anxn is not singular and triangular
aii 1 for i 1, . .., n
substitution sequential method
x1 b1, .... xi bi - ?j1..i-1 aij
xj
results in T(n) O(n2)
not suitable for fast parallel algorithm.Grama
discusses an O(n2 / vp ) algorithm using p
processors.We will discuss another much faster
method.

This is the result of phase 1 (Gaussian-eliminat
ion)
12
solving Ax b (ii)

Alternative approach using divide and conquer
assume triangular matrix A
compute A-1 with divide and conquer (to be shown
in next slides)
compute x A-1 b
total cost

T(n) O(log2 n) , W(n) O(n3)
T(n) O(log n), W(n) O(n2).
T(n) O( log2 n ), W(n) O(n3)
13
A-1 divide and conquer

Given an n x n lower triangular A
Find a parallel algorithm to compute A-1
Solution
partition A in 4 n/2 x n/2 blocks A then
we have A-1
Hence computing A-1 is reduced to computing A1-1
andA3-1, each being lower triangular of order
n/2 x n/2
The recurrence relations for computing T(n) and
W(n) areT(n) T(n/2) O(log n) ? T(n)
O(log2n) W(n) 2W(n/2) O(n3) ? W(n)
O(n3)

14
matrix multiplication special cases
15
Matrix multiplication

input Anxn, Bnxn, n 2k
output C A x B
begin1. for 1 ? i,j,k ? n pardo Ci,j,k
Ai,k x Bk,j2. for h1 to log n do for 1 ?
i,j ? n, 1 ? k ? n/2h pardo Ci,j,k
Ci,j,2k-1 Ci,j,2k
3. for 1 ? i,j ? n pardo Ci,j Ci,j,1
end

T(n) O(1), W(n) O(n3)
T(n)O(log n), W(n) O(n3)
T(n) O(1), W(n) O(n2)
Total T(n) O(log n), W(n) O(n3)
16
Matrix product specialisations

Let Amxn and Bnxp be matrices
There exists a (O(m x n x p), O(log n))
parallel-algorithm to compute C A x B on a
CREW-PRAM.
Comment This is a direct consequence of
preceding algorithm

17
Matrix product specialisations

If A and B are boolean, and C A x B then cij
(ai1?b1j) ? (ai2?b2j) ?.... ? (ain?bnj).
There exists an (O(m x n x p) , O(1)) parallel
algorithm to compute C A x B on a CRCW-PRAM.
CommentThe boolean or of n and- terms
(aik?bkj) can be computed on a CRCW-PRAM in
O(1)-time and O(n)-work. There are m x p such
combinations to be computed (in parallel). (See
next slide)

18
multiplication of boolean matrices
input Anxn, Bnxn, boolan matrices n
2k output C A x B boolean matrix begin1.
for 1 ? i,j ? n pardo Ci,j 0 2. for 1 ?
i,j,k ? n pardo Ci,j,k Ai,k ? Bk,j3.
for 1 ? i,j,k ? n pardo if Ci,j,k 1 then
Ci,j 1 end
T(n) O(1), W(n) O(n2)
T(n) O(1), W(n) O(n3)
T(n) O(1), W(n) O(n3)
19
Matrix product specialisations

If A n x n and m 2s then there exists an (O(s
x n3) , O(s log n)) -algorithm to compute Am.
Comment
Am can be computed in log m s iterations by
an obvious iterative algorithm using s matrix
multiplications each costing O(log n)-time and
O(n3)-work.

function power(A, m)var Z begin if m1 then
return A else if m ? 0 mod 2 then Z
power(A,m/2) return Z x Z else
return A x power(A, m-1)end
20
Matrix product specialisations

If A n x n is boolean and m 2s then there
exists an ( O(s x n3), O(s) )- algorithm to
compute Am on a CRCW-PRAM
Commentcombine the previous results(i)
boolean matrix multiplication can be performed
in O(1)-time and O(n3)-work
(ii) computing Am requires s log m iterations
of such matrix multiplications.

21
Application transitive closure

Let G (V,E) be a directed graph.
The boolean incidence-matrix Bnxn associated with
G is defined as follows 1 if (vi, vj) ?
E Bi,j 0 else
the transitive closure G (V, E) of G is
defined by (v, w) ? E iff v w or
? directed path from v to w in G
Problem compute B (the incidence matrix
associated with G) from B

22
Example transitive closure

Observe Bk is the matrix with the following
property Bki,j 1 iff there exists
a path of length k from vi to vj in G
proof by easy induction on path
length k, observing that
there exists a path of length
k1 from i to j iff there exists some node m,
1 m n, such that there exists a path
of length k from i to m and an
edge from m to j
Hence, the incidence-matrix B associated with G
equals B I ? B ? B2 ?. . . ? Bn (I ? B)n
Hence there exists an ( O( n3 log n) , O(log n)
) - algorithm to compute B from Bnxn

23
Linear recurrences

y1 b1 yi aiyi-1 bi, 2 ? i ?
n
Remarkif ai 1 then this problem reduces to
computing prefix-sums of ( b1, b2, . . , bn)
(Unfolding) Lety2i a2i y2i-1b2i a2i
(a2i-1y2i-2b2i-1) b2i
a2ia2i-1y2i-2 (a2i b2i-1 b2i)
ai y2(i-1) bi
Define zi y2i , 2 i n/2

z1 b1 zi ai zi-1 bi 2 ? i ?
n/2
24
Linear recurrences

y1 b1 yi aiyi-1 bi, 2 ? i ? n
z1 b1 zi aizi-1 bi, 2 ? i ? n/2
y1 b1
y2i zi
y2i-1 a2i-1y2i-2 b2i-1 a2i-1 z(i-1)
b2i-1

ai a2ia2i-1 bi (a2i b2i-1 b2i)
zi y2i
computing yi1n using zi1n/2
i 1
25
algorithm

lincur(a, b) input a, b sequences of
n-coefficientsoutput y sequence of terms
y1 b1, yi aiyi-1 bi
begin1. if n 1 then y1 b1 exit 2. for
1 ? i ? n/2 pardo ai a2ia2i-1 bi
a2ib2i-1 b2i 3. z lincur(a, b)4. for 1 ?
i ? n pardo i even ? yi
zi/2 i 1 ? y1 b1 else ? yi ai
z(i-1)/2 bi end

T O(1), W O(1)
T O(1), W O(n)
T T(n/2), W W(n/2)
T O(1), W O(n)
Total T(n) O(log n) , W(n) O(n)
26
application of recurrences

Given a sequence A (a0, a1, . . . , an )
representing the polynomial p(x) a0xn a1xn-1
. . . an-1x an and a given value x0
To computep(x0)
Solution compute yn in the linear recurrence
y0 a0 yi x0yi-1 ai , 1 ? i ?
nusing the previous algorithm in O(log n) time
and O(n) work

Write a Comment

User Comments (0)