Title: Parallel Algorithms and Parallel Computers ii in4026 Lecture 4
1Parallel Algorithms and Parallel Computers
(ii)in4026 Lecture 4
-
- Cees WitteveenC.Witteveen_at_tudelft.nl
- Software TechnologyFaculty EEMCS, TU-Delft
2Subjects
- ExerciseDiscussion of last weeks exercise
(accelerated cascading) - Solving Linear EquationsTo solve a linear
equation Ax b, Gaussian Elimination (GE) is
used to reduce the equation to Ux b where L is
upper-triangular. Thereafter back- or forward
substitution is applied to solve x. We discuss a
pipelining technique for applying GE and a divide
and conquer approach for solving x from Ux b - Matrix multiplication special casesWe discuss
matrix multiplication for boolean matrices and
apply it to compute the transitive closure of a
relation. - Linear RecurrencesWe provide some methods to
solve linear (matrix) recurrencesand special
matrix multiplications.
3Exercise
- Let A be an (O(n log n), O(log n))-algorithm for
P. Algorithm B with - (O(n), O(log n/ (log log n)) reduces every
instance of P with c gt 1 - without affecting the solution.
- Construct an (O(n), O(log n)) -algorithm for P.
4- Note
- To prove that O(log n / log log n) O(log (n/c)
/ log log (n/c)) ... O(log n) - we reason as followslet x log log n/ci
and y log log n. Clearly we have y p.x for
some p gt 1. Hence, 2x/x 2x.p/ x.p 2y/y,
solog (n/ci) /log log (n/ci) log n /log log
n. - This impliesO(log n / log log n) O(log (n/c)
/ log log (n/c)) ... O(log n / log log n
O(log n/ log log n) O(log n / log log
n)O(log n / log log n x log log n ) O(log n)
5Solving Linear Equations
6Linear systems Ax b
- Let Ax b be a system of linear equations
- standard construction to compute x
- reduce A to (upper or lower) triangular form
withGaussian elimination - apply simple back or forward substitution
algorithm tocompute the values of the unknown
variables x1, ... , xn - construction and discussion of two parallel
algorithms - first phase parallelizing Gauss-elimination by
a standard pipelining approach - second phaseforward substitution a fast
parallel algorithm for solving linear systems
with triangular matrices.
7Gaussian Elimination (GE)
see Grama, pages 352 - 357
- GE principlefor k 1, . . . , n, the variable xk
is eliminated from equations k1 to n by - dividing row ak by akk
- multiplying row ak by a suitable constant cki
and - then substracting the result ckiak from row aki
such that entry aki,k 0
ak 1
ak 2
ak k
ak k1
ak n
aki 1
aki 2
aki,j aki,j - aki k x ak kj / akk
8Gaussian Elimination (GE)
see Grama, pages 352 - 357
- GE principlefor equation k, k 1, . . . , n, the
variable xki , for i 1 to n - k, is
eliminated by - dividing row ak by akk
- multiplying row ak by a suitable constant cki
and - then substracting the result ckiak from row aki
such that entry aki,k 0 - The resulting matrix is an upper triangular
matrix U with ukk 1. - ComplexityWe need n (n-1) 1 n(n-1)/2
divisions plus (n-1)2 1 (n-1)n(2n-1)/6
substractions and (n-1)n(2n-1)/6
multiplications. The resulting number of
operations is therefore is W(n) (4n1)n(n-1)/6
O(n3)
9Gaussian elimination
(this is from Grama)
- parallelizing GE (naively)use a row-wise 1D
partitioning with one processor per
row.Divisions per row k cost n-k-1 time,
followed by n-k-1 multiplications and
substractions. So iteration step k costs - - 3(n-k-1) (multiplications divisions
substractions) - one-to-all broadcast costs to
distribute the results of multiplications to
other rows. Cost tstw(n-k-1)log n - computation time (summing over row-iterations)Tn
(n) 3n(n-1)/2 ts n log n tw n(n-1)/2 x log n
O(n2 log n) - Note that the work complexity equals n x Tp(n)
O(n3 log n)
10Gaussian elimination
- pipelining GE
- We consider tasks pertaining to the k-th row as
consisting of the following sequence of
subtasks - a sequence of k-1 subtasks,where a result from
processor j (ck x aj ) is substracted from row
ak for j1, ... , k-1 - a division step where row elements of row k are
divided by akk - a sequence of n-k subtasks where row ak is
multipliedwith a suitable scalar cki and the
result is communicated to the task pertaining to
the (ki)-th row. - Attach a separate process to each of these
tasks. Then the substasks can be pipelined. The
resulting time complexity is O(n2) Convince
yourself
11solving Ax b
- Assumptions
- Anxn is not singular and triangular
- aii 1 for i 1, . .., n
- substitution sequential method
- x1 b1, .... xi bi - ?j1..i-1 aij
xj - results in T(n) O(n2)
- not suitable for fast parallel algorithm.Grama
discusses an O(n2 / vp ) algorithm using p
processors.We will discuss another much faster
method.
This is the result of phase 1 (Gaussian-eliminat
ion)
12solving Ax b (ii)
- Alternative approach using divide and conquer
- assume triangular matrix A
- compute A-1 with divide and conquer (to be shown
in next slides) -
- compute x A-1 b
- total cost
T(n) O(log2 n) , W(n) O(n3)
T(n) O(log n), W(n) O(n2).
T(n) O( log2 n ), W(n) O(n3)
13A-1 divide and conquer
- Given an n x n lower triangular A
- Find a parallel algorithm to compute A-1
- Solution
- partition A in 4 n/2 x n/2 blocks A then
we have A-1 - Hence computing A-1 is reduced to computing A1-1
andA3-1, each being lower triangular of order
n/2 x n/2 - The recurrence relations for computing T(n) and
W(n) areT(n) T(n/2) O(log n) ? T(n)
O(log2n) W(n) 2W(n/2) O(n3) ? W(n)
O(n3)
14matrix multiplication special cases
15Matrix multiplication
- input Anxn, Bnxn, n 2k
- output C A x B
- begin1. for 1 ? i,j,k ? n pardo Ci,j,k
Ai,k x Bk,j2. for h1 to log n do for 1 ?
i,j ? n, 1 ? k ? n/2h pardo Ci,j,k
Ci,j,2k-1 Ci,j,2k - 3. for 1 ? i,j ? n pardo Ci,j Ci,j,1
- end
T(n) O(1), W(n) O(n3)
T(n)O(log n), W(n) O(n3)
T(n) O(1), W(n) O(n2)
Total T(n) O(log n), W(n) O(n3)
16Matrix product specialisations
- Let Amxn and Bnxp be matrices
- There exists a (O(m x n x p), O(log n))
parallel-algorithm to compute C A x B on a
CREW-PRAM. - Comment This is a direct consequence of
preceding algorithm
17Matrix product specialisations
- If A and B are boolean, and C A x B then cij
(ai1?b1j) ? (ai2?b2j) ?.... ? (ain?bnj).
There exists an (O(m x n x p) , O(1)) parallel
algorithm to compute C A x B on a CRCW-PRAM. - CommentThe boolean or of n and- terms
(aik?bkj) can be computed on a CRCW-PRAM in
O(1)-time and O(n)-work. There are m x p such
combinations to be computed (in parallel). (See
next slide)
18multiplication of boolean matrices
input Anxn, Bnxn, boolan matrices n
2k output C A x B boolean matrix begin1.
for 1 ? i,j ? n pardo Ci,j 0 2. for 1 ?
i,j,k ? n pardo Ci,j,k Ai,k ? Bk,j3.
for 1 ? i,j,k ? n pardo if Ci,j,k 1 then
Ci,j 1 end
T(n) O(1), W(n) O(n2)
T(n) O(1), W(n) O(n3)
T(n) O(1), W(n) O(n3)
19Matrix product specialisations
-
- If A n x n and m 2s then there exists an (O(s
x n3) , O(s log n)) -algorithm to compute Am. -
- Comment
- Am can be computed in log m s iterations by
an obvious iterative algorithm using s matrix
multiplications each costing O(log n)-time and
O(n3)-work.
function power(A, m)var Z begin if m1 then
return A else if m ? 0 mod 2 then Z
power(A,m/2) return Z x Z else
return A x power(A, m-1)end
20Matrix product specialisations
- If A n x n is boolean and m 2s then there
exists an ( O(s x n3), O(s) )- algorithm to
compute Am on a CRCW-PRAM - Commentcombine the previous results(i)
boolean matrix multiplication can be performed
in O(1)-time and O(n3)-work - (ii) computing Am requires s log m iterations
of such matrix multiplications.
21Application transitive closure
- Let G (V,E) be a directed graph.
- The boolean incidence-matrix Bnxn associated with
G is defined as follows 1 if (vi, vj) ?
E Bi,j 0 else - the transitive closure G (V, E) of G is
defined by (v, w) ? E iff v w or
? directed path from v to w in G - Problem compute B (the incidence matrix
associated with G) from B
22Example transitive closure
- Observe Bk is the matrix with the following
property Bki,j 1 iff there exists
a path of length k from vi to vj in G - proof by easy induction on path
length k, observing that
there exists a path of length
k1 from i to j iff there exists some node m,
1 m n, such that there exists a path
of length k from i to m and an
edge from m to j - Hence, the incidence-matrix B associated with G
equals B I ? B ? B2 ?. . . ? Bn (I ? B)n
- Hence there exists an ( O( n3 log n) , O(log n)
) - algorithm to compute B from Bnxn
23Linear recurrences
- y1 b1 yi aiyi-1 bi, 2 ? i ?
n - Remarkif ai 1 then this problem reduces to
computing prefix-sums of ( b1, b2, . . , bn) - (Unfolding) Lety2i a2i y2i-1b2i a2i
(a2i-1y2i-2b2i-1) b2i
a2ia2i-1y2i-2 (a2i b2i-1 b2i)
ai y2(i-1) bi - Define zi y2i , 2 i n/2
z1 b1 zi ai zi-1 bi 2 ? i ?
n/2
24Linear recurrences
- y1 b1 yi aiyi-1 bi, 2 ? i ? n
- z1 b1 zi aizi-1 bi, 2 ? i ? n/2
-
-
- y1 b1
- y2i zi
- y2i-1 a2i-1y2i-2 b2i-1 a2i-1 z(i-1)
b2i-1
ai a2ia2i-1 bi (a2i b2i-1 b2i)
zi y2i
computing yi1n using zi1n/2
i 1
25algorithm
- lincur(a, b) input a, b sequences of
n-coefficientsoutput y sequence of terms
y1 b1, yi aiyi-1 bi - begin1. if n 1 then y1 b1 exit 2. for
1 ? i ? n/2 pardo ai a2ia2i-1 bi
a2ib2i-1 b2i 3. z lincur(a, b)4. for 1 ?
i ? n pardo i even ? yi
zi/2 i 1 ? y1 b1 else ? yi ai
z(i-1)/2 bi end
T O(1), W O(1)
T O(1), W O(n)
T T(n/2), W W(n/2)
T O(1), W O(n)
Total T(n) O(log n) , W(n) O(n)
26application of recurrences
- Given a sequence A (a0, a1, . . . , an )
representing the polynomial p(x) a0xn a1xn-1
. . . an-1x an and a given value x0 - To computep(x0)
- Solution compute yn in the linear recurrence
y0 a0 yi x0yi-1 ai , 1 ? i ?
nusing the previous algorithm in O(log n) time
and O(n) work -