Parallel Algorithms and Parallel Computers ii in4026 Lecture 4

1 / 25
About This Presentation
Title:

Parallel Algorithms and Parallel Computers ii in4026 Lecture 4

Description:

There exists an (O(m x n x p) , O(1)) parallel algorithm to compute C = A x B on a CRCW-PRAM. ... Am on a CRCW-PRAM. Comment. combine the previous results: ... –

Number of Views:96
Avg rating:3.0/5.0
Slides: 26
Provided by: ceeswit
Category:

less

Transcript and Presenter's Notes

Title: Parallel Algorithms and Parallel Computers ii in4026 Lecture 4


1
Parallel Algorithms and Parallel Computers
(ii)in4026 Lecture 4
  • Cees WitteveenC.Witteveen_at_tudelft.nl
  • Software TechnologyFaculty EEMCS, TU-Delft

2
Subjects
  • ExerciseDiscussion of last weeks exercise
    (accelerated cascading)
  • Solving Linear EquationsTo solve a linear
    equation Ax b, Gaussian Elimination (GE) is
    used to reduce the equation to Ux b where L is
    upper-triangular. Thereafter back- or forward
    substitution is applied to solve x. We discuss a
    pipelining technique for applying GE and a divide
    and conquer approach for solving x from Ux b
  • Matrix multiplication special casesWe discuss
    matrix multiplication for boolean matrices and
    apply it to compute the transitive closure of a
    relation.
  • Linear RecurrencesWe provide some methods to
    solve linear (matrix) recurrencesand special
    matrix multiplications.

3
Exercise
  • Let A be an (O(n log n), O(log n))-algorithm for
    P. Algorithm B with
  • (O(n), O(log n/ (log log n)) reduces every
    instance of P with c gt 1
  • without affecting the solution.
  • Construct an (O(n), O(log n)) -algorithm for P.

4
  • Note
  • To prove that O(log n / log log n) O(log (n/c)
    / log log (n/c)) ... O(log n)
  • we reason as followslet x log log n/ci
    and y log log n. Clearly we have y p.x for
    some p gt 1. Hence, 2x/x 2x.p/ x.p 2y/y,
    solog (n/ci) /log log (n/ci) log n /log log
    n.
  • This impliesO(log n / log log n) O(log (n/c)
    / log log (n/c)) ... O(log n / log log n
    O(log n/ log log n) O(log n / log log
    n)O(log n / log log n x log log n ) O(log n)

5
Solving Linear Equations
6
Linear systems Ax b
  • Let Ax b be a system of linear equations
  • standard construction to compute x
  • reduce A to (upper or lower) triangular form
    withGaussian elimination
  • apply simple back or forward substitution
    algorithm tocompute the values of the unknown
    variables x1, ... , xn
  • construction and discussion of two parallel
    algorithms
  • first phase parallelizing Gauss-elimination by
    a standard pipelining approach
  • second phaseforward substitution a fast
    parallel algorithm for solving linear systems
    with triangular matrices.

7
Gaussian Elimination (GE)
see Grama, pages 352 - 357
  • GE principlefor k 1, . . . , n, the variable xk
    is eliminated from equations k1 to n by
  • dividing row ak by akk
  • multiplying row ak by a suitable constant cki
    and
  • then substracting the result ckiak from row aki
    such that entry aki,k 0

ak 1
ak 2
ak k
ak k1
ak n
aki 1
aki 2
aki,j aki,j - aki k x ak kj / akk
8
Gaussian Elimination (GE)
see Grama, pages 352 - 357
  • GE principlefor equation k, k 1, . . . , n, the
    variable xki , for i 1 to n - k, is
    eliminated by
  • dividing row ak by akk
  • multiplying row ak by a suitable constant cki
    and
  • then substracting the result ckiak from row aki
    such that entry aki,k 0
  • The resulting matrix is an upper triangular
    matrix U with ukk 1.
  • ComplexityWe need n (n-1) 1 n(n-1)/2
    divisions plus (n-1)2 1 (n-1)n(2n-1)/6
    substractions and (n-1)n(2n-1)/6
    multiplications. The resulting number of
    operations is therefore is W(n) (4n1)n(n-1)/6
    O(n3)

9
Gaussian elimination
(this is from Grama)
  • parallelizing GE (naively)use a row-wise 1D
    partitioning with one processor per
    row.Divisions per row k cost n-k-1 time,
    followed by n-k-1 multiplications and
    substractions. So iteration step k costs
  • - 3(n-k-1) (multiplications divisions
    substractions) - one-to-all broadcast costs to
    distribute the results of multiplications to
    other rows. Cost tstw(n-k-1)log n
  • computation time (summing over row-iterations)Tn
    (n) 3n(n-1)/2 ts n log n tw n(n-1)/2 x log n
    O(n2 log n)
  • Note that the work complexity equals n x Tp(n)
    O(n3 log n)

10
Gaussian elimination
  • pipelining GE
  • We consider tasks pertaining to the k-th row as
    consisting of the following sequence of
    subtasks
  • a sequence of k-1 subtasks,where a result from
    processor j (ck x aj ) is substracted from row
    ak for j1, ... , k-1
  • a division step where row elements of row k are
    divided by akk
  • a sequence of n-k subtasks where row ak is
    multipliedwith a suitable scalar cki and the
    result is communicated to the task pertaining to
    the (ki)-th row.
  • Attach a separate process to each of these
    tasks. Then the substasks can be pipelined. The
    resulting time complexity is O(n2) Convince
    yourself

11
solving Ax b
  • Assumptions
  • Anxn is not singular and triangular
  • aii 1 for i 1, . .., n
  • substitution sequential method
  • x1 b1, .... xi bi - ?j1..i-1 aij
    xj
  • results in T(n) O(n2)
  • not suitable for fast parallel algorithm.Grama
    discusses an O(n2 / vp ) algorithm using p
    processors.We will discuss another much faster
    method.

This is the result of phase 1 (Gaussian-eliminat
ion)
12
solving Ax b (ii)
  • Alternative approach using divide and conquer
  • assume triangular matrix A
  • compute A-1 with divide and conquer (to be shown
    in next slides)
  • compute x A-1 b
  • total cost

T(n) O(log2 n) , W(n) O(n3)
T(n) O(log n), W(n) O(n2).
T(n) O( log2 n ), W(n) O(n3)
13
A-1 divide and conquer
  • Given an n x n lower triangular A
  • Find a parallel algorithm to compute A-1
  • Solution
  • partition A in 4 n/2 x n/2 blocks A then
    we have A-1
  • Hence computing A-1 is reduced to computing A1-1
    andA3-1, each being lower triangular of order
    n/2 x n/2
  • The recurrence relations for computing T(n) and
    W(n) areT(n) T(n/2) O(log n) ? T(n)
    O(log2n) W(n) 2W(n/2) O(n3) ? W(n)
    O(n3)

14
matrix multiplication special cases
15
Matrix multiplication
  • input Anxn, Bnxn, n 2k
  • output C A x B
  • begin1. for 1 ? i,j,k ? n pardo Ci,j,k
    Ai,k x Bk,j2. for h1 to log n do for 1 ?
    i,j ? n, 1 ? k ? n/2h pardo Ci,j,k
    Ci,j,2k-1 Ci,j,2k
  • 3. for 1 ? i,j ? n pardo Ci,j Ci,j,1
  • end

T(n) O(1), W(n) O(n3)
T(n)O(log n), W(n) O(n3)
T(n) O(1), W(n) O(n2)
Total T(n) O(log n), W(n) O(n3)
16
Matrix product specialisations
  • Let Amxn and Bnxp be matrices
  • There exists a (O(m x n x p), O(log n))
    parallel-algorithm to compute C A x B on a
    CREW-PRAM.
  • Comment This is a direct consequence of
    preceding algorithm

17
Matrix product specialisations
  • If A and B are boolean, and C A x B then cij
    (ai1?b1j) ? (ai2?b2j) ?.... ? (ain?bnj).
    There exists an (O(m x n x p) , O(1)) parallel
    algorithm to compute C A x B on a CRCW-PRAM.
  • CommentThe boolean or of n and- terms
    (aik?bkj) can be computed on a CRCW-PRAM in
    O(1)-time and O(n)-work. There are m x p such
    combinations to be computed (in parallel). (See
    next slide)

18
multiplication of boolean matrices
input Anxn, Bnxn, boolan matrices n
2k output C A x B boolean matrix begin1.
for 1 ? i,j ? n pardo Ci,j 0 2. for 1 ?
i,j,k ? n pardo Ci,j,k Ai,k ? Bk,j3.
for 1 ? i,j,k ? n pardo if Ci,j,k 1 then
Ci,j 1 end
T(n) O(1), W(n) O(n2)
T(n) O(1), W(n) O(n3)
T(n) O(1), W(n) O(n3)
19
Matrix product specialisations
  • If A n x n and m 2s then there exists an (O(s
    x n3) , O(s log n)) -algorithm to compute Am.
  • Comment
  • Am can be computed in log m s iterations by
    an obvious iterative algorithm using s matrix
    multiplications each costing O(log n)-time and
    O(n3)-work.

function power(A, m)var Z begin if m1 then
return A else if m ? 0 mod 2 then Z
power(A,m/2) return Z x Z else
return A x power(A, m-1)end
20
Matrix product specialisations
  • If A n x n is boolean and m 2s then there
    exists an ( O(s x n3), O(s) )- algorithm to
    compute Am on a CRCW-PRAM
  • Commentcombine the previous results(i)
    boolean matrix multiplication can be performed
    in O(1)-time and O(n3)-work
  • (ii) computing Am requires s log m iterations
    of such matrix multiplications.

21
Application transitive closure
  • Let G (V,E) be a directed graph.
  • The boolean incidence-matrix Bnxn associated with
    G is defined as follows 1 if (vi, vj) ?
    E Bi,j 0 else
  • the transitive closure G (V, E) of G is
    defined by (v, w) ? E iff v w or
    ? directed path from v to w in G
  • Problem compute B (the incidence matrix
    associated with G) from B

22
Example transitive closure
  • Observe Bk is the matrix with the following
    property Bki,j 1 iff there exists
    a path of length k from vi to vj in G
  • proof by easy induction on path
    length k, observing that
    there exists a path of length
    k1 from i to j iff there exists some node m,
    1 m n, such that there exists a path
    of length k from i to m and an
    edge from m to j
  • Hence, the incidence-matrix B associated with G
    equals B I ? B ? B2 ?. . . ? Bn (I ? B)n
  • Hence there exists an ( O( n3 log n) , O(log n)
    ) - algorithm to compute B from Bnxn

23
Linear recurrences
  • y1 b1 yi aiyi-1 bi, 2 ? i ?
    n
  • Remarkif ai 1 then this problem reduces to
    computing prefix-sums of ( b1, b2, . . , bn)
  • (Unfolding) Lety2i a2i y2i-1b2i a2i
    (a2i-1y2i-2b2i-1) b2i
    a2ia2i-1y2i-2 (a2i b2i-1 b2i)
    ai y2(i-1) bi
  • Define zi y2i , 2 i n/2

z1 b1 zi ai zi-1 bi 2 ? i ?
n/2
24
Linear recurrences
  • y1 b1 yi aiyi-1 bi, 2 ? i ? n
  • z1 b1 zi aizi-1 bi, 2 ? i ? n/2
  • y1 b1
  • y2i zi
  • y2i-1 a2i-1y2i-2 b2i-1 a2i-1 z(i-1)
    b2i-1

ai a2ia2i-1 bi (a2i b2i-1 b2i)
zi y2i
computing yi1n using zi1n/2
i 1
25
algorithm
  • lincur(a, b) input a, b sequences of
    n-coefficientsoutput y sequence of terms
    y1 b1, yi aiyi-1 bi
  • begin1. if n 1 then y1 b1 exit 2. for
    1 ? i ? n/2 pardo ai a2ia2i-1 bi
    a2ib2i-1 b2i 3. z lincur(a, b)4. for 1 ?
    i ? n pardo i even ? yi
    zi/2 i 1 ? y1 b1 else ? yi ai
    z(i-1)/2 bi end

T O(1), W O(1)
T O(1), W O(n)
T T(n/2), W W(n/2)
T O(1), W O(n)
Total T(n) O(log n) , W(n) O(n)
26
application of recurrences
  • Given a sequence A (a0, a1, . . . , an )
    representing the polynomial p(x) a0xn a1xn-1
    . . . an-1x an and a given value x0
  • To computep(x0)
  • Solution compute yn in the linear recurrence
    y0 a0 yi x0yi-1 ai , 1 ? i ?
    nusing the previous algorithm in O(log n) time
    and O(n) work
Write a Comment
User Comments (0)
About PowerShow.com