Advanced Algorithms - PowerPoint PPT Presentation

About This Presentation
Title:

Advanced Algorithms

Description:

EREW/ERCW/CREW/CRCW. EREW: A program isnt allowed to access the same memory location ... Finding the upper tangent from CH(Qi) to CH(Qj) for each i, j pair. ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 34
Provided by: sony65
Learn more at: http://www.cs.fsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Advanced Algorithms


1
Advanced Algorithms
  • Piyush Kumar
  • (Lecture 12 Parallel Algorithms)

Courtesy Baker 05.
Welcome to COT5405
2
Parallel Models
  • An abstract description of a real world parallel
    machine.
  • Attempts to capture essential features (and
    suppress details?)
  • What other models have we seen so far?

RAM? External Memory Model?
3
RAM
  • Random Access Machine Model
  • Memory is a sequence of bits/words.
  • Each memory access takes O(1) time.
  • Basic operations take O(1) time
    Add/Mul/Xor/Sub/AND/not
  • Instructions can not be modified.
  • No consideration of memory hierarchies.
  • Has been very successful in modelling real world
    machines.

4
Parallel RAM aka PRAM
  • Generalization of RAM
  • P processors with their own programs (and unique
    id)
  • MIMD processors At each point in time the
    processors might be executing different
    instructions on different data.
  • Shared Memory
  • Instructions are synchronized among the
    processors

5
PRAM
Shared Memory
EREW/ERCW/CREW/CRCW
EREW A program isnt allowed to access the same
memory location at the same time.
6
Variants of CRCW
  • Common CRCW CW iff processors write same value.
  • Arbitrary CRCW
  • Priority CRCW
  • Combining CRCW

7
Why PRAM?
  • Lot of literature available on algorithms for
    PRAM.
  • One of the most clean models.
  • Focuses on what communication is needed ( and
    ignores the cost/means to do it)

8
PRAM Algorithm design.
  • Problem 1 Produce the sum of an array of n
    numbers.
  • RAM ?
  • PRAM ?

9
Problem 2 Prefix Computation
Let X s0, s1, , sn-1 be in a set S
Let Ä be a binary, associative, closed operator
with respect to S (usually Q(1) time MIN, MAX,
AND, , ...)
The result of s0Ä s1 ÄÄ sk is called the k-th
prefix
Computing all such n prefixes is the parallel
prefix computation
10
Prefix computation
  • Suffix computation is a similar problem.
  • Assumes Binary op takes O(1)
  • In RAM ?

11
Prefix Computation (Akl)
12
EREW PRAM Prefix computation
  • Assume PRAM has n processors and n is a power of
    2.
  • Input si for i 0,1, ... , n-1.
  • Algorithm Steps
  • for j 0 to (lg n) -1, do
  • for i 2j to n-1 do
  • h i - 2j
  • si sh Ä si
  • endfor
  • endfor

Total time in EREW PRAM?
13
Problem 3 Array packing
  • Assume that we have
  • an array of n elements, X x1, x2, ... , xn
  • Some array elements are marked (or
    distinguished).
  • The requirements of this problem are to
  • pack the marked elements in the front part of the
    array.
  • place the remaining elements in the back of the
    array.
  • While not a requirement, it is also desirable to
  • maintain the original order between the marked
    elements
  • maintain the original order between the unmarked
    elements

14
In RAM?
  • How would you do this?
  • Inplace?
  • Running time?
  • Any ideas on how to do this in PRAM?

15
EREW PRAM Algorithm
  • Set si in Pi to 1 if xi is marked and set si 0
    otherwise.
  • 2. Perform a prefix sum on S (s1, s2 ,..., sn)
    to obtain destination di si for each marked xi
    .
  • 3. All PEs set m sn , the total nr of marked
    elements.
  • 4. Pi sets si to 0 if xi is marked and otherwise
    sets si 1.
  • 5. Perform a prefix sum on S and set di si m
    for each unmarked xi .
  • 6. Each Pi copies array element xi into address
    di in X.

16
Array Packing
  • Assume n processors are used above.
  • Optimal prefix sums requires O(lg n) time.
  • The EREW broadcast of sn needed in Step 3 takes
    O(lg n) time using a binary tree in memory
  • All and other steps require constant time.
  • Runs in O(lg n) time and is cost optimal.
  • Maintains original order in unmarked group as
    well
  • Notes
  • Algorithm illustrates usefulness of Prefix Sums
  • There many applications for Array Packing
    algorithm

17
Problem 4 PRAM MergeSort
  • RAM Merge Sort Recursion?
  • PRAM Merge Sort recursion?
  • Can we speed up the merging?
  • Merging n elements with n processors can be done
    in O(log n) time.
  • Assume all elements are distinct
  • Rank(a, A) number of elements in A smaller than
    a. For example rank(8, 1,3,5,7,9) 4

18
PRAM Merging
A 2,3,10,15,16
B 1,8,12,14,19
Rank(2)1 Rank(3)1 Rank(10)2 Rank(15)4
Rank(16)4
Rank(1)0 Rank(8)2 Rank(12)3 Rank(14)3
Rank(19)5
1 2 3 4 5
1 2 3 4 5
19
PRAM Merge Sort
  • T(n) T(n/2) O(log n)
  • Using the idea of pipelined dc PRAM Mergesort
    can be done in O(log n).
  • DC is one of the most powerful techniques to
    solve problems in parallel.

20
Problem 5 Closest Pair
  • RAM Version ?

L
7
6
5
4
? min(12, 21)
3
2
1
21
Closest Pair RAM Version
Closest-Pair(p1, , pn) Compute separation
line L such that half the points are on one
side and half on the other side. ?1
Closest-Pair(left half) ?2
Closest-Pair(right half) ? min(?1, ?2)
Delete all points further than ? from separation
line L Sort remaining points by
y-coordinate. Scan points in y-order and
compare distance between each point and next
11 neighbors. If any of these distances is
less than ?, update ?. return ?.
O(n log n)
2T(n / 2)
O(n)
O(n log n)
O(n)
22
Closest Pair PRAM Version?
Closest-Pair(p1, , pn) Compute separation
line L such that half the points are on one
side and half on the other side. ?1
Closest-Pair(left half) ?2
Closest-Pair(right half) ? min(?1, ?2)
Delete all points further than ? from separation
line L Sort remaining points by
y-coordinate. Scan points in y-order and
compare distance between each point and next
11 neighbors. Find min of all these
distances, update ?. return ?.
O(1)
Use sorted lists
T(n / 2)
In parallel
Use presorting and prefix computation.
O(log n)
O(1)
O(log n)
Again use prefix computation.
Recurrence T(n) T(n/2) O(log n)
23
Problem 6 Planar Convex hulls
  • MergeHull (P)
  • HL MergeHull( Left of median)
  • HR MergeHull( Right of median)
  • Return JoinHulls(HL,HR)

Time complexity in RAM? Time complexity in PRAM?
24
Join_Hulls
25
Towards a betterPlanar Convex hull
  • Let Q q1, q2, . . . , qn be a set of points
    in the Euclidean plane (i.e., E2-space).
  • The convex hull of Q is denoted by CH(Q) and is
    the smallest convex polygon containing Q.
  • It is specified by listing its corner points
    (which are from Q) in order (e.g., clockwise
    order).
  • Usual Computational Geometry Assumptions
  • No three points lie on the same straight line.
  • No two points have the same x or y coordinate.
  • There are at least 4 points, as CH(Q) Q for
    n ? 3.

26
PRAM CONVEX HULL(n,Q, CH(Q))
  • Sort the points of Q by x-coordinate.
  • Partition Q into k ?n subsets Q1,Q2,. . . ,Qk
    of k points each such that a vertical line can
    separate Qi from Qj
  • Also, if i lt j, then Qi is left of Qj.
  • For i 1 to k , compute the convex hulls of Qi
    in parallel, as follows
  • if Qi ? 3, then CH(Qi) Qi
  • else (using k?n PEs) call PRAM CONVEX HULL(k,
    Qi, CH(Qi))
  • Merge the convex hulls in CH(Q1),CH(Q2), . . .
    ,CH(Qk) together.

27
Basic Idea
28
Last Step
  • The upper hull is found first. Then, the lower
    hull is found next using the same method.
  • Only finding the upper hull is described here
  • Upper lower convex hull points merged into
    ordered set
  • Each CH(Qi) has ?n PEs assigned to it.
  • The PEs assigned to CH(Qi) (in parallel) compute
    the upper tangent from CH(Qi) to another CH(Qj) .
  • A total of n-1 tangents are computed for each
    CH(Qi)
  • Details for computing the upper tangents will be
    separately

29
(No Transcript)
30
Last Step
  • Among the tangent lines to CH(Qi) , and polygons
    to the left of CH(Qi), let Li be the one with the
    smallest slope.
  • Among the tangent lines to CH(Qi) and polygons
    to the right, let Ri be the one with the largest
    slope.
  • If the angle between Li and Ri is less than 180
    degrees, no point of CH(Qi) is in CH(Q).
  • See Figure 5.13 on next slide (from Akls Online
    text)
  • Otherwise, all points in CH(Q) between where Li
    touches CH(Qi) and where Ri touches CH(Qi) are in
    CH(Q).
  • Array Packing is used to combine all convex hull
    points of CH(Q) after they are identified.

31
(No Transcript)
32
Complexity
  • Step 1 The sort takes O(lg n) time.
  • Step 2 Partition of Q into subsets takes O(1)
    time.
  • Step 3 The recursive calculations of CH(Qi) for
    1 ? i ??n in parallel takes t(n) time (using n
    PEs for each Qi).
  • Step 4 The big steps here require O(lgn) and
    are
  • Finding the upper tangent from CH(Qi) to CH(Qj)
    for each i, j pair.
  • Array packing used to form the ordered sequence
    of upper convex hull points for Q.
  • Above steps find the upper convex hull. The lower
    convex hull is found similarly.
  • Upper lower hulls merged in O(1) time to
    ordered set

33
Complexity
  • Cost for Step 3 Solving the recurrance relation
  • t(n) t(?n) ? lg n
  • yields
  • t(n) O(lg n)
  • Running time for PRAM Convex Hull is O(lg n)
    since this is maximum cost for each step.
  • Then the cost for PRAM Convex Hull is
  • C(n) O(n lg n).
Write a Comment
User Comments (0)
About PowerShow.com