Title: Advanced Algorithms
1Advanced Algorithms
- Piyush Kumar
- (Lecture 12 Parallel Algorithms)
Courtesy Baker 05.
Welcome to COT5405
2Parallel Models
- An abstract description of a real world parallel
machine. - Attempts to capture essential features (and
suppress details?) - What other models have we seen so far?
RAM? External Memory Model?
3RAM
- Random Access Machine Model
- Memory is a sequence of bits/words.
- Each memory access takes O(1) time.
- Basic operations take O(1) time
Add/Mul/Xor/Sub/AND/not - Instructions can not be modified.
- No consideration of memory hierarchies.
- Has been very successful in modelling real world
machines.
4Parallel RAM aka PRAM
- Generalization of RAM
- P processors with their own programs (and unique
id) - MIMD processors At each point in time the
processors might be executing different
instructions on different data. - Shared Memory
- Instructions are synchronized among the
processors
5PRAM
Shared Memory
EREW/ERCW/CREW/CRCW
EREW A program isnt allowed to access the same
memory location at the same time.
6Variants of CRCW
- Common CRCW CW iff processors write same value.
- Arbitrary CRCW
- Priority CRCW
- Combining CRCW
7Why PRAM?
- Lot of literature available on algorithms for
PRAM. - One of the most clean models.
- Focuses on what communication is needed ( and
ignores the cost/means to do it)
8PRAM Algorithm design.
- Problem 1 Produce the sum of an array of n
numbers. - RAM ?
- PRAM ?
9Problem 2 Prefix Computation
Let X s0, s1, , sn-1 be in a set S
Let Ä be a binary, associative, closed operator
with respect to S (usually Q(1) time MIN, MAX,
AND, , ...)
The result of s0Ä s1 ÄÄ sk is called the k-th
prefix
Computing all such n prefixes is the parallel
prefix computation
10Prefix computation
- Suffix computation is a similar problem.
- Assumes Binary op takes O(1)
- In RAM ?
11Prefix Computation (Akl)
12EREW PRAM Prefix computation
- Assume PRAM has n processors and n is a power of
2. - Input si for i 0,1, ... , n-1.
- Algorithm Steps
- for j 0 to (lg n) -1, do
- for i 2j to n-1 do
- h i - 2j
- si sh Ä si
- endfor
- endfor
Total time in EREW PRAM?
13Problem 3 Array packing
- Assume that we have
- an array of n elements, X x1, x2, ... , xn
- Some array elements are marked (or
distinguished). - The requirements of this problem are to
- pack the marked elements in the front part of the
array. - place the remaining elements in the back of the
array. - While not a requirement, it is also desirable to
- maintain the original order between the marked
elements - maintain the original order between the unmarked
elements
14In RAM?
- How would you do this?
- Inplace?
- Running time?
- Any ideas on how to do this in PRAM?
15EREW PRAM Algorithm
- Set si in Pi to 1 if xi is marked and set si 0
otherwise. - 2. Perform a prefix sum on S (s1, s2 ,..., sn)
to obtain destination di si for each marked xi
. - 3. All PEs set m sn , the total nr of marked
elements. - 4. Pi sets si to 0 if xi is marked and otherwise
sets si 1. - 5. Perform a prefix sum on S and set di si m
for each unmarked xi . - 6. Each Pi copies array element xi into address
di in X.
16Array Packing
- Assume n processors are used above.
- Optimal prefix sums requires O(lg n) time.
- The EREW broadcast of sn needed in Step 3 takes
O(lg n) time using a binary tree in memory - All and other steps require constant time.
- Runs in O(lg n) time and is cost optimal.
- Maintains original order in unmarked group as
well - Notes
- Algorithm illustrates usefulness of Prefix Sums
- There many applications for Array Packing
algorithm
17Problem 4 PRAM MergeSort
- RAM Merge Sort Recursion?
- PRAM Merge Sort recursion?
- Can we speed up the merging?
- Merging n elements with n processors can be done
in O(log n) time. - Assume all elements are distinct
- Rank(a, A) number of elements in A smaller than
a. For example rank(8, 1,3,5,7,9) 4
18PRAM Merging
A 2,3,10,15,16
B 1,8,12,14,19
Rank(2)1 Rank(3)1 Rank(10)2 Rank(15)4
Rank(16)4
Rank(1)0 Rank(8)2 Rank(12)3 Rank(14)3
Rank(19)5
1 2 3 4 5
1 2 3 4 5
19PRAM Merge Sort
- T(n) T(n/2) O(log n)
- Using the idea of pipelined dc PRAM Mergesort
can be done in O(log n). - DC is one of the most powerful techniques to
solve problems in parallel.
20Problem 5 Closest Pair
L
7
6
5
4
? min(12, 21)
3
2
1
21Closest Pair RAM Version
Closest-Pair(p1, , pn) Compute separation
line L such that half the points are on one
side and half on the other side. ?1
Closest-Pair(left half) ?2
Closest-Pair(right half) ? min(?1, ?2)
Delete all points further than ? from separation
line L Sort remaining points by
y-coordinate. Scan points in y-order and
compare distance between each point and next
11 neighbors. If any of these distances is
less than ?, update ?. return ?.
O(n log n)
2T(n / 2)
O(n)
O(n log n)
O(n)
22Closest Pair PRAM Version?
Closest-Pair(p1, , pn) Compute separation
line L such that half the points are on one
side and half on the other side. ?1
Closest-Pair(left half) ?2
Closest-Pair(right half) ? min(?1, ?2)
Delete all points further than ? from separation
line L Sort remaining points by
y-coordinate. Scan points in y-order and
compare distance between each point and next
11 neighbors. Find min of all these
distances, update ?. return ?.
O(1)
Use sorted lists
T(n / 2)
In parallel
Use presorting and prefix computation.
O(log n)
O(1)
O(log n)
Again use prefix computation.
Recurrence T(n) T(n/2) O(log n)
23Problem 6 Planar Convex hulls
- MergeHull (P)
- HL MergeHull( Left of median)
- HR MergeHull( Right of median)
- Return JoinHulls(HL,HR)
Time complexity in RAM? Time complexity in PRAM?
24Join_Hulls
25Towards a betterPlanar Convex hull
- Let Q q1, q2, . . . , qn be a set of points
in the Euclidean plane (i.e., E2-space). - The convex hull of Q is denoted by CH(Q) and is
the smallest convex polygon containing Q. - It is specified by listing its corner points
(which are from Q) in order (e.g., clockwise
order). - Usual Computational Geometry Assumptions
- No three points lie on the same straight line.
- No two points have the same x or y coordinate.
- There are at least 4 points, as CH(Q) Q for
n ? 3.
26PRAM CONVEX HULL(n,Q, CH(Q))
- Sort the points of Q by x-coordinate.
- Partition Q into k ?n subsets Q1,Q2,. . . ,Qk
of k points each such that a vertical line can
separate Qi from Qj - Also, if i lt j, then Qi is left of Qj.
- For i 1 to k , compute the convex hulls of Qi
in parallel, as follows - if Qi ? 3, then CH(Qi) Qi
- else (using k?n PEs) call PRAM CONVEX HULL(k,
Qi, CH(Qi)) - Merge the convex hulls in CH(Q1),CH(Q2), . . .
,CH(Qk) together.
27Basic Idea
28Last Step
- The upper hull is found first. Then, the lower
hull is found next using the same method. - Only finding the upper hull is described here
- Upper lower convex hull points merged into
ordered set - Each CH(Qi) has ?n PEs assigned to it.
- The PEs assigned to CH(Qi) (in parallel) compute
the upper tangent from CH(Qi) to another CH(Qj) .
- A total of n-1 tangents are computed for each
CH(Qi) - Details for computing the upper tangents will be
separately
29(No Transcript)
30Last Step
- Among the tangent lines to CH(Qi) , and polygons
to the left of CH(Qi), let Li be the one with the
smallest slope. - Among the tangent lines to CH(Qi) and polygons
to the right, let Ri be the one with the largest
slope. - If the angle between Li and Ri is less than 180
degrees, no point of CH(Qi) is in CH(Q). - See Figure 5.13 on next slide (from Akls Online
text) - Otherwise, all points in CH(Q) between where Li
touches CH(Qi) and where Ri touches CH(Qi) are in
CH(Q). - Array Packing is used to combine all convex hull
points of CH(Q) after they are identified.
31(No Transcript)
32Complexity
- Step 1 The sort takes O(lg n) time.
- Step 2 Partition of Q into subsets takes O(1)
time. - Step 3 The recursive calculations of CH(Qi) for
1 ? i ??n in parallel takes t(n) time (using n
PEs for each Qi). - Step 4 The big steps here require O(lgn) and
are - Finding the upper tangent from CH(Qi) to CH(Qj)
for each i, j pair. - Array packing used to form the ordered sequence
of upper convex hull points for Q. - Above steps find the upper convex hull. The lower
convex hull is found similarly. - Upper lower hulls merged in O(1) time to
ordered set
33Complexity
- Cost for Step 3 Solving the recurrance relation
- t(n) t(?n) ? lg n
- yields
- t(n) O(lg n)
- Running time for PRAM Convex Hull is O(lg n)
since this is maximum cost for each step. - Then the cost for PRAM Convex Hull is
- C(n) O(n lg n).