Advanced Algorithms - PowerPoint PPT Presentation

1 / 33

About This Presentation

Title:

Advanced Algorithms

Description:

EREW/ERCW/CREW/CRCW. EREW: A program isnt allowed to access the same memory location ... Finding the upper tangent from CH(Qi) to CH(Qj) for each i, j pair. ... – PowerPoint PPT presentation

Number of Views:60

Avg rating:3.0/5.0

Slides: 34

Provided by: sony65

Learn more at: http://www.cs.fsu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Advanced Algorithms

1
Advanced Algorithms

Piyush Kumar
(Lecture 12 Parallel Algorithms)

Courtesy Baker 05.
Welcome to COT5405
2
Parallel Models

An abstract description of a real world parallel
machine.
Attempts to capture essential features (and
suppress details?)
What other models have we seen so far?

RAM? External Memory Model?
3
RAM

Random Access Machine Model
Memory is a sequence of bits/words.
Each memory access takes O(1) time.
Basic operations take O(1) time
Add/Mul/Xor/Sub/AND/not
Instructions can not be modified.
No consideration of memory hierarchies.
Has been very successful in modelling real world
machines.

4
Parallel RAM aka PRAM

Generalization of RAM
P processors with their own programs (and unique
id)
MIMD processors At each point in time the
processors might be executing different
instructions on different data.
Shared Memory
Instructions are synchronized among the
processors

5
PRAM
Shared Memory
EREW/ERCW/CREW/CRCW
EREW A program isnt allowed to access the same
memory location at the same time.
6
Variants of CRCW

Common CRCW CW iff processors write same value.
Arbitrary CRCW
Priority CRCW
Combining CRCW

7
Why PRAM?

Lot of literature available on algorithms for
PRAM.
One of the most clean models.
Focuses on what communication is needed ( and
ignores the cost/means to do it)

8
PRAM Algorithm design.

Problem 1 Produce the sum of an array of n
numbers.
RAM ?
PRAM ?

9
Problem 2 Prefix Computation
Let X s0, s1, , sn-1 be in a set S
Let Ä be a binary, associative, closed operator
with respect to S (usually Q(1) time MIN, MAX,
AND, , ...)
The result of s0Ä s1 ÄÄ sk is called the k-th
prefix
Computing all such n prefixes is the parallel
prefix computation
10
Prefix computation

Suffix computation is a similar problem.
Assumes Binary op takes O(1)
In RAM ?

11
Prefix Computation (Akl)
12
EREW PRAM Prefix computation

Assume PRAM has n processors and n is a power of
2.
Input si for i 0,1, ... , n-1.
Algorithm Steps
for j 0 to (lg n) -1, do
for i 2j to n-1 do
h i - 2j
si sh Ä si
endfor
endfor

Total time in EREW PRAM?
13
Problem 3 Array packing

Assume that we have
an array of n elements, X x1, x2, ... , xn
Some array elements are marked (or
distinguished).
The requirements of this problem are to
pack the marked elements in the front part of the
array.
place the remaining elements in the back of the
array.
While not a requirement, it is also desirable to
maintain the original order between the marked
elements
maintain the original order between the unmarked
elements

14
In RAM?

How would you do this?
Inplace?
Running time?
Any ideas on how to do this in PRAM?

15
EREW PRAM Algorithm

Set si in Pi to 1 if xi is marked and set si 0
otherwise.
2. Perform a prefix sum on S (s1, s2 ,..., sn)
to obtain destination di si for each marked xi
.
3. All PEs set m sn , the total nr of marked
elements.
4. Pi sets si to 0 if xi is marked and otherwise
sets si 1.
5. Perform a prefix sum on S and set di si m
for each unmarked xi .
6. Each Pi copies array element xi into address
di in X.

16
Array Packing

Assume n processors are used above.
Optimal prefix sums requires O(lg n) time.
The EREW broadcast of sn needed in Step 3 takes
O(lg n) time using a binary tree in memory
All and other steps require constant time.
Runs in O(lg n) time and is cost optimal.
Maintains original order in unmarked group as
well
Notes
Algorithm illustrates usefulness of Prefix Sums
There many applications for Array Packing
algorithm

17
Problem 4 PRAM MergeSort

RAM Merge Sort Recursion?
PRAM Merge Sort recursion?
Can we speed up the merging?
Merging n elements with n processors can be done
in O(log n) time.
Assume all elements are distinct
Rank(a, A) number of elements in A smaller than
a. For example rank(8, 1,3,5,7,9) 4

18
PRAM Merging
A 2,3,10,15,16
B 1,8,12,14,19
Rank(2)1 Rank(3)1 Rank(10)2 Rank(15)4
Rank(16)4
Rank(1)0 Rank(8)2 Rank(12)3 Rank(14)3
Rank(19)5
1 2 3 4 5
1 2 3 4 5
19
PRAM Merge Sort

T(n) T(n/2) O(log n)
Using the idea of pipelined dc PRAM Mergesort
can be done in O(log n).
DC is one of the most powerful techniques to
solve problems in parallel.

20
Problem 5 Closest Pair

RAM Version ?

L
7
6
5
4
? min(12, 21)
3
2
1
21
Closest Pair RAM Version
Closest-Pair(p1, , pn) Compute separation
line L such that half the points are on one
side and half on the other side. ?1
Closest-Pair(left half) ?2
Closest-Pair(right half) ? min(?1, ?2)
Delete all points further than ? from separation
line L Sort remaining points by
y-coordinate. Scan points in y-order and
compare distance between each point and next
11 neighbors. If any of these distances is
less than ?, update ?. return ?.
O(n log n)
2T(n / 2)
O(n)
O(n log n)
O(n)
22
Closest Pair PRAM Version?
Closest-Pair(p1, , pn) Compute separation
line L such that half the points are on one
side and half on the other side. ?1
Closest-Pair(left half) ?2
Closest-Pair(right half) ? min(?1, ?2)
Delete all points further than ? from separation
line L Sort remaining points by
y-coordinate. Scan points in y-order and
compare distance between each point and next
11 neighbors. Find min of all these
distances, update ?. return ?.
O(1)
Use sorted lists
T(n / 2)
In parallel
Use presorting and prefix computation.
O(log n)
O(1)
O(log n)
Again use prefix computation.
Recurrence T(n) T(n/2) O(log n)
23
Problem 6 Planar Convex hulls

MergeHull (P)
HL MergeHull( Left of median)
HR MergeHull( Right of median)
Return JoinHulls(HL,HR)

Time complexity in RAM? Time complexity in PRAM?
24
Join_Hulls
25
Towards a betterPlanar Convex hull

Let Q q1, q2, . . . , qn be a set of points
in the Euclidean plane (i.e., E2-space).
The convex hull of Q is denoted by CH(Q) and is
the smallest convex polygon containing Q.
It is specified by listing its corner points
(which are from Q) in order (e.g., clockwise
order).
Usual Computational Geometry Assumptions
No three points lie on the same straight line.
No two points have the same x or y coordinate.
There are at least 4 points, as CH(Q) Q for
n ? 3.

26
PRAM CONVEX HULL(n,Q, CH(Q))

Sort the points of Q by x-coordinate.
Partition Q into k ?n subsets Q1,Q2,. . . ,Qk
of k points each such that a vertical line can
separate Qi from Qj
Also, if i lt j, then Qi is left of Qj.
For i 1 to k , compute the convex hulls of Qi
in parallel, as follows
if Qi ? 3, then CH(Qi) Qi
else (using k?n PEs) call PRAM CONVEX HULL(k,
Qi, CH(Qi))
Merge the convex hulls in CH(Q1),CH(Q2), . . .
,CH(Qk) together.

27
Basic Idea
28
Last Step

The upper hull is found first. Then, the lower
hull is found next using the same method.
Only finding the upper hull is described here
Upper lower convex hull points merged into
ordered set
Each CH(Qi) has ?n PEs assigned to it.
The PEs assigned to CH(Qi) (in parallel) compute
the upper tangent from CH(Qi) to another CH(Qj) .
A total of n-1 tangents are computed for each
CH(Qi)
Details for computing the upper tangents will be
separately

29
(No Transcript)
30
Last Step

Among the tangent lines to CH(Qi) , and polygons
to the left of CH(Qi), let Li be the one with the
smallest slope.
Among the tangent lines to CH(Qi) and polygons
to the right, let Ri be the one with the largest
slope.
If the angle between Li and Ri is less than 180
degrees, no point of CH(Qi) is in CH(Q).
See Figure 5.13 on next slide (from Akls Online
text)
Otherwise, all points in CH(Q) between where Li
touches CH(Qi) and where Ri touches CH(Qi) are in
CH(Q).
Array Packing is used to combine all convex hull
points of CH(Q) after they are identified.

31
(No Transcript)
32
Complexity

Step 1 The sort takes O(lg n) time.
Step 2 Partition of Q into subsets takes O(1)
time.
Step 3 The recursive calculations of CH(Qi) for
1 ? i ??n in parallel takes t(n) time (using n
PEs for each Qi).
Step 4 The big steps here require O(lgn) and
are
Finding the upper tangent from CH(Qi) to CH(Qj)
for each i, j pair.
Array packing used to form the ordered sequence
of upper convex hull points for Q.
Above steps find the upper convex hull. The lower
convex hull is found similarly.
Upper lower hulls merged in O(1) time to
ordered set

33
Complexity

Cost for Step 3 Solving the recurrance relation
t(n) t(?n) ? lg n
yields
t(n) O(lg n)
Running time for PRAM Convex Hull is O(lg n)
since this is maximum cost for each step.
Then the cost for PRAM Convex Hull is
C(n) O(n lg n).

Write a Comment

User Comments (0)