Title: Computer Science 320
1Computer Science 320
2Estimating p
Throw N darts, and let C be the number of darts
that land within the circle quadrant of a unit
circle Then, C / N should be about the same
ratio as circle area / square area Circles area
p R2, and circle quadrants area is p / 4,
where R 1 Then C / N p / 4, and p 4 C /
N
3Sequential Program PiSeq
// Generate n random points in the unit square,
count how many are in // the unit circle. count
0 for (long i 0 i lt N i) double x
prng.nextDouble() double y
prng.nextDouble() if (x x y y lt 1.0)
count // Stop timing. time
System.currentTimeMillis() // Print
results. System.out.println("pi 4 " count
" / " N " " (4.0
count / N))
4Parallel Program PiSmp3
new ParallelTeam().execute (new
ParallelRegion() public void run() throws
Exception execute (0, N-1, new
LongForLoop() // Set up per-thread
PRNG and counter. Random prng_thread
Random.getInstance(seed) long
count_thread 0 // Extra padding
to avert cache interference. long
pad0, pad1, pad2, pad3, pad4, pad5, pad6, pad7
long pad8, pad9, pada, padb, padc,
padd, pade, padf // Parallel loop
body. public void run (long first,
long last) // Skip PRNG ahead to
index ltfirstgt prng_thread.setSeed(
seed) prng_thread.skip(2
first) // Generate random
points. for (long i first i lt
last i) double x
prng_thread.nextDouble()
double y prng_thread.nextDouble()
if (x x y y lt 1.0)
count_thread
5Reduction Step, SMP-Style
static SharedLong count . . . . . . public
void finish() // Reduce per-thread counts
into shared count. count.addAndGet(count_threa
d)
6Monte Carlo Design for a Cluster
- Could keep global counter in process 0, but that
would involve too many messages - Use reduction instead, so message passing is
minimal - Each process has its own PRNG, with its own split
sequence
7Reduction vs Gather
- Could allocate an array of K cells for results,
where the ith processors result is in the ith
cell then gather these into process 0 and let
process 0 reduce the end result from these - Instead, the reduce method employs all processes
in computing the reduction
8Reduction in Cluster
- Concentrate data into fewer and fewer processes
- When K 8,
- processes 4-7 send their data to processes 0-3
- processes 2-3 send their results to processes 0-1
- process 1 sends its results to process 0
- At most log2(K) messages!
9Reduction Tree for K 8
Messages are sent in parallel at each level,
starting at the bottom When results have been
computed, messages are sent from the next level
10Example Add the Results
Initial state
After first set of messages
11Example Add the Results
After second set of messages
After third set of messages
12Its Automatic reduce
world.reduce(0, buf, InegerOp.SUM)
// Compute the count in each processor ... //
Perform the reduction step LongItemBuf buf new
LongItemBuf() buf.item count world.reduce(0,
buf, InegerOp.SUM) count buf.item ... ... if
(rank 0) // Output the count and the
estimate of PI
13Reduction in Mandelbrot Histogram
int histogram new intmaxiter
1 ... world.reduce(0, IntegerBuf.buffer(histogra
m), InegerOp.SUM)
14Reduction in Mandelbrot Histogram
int histogram new intmaxiter
1 ... world.reduce(0, IntegerBuf.buffer(histogra
m), InegerOp.SUM)