Shortest Path Algorithms - PowerPoint PPT Presentation

1 / 39

About This Presentation

Title:

Shortest Path Algorithms

Description:

Shortest Path Algorithms. Jim E. Jones. Talk Outline. Background for the ... for (k = 0; k n; k ) for (i = 0; i n; i ) for (j = 0; ... dIK. dKJ ... – PowerPoint PPT presentation

Number of Views:32

Avg rating:3.0/5.0

Slides: 40

Provided by: Jim4110

Category:

more less

Transcript and Presenter's Notes

Title: Shortest Path Algorithms

1
Shortest Path Algorithms

Jim E. Jones

2
Talk Outline

Background for the problem
Algorithms
Code Listings
Numerical Results
Conclusions and Issues

3
Talk Outline

Background for the problem
Algorithms
Code Listings
Numerical Results
Conclusions and Issues

4
Weighted Directed Graph
1
0
1
4
3
6
2
7
2
3
5
Distance Matrix
1
0
1
4
3
6
2
7
2
3
6
Shortest Path Problem

Given the adjacency matrix A.
Compute the distance matrix D.

7
Talk Outline

Background for the problem
Algorithms
Code Listings
Numerical Results
Conclusions and Issues

8
Floyds Algorithm
for (k 0 k lt n k) for (i 0 i lt n
i) for (j 0 j lt n j)
Dij min(Dij,
Dik Dkj)
9
Floyds Algorithm
Dij min(Dij, Dik Dkj)
j
dIJ
dKJ
k
dIK
i
10
Floyd Parallel 1

Give each processor a contiguous set of rows of A
and D. (Row-wise partition)
Can use at most N processors.

11
Floyd Parallel 1
for (k 0 k lt n k) for (i
i_local_start i lt i_local_end 1 i)
for (j 0 j lt n j)
Dij min(Dij, Dik
Dkj)
12
Floyd Parallel 1 (P3s view)
for (k 0 k lt n k) for (i
i_local_start i lt i_local_end 1 i)
for (j 0 j lt n j)
Dij min(Dij, Dik
Dkj)
kth row
my rows
13
Floyd Parallel 1 costs

Computation
T (N3/P) tc
Communication (broadcasts of kth row)
T N log(P) (a bN)
Overall
T (N3/P) tc N log(P) (a bN)

14
Floyd Parallel 2

Give each processor a contiguous block of A and
D. (row/column partition)
Can use up to N2 processors.

15
Floyd Parallel 2
for (k 0 k lt n k) for (i
i_local_start i lt i_local_end 1 i)
for (j j_local_start j lt j_local_end
j) Dij
min(Dij, Dik Dkj)

16
Floyd Parallel 2 (P14s view)
for (k 0 k lt n k) for (i
i_local_start i lt i_local_end 1 i)
for (j j_local_start j lt j_local_end
j) Dij
min(Dij, Dik Dkj)

my block
17
Floyd Parallel 2 (P14s view)
for (k 0 k lt n k) for (i
i_local_start i lt i_local_end 1 i)
for (j j_local_start j lt j_local_end
j) Dij
min(Dij, Dik Dkj)

kth row
my block
kth column
18
Floyd Parallel 2 costs

Computation
T (N3/P) tc
Communication (broadcasts of kth row)
T 2N log(sqrt(P)) (a bN/sqrt(P))
N log(P) (a bN/sqrt(P))
Overall
T (N3/P) tc N log(P) (a bN/sqrt(P))

19
Dijkstras Algorithm find shortest paths from
vertex s to all others. (Ds)
Ds 0 if (i!s) Di inf TV / set of all
vertices / for (k 0 k lt n k) find
i in T with min di for each edge (i,j)
with j in T if (dj gt di
aij) dj di
aij T - i
20
Dijkstras Algorithm
Dj min(Dj, Di Aij)
j
dJ
aIJ
I
dI
s
21
Dijkstra Parallel

Give each processor all of A and have it run
serial Dijkstra to compute contiguous rows of D.
Can use at most N processors. Local memory must
hold all of A.

A
D
22
Dijkstras Algorithm Parallel
for (s local_firstrow s lt local_lastrow
s) Dss 0 if (i!s) Dsi
inf TV / set of all vertices / for (k
0 k lt n k) find i in T with min
Dsi for each edge (i,j) with j in T
if (Dsj gt Dsi Aij)
Dsj Dsi Aij
T - i
23
Dijkstra Parallel costs

From literature, Dijkstras slower than Floyd by
a factor F1.6.
Computation
T (N3/P) F tc
No Communication

24
Cost summary
25
Talk Outline

Background for the problem
Algorithms
Code Listings
Numerical Results
Conclusions and Issues

26
Talk Outline

Background for the problem
Algorithms
Code Listings
Numerical Results
Conclusions and Issues

27
Run Times on Bluemarlin

On each run, the maximum time over all the
processors was recorded as the time for that run.
Three runs were made, and the median run time is
recorded in the following tables.

28
Serial Run Times

In serial, the Floyd code was faster than the
Dijkstra code.
Speed advantage for Floyd was less than reported
value, our F 1.15

29
Parallel Run Times
30
N720, Run times and Speed-up
Floyd 2 fastest up to P4 Dijkstra fastest
thereafter
Dijkstra has best speed ups
31
Models of performance

Ping pong test code to generate data
Least squares fit
a .002 sec/message
b 8x10-7 sec/double

32
Dijkstra Model Comparison

Model
T (N3/P) F tc
No communication, no a or b terms.

33
Floyd Model Comparison

Poor match between model and results
Model appears to overestimate the cost of
communication.

34
Talk Outline

Background for the problem
Algorithms
Code Listings
Numerical Results
Conclusions and Issues

35
The actual run times for the codes confirmed
expectations

Floyd faster than Dijkstra in serial
With increasing number of processors, Dijkstra
eventually becomes faster because no
communication occurs.
Speed ups were good for Floyd1, better for
Floyd2, and best (near perfect) for Dijkstra.
Worst speed up was with Floyd1 and N576, where
the speedup was 9.3 for the 16 processor run.

36
Model gave poor quantitative prediction for run
times of Floyd in parallel.