AllPairs Shortest Paths

About This Presentation

Title:

AllPairs Shortest Paths

Description:

Algorithm HYPERCUBE SHORTEST PATH (A,C) Step 1: for j = 0 to n - 1 dopar ... Hypercube (Cube) Mesh. Mesh with Cut-Through Routing. Mesh with Cut-Through and ... – PowerPoint PPT presentation

Number of Views:44

Avg rating:3.0/5.0

Slides: 46

Provided by: jonapr

Category:

more less

Transcript and Presenter's Notes

Title: AllPairs Shortest Paths

1
All-Pairs Shortest Paths

Csc8530 Dr. Prasad
Jon A Preston
March 17, 2004

2
Outline

Review of graph theory
Problem definition
Sequential algorithms
Properties of interest
Parallel algorithm
Analysis
Recent research
References

3
Graph Terminology

G (V, E)
W weight matrix
wij weight/length of edge (vi, vj)
wij 8 if vi and vj are not connected by an edge
wii 0
Assume W has positive, 0, and negative values
For this problem, we cannot have a negative-sum
cycle in G

4
Weighted Graph and Weight Matrix
v1
v0
5
-4
v2
3
1
2
7
9
6
v3
v4
5
Directed Weighted Graph and Weight Matrix
v3
v0
-2
1
7
v2
v1
-1
2
5
9
6
3
4
0 1 2 3 4 5
v4
v5
6
All-Pairs Shortest Paths Problem Defined

For every pair of vertices vi and vj in V, it is
required to find the length of the shortest path
from vi to vj along edges in E.
Specifically, a matrix D is to be constructed
such that dij is the length of the shortest path
from vi to vj in G, for all i and j.
Length of a path (or cycle) is the sum of the
lengths (weights) of the edges forming it.

7
Sample Shortest Path
v3
v0
-2
1
7
v2
v1
2
-1
5
9
6
3
4
v4
v5
Shortest path from v0 to v4 is along edges (v0,
v1), (v1, v2), (v2, v4) and has length 6
8
Disallowing Negative-length Cycles

APSP does not allow for input to contain
negative-length cycles
This is necessary because
If such a cycle were to exist within a path from
vi to vj, then one could traverse this cycle
indefinitely, producing paths of ever shorter
lengths from vi to vj.
If a negative-length cycle exists, then all paths
which contain this cycle would have a length of
-8.

9
Recent Work on Sequential Algorithms

Floyd-Warshall algorithm is T(V3)
Appropriate for dense graphs E O(V2)
Johnsons algorithm
Appropriate for sparse graphs E O(V)
O(V2 log V V E) if using a Fibonacci heap
O(V E log V) if using binary min-heap
Shoshan and Zwick (1999)
Integer edge weights in 1, 2, , W
O(W V? p(V W)) where ? 2.376 and p is a polylog
function
Pettie (2002)
Allows real-weighted edges
O(V2 log log V V E)

Strassens Algorithm (matrix multiplication)
10
Properties of Interest

Let denote the length of the shortest path
from vi to vj that goes through at most k - 1
intermediate vertices (k hops)
wij (edge length from vi to vj)
If i ? j and there is no edge from vi to vj, then
Also,
Given that there are no negative weighted cycles
in G, there is no advantage in visiting any
vertex more than once in the shortest path from
vi to vj.
Since there are only n vertices in G,

11
Guaranteeing Shortest Paths

If the shortest path from vi to vj contains vr
and vs (where vr precedes vs)
The path from vr to vs must be minimal (or it
wouldnt exist in the shortest path)
Thus, to obtain the shortest path from vi to vj,
we can compute all combinations of optimal
sub-paths (whose concatenation is a path from vi
to vj), and then select the shortest one

vi
vs
vj
vr
MIN
MIN
MIN
? MINs
12
Iteratively Building Shortest Paths
v1
w1j
v2
w2j

vn
vi
vj
wnj
13
Recurrence Definition

For k gt 1,
Guarantees O(log k) steps to calculate

vi
vl
vj
MIN
MIN
k/2 vertices
k/2 vertices
k vertices
14
Similarity
15
Computing D

Let Dk matrix with entries dij for 0 i, j n
- 1.
Given D1, compute D2, D4, , Dm
D Dm
To calculate Dk from Dk/2, use special form of
matrix multiplication
? ? ?
? ? min

16
Modified Matrix Multiplication

Step 2 for r 0 to N 1 dopar
Cr Ar Br
end
Step 3 for m 2q to 3q 1 do
for all r ? N (rm 0) dopar
Cr min(Cr, Cr(m))

17
Modified Example
P101
P100
2 3
2 -4
P000
P001
From 9.2, after step (1.3)
1 -1
1 -2
3 -1
3 -2
P110
P010
P011
P111
4 -3
4 -4
18
Modified Example (step 2)
P101
P100
5
-2
P000
P001
From 9.2, after modified step 2
0
-1
2
1
P110
P010
P011
P111
1
0
19
Modified Example (step 3)
P101
P100
P000
P001
MIN
MIN
From 9.2, after modified step 3
0
-2
1
0
MIN
MIN
P110
P010
P011
P111
20
Hypercube Setup

Begin with a hypercube of n3 processors
Each has registers A, B, and C
Arrange them in an n ? n ? n array (cube)
Set A(0, j, k) wjk for 0 j, k n 1
i.e processors in positions (0, j, k) contain D1
W
When done, C(0, j, k) contains APSP Dm

21
Setup Example
D1 Wjk A(0, j, k)
v3
v0
-2
1
7
v2
v1
-1
2
5
9
6
3
4
v4
v5
22
APSP Parallel Algorithm

Algorithm HYPERCUBE SHORTEST PATH (A,C)
Step 1 for j 0 to n - 1 dopar
for k 0 to n - 1 dopar
B(0, j, k) A(0, j, k)
end for
end for
Step 2 for i 1 to do
(2.1) HYPERCUBE MATRIX MULTIPLICATION(A,B,C
)
(2.2) for j 0 to n - 1 dopar
for k 0 to n - 1 dopar
(i) A(0, j, k) C(0, j, k)
(ii) B(0, j, k) C(0, j, k)
end for
end for
end for

23
An Example
0 1 2 3 4 5
0 1 2 3 4 5
D1
D2
0 1 2 3 4 5
0 1 2 3 4 5
D4
D8
24
Analysis

Steps 1 and (2.2) require constant time
There are iterations of Step
(2.1)
Each requires O(log n) time
The overall running time is t(n) O(log2 n)
p(n) n3
Cost is c(n) p(n) t(n) O(n3 log2 n)
Efficiency is

25
Recent Research

Jenq and Sahni (1987) compared various parallel
algorithms for solving APSP empirically
Kumar and Singh (1991) used the isoefficiency
metric (developed by Kumar and Rao) to analyze
the scalability of parallel APSP algorithms
Hardware vs. scalability
Memory vs. scalability

26
Isoefficiency

For scalable algorithms (efficiency increases
monotonically as p remains constant and problem
size increases), efficiency can be maintained for
increasing processors provided that the problem
size also increases
Relates the problem size to the number of
processors necessary for an increase in speedup
in proportion to the number of processors used

27
Isoefficiency (cont)

Given an architecture, defines thedegree of
scalability
Tells us the required growth in problem size to
be able to efficiently utilize an increasing
number of processors
ExGiven an isoefficiency of kp3If p0 and w0,
speedup 0.8p0 (efficiency 0.8)If p1 2p0,
to maintain efficiency of 0.8 w1 23w0 8w0
Indicates the superiority of one algorithm over
another only when problem sizes are increased in
the range between the two isoefficiency functions

28
Isoefficiency (cont)

Given an architecture, defines thedegree of
scalability
Tells us the required growth in problem size to
be able to efficiently utilize an increasing
number of processors
ExGiven an isoefficiency of kp3If p0 and w0,
speedup 0.8p0 (efficiency 0.8)If w1 2w0,
to maintain efficiency of 0.8 p1 23w0 8w0
Indicates the superiority of one algorithm over
another only when problem sizes are increased in
the range between the two isoefficiency functions

Given an isoefficiency of kp3If p0 and w0,
speedup 0.8p0 (efficiency 0.8)If p1 2p0,
to maintain efficiency of 0.8 w1 23w0 8w0
29
Memory Overhead Factor (MOF)

Ratio
Total memory required for all processors
Memory required for the same problems size on
single processor
Wed like this to be lower!

30
Architectures Discussed

Shared Memory (CREW)
Hypercube (Cube)
Mesh
Mesh with Cut-Through Routing
Mesh with Cut-Through and Multicast Routing
Also examined fast and slow communication
technologies

31
Parallel APSP Algorithms

Floyd Checkerboard
Floyd Pipelined Checkerboard
Floyd Striped
Dijkstra Source-Partition
Dijkstra Source-Parallel

32
General Parallel Algorithm (Floyd)

Repeat steps 1 through 4 for k 1 to n
Step 1 If this processor has a segment of
Pk-1,k, then transmit it to all processors
that need it
Step 2 If this processor has a segment of
Pk-1k,, then transmit it to all processors
that need it
Step 3 Wait until the needed segments of
Pk-1,k and Pk-1k, have been received
Step 4 For all i, j in this processors
partition, computePki,j min Pk-1i,j,
Pk-1i,k Pk-1k,j

33
Floyd Checkerboard
Each cell is assigned to adifferent processor,
and thisprocessor is responsible forupdating
the cost matrixvalues at each iteration ofthe
Floyd algorithm. Steps 1 and 2 of the
GPFinvolve each of the processors sending
theirdata to the neighborcolumns and rows.
34
Floyd Pipelined Checkerboard
Similar to the preceding. Steps 1 and 2 of the
GPFinvolve each of the processors sending
theirdata to the neighborcolumns and
rows. The difference is that theprocessors are
notsynchronized and computeand send data ASAP
(orsends as soon as it receives).
35
Floyd Striped
Each column is assigned adifferent processor,
and thisprocessor is responsible forupdating
the cost matrixvalues at each iteration ofthe
Floyd algorithm. Step 1 of the GPFinvolves each
of the processors sending theirdata to the
neighborcolumns. Step 2 is notneeded (since
the columnis contained within theprocessor).
36
Dijkstra Source-Partition

Assumes Dijkstras Single-source Shortest Path is
equally distributed over p processors and
executed in parallel
Processor p finds shortest paths from each vertex
in its set to all other vertices in the graph
Fortunately, this approach involves no
inter-processor communication
Unfortunately, only n processors can be kept busy
Also, memory overhead is high since each
processors has a copy of the weight matrix

37
Dijkstras Source-Parallel

Motivated by keeping more processors busy
Run n copies of the Dijkstras SSP
Each copy runs on processors (p gt n)

38
Calculating Isoefficiency

Example Floyd Checkerboard
At most n2 processors can be kept busy
n must grow as T(vp) due to problem structure
By Floyd (sequential), Te T(n3)
Thus isoefficiency is v(p3) T(p1.5)
But what about communication

39
Calculating Isoefficiency (cont)

ts message startup time
tw per-word communication time
tc time to compute next iteration value for one
cell in matrix
m number words sent
d number hops between nodes
Hypercube
(ts tw m) log d time to deliver m words
2 (ts tw m) log p barrier synchronization
time (up down tree)
d vp
Step 1 (ts tw n/vp) log vp
Step 2 (ts tw n/vp) log vp
Step 3 (barrier synch) 2(ts tw) log p
Step 4 tcn2/p

Isoefficiency T(p1.5(log p)3)
40
Mathematical Details
How are n and p related?
41
Mathematical Details
42
Calculating Isoefficiency (cont)

ts message startup time
tw per-word communication time
tc time to compute next iteration value for one
cell in matrix
m number words sent
d number hops between nodes
Mesh
Step 1
Step 2
Step 3 (barrier synch)
Step 4 Te

Isoefficiency T(p3p2.25) T(p3)
43
Isoefficiency and MOF forAlgorithm
Architecture Combinations
44
Comparing Metrics

Weve used cost previously this semester(cost
p Tp)
But notice that the cost of all of the
architecture-algorithm combinations discussed
here is T(n3)
Clearly some are more scalable than others
Thus isoefficiency is a useful metric when
analyzing algorithms and architectures

45
References

Akl S. G. Parallel Computation Models and
Methods. Prentice Hall, Upper Saddle River NJ,
pp. 381-384,1997.
Cormen T. H., Leiserson C. E., Rivest R. L., and
Stein C. Introduction to Algorithms (2nd
Edition). The MIT Press, Cambridge MA, pp.
620-642, 2001.
Jenq J. and Sahni S. All Pairs Shortest Path on
a Hypercube Multiprocessor. In International
Conference on Parallel Processing. pp. 713-716,
1987.
Kumar V. and Singh V. Scalability of Parallel
Algorithms for the All Pairs Shortest Path
Problem. Journal of Parallel and Distributed
Computing, vol. 13, no. 2, Academic Press, San
Diego CA, pp. 124-138, 1991.
Pettie S. A Faster All-pairs Shortest Path
Algorithm for Real-weighted Sparse Graphs. In
Proc. 29th Int'l Colloq. on Automata, Languages,
and Programming (ICALP'02), LNCS vol. 2380, pp.
85-97, 2002.

Write a Comment

User Comments (0)

About PowerShow.com

AllPairs Shortest Paths - PowerPoint PPT Presentation

AllPairs Shortest Paths

Algorithm HYPERCUBE SHORTEST PATH (A,C) Step 1: for j = 0 to n - 1 dopar ... Hypercube (Cube) Mesh. Mesh with Cut-Through Routing. Mesh with Cut-Through and ... – PowerPoint PPT presentation