Title: A survey of external graph algorithms
1A survey of external graph algorithms
2outlines
- Introduction
- Locality
- External graph algorithm
- Current best results
- Sparsification
- Simulating PRAM
- BFS Minimum Spanning Forest
- Point-to-point shortest path
3Does time complexity measure the running time?
- Problem loading a graph into adjacent lists
- Input a list of edges sorted by Link_From
- In the left one, we add the edge into the
link-list of FROM - In the right one, we add the edge into the
link-list of TO
- while (input not exhausted) read an edge and
allocate mem.ptr-gtnodeedge-gtto
ptr-gtnextlistedge-gtfrom listedge-gtfrompt
r
- while (input not exhausted) read an edge and
allocate mem. ptr-gtnodeedge-gtfrom
ptr-gtnextlistedge-gtto listedge-gttoptr1
The difference of running time is 2050
4Another test
- Data access in a large array
- Random vs Sequential
- 100M integers (400M bytes), no virtual memory
- Performance ratio 4004000
- 300M integers (1.2G bytes), virtual memory
- Performance ratio Very large
- Program running time does not only dependent on
computational time complexity
5Whats the problem?
secs
- To load a graph into memory
- edges stored in a file
- Sorted adjacency list
- How long does it take for a graph with 30M edges?
- Sorted 64 secs
- Unsorted gt10hrs!!
- Whats the time complexity?
- Linear time (bounded degree)?
- For massive data, CPU time is not the major
concern.
edges
The time to load a graph ( nodesedge/15)
6Memory hierarchy
ms
ns
10 ns
Access time
from Vitter 2007
7Locality
- Locality, working set
- Caching, prefetching
- System level (general) solutions
- External memory (or EM) algorithms/data
structures - The ones that explicitly manage data placement
and movement. - A.k.a I/O algorithms or out-of-core algorithms.
- The I/O complexity
- The number of communications between the internal
memory and the external memory. - Bottleneck of the performance of EM algorithms
8The reason for our tests
- The computer system supports virtual memory
- programmers can use very large memory as if it
was internal memory - Severe performance problem will be encountered if
data is not arranged well - For both cache memory and external memory
9- A nice survey paper
- External Memory Algorithms and Data Structures
Dealing with Massive Data, - ACM Computing Surveys, Vol. 33, No. 2, June 2001,
- February 2007 revision Available online at
Internet,
- Scott Vitter,
- Dean of Science
- Purdue University
10Parallel disk model
11Problem parameters
- Usually, we focus on the case P1 and regard DB
as the block size. - For graph problems, NVE.
12Fundamental operations
Note that the constant is somehow more important
when discussing I/O complexity.
13- In practice, sorting can be done in almost linear
numbers of I/Os - Logmn is almost constant for N1T, M1G, B10K,
gt n100M, m100K gt Logmnlt2
14Current best results
15Current best results (cont.)
16Current best results (cont.)
17Sparsification
Sparsification--A technique for speeding up
dynamic graph algorithms.D. Eppstein, Z. Galil,
G.F. Italiano, and A. Nissenzweig.FOCS, 1992,
pp. 60-69. J. ACM 44(5)669-696, 1997.
Giuseppe F. Italiano U. Roma "Tor Vergata"
Zvi Galil Columbia U. President, Tel-Aviv U.
David Eppstein UC, Irvine
Research is what I'm doing when I don't know
what I'm doing.
18- (often used to) convert I/O bounds
- from O(Sort(E)) to O((E/V)Sort(V))
- speedup logE/logV
19An illustration of sparsification MSF
- L Arges algorithm O((loglog(V/e))Sort(E))
- Improve by sparsification
- Partition graph into E/V sparse subgraphs, each
with V edges on the V vertices - Apply L Arges algorithm to each subgraph
- merge the E/V forests, two at a time, in a
balanced binary merging procedure by repeatedly
applying LArge.
20For all levels, total O((loglogB)(E/V)Sort(V))
Only E/2 edges left 2nd level total
(loglogB)((E/2))/VSort(V)
Total (loglogB)(E/V)Sort(V) in 1st level
Each (loglog(V/e))Sort(E)(loglogB)Sort(V) since
each has only V edges
21Why sparsification works
- After each merge, only O(V) data needed
- The same approach can be applied to connectivity,
biconnectivity, and maximal matching. - For example, for biconnected components,
- the merging process replacing each biconnected
component by a cycle. - The resulting graph has O(V ) size and contains
the necessary information.
22Simulation of parallel algorithms
Y.-J. Chiang, M. T. Goodrich, E. F. Grove, R.
Tamassia, D. E. Vengroff, and J. S. Vitter.
External-memory graph algorithms. SODA 1995,
ACM-SIAM.
Yi-Jen Chiang, PolyTechnic U BS NTU, 1986 PhD,
Brown University, 1995
T.H. Cormen, Virtual memory for data parallel
computing, PhD Thesis, MIT 1992
23An illustrationlist ranking
Input
3
output
1
7
4
2
5
8
6
9
After finding the ranks, we can rearrange the
nodes by sorting
24A parallel algorithm
Initial
Find a independent set
1
1
1
1
1
1
1
1
1
2
1
Merge with successors
1
2
2
1
1
2
1
2
1
recursively
1
2
3
3
4
2
1
25Simulating the PRAM algorithm
- In each phase
- Find the independent set O(sort(N))
- Several methods
- Take O(sort(N)) I/O to combine the nodes
- If the independent set is at lease cN, the data
size is decreased by a constant factor. - The total cost is also O(sort(N)) I/Os
26Duplicate Elimination in a Multiset BFS
- K. Munagala and A. Ranade,
- I/O-complexity of graph algorithms.
- SODA 1999
27Duplicate Elimination
- Input N integers in 1,P.
- Output an array C1..P, Ci1 if i exists.
- LB ?((N/P)Sort(P))
- UB (Algorithm)
- Divide N input records into N/P groups of P
records O(scan(N)) - sort the records within each group and construct
a vector of size P O((N/P)Sort(P)) - merge the vectors (OR-operation)O(scan(N))
- Scan(N)N/B (N/P)sort(P) (N/P)(P/B)logmP/B
28Connected component via BFS
- Input unordered edge list
- Output A list L1..n, Li is the smallest
vertex that vertex i is connected to. - sort the edges into ordered adjacency list and
get the degree and the ptr to its first edge of
each vertex. O((E/V)Sort(V)) - construct Front(t) for t1,2,
- construct Nbr(Front(t-1))
- O(Scan(E)V) in total, the term V for the
possible round off - remove duplicates O((E/V)Sort(V)V) in total
- eliminate from it those in Front(t-1) or
Front(t-2) O(Scan(E)V) in total
29Remark
- The algorithm is optimal for dense graph (E/VgtB).
- If the graph is sparse, we can use a
preprocessing algorithm to group vertices into
supper nodes. - The total I/O-complexity is at most an additional
factor loglog(VB/E) to the optimal.
30An external algorithm for MST
- L. Arge, G. S. Brodal, and L. Toma.
- On external-memory MST, SSSP, and multi-way
planar - graph separation.
- Journal of Algorithms, 2004.
31The Prims algorithm
A priority queue for vertices not in T Priority
of x d(x,T), the min distance from x to any in T
T
32- The steps in each iteration
- Extract-min u from priority queue
- Insert u into T
- Relax d(T,v)mind(T,v),d(T,u)w(u,v) for all
v not in T - Problem in EM how to avoid one I/O per relaxation
v
u
T
33L.Arges solution
- The priority queue Q is not for vertices but all
edges with one or both endpoints in T - When a vertex v is selected, insert all edges
incident to v. - how to check if both u and v are already in T
when extract an edge (u,v) from Q? - If so, (u,v) must appear in Q twice.
34I/O complexity
- V(E/B) I/Os read the adjacent list
- O(E) inserts and deletes on the queue
- amortized O((1/B)logM/B (N/B)) I/O per op.
- O(V(E/B) (E/B)logM/B (N/B)) O(VSort(E))
- Sort(N) O((N/B)logM/B (N/B))
- Comments ?????????
- Laziness -- more computation and less I/O
35MST vertex reduction
- The complexity O(VSort(E)) can be further
improved for sparse graph - Vertex reduction in each phase
- Choose a shortest edge for each vertex and make
it an MST edge - Also contract the two into one super-vertex
- After O(log(VB/E)) phases, the above method gives
an O(Sort(E)log(VB/E)) algorithm. - Further improved to O(Sort(E)loglog(VB/E))
36Computing Point-to-Point Shortest Paths from
External Memory
- A. V. Goldberg and R. Werneck, ALENEX '05 2005
Andrew Goldberg Microsoft Research Ph.D. in
Computer Science, MIT, 1987
37Shortest path problems
- Single source and Point-to-Point
- Worst case O(mnlogn) computational time
- not what we only concern for the P2P problem
- Dijkstras algorithm
- Relaxation for each arc (v,w), if d(w) gt
d(v)l(v,w) set d(w) d(v)l(v,w). - Starting from the source until the sink is met.
w
d(w)
s
d(v)
v
38Bidirectional algorithm
- Goldbergs data
- 1.6M vertices, 3.8M arcs, travel time metric.
Bidirectional
Dijkstra
39A algorithm
- Similar to Dijkstras algorithm but
- Domain-specific estimates pt(v) on dist(v, t) .
- At each step pick a labeled vertex with the
minimum k(v) d(v)pt(v). - Best estimate of path length throgh v.
- In general, optimality is not guaranteed.
- Find Optimal if pt(v) is under-estimate.
- To use A, we need a lower bound function pt(v)
of dist(v, t).
40ALT algorithms
- use A search and landmark-based lower bounds.
- Landmark a and b in the graph
- dist(v,w) dist(v, b)-dist(w, b)
- dist(v,w) dist(a,w)-dist(a, v).
- We choose several landmarks
- At preprocessing, compute dist(a,v)
- Select the largest LB
41How to choose the landmarks
- dist(v,w) dist(a,w)-dist(a, v).
- The equality holds when w is one the shortest
path from v to the landmark
Bidirectional ALT Example
42Reaches Gutman 04
- Consider a vertex v that splits a path P into P1
and P2. - rP(v) min(l(P1), l(P2)).
- r(v) maxP(rP(v)) over all shortest paths P
through v. - Using reaches to prune Dijkstra
- If r(w) lt min(d(v)l(v,w),LB(w, t)) then prune w.
43Short-cut
- We dont like many nodes with large reach
- Add short-cuts break ties by of hops.
44(No Transcript)
45(No Transcript)