A survey of external graph algorithms - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

A survey of external graph algorithms

Description:

BFS & Minimum Spanning Forest. Point-to-point shortest path. Does time complexity measure the ... speedup: logE/logV. An illustration of sparsification: MSF ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 46
Provided by: pwsSt
Category:

less

Transcript and Presenter's Notes

Title: A survey of external graph algorithms


1
A survey of external graph algorithms
  • ???
  • ????

2
outlines
  • Introduction
  • Locality
  • External graph algorithm
  • Current best results
  • Sparsification
  • Simulating PRAM
  • BFS Minimum Spanning Forest
  • Point-to-point shortest path

3
Does time complexity measure the running time?
  • Problem loading a graph into adjacent lists
  • Input a list of edges sorted by Link_From
  • In the left one, we add the edge into the
    link-list of FROM
  • In the right one, we add the edge into the
    link-list of TO
  • while (input not exhausted) read an edge and
    allocate mem.ptr-gtnodeedge-gtto
    ptr-gtnextlistedge-gtfrom listedge-gtfrompt
    r
  • while (input not exhausted) read an edge and
    allocate mem. ptr-gtnodeedge-gtfrom
    ptr-gtnextlistedge-gtto listedge-gttoptr1

The difference of running time is 2050
4
Another test
  • Data access in a large array
  • Random vs Sequential
  • 100M integers (400M bytes), no virtual memory
  • Performance ratio 4004000
  • 300M integers (1.2G bytes), virtual memory
  • Performance ratio Very large
  • Program running time does not only dependent on
    computational time complexity

5
Whats the problem?
secs
  • To load a graph into memory
  • edges stored in a file
  • Sorted adjacency list
  • How long does it take for a graph with 30M edges?
  • Sorted 64 secs
  • Unsorted gt10hrs!!
  • Whats the time complexity?
  • Linear time (bounded degree)?
  • For massive data, CPU time is not the major
    concern.

edges
The time to load a graph ( nodesedge/15)
6
Memory hierarchy
ms
ns
10 ns
Access time
from Vitter 2007
7
Locality
  • Locality, working set
  • Caching, prefetching
  • System level (general) solutions
  • External memory (or EM) algorithms/data
    structures
  • The ones that explicitly manage data placement
    and movement.
  • A.k.a I/O algorithms or out-of-core algorithms.
  • The I/O complexity
  • The number of communications between the internal
    memory and the external memory.
  • Bottleneck of the performance of EM algorithms

8
The reason for our tests
  • The computer system supports virtual memory
  • programmers can use very large memory as if it
    was internal memory
  • Severe performance problem will be encountered if
    data is not arranged well
  • For both cache memory and external memory

9
  • A nice survey paper
  • External Memory Algorithms and Data Structures
    Dealing with Massive Data,
  • ACM Computing Surveys, Vol. 33, No. 2, June 2001,
  • February 2007 revision Available online at
    Internet,
  • Scott Vitter,
  • Dean of Science
  • Purdue University

10
Parallel disk model
11
Problem parameters
  • Usually, we focus on the case P1 and regard DB
    as the block size.
  • For graph problems, NVE.

12
Fundamental operations
Note that the constant is somehow more important
when discussing I/O complexity.
13
  • In practice, sorting can be done in almost linear
    numbers of I/Os
  • Logmn is almost constant for N1T, M1G, B10K,
    gt n100M, m100K gt Logmnlt2

14
Current best results
15
Current best results (cont.)
16
Current best results (cont.)
17
Sparsification
Sparsification--A technique for speeding up
dynamic graph algorithms.D. Eppstein, Z. Galil,
G.F. Italiano, and A. Nissenzweig.FOCS, 1992,
pp. 60-69. J. ACM 44(5)669-696, 1997.
Giuseppe F. Italiano U. Roma "Tor Vergata"
Zvi Galil Columbia U. President, Tel-Aviv U.
David Eppstein UC, Irvine
Research is what I'm doing when I don't know
what I'm doing.
18
  • (often used to) convert I/O bounds
  • from O(Sort(E)) to O((E/V)Sort(V))
  • speedup logE/logV

19
An illustration of sparsification MSF
  • L Arges algorithm O((loglog(V/e))Sort(E))
  • Improve by sparsification
  • Partition graph into E/V sparse subgraphs, each
    with V edges on the V vertices
  • Apply L Arges algorithm to each subgraph
  • merge the E/V forests, two at a time, in a
    balanced binary merging procedure by repeatedly
    applying LArge.

20
For all levels, total O((loglogB)(E/V)Sort(V))
Only E/2 edges left 2nd level total
(loglogB)((E/2))/VSort(V)
Total (loglogB)(E/V)Sort(V) in 1st level
Each (loglog(V/e))Sort(E)(loglogB)Sort(V) since
each has only V edges
21
Why sparsification works
  • After each merge, only O(V) data needed
  • The same approach can be applied to connectivity,
    biconnectivity, and maximal matching.
  • For example, for biconnected components,
  • the merging process replacing each biconnected
    component by a cycle.
  • The resulting graph has O(V ) size and contains
    the necessary information.

22
Simulation of parallel algorithms
Y.-J. Chiang, M. T. Goodrich, E. F. Grove, R.
Tamassia, D. E. Vengroff, and J. S. Vitter.
External-memory graph algorithms. SODA 1995,
ACM-SIAM.
Yi-Jen Chiang, PolyTechnic U BS NTU, 1986 PhD,
Brown University, 1995
T.H. Cormen, Virtual memory for data parallel
computing, PhD Thesis, MIT 1992
23
An illustrationlist ranking
Input
3
output
1
7
4
2
5
8
6
9
After finding the ranks, we can rearrange the
nodes by sorting
24
A parallel algorithm
Initial
Find a independent set
1
1
1
1
1
1
1
1
1
2
1
Merge with successors
1
2
2
1
1
2
1
2
1
recursively
1
2
3
3
4
2
1
25
Simulating the PRAM algorithm
  • In each phase
  • Find the independent set O(sort(N))
  • Several methods
  • Take O(sort(N)) I/O to combine the nodes
  • If the independent set is at lease cN, the data
    size is decreased by a constant factor.
  • The total cost is also O(sort(N)) I/Os

26
Duplicate Elimination in a Multiset BFS
  • K. Munagala and A. Ranade,
  • I/O-complexity of graph algorithms.
  • SODA 1999

27
Duplicate Elimination
  • Input N integers in 1,P.
  • Output an array C1..P, Ci1 if i exists.
  • LB ?((N/P)Sort(P))
  • UB (Algorithm)
  • Divide N input records into N/P groups of P
    records O(scan(N))
  • sort the records within each group and construct
    a vector of size P O((N/P)Sort(P))
  • merge the vectors (OR-operation)O(scan(N))
  • Scan(N)N/B (N/P)sort(P) (N/P)(P/B)logmP/B

28
Connected component via BFS
  • Input unordered edge list
  • Output A list L1..n, Li is the smallest
    vertex that vertex i is connected to.
  • sort the edges into ordered adjacency list and
    get the degree and the ptr to its first edge of
    each vertex. O((E/V)Sort(V))
  • construct Front(t) for t1,2,
  • construct Nbr(Front(t-1))
  • O(Scan(E)V) in total, the term V for the
    possible round off
  • remove duplicates O((E/V)Sort(V)V) in total
  • eliminate from it those in Front(t-1) or
    Front(t-2) O(Scan(E)V) in total

29
Remark
  • The algorithm is optimal for dense graph (E/VgtB).
  • If the graph is sparse, we can use a
    preprocessing algorithm to group vertices into
    supper nodes.
  • The total I/O-complexity is at most an additional
    factor loglog(VB/E) to the optimal.

30
An external algorithm for MST
  • L. Arge, G. S. Brodal, and L. Toma.
  • On external-memory MST, SSSP, and multi-way
    planar
  • graph separation.
  • Journal of Algorithms, 2004.

31
The Prims algorithm
A priority queue for vertices not in T Priority
of x d(x,T), the min distance from x to any in T
T
32
  • The steps in each iteration
  • Extract-min u from priority queue
  • Insert u into T
  • Relax d(T,v)mind(T,v),d(T,u)w(u,v) for all
    v not in T
  • Problem in EM how to avoid one I/O per relaxation

v
u
T
33
L.Arges solution
  • The priority queue Q is not for vertices but all
    edges with one or both endpoints in T
  • When a vertex v is selected, insert all edges
    incident to v.
  • how to check if both u and v are already in T
    when extract an edge (u,v) from Q?
  • If so, (u,v) must appear in Q twice.

34
I/O complexity
  • V(E/B) I/Os read the adjacent list
  • O(E) inserts and deletes on the queue
  • amortized O((1/B)logM/B (N/B)) I/O per op.
  • O(V(E/B) (E/B)logM/B (N/B)) O(VSort(E))
  • Sort(N) O((N/B)logM/B (N/B))
  • Comments ?????????
  • Laziness -- more computation and less I/O

35
MST vertex reduction
  • The complexity O(VSort(E)) can be further
    improved for sparse graph
  • Vertex reduction in each phase
  • Choose a shortest edge for each vertex and make
    it an MST edge
  • Also contract the two into one super-vertex
  • After O(log(VB/E)) phases, the above method gives
    an O(Sort(E)log(VB/E)) algorithm.
  • Further improved to O(Sort(E)loglog(VB/E))

36
Computing Point-to-Point Shortest Paths from
External Memory
  • A. V. Goldberg and R. Werneck, ALENEX '05 2005

Andrew Goldberg Microsoft Research Ph.D. in
Computer Science, MIT, 1987
37
Shortest path problems
  • Single source and Point-to-Point
  • Worst case O(mnlogn) computational time
  • not what we only concern for the P2P problem
  • Dijkstras algorithm
  • Relaxation for each arc (v,w), if d(w) gt
    d(v)l(v,w) set d(w) d(v)l(v,w).
  • Starting from the source until the sink is met.

w
d(w)
s
d(v)
v
38
Bidirectional algorithm
  • Goldbergs data
  • 1.6M vertices, 3.8M arcs, travel time metric.

Bidirectional
Dijkstra
39
A algorithm
  • Similar to Dijkstras algorithm but
  • Domain-specific estimates pt(v) on dist(v, t) .
  • At each step pick a labeled vertex with the
    minimum k(v) d(v)pt(v).
  • Best estimate of path length throgh v.
  • In general, optimality is not guaranteed.
  • Find Optimal if pt(v) is under-estimate.
  • To use A, we need a lower bound function pt(v)
    of dist(v, t).

40
ALT algorithms
  • use A search and landmark-based lower bounds.
  • Landmark a and b in the graph
  • dist(v,w) dist(v, b)-dist(w, b)
  • dist(v,w) dist(a,w)-dist(a, v).
  • We choose several landmarks
  • At preprocessing, compute dist(a,v)
  • Select the largest LB

41
How to choose the landmarks
  • dist(v,w) dist(a,w)-dist(a, v).
  • The equality holds when w is one the shortest
    path from v to the landmark

Bidirectional ALT Example
42
Reaches Gutman 04
  • Consider a vertex v that splits a path P into P1
    and P2.
  • rP(v) min(l(P1), l(P2)).
  • r(v) maxP(rP(v)) over all shortest paths P
    through v.
  • Using reaches to prune Dijkstra
  • If r(w) lt min(d(v)l(v,w),LB(w, t)) then prune w.

43
Short-cut
  • We dont like many nodes with large reach
  • Add short-cuts break ties by of hops.

44
(No Transcript)
45
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com