A survey of external graph algorithms - PowerPoint PPT Presentation

1 / 45

About This Presentation

Title:

A survey of external graph algorithms

Description:

BFS & Minimum Spanning Forest. Point-to-point shortest path. Does time complexity measure the ... speedup: logE/logV. An illustration of sparsification: MSF ... – PowerPoint PPT presentation

Number of Views:45

Avg rating:3.0/5.0

Slides: 46

Provided by: pwsSt

Category:

more less

Transcript and Presenter's Notes

Title: A survey of external graph algorithms

1
A survey of external graph algorithms

???
????

2
outlines

Introduction
Locality
External graph algorithm
Current best results
Sparsification
Simulating PRAM
BFS Minimum Spanning Forest
Point-to-point shortest path

3
Does time complexity measure the running time?

Problem loading a graph into adjacent lists
Input a list of edges sorted by Link_From
In the left one, we add the edge into the
link-list of FROM
In the right one, we add the edge into the
link-list of TO

while (input not exhausted) read an edge and
allocate mem.ptr-gtnodeedge-gtto
ptr-gtnextlistedge-gtfrom listedge-gtfrompt
r

while (input not exhausted) read an edge and
allocate mem. ptr-gtnodeedge-gtfrom
ptr-gtnextlistedge-gtto listedge-gttoptr1

The difference of running time is 2050
4
Another test

Data access in a large array
Random vs Sequential
100M integers (400M bytes), no virtual memory
Performance ratio 4004000
300M integers (1.2G bytes), virtual memory
Performance ratio Very large
Program running time does not only dependent on
computational time complexity

5
Whats the problem?
secs

To load a graph into memory
edges stored in a file
Sorted adjacency list
How long does it take for a graph with 30M edges?
Sorted 64 secs
Unsorted gt10hrs!!
Whats the time complexity?
Linear time (bounded degree)?
For massive data, CPU time is not the major
concern.

edges
The time to load a graph ( nodesedge/15)
6
Memory hierarchy
ms
ns
10 ns
Access time
from Vitter 2007
7
Locality

Locality, working set
Caching, prefetching
System level (general) solutions
External memory (or EM) algorithms/data
structures
The ones that explicitly manage data placement
and movement.
A.k.a I/O algorithms or out-of-core algorithms.
The I/O complexity
The number of communications between the internal
memory and the external memory.
Bottleneck of the performance of EM algorithms

8
The reason for our tests

The computer system supports virtual memory
programmers can use very large memory as if it
was internal memory
Severe performance problem will be encountered if
data is not arranged well
For both cache memory and external memory

A nice survey paper
External Memory Algorithms and Data Structures
Dealing with Massive Data,
ACM Computing Surveys, Vol. 33, No. 2, June 2001,
February 2007 revision Available online at
Internet,

Scott Vitter,
Dean of Science
Purdue University

10
Parallel disk model
11
Problem parameters

Usually, we focus on the case P1 and regard DB
as the block size.
For graph problems, NVE.

12
Fundamental operations
Note that the constant is somehow more important
when discussing I/O complexity.
13

In practice, sorting can be done in almost linear
numbers of I/Os
Logmn is almost constant for N1T, M1G, B10K,
gt n100M, m100K gt Logmnlt2

14
Current best results
15
Current best results (cont.)
16
Current best results (cont.)
17
Sparsification
Sparsification--A technique for speeding up
dynamic graph algorithms.D. Eppstein, Z. Galil,
G.F. Italiano, and A. Nissenzweig.FOCS, 1992,
pp. 60-69. J. ACM 44(5)669-696, 1997.
Giuseppe F. Italiano U. Roma "Tor Vergata"
Zvi Galil Columbia U. President, Tel-Aviv U.
David Eppstein UC, Irvine
Research is what I'm doing when I don't know
what I'm doing.
18

(often used to) convert I/O bounds
from O(Sort(E)) to O((E/V)Sort(V))
speedup logE/logV

19
An illustration of sparsification MSF

L Arges algorithm O((loglog(V/e))Sort(E))
Improve by sparsification
Partition graph into E/V sparse subgraphs, each
with V edges on the V vertices
Apply L Arges algorithm to each subgraph
merge the E/V forests, two at a time, in a
balanced binary merging procedure by repeatedly
applying LArge.

20
For all levels, total O((loglogB)(E/V)Sort(V))
Only E/2 edges left 2nd level total
(loglogB)((E/2))/VSort(V)
Total (loglogB)(E/V)Sort(V) in 1st level
Each (loglog(V/e))Sort(E)(loglogB)Sort(V) since
each has only V edges
21
Why sparsification works

After each merge, only O(V) data needed
The same approach can be applied to connectivity,
biconnectivity, and maximal matching.
For example, for biconnected components,
the merging process replacing each biconnected
component by a cycle.
The resulting graph has O(V ) size and contains
the necessary information.

22
Simulation of parallel algorithms
Y.-J. Chiang, M. T. Goodrich, E. F. Grove, R.
Tamassia, D. E. Vengroff, and J. S. Vitter.
External-memory graph algorithms. SODA 1995,
ACM-SIAM.
Yi-Jen Chiang, PolyTechnic U BS NTU, 1986 PhD,
Brown University, 1995
T.H. Cormen, Virtual memory for data parallel
computing, PhD Thesis, MIT 1992
23
An illustrationlist ranking
Input
3
output
1
7
4
2
5
8
6
9
After finding the ranks, we can rearrange the
nodes by sorting
24
A parallel algorithm
Initial
Find a independent set
1
1
1
1
1
1
1
1
1
2
1
Merge with successors
1
2
2
1
1
2
1
2
1
recursively
1
2
3
3
4
2
1
25
Simulating the PRAM algorithm

In each phase
Find the independent set O(sort(N))
Several methods
Take O(sort(N)) I/O to combine the nodes
If the independent set is at lease cN, the data
size is decreased by a constant factor.
The total cost is also O(sort(N)) I/Os

26
Duplicate Elimination in a Multiset BFS

K. Munagala and A. Ranade,
I/O-complexity of graph algorithms.
SODA 1999

27
Duplicate Elimination

Input N integers in 1,P.
Output an array C1..P, Ci1 if i exists.
LB ?((N/P)Sort(P))
UB (Algorithm)
Divide N input records into N/P groups of P
records O(scan(N))
sort the records within each group and construct
a vector of size P O((N/P)Sort(P))
merge the vectors (OR-operation)O(scan(N))
Scan(N)N/B (N/P)sort(P) (N/P)(P/B)logmP/B

28
Connected component via BFS

Input unordered edge list
Output A list L1..n, Li is the smallest
vertex that vertex i is connected to.
sort the edges into ordered adjacency list and
get the degree and the ptr to its first edge of
each vertex. O((E/V)Sort(V))
construct Front(t) for t1,2,
construct Nbr(Front(t-1))
O(Scan(E)V) in total, the term V for the
possible round off
remove duplicates O((E/V)Sort(V)V) in total
eliminate from it those in Front(t-1) or
Front(t-2) O(Scan(E)V) in total

29
Remark

The algorithm is optimal for dense graph (E/VgtB).
If the graph is sparse, we can use a
preprocessing algorithm to group vertices into
supper nodes.
The total I/O-complexity is at most an additional
factor loglog(VB/E) to the optimal.

30
An external algorithm for MST

L. Arge, G. S. Brodal, and L. Toma.
On external-memory MST, SSSP, and multi-way
planar
graph separation.
Journal of Algorithms, 2004.

31
The Prims algorithm
A priority queue for vertices not in T Priority
of x d(x,T), the min distance from x to any in T
T
32

The steps in each iteration
Extract-min u from priority queue
Insert u into T
Relax d(T,v)mind(T,v),d(T,u)w(u,v) for all
v not in T
Problem in EM how to avoid one I/O per relaxation

v
u
T
33
L.Arges solution

The priority queue Q is not for vertices but all
edges with one or both endpoints in T
When a vertex v is selected, insert all edges
incident to v.
how to check if both u and v are already in T
when extract an edge (u,v) from Q?
If so, (u,v) must appear in Q twice.

34
I/O complexity

V(E/B) I/Os read the adjacent list
O(E) inserts and deletes on the queue
amortized O((1/B)logM/B (N/B)) I/O per op.
O(V(E/B) (E/B)logM/B (N/B)) O(VSort(E))
Sort(N) O((N/B)logM/B (N/B))
Comments ?????????
Laziness -- more computation and less I/O

35
MST vertex reduction

The complexity O(VSort(E)) can be further
improved for sparse graph
Vertex reduction in each phase
Choose a shortest edge for each vertex and make
it an MST edge
Also contract the two into one super-vertex
After O(log(VB/E)) phases, the above method gives
an O(Sort(E)log(VB/E)) algorithm.
Further improved to O(Sort(E)loglog(VB/E))

36
Computing Point-to-Point Shortest Paths from
External Memory