Title: IO Efficient Minimum Spanning Tree Algorithm
1IO Efficient Minimum Spanning Tree Algorithm
2Outline
- What is Minimum Spanning Tree (MST)
- MST algorithms in internal memory
- The connectivity problems in IO efficient manner
(pre-knowledge for MST in external memory) - IO Efficient MST algorithm
3Outline
- What is Minimum Spanning Tree (MST)
- MST algorithms in internal memory
- The connectivity problems in IO efficient manner
(pre-knowledge for MST in external memory) - IO Efficient MST algorithm
4Minimum Spanning Tree (MST)
- Spanning Tree (ST) Given a connected, undirected
graph, a spanning tree of it is a subgraph which
is a tree and contains all its vertices. - Minimum Spanning Tree (MST) Given a connected,
undirected graph and assign weights to all its
edges, an MST is an ST whose sum of the weights
of all its edges is not larger than any of other
STs.
5Minimum Spanning Tree (MST)
6Outline
- What is Minimum Spanning Tree (MST)
- MST algorithms in internal memory
- The connectivity problems in IO efficient manner
(pre-knowledge for MST in external memory) - IO Efficient MST algorithm
7MST algorithms in internal memory
- Prims and Kruskal's algorithm
- Only introduce Kruskals algorithm
It works as follows 1.create a forest F (a set
of trees), where each vertex in the graph is a
separate tree 2.create a set S containing all
the edges in the graph 3.while S is nonempty
a. remove an edge with minimum weight from S
b. if that edge connects two different trees,
then add it to the forest, combining two trees
into a single tree c. otherwise discard that
edge
8MST algorithms in internal memory
This is our original graph. The numbers near the
arcs indicate their weight. None of the arcs are
highlighted.
9MST algorithms in internal memory
AD and CE are the shortest arcs, with length 5,
and AD has been arbitrary chosen, so it is
highlighted.
10MST algorithms in internal memory
CE is now the shortest arc that does not form a
cycle, with length 5, so it is highlighted as the
second arc.
11MST algorithms in internal memory
The next arc, DF with length 6, is highlighted
using much the same method.
12MST algorithms in internal memory
The next-shortest arcs are AB and BE, both with
length 7. AB is chosen arbitrarily, and is
highlighted. The arc BD has been highlighted in
red, because there already exists a path (in
green) between B and D, so it would form a cycle
(ABD) if it were chosen.
13MST algorithms in internal memory
The process continues to highlight the
next-smallest arc, BE with length 7. Many more
arcs are highlighted in red at this stage BC
because it would form the loop BCE, DE because it
would form the loop DEBA, and FE because it would
form FEBAD.
14MST algorithms in internal memory
Finally, the process finishes with the arc EG of
length 9, and the minimum spanning tree is found.
15MST algorithms in internal memory
- Complexity of Kruskals algorithm O(ElogE)
- Proof Not presented here.
16Outline
- What is Minimum Spanning Tree (MST)
- MST algorithms in internal memory
- The connectivity problems in IO efficient manner
(pre-knowledge for MST in external memory) - IO Efficient MST algorithm
17Connectivity Problem
- Connectivity Problem is to compute number of the
connected parts (components) of a given graph
18Connectivity Problem
- Connectivity Problem is equivalent to the
labeling problem if two vertices are in the same
component, mark them as the same label.
19SemiExternalConnectivity
- SemiExternalConnectivity algorithm assume all
vertices can be loaded to the main memory, i.e.
VltM
Let L(x) denotes the label of x. L(x) is a
number. For each edge (x, y), if L(x)!L(y)
For every vertex m, if L(m)L(x) or L(m)L(y),
then let L(m)minL(x),L(y)
20SemiExternalConnectivity
- Black solid line have been checked.
- Dash line have not been checked.
- Red solid line be checking
21SemiExternalConnectivity
- Correctness of SemiExternalConnectivity It is
obvious. - Complexity O(VE)
22FullExternalConnectivity
- FullExternalConnectivity algorithm for graph G
If VltM, then apply SemiExternalConnectivity Else
Let wv denotes the smallest neighbor of vertex
w. Create a subgraph H of G that constructed by
edges (w, wv), for all w in G. Compress the
connected part to a single vertex. Recursively
run FullExternalConnectivity algorithm.
23FullExternalConnectivity
- The compress procedure
- Lines edges in GRed lines edges in H (in G
also)
24FullExternalConnectivity
- Correctness obvious
- Comlexity
25Outline
- What is Minimum Spanning Tree (MST)
- MST algorithms in internal memory
- The connectivity problems in IO efficient manner
(pre-knowledge for MST in external memory) - IO Efficient MST algorithm
26IO Efficient MST algorithm
- SemiExternal case similar as the connectivity
problem. - FullExternal case similar as the connectivity
problem, too.
27SemiExternal case
- All the vertices can be loaded to memory, i.e.
VltM. - Similar with connectivity problem.
28SemiExternal case
- The modifications every time we just check the
edges with the smallest weight. - S is the result edge set.
Let L(x) denotes the label of x. L(x) is a
number. For the unchecked edge (x, y) with the
smallest weight, if L(x)!L(y) For every
vertex m, if L(m)L(x) or L(m)L(y), then let
L(m)minL(x),L(y), and add (x, y) to S
29SemiExternal case
- Correctness obvious because it is actually the
Kruskal algorithm. - Complexity O(sort(E))
30FullExternal case
- Similar with connectivity problem.
- The modification during construction of subgraph
H of G, we use (w, wv) where (w, wv) is the
lowest weighted edges of w, rather than that wv
is ws smallest neighbor. - The final H is the result MST.
31FullExternal case
- The construction of H
- Lines edges in GRed lines edges in H (in G
also)
32FullExternal case
- Correctness prove later.
- Complexity
There are optimizations but not presented here.
33Proof of FullExternal
- First of all, we need to show that, with such
construction method, the result subgraph H is a
tree. - We will prove this by showing that there will be
no cycles in H. - To simplify the problem, we assume that any two
the weights are different.
34Proof of FullExternal
- If there is a cycle in H
- It is obviously that a1 is the lowest weighted
edges of either vertex 1 or vertex 2. Similar to
a2, a3 an - Assume a1lta2, then a2 is not vertex2s lowest
weighted edges, so a2 must be vertex3s. So a2lta3 - Since a2lta3, similar as above, we will get a3lta4.
- So we have a1lta2lta3ltanlta1, which is impossible.
35Proof of FullExternal
- Further, for a tree T which is an MST of G, we
prove - If not, there must be an edge (v, wv) that in H
but not in T. Further, since H and T are both ST,
the number of edges are the same. So there must
be an edge (x,y) that in T but not in H.
36Proof of FullExternal
- Edge (x, y) is not random selected. W.l.o.g, we
assume that in T, the path from y to v are the
same as in H. (note that y and v may be the same
vertex)
37Proof of FullExternal
- Assume the path from v to y is (a1, a2 an)
where a1v and any. Because (v, wv) is the
lowest weighted edge of v, so (v, wv)lt(a1, a2),
where va1. Similar we have (a1,a2)lt(a2,a3)ltlt(an-
1,an)lt(x,y). So we get (v, wv)lt(x,y)
38Proof of FullExternal
- Then in T, we replace (x,y) by (v, wv), we get
another ST(), whose sum of edges is smaller than
T, which conflicts with the fact that T is an
MST. - So we have
- Further, because of the definition of MST, H is
an MST of G
() please note that for a connected graph G, G
is a tree if and only if VE1. So since E
does not change, a tree will remain a tree,
provided that it is still connected.
39Reference
- Norbert Zeh, I/O-Efficient Graph Algorithms,
section 5.4
40QA