IOEfficient Algorithms - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

IOEfficient Algorithms

Description:

Extract(k) Visit the subtree whose leaves contain all (k, v) ... Extract: return all pairs with a given key, and delete them from T ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 18
Provided by: Lars154
Category:

less

Transcript and Presenter's Notes

Title: IOEfficient Algorithms


1
I/O-Efficient Algorithms Data Structures
Ke Yi February 28, 2008
2
Graph Representation
  • Vertices represented by their ids
  • Adjacency matrix definitely NOT!
  • Adjacency list
  • 1 3, 4, 7, 102 1, 5, 6, 7
  • Edge list
  • (1, 3), (1, 4), (2, 1), (2, 5),
  • Reducible to each other in O(sort(E)) I/O, can
    use either

3
Connectivity
  • Input An undirected graph G(V, E)
  • Output A label ?(v) for each v in V, such that
    ?(u)?(v) iff u and v are connected
  • Total I/O O(sort(N) log2(N/M))
  • The best result known so far O(sort(N)
    log2log2B))
  • Internal memory Use BFS to solve connectivity
    both linear time
  • External memory BFS is even more difficult.

2
5
1
1
1
2
2
6
1
2
8
7
2
3
2
10
1
9
4
4
4
Today Breadth-First Search
  • Input An undirected graph G(V, E)
  • Output The depths d(v) of each v
  • Equivalent to Single-Source-Shortest-Path when
    all the edges have the same length

2
5
0
r
1
1
4
2
6
1
3
8
7
4
3
2
1
10
9
4
4
5
Undirected Breadth-First Search
2
5
0
r
1
1
4
2
6
1
3
8
7
4
3
2
1
10
9
4
4
  • An important (and simple) property the depths of
    two adjacent nodes differ by at most 1
  • Let L(d) be all nodes at depth d, will find L(0),
    L(1), L(2),
  • L(d1) L(d)s neighbors (L(d) U L(d 1))
  • So, to find L(d1), we only need to consider
    L(d-1) and L(d)

6
Undirected Breadth-First Search
  • L(0) r
  • d 0
  • While L(d) not empty do
  • Find L(d)s neighbors, remove duplicates
  • Remove all vertices in (L(d) U L(d 1))
  • The resulting set is L(d1)
  • d d1
  • Total I/O O(V sort(E)) (on board)

7
Directed Breadth-First Search
3
5
0
r
1
1
3
2
6
1
2
8
7
3
3
3
4
10
9
3
4
  • The depths of two adjacent nodes differ by at
    most 1 doesnt hold for directed graphs
  • Need more sophisticated algorithms

8
Review (2, 3)-tree
  • T is an (2, 3)-tree
  • All leaves on the same level
  • Each node has 2 or 3 children
  • Leaves in sorted order
  • Each node stores splitting elements
  • Height of tree O(log n)
  • Range query O(log n k) k number of nodes
    reported
  • Insert O(log n)
  • May need O(log n) splits after insertion
  • Delete O(log n)
  • May need O(log n) merges and one share after
    insertion

9
Buffer Repository Tree
  • T is an (2, 3)-tree
  • All leaves on the same level
  • Each node has 2 or 3 children
  • Leaves in sorted order
  • Each node stores splitting elements
  • Each node keeps a buffer ltB elements
  • Each leaf is at least half full
  • Internal node may be empty
  • Root is always in memory
  • Height of tree O(log2(N/B))
  • Insert a pair (key, value)
  • Extract return all pairs with a given key, and
    delete them from T

10
Buffer Repository Tree
  • Insert (k, v)
  • Insert into root buffer
  • If root is full
  • Empty it by pushing allelements to the
    corresponding children O(1) I/O
  • Recursively empty children if needed
  • If a leaf overflows, split it O(1) I/O
  • May need to split its ancestors
  • If the root splits, create a new root
  • Amortized cost O(1/B log2(N/B))
  • Cost to empty a node charged to the B elements
    pushed down
  • Each element goes down O(log2(N/B)) times
  • Cost of a split charged to the new leaf created

11
Buffer Repository Tree
  • Extract(k)
  • Visit the subtree whose leaves contain all (k, v)
  • Delete all pairs in the leaves and return them
  • Inspect all internal nodes in the tree
  • Delete all pairs with the queries key, and return
    them
  • At most two leaves now below half-full but still
    nonempty
  • Merge them
  • If still below half-full, merge or share with one
    of the siblings
  • Delete all empty leaves
  • Merge or share internal nodes in the same as the
    (2,3)-tree
  • If an internal has gt B elements after merge,
    empty it
  • Cost O(log2(N/B) K/B) I/Os (except the last
    step)
  • Cost of last step
  • O(log2(N/B)) for each deleted leaf O(N/B
    log2(N/B)) in total

12
Buffer Repository Tree Summary
  • Insert a pair (key, value)
  • O(1/B log2(N/B)) I/Os amortized
  • Extract return all pairs with a given key, and
    delete them from T
  • O(log2(N/B) K/B) I/Os amortized
  • For a sequence of N insertions and ltN
    extractions,
  • The amortized cost per insertion is O(1/B
    log2(N/B))
  • The amortized cost per extraction is O(log2(N/B))

13
Directed Breadth-First Search
3
5
0
r
1
1
3
2
6
1
2
8
7
3
3
3
4
10
9
3
4
  • Key problem is to decide if a node has been
    visited or not
  • Use the buffered repository tree!
  • For each node v we visit, put all edges (w, v)
    into the BRT, using w as the key
  • These edges should not be used anymore

14
Internal Memory Directed Breadth-First Search
3
5
0
r
1
1
3
2
6
1
2
8
7
3
3
3
4
10
3
9
4
  • Put the root in a queue, give it a BFS label of 0
  • While queue not empty
  • Pop the first node u from queue
  • For each edge (u, w) and w not visited before
  • Give w the BFS label us label1, append w into
    queue

15
External Directed Breadth-First Search
3
5
0
r
1
1
3
2
6
1
2
8
7
3
3
10
3
9
4
4
3
  • Put the root in a queue, give it a BFS label of 0
  • While queue not empty
  • Pop the first node u from queue
  • Collect all of edges (u, v), call this set X
  • Extract from BRT all edges originating from u,
    call this set Y
  • For each edge (u, w) in X Y
  • Give w the BFS label us label1, append w into
    queue
  • Put all edges (x, w) into BRT

16
External Directed Breadth-First Search
  • Put the root in a queue, give it a BFS label of 0
  • While queue not empty
  • Pop the first node u from queue
  • Collect all of edges (u, v), call this set X -
    total I/O O(VE/B)
  • Extract from BRT all edges originating from u,
    call this set Y

  • - total V extracts
  • For each edge (u, w) in X Y -
    total I/O O(Vsort(E))
  • Give w the BFS label us label1, append w into
    queue
  • Put all edges (x, w) into BRT - total E
    inserts
  • Analysis
  • Black lines O(V) I/Os - Internal memory BFS
  • Red lines O((VE/B)log2E) I/Os
  • BRT costs O(1/B log2E) per insert, O(log2E)
    per extract

17
I/O-Efficient BFS Summary
  • Directed BFS O((VE/B)log2E) I/Os
  • Best known result
  • DFS can be done in the same I/O using same
    techniques
  • Undirected BFS
  • Simple O(Vsort(E)) algorithm shown in class
  • Good if graph is dense, e.g., E gt V B
  • Meaningless if graph is sparse, i.e., E
    O(V)
  • Can solve much more efficiently (often in
    O(sort(E)) time for special sparse graphs,
    e.g., planar graphs
  • Used to conjecture that O(V) is a lower bound
  • But a major breakthrough in 2002 gives an
    undirected BFS algorithm with I/O complexity
Write a Comment
User Comments (0)
About PowerShow.com