IOEfficient Algorithms - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

IOEfficient Algorithms

Description:

Amortized I/O per insert or delete-max: O(1/B) Recap: Basic General I/O Techniques ... Generate M/B problem instances inside the slabs ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 17
Provided by: Lars154
Category:

less

Transcript and Presenter's Notes

Title: IOEfficient Algorithms


1
I/O-Efficient Algorithms Data Structures
Ke Yi February 14, 2008
2
Recap Merge Sort
  • Merge sort
  • Internal memory two-way merge, naïve I/O O(N/B
    log2N/B)
  • External memory O(M/B)-way merge

Total I/O O(N/B logM/BN/B) sort(N)
3
Recap External Heap
insert buffer
main memory
in memory
heap has fan-out T(M/B) each node has T(M/B)
blocks
Amortized I/O per insert or delete-max O(1/B
logM/BN/B)
naïve I/O with internal heap O(log2N/B)
Heap property All elements in a child are
smaller than those in its parent
4
External Heap In Practice
  • In practice Know the scale of your problem!
  • Suppose M 512M, B 256K, then two levels can
    support M(M/B) 1024G 1T of data!

5
Recap Basic General I/O Techniques
(3) Reduce to sort pqueue
6
Pointer Dereferencing
  • Almost every problem in computer science can be
    solved by another level of indirection
  • Dereference each pointer needs many random I/Os
  • How do we get the values I/O-efficiently?
  • Output (i, data) pairs

pointer array Pi
data array Di
7
I/O-Efficient Pointer Dereferencing
pointer array Pi
data array Di
Total I/O sort(N)
  • Sort pointer array by pointers
  • Produce a list of (i, Pi) pairs, sorted by Pi
  • Scan both arrays in parallel
  • Produce (i, data) pairs
  • Sort the list back by i if needed

8
Time-Forward Processing
  • Scan sequence in order, create a priority queue
  • For a cell
  • For each incoming edge
  • DeleteMin from pq if theres a match, obtain the
    incoming value
  • Compute the outgoing value
  • For each outgoing edge
  • Insert (destination address, value) to pq, with
    destination as key

Total I/O sort(N)
9
Application Maximal Independent Set
  • Given an undirected graph G (V,E) stored on
    disk
  • A list of (vertex-id, vertex-id) pairs
    representing all edges
  • An independent set is a set I of vertices so that
    no two vertices in I are adjacent
  • Set I is maximal if any other vertex is added to
    I, then I becomes not independent
  • Note maximum independent set is NP-hard!
  • Internal memory
  • Add vertices one by one until no more vertices
    can be added
  • Time O(E)

10
I/O-Efficient Maximal Independent Set
1
4
6
2
Total I/O sort(N)
3
7
5
  • Make all edges directed from a low vertex id to a
    high vertex id
  • Sort all edges by source
  • Now have a time-forward processing problem!

11
Distribution Sweeping
  • An I/O-Efficient Technique for Solving Batched
    Geometry Problems

12
Plane Sweep
  • A technique for solving batched geometry problems
  • Plane sweep an important technique in
    computational geometry (in internal memory)
  • Example orthogonal segment intersection
  • Given a set of horizontal and vertical segments,
    goal is to report all intersections
  • Internal memory plane sweep binary search tree
  • Time O(N log N K), Koutput size
    (output-sensitive algorithm)

13
Distribution Sweeping
  • Divide into M/B slabs
  • Only consider the red middle segment on this
    level
  • Push blue leftovers one level down and process
    recursively
  • One input segment pushes two blue segments down
    only once!
  • Total size is linear at any level (Phew)

14
Distribution Sweeping
  • Maintain an active list for each slab storing all
    vertical segments intersecting the sweep line
  • Report is fine, but how to delete?
  • Delete lazily!

15
Distribution Sweeping
  • Total I/O on this level O(N/B K/B)
  • K intersections found on this level
  • Total I/O on all levels O(N/B logM/BN/B K/B)
  • Optimal in the comparison-I/O model

16
Distribution Sweeping Framework
  • Sort objects by x-coordinate and y-coordinate
  • Divide into M/B slabs
  • Sweep the plane, solving problems on the slab
    level
  • Generate M/B problem instances inside the slabs
  • Solve the problem inside each slab recursively
    until smaller than memory size
  • Total levels O(logM/B N/M)
  • Key is how to solve the problem on the slab level
Write a Comment
User Comments (0)
About PowerShow.com