Rapid Multipole Graph Drawing on the GPU

1 / 21
About This Presentation
Title:

Rapid Multipole Graph Drawing on the GPU

Description:

... that m charges of strengths qi are located at points zi , for i = 1...m, with ... Time for constructing the k-d tree is nearly same in the CPU and GPU ... –

Number of Views:40
Avg rating:3.0/5.0
Slides: 22
Provided by: apek5
Category:

less

Transcript and Presenter's Notes

Title: Rapid Multipole Graph Drawing on the GPU


1
Rapid Multipole Graph Drawing on the GPU
  • Graph Drawing 2008
  • Apeksha Godiyal, Jared Hoberock, Michael
    Garland
  • and John C Hart
  • University of Illinois-Urbana Champaign,
    NVIDIA Corp.

2
Introduction
  • Problem To speedup automatic
  • graph drawing using GPU
  • Automatic Graph Drawing
  • Vertex adjacency ? Vertex positions
  • Applications
  • Software engineering, Database, Web design
  • VLSI circuit design, Bioinformatics
  • Social Networking, Financial Data Visualization

3
Motivation
  • Classical Method Force Directed Layout
  • O( V2 E ) Very Slow Does not scale.
  • Vertices ? Charged Particles (obey Coulombs Law)
  • Edges ? Springs (obey Hookes Law)
  • Minimization of the energy function of physical
    system.
  • Graphics Processing Unit (GPU)
  • Powerful manycore architecture.
  • Easier to program now.
  • GPGPU
  • Maintain good quality of graph layout.

4
GeForce 8800
  • 16 highly threaded SMs.
  • Multithreaded SPMD model uses application data
    parallelism.
  • Programmable in C with CUDA tools.

5
CUDA Programming Model
  • A Highly Multithreaded Coprocessor.
  • The GPU is viewed as a compute device that
  • Is a coprocessor to the CPU or host
  • Has its own DRAM (device memory)
  • Runs many threads in parallel
  • Differences between GPU and CPU threads
  • GPU threads are extremely lightweight
  • Very little creation overhead
  • GPU needs 1000s of threads for full efficiency
  • Multi-core CPU needs only a few
  • Data-parallel portions of an application are
    executed on the device as kernels which run in
    parallel on many threads

6
Related Work
  • Approximate Force Directed Layout
  • Multilevel Strategy
  • Grid Based - GVA
  • Tree Based Fast Multipole Multilevel Method
    (FM3)
  • Linear Algebra - Quasi-grids
  • Algebraic Multigrid Method (ACE)
  • Minimizing a quadratic energy function.
  • High Dimensional Embedding (HDE)
  • Embed the graph in a very high dimension.
  • Project it into the 2-D plane using PCA analysis.
  • GPU
  • Multilevel Graph Layout

7
Overall Algorithm
  • Input G0 (VE) with random initial placements
  • Output G0 (V E) with final placements

Multilevel Coarsen G0 ? (G0,G1,G2Gk-1,Gk) For
ik to i0, Compute the layout of Gi / On GPU
/ Interpolate the initial positions of Gi-1
8
Algorithm Multilevel Strategy
  • Benefit
  • Lesser iterations are required for larger graphs.
  • Working with smaller sized graphs.

9
Coarsening
  • Independent Set S of graph G(VE)
  • No two elements of S are
    adjacent.
  • Maximal Independent Set (MIS) Filtration
  • Family of successive independent sets of V.
  • 2-Approximation algorithm
  • S can be computed by repeatedly deleting a vertex
    v ? V and adding it to S and removing all
    vertices adjacent to v from V until V is empty.

10
Interpolation
  • To get the starting positions of vertices in Gi
    based on the positions of vertices in the layout
    of Gi1
  • Each vertex v ? Vi is initially placed at the
    position of its parent vertex v ? Vi 1.
  • Iterative algorithm Place v at the average of
    its current position, pi, and the current average
    position of its neighbors Ni
  • In our implementation we use a maximum of 50
    iterations.

11
Computing The Layout - GPU
  • Force Based Layout
  • Let e be an edge connecting two vertices u, v ? V
    with actual length d and desired length d.
  • Nodes ? Charged Particles (same charge).
  • Fr 1/d
  • Edges ? Springs
  • GPU threads
  • 1 GPU thread handles the force calculations for
    one vertex of the graph.

12
Attractive Force
  • Compressed Sparse Row
  • Sparse representation of the edges of the graph.
  • Stored in the GPU memory.
  • Let i be a vertex of graph G such that i has k
    edges (i, j1), (i, j2)... (i, jk). Then the
    graphs adjacency list is represented by 2
    arrays
  • Edge-value For each vertex i, this array stores
    vertices j1, j2... jk i.e. the adjacency list
    of i.
  • Edge-index Edge-Indexi-1 and Edge-Indexi
    store the beginning and ending of the adjacency
    list of vertex i.

1
2
1
3
3
1
2
Edge Value
3
2
1
3
5
Edge Index
13
Repulsive ForceMultipole Expansion
  • Force calculations are well-behaved in the far
    field.
  • Multipole Expansion Suppose that m charges of
    strengths qi are located at points zi , for i
    1m, with center z0 and zi - z0 lt r. Then for
    any z ? C with z- z0gt r, the potential ?(z)
    induced by the charges is
  • Where and
  • Force Fr(z) (Re(F(z)), Im(F(z)))

14
Repulsive ForceK-D Tree and Multipole Expansion
  • Calculate_Repulsion( v, n)
  • Input v A vertex of graph G(V,E)
  • n A node of K-DTree(G)
  • If v is far sufficiently far from Center(n)
  • Fr(v) Approximate Multipole Force
  • Else If n is a leaf,
  • Fr(v) Exact Repulsive Force
  • Else
  • Fr(v) Calculate_Repulsion(v,
    n.right_child) Calculate_Repulsion(v,
    n.left_child)
  • Output Fr(v)

15
Processing the K-D Tree
  • Construction
  • Root node represents all the vertices.
  • Each node of a k-d tree divides the set of
    vertices it represents, into two
  • equal sets by splitting along a chosen
    dimension.
  • Bisection is achieved by finding the median
    element by Radix Selection algorithm on the GPU.
  • Radix Selection
  • Splits the input array into 2 groups, based on
    the MSB. At higher level of recursion, lesser
    significant bits are used, until the median is
    found.
  • Split can be done in parallel if the final
    address of each element is known. This is done by
    finding the prefix sum.

16
PROCESSING THE K-D TREE II
  • Array bisection example using Prefix Sum
  • 7 0 5 3 2 6 1 input array
  • 0 1 0 1 1 0 1 e 0 if MSB is 1, 1 if MSB is
    0.
  • 0 0 1 1 2 3 3 f Prefix sum, NF Total no
    of false entries
  • 0 1 2 3 4 5 6 id Thread id
  • 4 5 5 6 6 6 7 t id - f NF
  • 4 0 5 1 2 6 3 addr e ? f t
  • 0 3 2 1 7 5 6 outaddrid in id
  • Prefix Sum
  • a0, a1, a2 -gt 0, a0, a0a1, a0a1 a2
  • Prefix sum calculation has very efficient GPU
    implementation.
  • The crossover array size (GPU radix selection gt
    CPU implementation) is machine dependent .
    50,000 in our configuration.

17
PROCESSING THE K-D TREE III
  • Traversal
  • CUDA programming model does not support
    recursion/stack.
  • GPU does not support efficient conditional
    statements.

18
Results
  • Implementation
  • NVIDIA's CUDA on GeForce 8800 GTX card.
  • 2.21 GHz AMD Athlon(tm) 64 Processor.

19
Results II
  • GPU force calculation is 7 40 times faster than
    the CPU force calculation, depending on the size
    of the graph.
  • GPU implementation spends 18-25 of the running
    time in data movement as compared to 2-3 time
    spent by the CPU implementation on the same.
  • Time for constructing the k-d tree is nearly same
    in the CPU and GPU implementations, for graphs
    with less than 50,000 vertices. For graphs larger
    than 50,000 vertices, k-d tree construction is
    more than 30 faster on the GPU.

20
Conclusions
  • Paper presents a manycore graph drawing
    algorithm.
  • Manycore chips accelerate graph drawing. In
    particular, GeForce 8800 achieves a speedup
    varying between 20 60 times.
  • Our algorithm produces high quality layout of
    graphs.
  • High constant factor involved in working with
    local expansions in FMM can be avoided in
    manycore implementations, without compromising
    the quality of the layouts.

21
  • Questions ?
Write a Comment
User Comments (0)
About PowerShow.com