Rapid Multipole Graph Drawing on the GPU

1 / 21

About This Presentation

Title:

Rapid Multipole Graph Drawing on the GPU

Description:

... that m charges of strengths qi are located at points zi , for i = 1...m, with ... Time for constructing the k-d tree is nearly same in the CPU and GPU ... –

Number of Views:40

Avg rating:3.0/5.0

Slides: 22

Provided by: apek5

Category:

more less

Transcript and Presenter's Notes

Title: Rapid Multipole Graph Drawing on the GPU

1
Rapid Multipole Graph Drawing on the GPU

Graph Drawing 2008
Apeksha Godiyal, Jared Hoberock, Michael
Garland
and John C Hart
University of Illinois-Urbana Champaign,
NVIDIA Corp.

2
Introduction

Problem To speedup automatic
graph drawing using GPU
Automatic Graph Drawing
Vertex adjacency ? Vertex positions
Applications
Software engineering, Database, Web design
VLSI circuit design, Bioinformatics
Social Networking, Financial Data Visualization

3
Motivation

Classical Method Force Directed Layout
O( V2 E ) Very Slow Does not scale.
Vertices ? Charged Particles (obey Coulombs Law)
Edges ? Springs (obey Hookes Law)
Minimization of the energy function of physical
system.
Graphics Processing Unit (GPU)
Powerful manycore architecture.
Easier to program now.
GPGPU
Maintain good quality of graph layout.

4
GeForce 8800

16 highly threaded SMs.
Multithreaded SPMD model uses application data
parallelism.
Programmable in C with CUDA tools.

5
CUDA Programming Model

A Highly Multithreaded Coprocessor.
The GPU is viewed as a compute device that
Is a coprocessor to the CPU or host
Has its own DRAM (device memory)
Runs many threads in parallel
Differences between GPU and CPU threads
GPU threads are extremely lightweight
Very little creation overhead
GPU needs 1000s of threads for full efficiency
Multi-core CPU needs only a few
Data-parallel portions of an application are
executed on the device as kernels which run in
parallel on many threads

6
Related Work

Approximate Force Directed Layout
Multilevel Strategy
Grid Based - GVA
Tree Based Fast Multipole Multilevel Method
(FM3)
Linear Algebra - Quasi-grids
Algebraic Multigrid Method (ACE)
Minimizing a quadratic energy function.
High Dimensional Embedding (HDE)
Embed the graph in a very high dimension.
Project it into the 2-D plane using PCA analysis.
GPU
Multilevel Graph Layout

7
Overall Algorithm

Input G0 (VE) with random initial placements
Output G0 (V E) with final placements

Multilevel Coarsen G0 ? (G0,G1,G2Gk-1,Gk) For
ik to i0, Compute the layout of Gi / On GPU
/ Interpolate the initial positions of Gi-1
8
Algorithm Multilevel Strategy

Benefit
Lesser iterations are required for larger graphs.
Working with smaller sized graphs.

9
Coarsening

Independent Set S of graph G(VE)
No two elements of S are
adjacent.
Maximal Independent Set (MIS) Filtration
Family of successive independent sets of V.
2-Approximation algorithm
S can be computed by repeatedly deleting a vertex
v ? V and adding it to S and removing all
vertices adjacent to v from V until V is empty.

10
Interpolation

To get the starting positions of vertices in Gi
based on the positions of vertices in the layout
of Gi1
Each vertex v ? Vi is initially placed at the
position of its parent vertex v ? Vi 1.
Iterative algorithm Place v at the average of
its current position, pi, and the current average
position of its neighbors Ni
In our implementation we use a maximum of 50
iterations.

11
Computing The Layout - GPU

Force Based Layout
Let e be an edge connecting two vertices u, v ? V
with actual length d and desired length d.
Nodes ? Charged Particles (same charge).
Fr 1/d
Edges ? Springs
GPU threads
1 GPU thread handles the force calculations for
one vertex of the graph.

12
Attractive Force

Compressed Sparse Row
Sparse representation of the edges of the graph.
Stored in the GPU memory.
Let i be a vertex of graph G such that i has k
edges (i, j1), (i, j2)... (i, jk). Then the
graphs adjacency list is represented by 2
arrays
Edge-value For each vertex i, this array stores
vertices j1, j2... jk i.e. the adjacency list
of i.
Edge-index Edge-Indexi-1 and Edge-Indexi
store the beginning and ending of the adjacency
list of vertex i.

1
2
1
3
3
1
2
Edge Value
3
2
1
3
5
Edge Index
13
Repulsive ForceMultipole Expansion

Force calculations are well-behaved in the far
field.
Multipole Expansion Suppose that m charges of
strengths qi are located at points zi , for i
1m, with center z0 and zi - z0 lt r. Then for
any z ? C with z- z0gt r, the potential ?(z)
induced by the charges is
Where and
Force Fr(z) (Re(F(z)), Im(F(z)))

14
Repulsive ForceK-D Tree and Multipole Expansion

Calculate_Repulsion( v, n)
Input v A vertex of graph G(V,E)
n A node of K-DTree(G)
If v is far sufficiently far from Center(n)
Fr(v) Approximate Multipole Force
Else If n is a leaf,
Fr(v) Exact Repulsive Force
Else
Fr(v) Calculate_Repulsion(v,
n.right_child) Calculate_Repulsion(v,
n.left_child)
Output Fr(v)

15
Processing the K-D Tree

Construction
Root node represents all the vertices.
Each node of a k-d tree divides the set of
vertices it represents, into two
equal sets by splitting along a chosen
dimension.
Bisection is achieved by finding the median
element by Radix Selection algorithm on the GPU.
Radix Selection
Splits the input array into 2 groups, based on
the MSB. At higher level of recursion, lesser
significant bits are used, until the median is
found.
Split can be done in parallel if the final
address of each element is known. This is done by
finding the prefix sum.

16
PROCESSING THE K-D TREE II

Array bisection example using Prefix Sum
7 0 5 3 2 6 1 input array
0 1 0 1 1 0 1 e 0 if MSB is 1, 1 if MSB is
0.
0 0 1 1 2 3 3 f Prefix sum, NF Total no
of false entries
0 1 2 3 4 5 6 id Thread id
4 5 5 6 6 6 7 t id - f NF
4 0 5 1 2 6 3 addr e ? f t
0 3 2 1 7 5 6 outaddrid in id
Prefix Sum
a0, a1, a2 -gt 0, a0, a0a1, a0a1 a2
Prefix sum calculation has very efficient GPU
implementation.
The crossover array size (GPU radix selection gt
CPU implementation) is machine dependent .
50,000 in our configuration.

17
PROCESSING THE K-D TREE III

Traversal
CUDA programming model does not support
recursion/stack.
GPU does not support efficient conditional
statements.

18
Results

Implementation
NVIDIA's CUDA on GeForce 8800 GTX card.
2.21 GHz AMD Athlon(tm) 64 Processor.

19
Results II

GPU force calculation is 7 40 times faster than
the CPU force calculation, depending on the size
of the graph.
GPU implementation spends 18-25 of the running
time in data movement as compared to 2-3 time
spent by the CPU implementation on the same.
Time for constructing the k-d tree is nearly same
in the CPU and GPU implementations, for graphs
with less than 50,000 vertices. For graphs larger
than 50,000 vertices, k-d tree construction is
more than 30 faster on the GPU.

20
Conclusions

Paper presents a manycore graph drawing
algorithm.
Manycore chips accelerate graph drawing. In
particular, GeForce 8800 achieves a speedup
varying between 20 60 times.
Our algorithm produces high quality layout of
graphs.
High constant factor involved in working with
local expansions in FMM can be avoided in
manycore implementations, without compromising
the quality of the layouts.