GPU Acceleration of Iterative Clustering

About This Presentation

Title:

GPU Acceleration of Iterative Clustering

Description:

Cluster bunnies courtesy Nate Carr. Applications. Vector quantization. Image compression (Gersho & Gray, 'VQ & Signal Compression,' 92) ... – PowerPoint PPT presentation

Number of Views:82

Avg rating:3.0/5.0

Slides: 13

Provided by: johnc143

Category:

more less

Transcript and Presenter's Notes

Title: GPU Acceleration of Iterative Clustering

1
GPU AccelerationofIterative Clustering

Jesse D. Hall
John C. Hart
University of Illinois, Urbana-Champaign

2
Iterative Clustering(aka k-means/Lloyds/LBG
Alg.)
Partitioning Assign eachpoint to
nearestcluster center
Fitting Find newcenter foreach cluster
Initialization Pick (random)collection
ofcluster centers
Cluster bunnies courtesy Nate Carr
3
Applications

Vector quantization
Image compression (Gersho Gray, VQ Signal
Compression, 92)
Light field compression (Levoy Hanrahan, S96)
Texture synthesis (Wei Levoy, S00)
Clustered principal component analysis
Precomputed radiance transfer (Sloan, us
Snyder, S03)
Mesh clustering
Multichart geometry images (Sander et al., SGP03)
Variational Shape Approx. (Cohen-Steiner et al.,
S04)

Vector quantization of a light field Levoy
Hanrahan, S96
Vector quantization ofradiance transfer
4
Clustering of
Orientation
Illumination
Position
5
CPU GPU Approach

GPU Partitioning
Metric evaluation independent for each point
Metric evaluation usually not data dependent
Ideal for SIMD streaming implementation
CPU Fitting
Fitting inherently a reduction operation
Relies on sophisticated data structures (e.g.
kd-trees) and processes (e.g. matrix
eigenstructure)

6
GPU Partitioning

Load cluster center and ID into fragment shader
constants
Fragment shader evaluates metric on point data
stored in (deep) texture
Distance stored in z-buffer if less than current
z-value (and ID written to framebuffer)
After all clusters processed, framebuffer
contains IDs of nearest cluster for each
datapoint
Requires z-buffer readback which will be faster
with PCI-Express

7
Hierarchical GPU Partitioning
or

Organize cluster centers in a kd-tree
Point may need to backtrack severaltimes to find
best cluster
Nevertheless O(n log k) in practice
Requires lots of decisions per point whichwould
hinder GPU implementation
Instead send group (e.g. previouscluster) of
datapoints simultaneouslythrough kd-tree
Traverse based on groups bounding box
Keep track of individual points closest distance
and ID
Groups organized as rectangles in texture memory

8
Clustered Principal Components for Precomputed
Radiance Transfer

Lloyds Algorithm on PRT datasets would take hours
on John Snyders PC, ran overnight
Can implement CPCA cluster metric in Cg
Final result is d2
VDIM of 4-vectors needed to store a point
NPCA of PCA basis vectors
More complex than Euclidean distance, but same
of texture fetches

9
GPU CPU v. CPU
NVIDIA GeForce FX 5900 vs. AMD Athlon XP
2800 16D points, 128 clusters
3x
10
16K points, 128 clusters
11
64K 16D points
12
Conclusions

GPU CPU 3x faster than CPU alone
Slower CPU fared just as well ? process is
bandwidth limited
Scales well even with of clusters which
corresponds to of fragment program state
changes ? process dominated by floating point
performance
GPU valuable for preprocessing

Write a Comment

User Comments (0)