GPU Acceleration of Iterative Clustering - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

GPU Acceleration of Iterative Clustering

Description:

Cluster bunnies courtesy Nate Carr. Applications. Vector quantization. Image compression (Gersho & Gray, 'VQ & Signal Compression,' 92) ... – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 13
Provided by: johnc143
Category:

less

Transcript and Presenter's Notes

Title: GPU Acceleration of Iterative Clustering


1
GPU AccelerationofIterative Clustering
  • Jesse D. Hall
  • John C. Hart
  • University of Illinois, Urbana-Champaign

2
Iterative Clustering(aka k-means/Lloyds/LBG
Alg.)
Partitioning Assign eachpoint to
nearestcluster center
Fitting Find newcenter foreach cluster
Initialization Pick (random)collection
ofcluster centers
Cluster bunnies courtesy Nate Carr
3
Applications
  • Vector quantization
  • Image compression (Gersho Gray, VQ Signal
    Compression, 92)
  • Light field compression (Levoy Hanrahan, S96)
  • Texture synthesis (Wei Levoy, S00)
  • Clustered principal component analysis
  • Precomputed radiance transfer (Sloan, us
    Snyder, S03)
  • Mesh clustering
  • Multichart geometry images (Sander et al., SGP03)
  • Variational Shape Approx. (Cohen-Steiner et al.,
    S04)

Vector quantization of a light field Levoy
Hanrahan, S96
Vector quantization ofradiance transfer
4
Clustering of
Orientation
Illumination
Position
5
CPU GPU Approach
  • GPU Partitioning
  • Metric evaluation independent for each point
  • Metric evaluation usually not data dependent
  • Ideal for SIMD streaming implementation
  • CPU Fitting
  • Fitting inherently a reduction operation
  • Relies on sophisticated data structures (e.g.
    kd-trees) and processes (e.g. matrix
    eigenstructure)

6
GPU Partitioning
  • Load cluster center and ID into fragment shader
    constants
  • Fragment shader evaluates metric on point data
    stored in (deep) texture
  • Distance stored in z-buffer if less than current
    z-value (and ID written to framebuffer)
  • After all clusters processed, framebuffer
    contains IDs of nearest cluster for each
    datapoint
  • Requires z-buffer readback which will be faster
    with PCI-Express

7
Hierarchical GPU Partitioning
or
  • Organize cluster centers in a kd-tree
  • Point may need to backtrack severaltimes to find
    best cluster
  • Nevertheless O(n log k) in practice
  • Requires lots of decisions per point whichwould
    hinder GPU implementation
  • Instead send group (e.g. previouscluster) of
    datapoints simultaneouslythrough kd-tree
  • Traverse based on groups bounding box
  • Keep track of individual points closest distance
    and ID
  • Groups organized as rectangles in texture memory

8
Clustered Principal Components for Precomputed
Radiance Transfer
  • Lloyds Algorithm on PRT datasets would take hours
    on John Snyders PC, ran overnight
  • Can implement CPCA cluster metric in Cg
  • Final result is d2
  • VDIM of 4-vectors needed to store a point
  • NPCA of PCA basis vectors
  • More complex than Euclidean distance, but same
    of texture fetches

9
GPU CPU v. CPU
NVIDIA GeForce FX 5900 vs. AMD Athlon XP
2800 16D points, 128 clusters
3x
10
16K points, 128 clusters
11
64K 16D points
12
Conclusions
  • GPU CPU 3x faster than CPU alone
  • Slower CPU fared just as well ? process is
    bandwidth limited
  • Scales well even with of clusters which
    corresponds to of fragment program state
    changes ? process dominated by floating point
    performance
  • GPU valuable for preprocessing
Write a Comment
User Comments (0)
About PowerShow.com