Title: GPGPU: Distance Fields
1GPGPU Distance Fields
- Avneesh Sud and Dinesh Manocha
- Feb 12, 2007
2So Far
- Overview
- Intro to GPGPU using OpenGL
- Current Architecture (Cell, G80)
- Programming (CUDA, Compilers)
- Applications (Vision)
3Interesting Reading on Parallel Computing
- The Landscape of Parallel Computing Research A
View from Berkeley
4This Lecture
- Distance Fields and Voronoi Diagrams
- Hands on demo
- Advanced Optimization
- Discussion Why fast on a GPU?
5This Lecture
- Distance Fields and Voronoi Diagrams
- Hands on application demo
- Parallel algorithm
- Example code 2D
- Visual Debugging (imdebug)
- Example code 3D
- Advanced Optimization
- Discussion Why fast on a GPU?
6Outline
- Distance Fields and Voronoi Diagrams
- Hands on application demo
- Advanced Optimization
- Discussion Why fast on a GPU?
7Distance Field
- Given a set of geometric primitives (sites), it
is a scalar field representing the minimum
distance from any point to the closest site
2D Distance field
Sites
8Generalized Voronoi Diagram
- Given a collection of sites, it is a subdivision
of space into cells such that all points in a
cell are closer to one site than to any other
site
Voronoi Site
Voronoi cell
Voronoi diagram
Sites
9Voronoi Diagram and Distance Fields
- Region where distance function contributes to
final distance field Voronoi Region
Distance field
Voronoi diagram
10Distance Functions 2D
- A scalar function f (x) representing minimum
distance from a point x to a site
graph z f (x,y)
f (x,y)vx2y2
11Distance Functions 3D
- Distance function of a site to plane is a quadric
Point Site Circular Paraboloid
Line Site Elliptic Cone
Plane Site Plane
12Why Should We Compute Them?
- Collision Detection Proximity Queries
- Robot Motion Planning
- Surface Reconstruction
- Non-Photorealistic Rendering
- Surface Simplification
- Mesh Generation
- Shape Analysis
13Why Difficult?
- Exact Computation
- Compute analytic boundaries
Analytic Boundary
14Why Difficult?
- Exact Computation
- Compute analytic boundaries
- Boundaries composed of high-degree curves and
surfaces and their intersections - Complex and difficult to implement
- Robustness and accuracy problems
15Approximate Computation
Approximate Algorithms
Discretize Sites
Discretize Space
GPU
16Outline
- Distance Fields and Voronoi Diagrams
- Hands on application demo
- Parallel algorithm
- Example code 2D
- Visual Debugging (imdebug)
- Demo 3D
- Advanced Optimization
- Discussion Why fast on a GPU?
17Brute-force Algorithm
Record ID of the closest site to each sample point
Coarsepoint-samplingresult
Finerpoint-samplingresult
18Slight Variation
?
?
?
For each site, compute distances to all sample pts
Given sites and uniform sampling
Composite through minimum operator
Record IDs of closest sites
19GPU Algorithm
?
?
?
For each site, compute distances to all pixels
Given sites and frame buffer
Composite through depth test
Read-back IDs of closest sites
20GPU Algorithm 2D
Point coord (uniform parameter)
Pixel coord
21GPU Algorithm 2D Source
- Initialization
- Setup GL State (Depth, Render Target)
- Setup fragment program
- Fragment program
- Computation For each point site
- Set program parameters
- Execute fragment program
- Display
- Display results
22GPU Algorithm 2D Source
- Show source
- Compile cg source and show assembly
23GPU Algorithm Debugging
- Visual debugging with imdebug (by Bill Baxter)
- http//www.billbaxter.com/projects/imdebug/index.h
tml - Steps
- Modify fragment program
- Readback and display buffer contents
24GPU Algorithm Debugging
25GPU Algorithm 2D
End-Point coords (uniform parameters)
Pixel coord
Careful Equation is to an infinite line
26GPU Algorithm 2D
- Line segment Region closer to interior of line
segment
In remaining region?
27GPU Algorithm 2D Source
28GPU Algorithm 3D
- Graphics hardware computes one 2D slice
- Sweep along 3rd dimension (Z-axis) computing 1
slice at a time
3D Voronoi Diagram
29Outline
- Distance Fields and Voronoi Diagrams
- Hands on application demo
- Advanced Optimization
- Discussion Why fast on a GPU?
30GPU Optimizations
- Where to optimize?
- Make fragment program run faster
- GPU / Application dependent optimizations
- Reduce memory bandwidth
- Reduce number of invocations of fragment program
- Geometric culling
31GPU Optimizations Recommended Reading
- Practical Performance Analysis and Tuning
- GPU Programming Guide
- GPU Gems 2
- GPU Computation Strategies and Tips (Ian Buck)
- GPU Program Optimization (Cliff Woolley)
32GPU Optimizations
- Where to optimize?
- Make fragment program run faster
- GPU / Application dependent optimizations
- Reduce memory bandwidth
- Reduce number of invocations of fragment program
- Geometric culling
33Optimization Fragment Program
- Reduce number of instructions!
- Do we need dist(x, p) or dist2(x, p)?
- Advantage dist() requires an additional
reciprocal sqrt - Show code demo
34Optimization Fragment Program
- Do we need to evaluate (x p) in fragment
program?
35Optimization Fragment Program
- Do we need to evaluate (x p) in fragment
program? - Rasterization/G80 lectures GPUs have VERY FAST
dedicated hardware for linear interpolation
(lerp) - Lerp color, textures, normals across triangle
vertices
36GPU Linear Interpolation
37Optimization Fragment Program
- Evaluate (x p) at polygon vertices and use
dedicated hardware to lerp at each pixel ! - What about line / triangle sites?
- Can be linearly interpolated too !
- More details later
38GPU Optimizations
- Where to optimize?
- Make fragment program run faster
- GPU / Application dependent optimizations
- Reduce memory bandwidth
- Reduce number of invocations of fragment program
- Geometric culling
39Optimization Memory Bandwidth
- Reduce number of texture lookups, framebuffer
writes - Pack data into fewer channels
- Is bandwidth limited?
40Optimization Memory Bandwidth
- Reduce number of texture lookups, framebuffer
writes - Pack data into fewer channels
- How?
41Optimization Memory Bandwidth
- Pack data into fewer channels
- Using fp32 render target
- 32 bit 4 billion site ids
- We can use only 1 channel (red) for writing site
id instead of 4 channels (RGBA)
42GPU Optimizations
- Where to optimize?
- Make fragment program run faster
- GPU / Application dependent optimizations
- Reduce memory bandwidth
- Reduce number of invocations of fragment program
- Geometric culling
43Linear Factorization
- Distance vector field Gives vector from a point
in 3D to closest point on a site
Line Site
Distance Vectors
44Linear Factorization
- Distance functions are non-linear (quadric)
- Distance Vectors can be factored into linear
terms - Linearly interpolated along each axis
45Linear Factorization 2D
- Distance vectors are linearly interpolated
Line Segment
e
f
46Linear Factorization 3D
- Distance vectors are bi-linearly interpolated
f
e
p
47Linear Factorization 3D
- Distance vectors are bi-linearly interpolated
f
e
b
p
a
48Linear Factorization 3D
- Distance vectors are bi-linearly interpolated
f
e
b
p
a
49Linear Factorization 3D
- Distance vectors are bi-linearly interpolated
f
e
b
p
a
50Linear Factorization 3D
- Distance vectors are bi-linearly interpolated
f
e
b
p
a
51Linear Factorization 3D
- Distance vectors are bi-linearly interpolated
f
e
e
b
p
a
52Linear Factorization 3D
- Distance vectors are bi-linearly interpolated
f
b
e
p
a
c
53Linear Factorization 3D
- Distance vectors are bi-linearly interpolated
- Distance vector in interior of a convex polygon
convex combination of distance vectors at
vertices - Convex polygon from geometry of site
54Domain Computation
- Compute a convex polytope bounding Voronoi region
- Non-manifold sites
- Manifold sites
- Intersect polytope with each slice to get convex
polygonal domain - Clamping computation reduce fill
55Domain Computation Non-Manifold Point
Point
Polygon
Slice
56Domain Computation Line Segment
Line Segment
Polygon
Slice
57Domain Computation Triangle
Triangle
Prism
Polygon
Slice
58Domain Computation Manifold Edge
Edge
- Triangular Prisms given by incident triangles
Prism
Polygon
Slice
59Domain Computation Manifold Vertex
Slice
- Cone given by half-plane intersections
- Compute a bounding right circular cone
- Valid for hyperbolic points
60GPU Based Algorithm
Pentium IV 3.4Ghz NVIDIA GeForce 7800 GTX
Compute bounding polygon
Compute distance vectors
Bi-linear interpolation
Compute Norm
Compute Min
280 GFLOPS
Vertex Processor
Fragment Processor
Raster Ops
Rasterizer
CPU
25.6 GFLOPS
35 GFLOPS
Texture
1.3 TFLOPS
GPU Memory
61Outline
- Distance Fields and Voronoi Diagrams
- Hands on application demo
- Advanced Optimization
- Discussion Why fast on a GPU?
62Distance Field Timings
DiFi GPU
HAVOC
CSC (CPU)
240x
313x
100
10
1
0.1
Disclaimer Non optimized CPU code (No SSE),
includes geometric culling
63Efficiency Parallelism
- Brute-force Insanely parallelizable
- All fragment processors (cores) are being utilized
64Efficiency Dedicated H/W
- Graphics hardware efficiently performs bilinear
interpolation of vertex attributes - Texture coordinates
- Color
- Normals (Phong shading)
- Fast depth test Atomic compare and set
(32bit FP)
65Efficiency Bandwidth
- GPU High bandwidth to framebuffer
- CPU Set of sites grid does NOT fit in L2 cache