Title: Photon Mapping on Programmable Graphics Hardware
1Photon Mapping on Programmable Graphics Hardware
- Timothy J. Purcell
- Mike Cammarano
- Pat Hanrahan
- Stanford University
Craig Donner Henrik Wann Jensen University of
California, San Diego
2Motivation
3Motivation
- Interactive global illumination on the GPU
- Nearly have sufficient compute power and
flexibility - Explore GPU-based computation algorithms
4Related Work
- CPU-based interactive global illumination
- Supercomputers Parker et al.
- Clusters Tole et al., Wald et al.
- Global illumination on programmable GPUs
- Ray tracing Carr et al., Purcell et al.
- Photon mapping Ma et al.
- Radiosity Carr et al., Coombe et al.
- Translucency Carr et al., Stamminger et al.
5Photon Mapping Algorithm Review
- Photon tracing
- Emission, scattering, storing into kd-tree
- Similar to ray tracing
- Rendering
- Ray tracing for direct illumination
- Photon map visualization
- Indirect bounce
6Computational Challenge for GPUs 1
- Constructing a irregular or sparse data structure
7Computational Challenge for GPUs 2
- Adaptive nearest neighbor search
- Noise vs. blur
8Computational Challenge for GPUs 2
- Adaptive nearest neighbor search
- Noise vs. blur
9Photon Mapping on the CPU
- Balanced kd-tree
- Compact storage of photons
- Efficient
- O(log n) search
- Priority queue
- Nearest neighbor search
- Incremental insertion and removal of photons
10Algorithmic Changes for the GPU
- Direct visualization of photon map
- Keeps rendering costs low
- Use grid instead of kd-tree
- Tried kd-tree
- Kd-tree construction is difficult
- Radiance estimate
- Fixed radius search works fine
- Adaptive search needs priority queue
- No priority queue
- Cant build on GPU
- Too much state
11Contributions
- Mapped complete grid-based photon mapping
algorithm onto the GPU - Including photon tracing, ray tracing, etc.
- Implemented an adaptive k-nearest neighbor search
- kNN-grid
- Show how to construct a sparse data structure on
the GPU - Bitonic merge sort with binary search
- Stencil routing
12Configuring the GPU for Computing
- GPU as data parallel compute engine
- Fragment programs execute compute kernels
- Screen sized quad initializes computation
- SIMD execution
- Floating point texture memory
- Render-to-texture for intermediate results
- Data structure storage
- Pointer dereferencing via dependent fetches
13Computational Challenge 1
- Building a Sparse Data Structure
14Building a Sparse Data Structure
- Requires scatter
- Dependent texture write
- Why dont we have fragment scatter?
- Fragment processing has highly coherent blocked
memory writes - Extra hardware support would be needed
- Write hazards
- Memory latencies
15Scatter on the GPU
- Sort photons into grid cells
- Grid cell is sort key
- Simulate scatter with fragment programs
- Bitonic merge sort followed by binary search
- Compact grid
- O(log2 n) rendering passes
16Bitonic Merge Sort
1
2
3
3
3
3
3
2
1
4
4
4
7
7
3
3
2
7
8
4
8
4
4
1
8
7
8
4
5
6
6
6
5
6
2
6
5
5
5
6
2
6
7
7
7
2
2
1
5
8
8
8
1
1
5
1
O(log2 n) rendering passes
17Binary Search
- Grid cell searches for self in photon list
- If none, find first element in next cell
- Empty grid cells waste compute
- Log(n) 1 steps
18Binary Search
- Grid cell searches for self in photon list
- If none, find first element in next cell
- Empty grid cells waste compute
- Log(n) 1 steps
Searching for first v5 photon
Sorted Photon List
initialize
v0
v0
v2
v2
v5
v0
v5
v2
19Binary Search
- Grid cell searches for self in photon list
- If none, find first element in next cell
- Empty grid cells waste compute
- Log(n) 1 steps
Searching for first v5 photon
Sorted Photon List
initialize
v0
v0
v2
v2
v5
v0
v5
v2
step 1
v0
v0
v2
v2
v2
v0
v5
v5
20Binary Search
- Grid cell searches for self in photon list
- If none, find first element in next cell
- Empty grid cells waste compute
- Log(n) 1 steps
Searching for first v5 photon
Sorted Photon List
initialize
v0
v0
v2
v2
v5
v0
v5
v2
step 1
v0
v0
v2
v2
v2
v0
v5
v5
step 2
v5
v0
v0
v2
v2
v5
v0
v2
21Binary Search
- Grid cell searches for self in photon list
- If none, find first element in next cell
- Empty grid cells waste compute
- Log(n) 1 steps
Searching for first v5 photon
Sorted Photon List
initialize
v0
v0
v2
v2
v5
v0
v5
v2
step 1
v0
v0
v2
v2
v2
v0
v5
v5
step 2
v5
v0
v0
v2
v2
v5
v0
v2
step 3
v0
v0
v2
v2
v2
v0
v5
v5
22Binary Search
- Grid cell searches for self in photon list
- If none, find first element in next cell
- Empty grid cells waste compute
- Log(n) 1 steps
Searching for first v5 photon
Sorted Photon List
initialize
v0
v0
v2
v2
v5
v0
v5
v2
step 1
v0
v0
v2
v2
v2
v0
v5
v5
step 2
v5
v0
v0
v2
v2
v5
v0
v2
step 3
v0
v0
v2
v2
v2
v0
v5
v5
step 4
v0
v0
v2
v2
v2
v0
v5
v5
23Scatter on the GPU
- Vertex programs can scatter
- Draw point to buffer
- Collisions?
24Scatter on the GPU
- Vertex programs can scatter
- Draw point to buffer
- Collisions?
- Stencil routing
- Limit photon count per grid cell
- Pre-allocate grid cell space
- Draw photons as points
- Vertex program computes grid cell
- Stencil buffer controls location within cell
- Single rendering pass
25Stencil Routing
- Fix each grid cell size to n2 pixels
- Draw fat points to cover each fat cell
- glPointSize(n)
Vertex ( photon_pos )
Vertex Program
4 pixels
Flattened Grid
26Stencil Routing
- Control location written to with stencil
- Pass when stencil is n2 -1
- Stencil always increments
- Location written depends on draw order
Vertex ( photon_pos )
Vertex Program
4 pixels
Flattened Grid
Stencil Values
Stencil
2
3
2
3
1 pixel
0
1
0
1
3
4
2
3
1
2
0
1
27Computational Challenge 2
- Adaptive Nearest Neighbor Search
28Adaptive Nearest Neighbor Search
- Iterative algorithm
- Accept or reject photons in cell visit order
29kNN-grid Algorithm
Want a 4 photon estimate
30kNN-grid Algorithm
- Candidate photons must be within max search
radius - Visit voxels in order of distance to sample point
Want a 4 photon estimate
31kNN-grid Algorithm
- If current number of photons in estimate is less
than number requested, grow search radius
1
Want a 4 photon estimate
32kNN-grid Algorithm
- If current number of photons in estimate is less
than number requested, grow search radius
2
Want a 4 photon estimate
33kNN-grid Algorithm
- Dont add photons outside maximum search radius
- Dont grow search radius when photon is outside
maximum radius
2
Want a 4 photon estimate
34kNN-grid Algorithm
- Add photons within search radius
3
Want a 4 photon estimate
35kNN-grid Algorithm
- Add photons within search radius
4
Want a 4 photon estimate
36kNN-grid Algorithm
- Dont expand search radius if enough photons
already found
4
Want a 4 photon estimate
37kNN-grid Algorithm
- Add photons within search radius
5
Want a 4 photon estimate
38kNN-grid Algorithm
- Visit all other voxels accessible within
determined search radius - Add photons within search radius
6
Want a 4 photon estimate
39kNN-grid Algorithm
- Finds all photons within a sphere centered about
sample point - May locate more than requested k-nearest neighbors
6
Want a 4 photon estimate
40System Implementation
- NVIDIA GeForce FX 5900 Ultra (NV35)
- Cg compiler 1.1
Compute Lighting
Render Image
Trace Photons
Build Photon Map
Ray Trace Scene
Compute Radiance Estimate
41Demos
42Glass Ball Bitonic Sort
18s _at_ 512x384, 5K photons
43Glass Ball Stencil Routing
11s _at_ 512x384, 5K photons
44Ring Bitonic Sort
9s _at_ 512x384, 16K photons
45Ring Stencil Routing
8s _at_ 512x384, 16K photons
46Cornell Box Bitonic Sort
64s _at_ 512x512, 65K photons
47Cornell Box Stencil Routing
47s _at_ 512x512, 65K photons
48Cornell Box Increased Search Radius
49Open Issues (1)
- How to prevent program execution over a subset of
pixels? - Non-uniform pixel computation distribution
- Radiance estimate
- KILL is only a write mask
- Early-z occlusion culling
- No pixel level control
- Compute mask, branching, or stream buffer?
- Improve radiance estimate speed by 30-70 over
tiling
50Open Issues (2)
- Scatter
- Makes (a programmers) life easier
- Is it worth implementing?
- Gain factor of log2 n avoiding sort
51Future Work
- Kd-trees
- Photon power redistribution
- Adaptive sampling
- Progressive refinement
52Conclusions
- The GPU can compute an entire global illumination
solution - Nearly interactive
- Implemented an adaptive k-nearest neighbor query
for the GPU - kNN-grid
- Shown how to construct sparse data structures on
the GPU - Bitonic merge sort and binary search
- Stencil routing
- Sorting and searching algorithms applicable to
other computations
53Acknowledgments
- Stanford FlashG
- Ian Buck, Mike Houston, Kekoa Proudfoot
- Stencil routing
- Kurt Akeley, Matt Papakipos
- Hardware and drivers
- David Kirk, Nick Triantos
- Funding
- NVIDIA, DARPA, NSF, 3Com