Title: Clustered Principal Components for Precomputed Radiance Transfer
1Clustered Principal Components for Precomputed
Radiance Transfer
- Peter-Pike Sloan
- Microsoft Corporation
- Jesse Hall, John Hart
- UIUC
- John Snyder
- Microsoft Research
2Demo
3PRT Terminology
4PRT Terminology
5PRT Terminology
6PRT Terminology
7PRT as a Linear Operator
- l light vector (in source basis)
- Mp source-to-exit transfer matrix
- ep exit radiance vector (in exit basis)
- y(vp) exit basis evaluated in direction vp
- ep(vp) exit radiance in direction vp
8PRT Special Case Diffuse Objects
transfer vector rather than matrix
PRT02 SH Xi03 Directional Ng03 Haar Ash
ikhmin02 Steerable
- independent of view (constant exit basis)
- matrix is row vector
- previous work uses different light bases
- image relighting
9PRT Special Case Surface Light Fields
transfer vector rather than matrix
Miller98 Nishino99 Wood00 Chen02 Matusik0
2
- frozen lighting environment
- matrix is column vector
10Factoring PRT (BRDFs)
- Tp source ? transferred incident radiance
- Rp rotate to local frame
- B integrate against BRDF Westin92
- y(vp) ep evaluate exit radiance at vp
11Hemispherical Projection
- exit radiance is defined over hemisphere, not
sphere - spherical harmonics not orthogonal over
hemisphere - how to project hemispherical functions using SH?
- naïve projection assumes underside is zero
- least squares projection minimizes approximation
error - see appendix
12Factoring PRT (BRDFs)
Technique LightB ExitB Note
Sloan02 SH SH Phong
Kautz02 SH Dir Arb
Lehtinen03 SH Dir Lsq
Matusik02 Dir Dir IBR
13Extending PRT to BSSRDFs
- already handled by original equation
- use Jensen02, only multiple scattering (matrix
with only 1 row) - mix with conventional BRDF
14Problems With PRT
- Big matrices at each surface point
- 25-vectors for diffuse, x3 for spectral
- 25x25-matrices for glossy
- at 50,000 vertices
- Slows glossy rendering (4hz)
- Frozen View/Light can increase performance
- Not as GPU friendly
- Limits diffuse lighting order
- Only very soft shadows
15Compression Goals
- Decode efficiently
- As much on the GPU as possible
- Render compressed representation directly
- Increase rendering performance
- Make non-diffuse case practical
- Reduce memory consumption
- Not just on disk
16Compression Example
Surface is curve, signal is normal
17Compression Example
Signal Space
18VQ
Cluster normals
19VQ
Replace samples with cluster mean
20PCA
Replace samples with mean linear combination
21CPCA
Compute a linear subspace in each cluster
22CPCA
- Clusters with low dimensional affine models
- How should clustering be done?
- Static PCA
- VQ, followed by one-time per-cluster PCA
- optimizes for piecewise-constant reconstruction
- Iterative PCA
- PCA in the inner loop, slower to compute
- optimizes for piecewise-affine reconstruction
23Static vs. Iterative
24Related Work
- VQPCA Kambhatla94 (static)
- VQPCA Khambhatla97 (iterative)
- Mixture PC Dony95 (iterative)
- More sophisticated models exist
- Brand03, Roweis02
- Mapping to current GPUs is challenging
- Variable storage per vertex
- Partitioning is more difficult (or requires more
passes)
25Equal Rendering Cost
VQ
PCA
CPCA
26Rendering with CPCA
27Rendering with CPCA
Constant per cluster precompute on the
CPU Rendering is a dot product Compute linear
combination of vectors Only depends on rows of
M
28Non-Local Viewer
- Assume
- vp constant across object (distant viewer)
Rendering independent of view light orders -
linear combination of colors
29Rendering
30Overdraw
- faces belong to 1-3 clusters
- OD 1 ? face drawn once
- OD 2 ? face drawn 2x
- OD 3 ? face drawn 3x
- coherence optimization
- reclassification
- superclustering
31GPU Dataflow
Vertices
Vertex Shader
PixelShader
32Demo
33Results
All examples have 25x25 matrices, 256 clusters,
8 PCA vectors
Model Pts SPCA IPCA FPS
Buddha 49.9k 3m30s 1h51m 27
BuddhaSS 49.9k 6m12s 4h32m 27
Bird Anis 48.7k 6m34s 3h43m 45
Bird Diff 48.7k 43s 3m26s 227
Head 50k 4m20s 2h12m 58.5
34Conclusions
- CPCA
- works in signal space, not surface space
- uses affine subspace per-cluster
- compresses PRT well
- is used directly without blowing out signal
- requires small, uniform state storage
- provides
- faster rendering
- higher-frequency lighting
35Future Work
- time-dependent and parameterized geometry
- higher-frequency lighting
- combination with bi-scale rendering
- better signal continuity
36Questions?
- DirectX SDK for PRT available soon.
- Jason Mitchell, Hugues Hoppe, Jason Sandlin,
David Kirk - Stanford, MPI for models