Title: Real-Time Computer Graphics
1Real-Time Computer Graphics
Kun Zhou State Key Lab of CADCG Zhejiang
University
www.kunzhou.net
2The Graphics Process
Lighting
3D Modeling
Image Storage and Display
Rendering
3D Animation
real-time rendering
3The Graphics Process
Lighting
3D Modeling
Image Storage and Display
Rendering
3D Animation
real-time computer graphics
GPU
4GPU Data-Parallel Computing Device
- Multiple cores, very high memory bandwidth
GF GTX 280 933 GFLOPS 141.7 GB/s
Floating-point operations per second for the
CPU and GPU NVIDIA 2008
5GPU Stream Processors
GF GTX 280 30 x 8 240 processors
6Outline
- Data structures algorithms
- Modeling surface reconstruction
- Animation surface deformation
- Rendering ray tracing, refraction
- Programming tools
- BSGP bulk-synchronous GPU programming
7Outline
- Data structures algorithms
- Modeling surface reconstruction
- Animation surface deformation
- Rendering ray tracing, refraction
- Programming tools
- BSGP bulk-synchronous GPU programming
8Modeling Surface Reconstruction
- Parallel Surface Reconstruction
Technical Report, 2008
A set of 3D points
Triangular mesh
9Modeling Surface Reconstruction
- Parallel Surface Reconstruction
Technical Report, 2008
- Octrees on GPUs
- nodes, faces, edges, vertices
- neighborhood info
Kazhdan06
10Modeling Surface Reconstruction
- Parallel Surface Reconstruction
Technical Report, 2008
- Octrees on GPUs
- nodes, faces, edges, vertices
- neighborhood info
- Bottom-up, breadth-first order
- Precompute look-up tables to compute neighbors
11Modeling Surface Reconstruction
- Parallel Surface Reconstruction
Technical Report, 2008
Our GPU algorithm 5 FPS for 512K points CPU
algorithm Kazhdan06 42 seconds
12Modeling Surface Reconstruction
- Parallel Surface Reconstruction
Technical Report, 2008
User-guided surface reconstruction
13Modeling Surface Reconstruction
- Parallel Surface Reconstruction
Technical Report, 2008
User-guided surface reconstruction
14Modeling Surface Reconstruction
- Parallel Surface Reconstruction
Technical Report, 2008
On-the-fly conversion of dynamic point clouds
15Animation Surface Deformation
- Direct Manipulation of Subdivision Surfaces on
the GPU, ACM TOG (siggraph), 2007
16Animation Surface Deformation
- Direct Manipulation of Subdivision Surfaces on
the GPU, ACM TOG (siggraph), 2007
vertex positions
Laplacian matrix
Laplacian coordinates
positional constraint matrix
constrained positions
17Animation Surface Deformation
- Direct Manipulation of Subdivision Surfaces on
the GPU, ACM TOG (siggraph), 2007
18Animation Surface Deformation
- Direct Manipulation of Subdivision Surfaces on
the GPU, ACM TOG (siggraph), 2007
19Animation Surface Deformation
- Direct Manipulation of Subdivision Surfaces on
the GPU, ACM TOG (siggraph), 2007
nonlinear least-squares optimization
20Animation Surface Deformation
- Direct Manipulation of Subdivision Surfaces on
the GPU, ACM TOG (siggraph), 2007
Inexact Gauss-Newton iterative solver
21Animation Surface Deformation
- Direct Manipulation of Subdivision Surfaces on
the GPU, ACM TOG (siggraph), 2007
- Precompute on the CPU
- Compute on the GPU
- Subdivision Shiue05, Laplacian coordinates
- Compute on the GPU
- Matrix-vector multiplication Boltz03, Kruger03
22Animation Surface Deformation
- Direct Manipulation of Subdivision Surfaces on
the GPU, ACM TOG (siggraph), 2007
Real-time MOCAP animation
23Animation Surface Deformation
- Direct Manipulation of Subdivision Surfaces on
the GPU, ACM TOG (siggraph), 2007
24Rendering Ray Tracing
- Real-time KD-Tree Construction on Graphics
Hardware, ACM TOG (siggraph asia), 2008
-
- Interactive frame rates
- Shadows, textures
- Multi-bounce reflection/refraction
25Rendering Ray Tracing
- Real-time KD-Tree Construction on Graphics
Hardware, ACM TOG (siggraph asia), 2008
KD-tree
Generate Eye Rays
Traverse Acceleration Structure
Intersect Triangles
Shade Hits Generate Secondary Rays
Andrew Morres slides
26Rendering Ray Tracing
- Real-time KD-Tree Construction on Graphics
Hardware, ACM TOG (siggraph asia), 2008
Constructing kd-tree on GPUs
Generate Eye Rays
- Maximize parallelism
- Build trees in BFS order
- Parallelize computation over primitives at upper
tree levels - Preserve high quality
- New schemes for node splitting
Traverse Acceleration Structure
Intersect Triangles
Shade Hits Generate Secondary Rays
27Rendering Ray Tracing
- Real-time KD-Tree Construction on Graphics
Hardware, ACM TOG (siggraph asia), 2008
Scene Wald07 1 core Shevtsov07 4 cores Our algorithm GF8800 Ultra
10.5 FPS 23.5 FPS 32.0 FPS
2.30 FPS 5.84 FPS 6.40 FPS
28Rendering Ray Tracing
- Real-time KD-Tree Construction on Graphics
Hardware, ACM TOG (siggraph asia), 2008
photon mapping for caustic rendering
29Rendering Dynamic Refraction
- Interactive Relighting of Dynamic Refractive
Objects, ACM TOG (siggraph), 2008
- Interactions
- Geometry, lighting, materials, viewpoint
- Rendering effects
- Refraction, reflection, single scattering
- Shadows, caustics
30Rendering Dynamic Refraction
- Interactive Relighting of Dynamic Refractive
Objects, ACM TOG (siggraph), 2008
object voxelization
octree construction
photon generation
adaptive photon tracing
view pass
31Rendering Dynamic Refraction
- Interactive Relighting of Dynamic Refractive
Objects, ACM TOG (siggraph), 2008
object voxelization
object voxelization
octree construction
photon generation
octree construction
photon generation
adaptive photon tracing
adaptive photon tracing
view pass
view pass
all performed on the GPU !
32Rendering Dynamic Refraction
- Interactive Relighting of Dynamic Refractive
Objects, ACM TOG (siggraph), 2008
object voxelization
- Dense 3D array instead of sparse tree
- Accounts for refractive index and extinction
coefficients - Construction is similar to mipmap
octree construction
photon generation
octree construction
adaptive photon tracing
view pass
33Rendering Dynamic Refraction
- Interactive Relighting of Dynamic Refractive
Objects, ACM TOG (siggraph), 2008
GPU Octree Construction
pyramid of min max values
index of refraction values
octree
index of refraction values
pyramid of hierarchy levels
34Rendering Dynamic Refraction
- Interactive Relighting of Dynamic Refractive
Objects, ACM TOG (siggraph), 2008
Adaptive Photon Tracing
35Rendering Dynamic Refraction
- Interactive Relighting of Dynamic Refractive
Objects, ACM TOG (siggraph), 2008
Surface manipulation
36Rendering Dynamic Refraction
- Interactive Relighting of Dynamic Refractive
Objects, ACM TOG (siggraph), 2008
Volume painting
37Rendering Dynamic Refraction
- Interactive Relighting of Dynamic Refractive
Objects, ACM TOG (siggraph), 2008
Simulation visualization
38Outline
- Data structures algorithms
- Modeling surface reconstruction
- Animation surface deformation
- Rendering ray tracing, refraction
- Programming tools
- BSGP bulk-synchronous GPU programming, ACM
TOG (siggraph), 2008
39Programming the GPU
- Cg Mark03, GLSL, HLSL graphics oriented
- Stream processing
- Brook Buck04 streams and kernels
- Sh McCool04 meta-programming lib
- NVIDIA CUDA scattering, local communication
- AMD CAL, Brook
- OpenCL
- DirectX11 Compute Shader
- Cg Mark03, GLSL, HLSL graphics oriented
- Stream processing
- Brook Buck04 streams and kernels
- Sh McCool04 meta-programming lib
- NVIDIA CUDA scattering, local communication
- AMD CAL, Brook
- OpenCL
- DirectX11 Compute Shader
40Stream Processing Model
- Data centric uniform streams
- Applying individual kernels in parallel to all
stream elements
41Stream Processing Model
- Supplies high performance, but makes GPU
programming hard - Program readability and maintenance
- Bundle independent processes to reduce temporary
streams and kernel launches - Manual dataflow management
- Recycle temporary streams
- Inefficient code reuse
- Primitives with broken integrity
- Supplies high performance, but makes GPU
programming hard - Program readability and maintenance
- Bundle independent processes to reduce temporary
streams and kernel launches - Manual dataflow management
- Recycle temporary streams
- Inefficient code reuse
- Primitives with broken integrity
42BSGP Model
- Programmer specifies barriers, compiler deduces
supersteps Valiant 1990
43BSGP Model
- Programmer specifies barriers, compiler deduces
supersteps Valiant 1990 - Implicit data dependencies through local variables
44BSGP Model
- Programmer specifies barriers, compiler deduces
supersteps Valiant 1990 - Implicit data dependencies through local
variables - Allows collective operation
- Parallel primitives are called as a whole in a
single statement
45BSGP Model
- Easy to read, write and maintain
- Similar or better performance than native
languages - i.e., CUDA...
- Complex programs
- i.e., X3D parser
46Example one-ring neighborhood
- Compute the one-ring neighboring triangles of
each vertex of a triangular mesh
v1 t1 , t2 , t3 , t4 , t5
v2 t4 , t5 , t6 , t7 , t8 , t9
v3
t6
t7
t5
t1
v2
t8
t4
t2
v1
t3
t9
v3
47One-ring neighborhood BSGP version
48One-ring neighborhood BSGP version
- Sorting the triplicated triangles
49One-ring neighborhood BSGP version
- Sorting the triplicated triangles
- Compute each vertexs head pointer
50One-ring neighborhood CUDA version
Dataflow management
Kernels
51BSGP Language Constructs
- spawn and barrier
- Insert CPU code require
- Thread manipulation fork and kill
- Communication thread.get and thread.put
- Reducing barriers par
- Parallel primitive operations, including reduce,
scan and sort
52BSGP Language Constructs
- spawn and barrier
- Insert CPU code require
- Thread manipulation fork and kill
- Communication thread.get and thread.put
- Reducing barriers par
- Parallel primitive operations, including reduce,
scan and sort
53BSGP Language Constructs
- spawn and barrier
- Insert CPU code require
- Thread manipulation fork and kill
- Communication thread.get and thread.put
- Reducing barriers par
- Parallel primitive operations, including reduce,
scan and sort
54BSGP Language Constructs
- spawn and barrier
- Insert CPU code require
- Thread manipulation fork and kill
- Communication thread.get and thread.put
- Reducing barriers par
- Parallel primitive operations, including reduce,
scan and sort
55BSGP Language Constructs
- spawn and barrier
- Insert CPU code require
- Thread manipulation fork and kill
- Communication thread.get and thread.put
- Reducing barriers par
- Parallel primitive operations, including reduce,
scan and sort
- spawn and barrier
- Insert CPU code require
- Thread manipulation fork and kill
56BSGP Language Constructs
- spawn and barrier
- Insert CPU code require
- Thread manipulation fork and kill
- Communication thread.get and thread.put
- Reducing barriers par
- Parallel primitive operations, including reduce,
scan and sort
- spawn and barrier
- Insert CPU code require
- Thread manipulation fork and kill
- Communication thread.get and thread.put
57BSGP Language Constructs
- spawn and barrier
- Insert CPU code require
- Thread manipulation fork and kill
- Communication thread.get and thread.put
- Reducing barriers par
- Parallel primitive operations, including reduce,
scan and sort
- spawn and barrier
- Insert CPU code require
- Thread manipulation fork and kill
- Communication thread.get and thread.put
- Reducing barriers par
58BSGP Language Constructs
- spawn and barrier
- Insert CPU code require
- Thread manipulation fork and kill
- Communication thread.get and thread.put
- Reducing barriers par
- Parallel primitive operations, including reduce,
scan and sort
59Sample Applications
Recursive ray tracer
Particle simulation
X3D parser
Adaptive tessellation
60Recursive Ray Tracer
- Both BSGP and CUDA are Implemented and optimized
by the same programmer
61Recursive Ray Tracer
- Both BSGP and CUDA are Implemented and optimized
by the same programmer - Clear advantage in code complexity
- Similar performance and memory usage
CUDA BSGP
Render fps 4.00 4.61
Mem usage 144M 150M
Code lines 815 475
GPU funcs 10 3
Coding days 23 1
Tuning days 45 23
62Particle Simulation
- CUDA SDK demo
- Rewrote simulation module in BSGP, reused GUI
code
63Particle Simulation
CUDA BSGP
Render fps 187 290
Module lines - 154
Total lines 2113 1579
Coding time - 1 hour
- CUDA SDK demo
- Rewrote simulation module in BSGP, reused GUI
code - Simpler and faster
- Integration and sort preparation arent bundled
- Sort isnt bundled with sort preparation
- Sort calls unbundled scan
64X3D Parser
- BSGP implementation
- Incremental development
- 16 GPU functions, compiled into 82 kernels, 19k
lines of assembly - 15x faster than CPU parser
- Extremely difficult in CUDA
An 7.03MB X3D scene Loaded in 183ms
65Adaptive Tessellation
- A displacement map based terrain renderer
66Adaptive Tessellation
- Without thread manipulation
- Parallelized over all input triangles
- With thread manipulation
- Parallelized over output vertices using
thread.fork
View no thread man. no thread man. with thread man. with thread man. vert output
View Ttess FPS Ttess FPS vert output
Side 43.9ms 21.0 3.62ms 142 1.14M
Top 5.0ms 144 2.1ms 249 322k
2x10x speedup
67Try BSGP Now!
- BSGP compiler, programming guide, primitive
library, editor and all example code - http//www.kunzhou.net/BSGP
68Summary
- GPUs are fast and cheap, and are getting faster
and cheaper - General-purpose computing
- Re-think your algorithms to be massively parallel
- Data structures quadtree, octree, kd-tree
- Algorithms nonlinear/linear optimization,
matrix-vector operations, parallel primitives - Programming the GPU
- BSGP makes programmers life much easier
69Questions?Kun Zhou
kunzhou_at_acm.org
70Other Real-Time Applications
Dynamic BRDF (sig2007)
Soft shadow (sig2006)
Smoke (sig2008)
Skinning (sig2008)