Real-Time Computer Graphics - PowerPoint PPT Presentation

1 / 69

About This Presentation

Title:

Real-Time Computer Graphics

Description:

RealTime Computer Graphics – PowerPoint PPT presentation

Number of Views:130

Avg rating:3.0/5.0

Slides: 70

Provided by: kunz2

Category:

more less

Transcript and Presenter's Notes

Title: Real-Time Computer Graphics

1
Real-Time Computer Graphics
Kun Zhou State Key Lab of CADCG Zhejiang
University
www.kunzhou.net
2
The Graphics Process
Lighting
3D Modeling
Image Storage and Display
Rendering
3D Animation
real-time rendering
3
The Graphics Process
Lighting
3D Modeling
Image Storage and Display
Rendering
3D Animation
real-time computer graphics
GPU
4
GPU Data-Parallel Computing Device

Multiple cores, very high memory bandwidth

GF GTX 280 933 GFLOPS 141.7 GB/s
Floating-point operations per second for the
CPU and GPU NVIDIA 2008
5
GPU Stream Processors
GF GTX 280 30 x 8 240 processors
6
Outline

Data structures algorithms
Modeling surface reconstruction
Animation surface deformation
Rendering ray tracing, refraction
Programming tools
BSGP bulk-synchronous GPU programming

7
Outline

Data structures algorithms
Modeling surface reconstruction
Animation surface deformation
Rendering ray tracing, refraction
Programming tools
BSGP bulk-synchronous GPU programming

8
Modeling Surface Reconstruction

Parallel Surface Reconstruction
Technical Report, 2008

A set of 3D points
Triangular mesh
9
Modeling Surface Reconstruction

Parallel Surface Reconstruction
Technical Report, 2008

Octrees on GPUs
nodes, faces, edges, vertices
neighborhood info

Kazhdan06
10
Modeling Surface Reconstruction

Parallel Surface Reconstruction
Technical Report, 2008

Octrees on GPUs
nodes, faces, edges, vertices
neighborhood info

Bottom-up, breadth-first order
Precompute look-up tables to compute neighbors

11
Modeling Surface Reconstruction

Parallel Surface Reconstruction
Technical Report, 2008

Our GPU algorithm 5 FPS for 512K points CPU
algorithm Kazhdan06 42 seconds
12
Modeling Surface Reconstruction

Parallel Surface Reconstruction
Technical Report, 2008

User-guided surface reconstruction
13
Modeling Surface Reconstruction

Parallel Surface Reconstruction
Technical Report, 2008

User-guided surface reconstruction
14
Modeling Surface Reconstruction

Parallel Surface Reconstruction
Technical Report, 2008

On-the-fly conversion of dynamic point clouds
15
Animation Surface Deformation

Direct Manipulation of Subdivision Surfaces on
the GPU, ACM TOG (siggraph), 2007

16
Animation Surface Deformation

Direct Manipulation of Subdivision Surfaces on
the GPU, ACM TOG (siggraph), 2007

vertex positions
Laplacian matrix
Laplacian coordinates
positional constraint matrix
constrained positions
17
Animation Surface Deformation

Direct Manipulation of Subdivision Surfaces on
the GPU, ACM TOG (siggraph), 2007

18
Animation Surface Deformation

Direct Manipulation of Subdivision Surfaces on
the GPU, ACM TOG (siggraph), 2007

19
Animation Surface Deformation

Direct Manipulation of Subdivision Surfaces on
the GPU, ACM TOG (siggraph), 2007

nonlinear least-squares optimization
20
Animation Surface Deformation

Direct Manipulation of Subdivision Surfaces on
the GPU, ACM TOG (siggraph), 2007

Inexact Gauss-Newton iterative solver
21
Animation Surface Deformation

Direct Manipulation of Subdivision Surfaces on
the GPU, ACM TOG (siggraph), 2007

Precompute on the CPU
Compute on the GPU
Subdivision Shiue05, Laplacian coordinates
Compute on the GPU
Matrix-vector multiplication Boltz03, Kruger03

22
Animation Surface Deformation

Direct Manipulation of Subdivision Surfaces on
the GPU, ACM TOG (siggraph), 2007

Real-time MOCAP animation
23
Animation Surface Deformation

Direct Manipulation of Subdivision Surfaces on
the GPU, ACM TOG (siggraph), 2007

24
Rendering Ray Tracing

Real-time KD-Tree Construction on Graphics
Hardware, ACM TOG (siggraph asia), 2008

Interactive frame rates
Shadows, textures
Multi-bounce reflection/refraction

25
Rendering Ray Tracing

Real-time KD-Tree Construction on Graphics
Hardware, ACM TOG (siggraph asia), 2008

KD-tree
Generate Eye Rays
Traverse Acceleration Structure
Intersect Triangles
Shade Hits Generate Secondary Rays
Andrew Morres slides
26
Rendering Ray Tracing

Real-time KD-Tree Construction on Graphics
Hardware, ACM TOG (siggraph asia), 2008

Constructing kd-tree on GPUs
Generate Eye Rays

Maximize parallelism
Build trees in BFS order
Parallelize computation over primitives at upper
tree levels
Preserve high quality
New schemes for node splitting

Traverse Acceleration Structure
Intersect Triangles
Shade Hits Generate Secondary Rays
27
Rendering Ray Tracing

Real-time KD-Tree Construction on Graphics
Hardware, ACM TOG (siggraph asia), 2008

Scene Wald07 1 core Shevtsov07 4 cores Our algorithm GF8800 Ultra
10.5 FPS 23.5 FPS 32.0 FPS
2.30 FPS 5.84 FPS 6.40 FPS
28
Rendering Ray Tracing

Real-time KD-Tree Construction on Graphics
Hardware, ACM TOG (siggraph asia), 2008

photon mapping for caustic rendering
29
Rendering Dynamic Refraction

Interactive Relighting of Dynamic Refractive
Objects, ACM TOG (siggraph), 2008

Interactions
Geometry, lighting, materials, viewpoint
Rendering effects
Refraction, reflection, single scattering
Shadows, caustics

30
Rendering Dynamic Refraction

Interactive Relighting of Dynamic Refractive
Objects, ACM TOG (siggraph), 2008

object voxelization
octree construction
photon generation
adaptive photon tracing
view pass
31
Rendering Dynamic Refraction

Interactive Relighting of Dynamic Refractive
Objects, ACM TOG (siggraph), 2008

object voxelization
object voxelization
octree construction
photon generation
octree construction
photon generation
adaptive photon tracing
adaptive photon tracing
view pass
view pass
all performed on the GPU !
32
Rendering Dynamic Refraction

Interactive Relighting of Dynamic Refractive
Objects, ACM TOG (siggraph), 2008

object voxelization

Dense 3D array instead of sparse tree
Accounts for refractive index and extinction
coefficients
Construction is similar to mipmap

octree construction
photon generation
octree construction
adaptive photon tracing
view pass
33
Rendering Dynamic Refraction

Interactive Relighting of Dynamic Refractive
Objects, ACM TOG (siggraph), 2008

GPU Octree Construction
pyramid of min max values
index of refraction values
octree
index of refraction values
pyramid of hierarchy levels
34
Rendering Dynamic Refraction

Interactive Relighting of Dynamic Refractive
Objects, ACM TOG (siggraph), 2008

Adaptive Photon Tracing
35
Rendering Dynamic Refraction

Interactive Relighting of Dynamic Refractive
Objects, ACM TOG (siggraph), 2008

Surface manipulation
36
Rendering Dynamic Refraction

Interactive Relighting of Dynamic Refractive
Objects, ACM TOG (siggraph), 2008

Volume painting
37
Rendering Dynamic Refraction

Interactive Relighting of Dynamic Refractive
Objects, ACM TOG (siggraph), 2008

Simulation visualization
38
Outline

Data structures algorithms
Modeling surface reconstruction
Animation surface deformation
Rendering ray tracing, refraction
Programming tools
BSGP bulk-synchronous GPU programming, ACM
TOG (siggraph), 2008

39
Programming the GPU

Cg Mark03, GLSL, HLSL graphics oriented
Stream processing
Brook Buck04 streams and kernels
Sh McCool04 meta-programming lib
NVIDIA CUDA scattering, local communication
AMD CAL, Brook
OpenCL
DirectX11 Compute Shader

Cg Mark03, GLSL, HLSL graphics oriented
Stream processing
Brook Buck04 streams and kernels
Sh McCool04 meta-programming lib
NVIDIA CUDA scattering, local communication
AMD CAL, Brook
OpenCL
DirectX11 Compute Shader

40
Stream Processing Model

Data centric uniform streams
Applying individual kernels in parallel to all
stream elements

41
Stream Processing Model

Supplies high performance, but makes GPU
programming hard
Program readability and maintenance
Bundle independent processes to reduce temporary
streams and kernel launches
Manual dataflow management
Recycle temporary streams
Inefficient code reuse
Primitives with broken integrity

Supplies high performance, but makes GPU
programming hard
Program readability and maintenance
Bundle independent processes to reduce temporary
streams and kernel launches
Manual dataflow management
Recycle temporary streams
Inefficient code reuse
Primitives with broken integrity

42
BSGP Model

Programmer specifies barriers, compiler deduces
supersteps Valiant 1990

43
BSGP Model

Programmer specifies barriers, compiler deduces
supersteps Valiant 1990
Implicit data dependencies through local variables

44
BSGP Model

Programmer specifies barriers, compiler deduces
supersteps Valiant 1990
Implicit data dependencies through local
variables
Allows collective operation
Parallel primitives are called as a whole in a
single statement

45
BSGP Model

Easy to read, write and maintain
Similar or better performance than native
languages
i.e., CUDA...
Complex programs
i.e., X3D parser

46
Example one-ring neighborhood

Compute the one-ring neighboring triangles of
each vertex of a triangular mesh

v1 t1 , t2 , t3 , t4 , t5
v2 t4 , t5 , t6 , t7 , t8 , t9
v3
t6
t7
t5
t1
v2
t8
t4
t2
v1
t3
t9
v3
47
One-ring neighborhood BSGP version
48
One-ring neighborhood BSGP version

Sorting the triplicated triangles

49
One-ring neighborhood BSGP version

Sorting the triplicated triangles
Compute each vertexs head pointer

50
One-ring neighborhood CUDA version
Dataflow management
Kernels
51
BSGP Language Constructs

spawn and barrier
Insert CPU code require
Thread manipulation fork and kill
Communication thread.get and thread.put
Reducing barriers par
Parallel primitive operations, including reduce,
scan and sort

52
BSGP Language Constructs

spawn and barrier
Insert CPU code require
Thread manipulation fork and kill
Communication thread.get and thread.put
Reducing barriers par
Parallel primitive operations, including reduce,
scan and sort

53
BSGP Language Constructs

spawn and barrier
Insert CPU code require
Thread manipulation fork and kill
Communication thread.get and thread.put
Reducing barriers par
Parallel primitive operations, including reduce,
scan and sort

54
BSGP Language Constructs

spawn and barrier
Insert CPU code require
Thread manipulation fork and kill
Communication thread.get and thread.put
Reducing barriers par
Parallel primitive operations, including reduce,
scan and sort

55
BSGP Language Constructs

spawn and barrier
Insert CPU code require
Thread manipulation fork and kill
Communication thread.get and thread.put
Reducing barriers par
Parallel primitive operations, including reduce,
scan and sort

spawn and barrier
Insert CPU code require
Thread manipulation fork and kill

56
BSGP Language Constructs

spawn and barrier
Insert CPU code require
Thread manipulation fork and kill
Communication thread.get and thread.put
Reducing barriers par
Parallel primitive operations, including reduce,
scan and sort

spawn and barrier
Insert CPU code require
Thread manipulation fork and kill
Communication thread.get and thread.put

57
BSGP Language Constructs

spawn and barrier
Insert CPU code require
Thread manipulation fork and kill
Communication thread.get and thread.put
Reducing barriers par
Parallel primitive operations, including reduce,
scan and sort

spawn and barrier
Insert CPU code require
Thread manipulation fork and kill
Communication thread.get and thread.put
Reducing barriers par

58
BSGP Language Constructs

spawn and barrier
Insert CPU code require
Thread manipulation fork and kill
Communication thread.get and thread.put
Reducing barriers par
Parallel primitive operations, including reduce,
scan and sort

59
Sample Applications
Recursive ray tracer
Particle simulation
X3D parser
Adaptive tessellation
60
Recursive Ray Tracer

Both BSGP and CUDA are Implemented and optimized
by the same programmer

61
Recursive Ray Tracer

Both BSGP and CUDA are Implemented and optimized
by the same programmer
Clear advantage in code complexity
Similar performance and memory usage

CUDA BSGP
Render fps 4.00 4.61
Mem usage 144M 150M
Code lines 815 475
GPU funcs 10 3
Coding days 23 1
Tuning days 45 23
62
Particle Simulation

CUDA SDK demo
Rewrote simulation module in BSGP, reused GUI
code

63
Particle Simulation
CUDA BSGP
Render fps 187 290
Module lines - 154
Total lines 2113 1579
Coding time - 1 hour

CUDA SDK demo
Rewrote simulation module in BSGP, reused GUI
code
Simpler and faster

Integration and sort preparation arent bundled
Sort isnt bundled with sort preparation
Sort calls unbundled scan

64
X3D Parser

BSGP implementation
Incremental development
16 GPU functions, compiled into 82 kernels, 19k
lines of assembly
15x faster than CPU parser
Extremely difficult in CUDA

An 7.03MB X3D scene Loaded in 183ms
65
Adaptive Tessellation

A displacement map based terrain renderer

66
Adaptive Tessellation

Without thread manipulation
Parallelized over all input triangles
With thread manipulation
Parallelized over output vertices using
thread.fork

View no thread man. no thread man. with thread man. with thread man. vert output
View Ttess FPS Ttess FPS vert output
Side 43.9ms 21.0 3.62ms 142 1.14M
Top 5.0ms 144 2.1ms 249 322k
2x10x speedup
67
Try BSGP Now!

BSGP compiler, programming guide, primitive
library, editor and all example code
http//www.kunzhou.net/BSGP

68
Summary

GPUs are fast and cheap, and are getting faster
and cheaper
General-purpose computing
Re-think your algorithms to be massively parallel
Data structures quadtree, octree, kd-tree
Algorithms nonlinear/linear optimization,
matrix-vector operations, parallel primitives
Programming the GPU
BSGP makes programmers life much easier

69
Questions?Kun Zhou
kunzhou_at_acm.org
70
Other Real-Time Applications
Dynamic BRDF (sig2007)
Soft shadow (sig2006)
Smoke (sig2008)
Skinning (sig2008)

Write a Comment

User Comments (0)