Title: GPU Shading and Rendering
1(No Transcript)
2GPU Shading and Rendering
3GPU Shading and RenderingIntroduction
4GPU
- GPU Graphics Processing Unit
- Designed for real-time graphics
- Present in almost every PC
- Increasing realismand complexity
Americas Army
5GPU computation
CPU
Displayed Pixels
6Low-level code
!!ARBvp1.0 Transform the normal to view
space TEMP Nv,Np DP3 Nv.x,state.matrix.modelview.
invtrans.row0,vertex.normal DP3
Nv.y,state.matrix.modelview.invtrans.row1,vertex
.normal DP3 Nv.z,state.matrix.modelview.invtrans.
row2,vertex.normal MAD Np,Nv,.9,.9,.9,0,0,0,
0,1 screen position from vertex TEMP Vp DP4
Vp.x, state.matrix.mvp.row0, vertex.position DP
4 Vp.y, state.matrix.mvp.row1,
vertex.position DP4 Vp.z, state.matrix.mvp.row2
, vertex.position DP4 Vp.w, state.matrix.mvp.row
3, vertex.position interpolate MAD Np,
Np, -vertex.color.x, Np MAD result.position, Vp,
vertex.color.x, Np END
7High-level code
void main() vec4 Kin gl_Color //
key input // screen position from vertex,
texture and normal vec4 Vp ftransform()
vec4 Tp vec4(gl_MultiTexCoord0.xy1.8-.9,
0,1) vec4 Np vec4(nn.9,1) //
interpolate between Vp, Tp and Np gl_Position
Vp gl_Position mix(Tp,gl_Position,pow(1.-
Kin.x,8.)) gl_Position mix(Np,gl_Position,p
ow(1.-Kin.y,8.)) // copy to output
gl_TexCoord0 gl_MultiTexCoord0
gl_TexCoord1 Vp gl_TexCoord3 Kin
8Non-real time vs. Real time
- Not real-time
- Developed from General CPU code
- Seconds to hours per frame
- 1000s of lines
- Unlimited computation, texture, memory,
- Real-time
- Developed from fixed-function hardware
- Tens of frames per second
- 1000s of instructions
- Limited computation, texture, memory,
9Non-real time vs. Real-time
Application
Application
Displacement
Texture/ Buffer
Vertex
Surface
Light
Volume
Geometry
Atmosphere
Fragment
Imager
Displayed Pixels
Displayed Pixels
10History (not real-time)
- Testbed Whitted and Weimer 1981
- Shade Trees Cook 1984
- Image Synthesizer Perlin 1985
- RenderMan Hanrahan and Lawson 1990
- Multi-pass RenderMan Peercy et al. 2000
- GPU acceleration Wexler et al. 2005
11History (real-time)
- Custom HW Olano and Lastra 1998
- Multi-pass standard HW Peercy et al. 2000
- Register combiners NVIDIA 2000
- Vertex programs Lindholm et al. 2001
- Compiling to mixed HW Proudfoot et al. 2001
- Fragment programs
- Standardized languages
- Geometry shaders Blythe 2006
12Choices
- OS Windows, Mac, Linux
- API DirectX, OpenGL
- Language HLSL, GLSL, Cg,
- Compiler DirectX, OpenGL, Cg, ASHLI
- Runtime CgFX, ASHLI, OSG ( others), sample code
13Major Commonalities
- Vertex Fragment/Pixel
- C-like, if/while/for
- Structs arrays
- Float small vector and matrix
- Swizzle mask (a.xyz b.xxw)
- Common math shading functions
14GPU Parallelism
Pipeline
15GPU Parallelism
Pipeline
SPMD ParallelFragment Stream
16GPU Parallelism
SIMD Parallel2x2 Block
SPMD ParallelFragment Stream
17GPU Parallelism
SIMD Parallel2x2 Block
Pipeline (NVIDIA)
18GPU Parallelism
Vector ParallelLimited MIMD
Pipeline (NVIDIA)
19Managing GPU Programming
- Simplified computational model
- Bonus consistent as hardware changes
- All stages SIMD
- Explicit 4-element SIMD vectors
- Fixed conversion / remapping between each stage
20Vertex
- One element in / one out
- NO communication
- Can select fragment address
21Geometry
- More next (Blythe talk)
- One element in / 0 to 100 out
- Limited by hardware buffer sizes
- Like vertex
- NO communication
- Can select fragment address
22Fragment
- Biggest computational resource
- One element in / 0 1 out
- Cannot change destination address
- I am element x,y in an array, what is my value?
- Effectively no communication
- Conditionals expensive
- Better if block coherence
23Program / Multiple Passes
- Communication
- None in one pass
- Arbitrary read addresses between passes
- Data layout
- No persistent per-processor memory
- No penalty to change
24Multiple passes
- GPGPU
- Non-local effects
- Shadow maps
- Texture space
- Precomputation
- Fix some degrees of freedom
- Factor into functions of 1-3D
- Project input or output into another space
25GPU Shading and Rendering
26(No Transcript)