Title: Implementing the Render Cache and the Edge-and-Point Image on Graphics Hardware
1Implementing the Render Cache and the
Edge-and-Point Image on Graphics Hardware
- Edgar Velázquez-Armendáriz
- Eugene Lee
- Bruce Walter
- Kavita Bala
2Motivation
- High quality shading is still too slow.
- Not ready for interactivity.
- It is slow even on the GPU.
- Potential applications.
- Architecture.
- Modeling.
- Movies.
3Overview
- GPU acceleration of the Render Cache and the
Edge-and-Point Image (EPI).
Points
Render Cache reconstruction
EPI reconstruction
Edges and Points
4Render Cache overview
Projection
Depth cull
Interpolation
5Edge-and-Point Image overview
- Alternative display representation
- Edge-constrained interpolation preserves sharp
features - Fast anti-aliasing
6Presented work
- Mapping to the hardware
- The algorithms components differ from standard
hardware rendering. - Overcome GPU limitations.
- Results
- GPU strategies.
- Better interactivity.
7Related Work
- Interactive.
- Shading cache. Tole02
- Corrective texturing. Stamminger00
- Tapestry. Simmons00
- Adaptive Frameless Rendering. Dayal05
- Distance impostors. Szirmay-Kalos05
- Non-interactive.
- Irradiance caching. Smky05
- Pure Hardware implementations.
- Ray tracing. Purcell02, Carr06
- Photon mapping. Purcell03
8Talk overview
- Algorithm overview.
- Mapping to the hardware strategies and
challenges. - Results.
- Discussion.
9Overview
10Overview
11Overview
12Public availability
- The complete Cg source of the shaders is
available online - http//www.cs.cornell.edu/kb/projects/epigpu/
13Talk overview
- Algorithm overview.
- Mapping to the hardware strategies and
challenges. - Results.
- Discussion.
14Mapping to the hardware
- Sections are grouped on computational similarity
- Point processing
- Edge finding
- Edge constrained interpolation
- Most of the processing has been moved to the GPU.
15Point processing
- Point Cloud as Vertex Buffer Object (VBO) and
Texture. - Multiple Render Targets (MRT) used to write all
information in a single pass. - Simplified predicted projection.
- Not as accurate as the regular projection.
4 one-pixel points
1 splat point using one quarter of the point cloud
16Point processing Update
- Render Caches structures are complex to map.
- We cannot modify pipelined GPU data.
- Use additional passes.
17Point processing Bandwidth issues
- Point projection is bandwidth limited.
- Point cloud update.
- New samples request.
- Write to the point cloud only the new samples.
- We use vertex scatter.
- Faster than replacing all the point cloud.
- A static VBO is projected three times faster than
a constantly modified one.
18Silhouette detection
- The original EPI uses hierarchical trees.
- Does not map well to GPU.
- Brute force method on the GPU.
- Avoid edges transfer every frame.
- Faster than hierarchical structures!
- Shadow edge detection left on the CPU.
Edge texture
Model edges
19Silhouette detection Limitations
- GPU silhouette detection is limited by the fill
rate. - Texture memory constraints.
- We need to keep all vertices as VBO.
- Vertices and normals as textures.
- One results texture.
- Normals stored as fp16 to reduce space.
20Edge Raster
- Raster edges with subpixel precision.
- Depends on model complexity.
- Extended lines as described in SEN03.
- Filtered depth as read-only depth buffer.
- Free occlusion culling!
No depth texture
With depth texture
21Edge Constrained Interpolation
- Multi-pass pixel shaders.
- Very long.
- A lot of texture accesses.
- Image resolution dependent.
- Use look-up tables encoded as textures.
- Avoid control code in shaders.
- Encode original EPI operations.
22Future trends
- Branching granularity.
- Some filters require fine granularity to take
advance of dynamic branching. - This issue is being solved with newer cards
beginning with ATI X1000 series. - Bit operations not directly supported.
- DirectX 10 will support them.
- Bottom line GPU implementation will get better
and faster.
23Limitations
- Fill rate and texture access.
- These characteristics constantly improve with
newer hardware with more pipelines and faster
clock frequencies. - Improve by diminishing shaders length.
- Number of registers used is still important.
- A 180 instructions shader with 25 registers
performs 50 slower than a 215 instructions
shader with and 24 registers on our GPU.
24Talk overview
- Algorithm overview.
- Mapping to the hardware strategies and
challenges. - Results.
- Discussion.
25Test platform
- Test environment.
- Software written in C, Cg 1.4rc, and Java
through JNI under Windows XP. - Pentium 4 EE 3.2 Ghz dual core, 2 GB RAM, dual
Nvidia GeForce 7800 GTX (81.85). - Test scenes.
- Cornell Box
- Chains
- Mackintosh Room
- David Head
- Dragon
26Results FPS
- GPU version is 60110 faster than the original.
- Speed up increases along with scene complexity.
27Results Speed increase from CPU
28Results Rendering times
29Talk overview
- Algorithm overview.
- Mapping to the hardware strategies and
challenges. - Results.
- Discussion.
30Discussion
- Point projection, even though it maps
straightforwardly to the GPU is the bottleneck. - Image filters are very fast in spite of their
multiple texture accesses and multiple passes. - We originally thought the opposite would be true!
31Discussion
- Projection is not optimal.
- We wanted to use Vertex Texture Fetch (VTF) for
mapping the point cloud update but it was slower
than Render to Vertex Array (RTV). - Dual GPU rendering with Scalable Link Interface
(SLI) showed marginal gains.
32Future performance
- Texture accesses are very fast and efficient.
- Transferring vertex data on the GPU is too slow
to be fully useful. - Scatter write on pixel shaders and geometry
shaders may allow complete data management on the
GPU.
33Conclusions
- We presented a hybrid GPU/CPU system for the
Render Cache and the EPI using commodity graphics
hardware. - Our implementation is 60-110 faster than a pure
CPU implementation and frees the CPU up for other
operations. - Systems performance is likely to improve with
the current trend of GPUs.
34Questions?
Implementing the Render Cache and the
Edge-and-Point Image on Graphics Hardware
http//www.cs.cornell.edu/kb/projects/epigpu/