Implementing the Render Cache and the Edge-and-Point Image on Graphics Hardware

About This Presentation

Title:

Implementing the Render Cache and the Edge-and-Point Image on Graphics Hardware

Description:

Title: Implementing the Render Cache and the Edge-and-Point Image on Graphics Hardware Subject: GI 2006 Author: Edgar Vel zquez-Armend riz Last modified by –

Number of Views:233

Avg rating:3.0/5.0

Slides: 35

Provided by: Edga51

Learn more at: https://www.cs.cornell.edu

Category:

more less

Transcript and Presenter's Notes

Title: Implementing the Render Cache and the Edge-and-Point Image on Graphics Hardware

1
Implementing the Render Cache and the
Edge-and-Point Image on Graphics Hardware

Edgar Velázquez-Armendáriz
Eugene Lee
Bruce Walter
Kavita Bala

2
Motivation

High quality shading is still too slow.
Not ready for interactivity.
It is slow even on the GPU.
Potential applications.
Architecture.
Modeling.
Movies.

3
Overview

GPU acceleration of the Render Cache and the
Edge-and-Point Image (EPI).

Points
Render Cache reconstruction
EPI reconstruction
Edges and Points
4
Render Cache overview
Projection
Depth cull
Interpolation
5
Edge-and-Point Image overview

Alternative display representation
Edge-constrained interpolation preserves sharp
features
Fast anti-aliasing

6
Presented work

Mapping to the hardware
The algorithms components differ from standard
hardware rendering.
Overcome GPU limitations.
Results
GPU strategies.
Better interactivity.

7
Related Work

Interactive.
Shading cache. Tole02
Corrective texturing. Stamminger00
Tapestry. Simmons00
Adaptive Frameless Rendering. Dayal05
Distance impostors. Szirmay-Kalos05
Non-interactive.
Irradiance caching. Smky05
Pure Hardware implementations.
Ray tracing. Purcell02, Carr06
Photon mapping. Purcell03

8
Talk overview

Algorithm overview.
Mapping to the hardware strategies and
challenges.
Results.
Discussion.

9
Overview
10
Overview
11
Overview
12
Public availability

The complete Cg source of the shaders is
available online
http//www.cs.cornell.edu/kb/projects/epigpu/

13
Talk overview

Algorithm overview.
Mapping to the hardware strategies and
challenges.
Results.
Discussion.

14
Mapping to the hardware

Sections are grouped on computational similarity
Point processing
Edge finding
Edge constrained interpolation
Most of the processing has been moved to the GPU.

15
Point processing

Point Cloud as Vertex Buffer Object (VBO) and
Texture.
Multiple Render Targets (MRT) used to write all
information in a single pass.
Simplified predicted projection.
Not as accurate as the regular projection.

4 one-pixel points
1 splat point using one quarter of the point cloud
16
Point processing Update

Render Caches structures are complex to map.
We cannot modify pipelined GPU data.
Use additional passes.

17
Point processing Bandwidth issues

Point projection is bandwidth limited.
Point cloud update.
New samples request.
Write to the point cloud only the new samples.
We use vertex scatter.
Faster than replacing all the point cloud.
A static VBO is projected three times faster than
a constantly modified one.

18
Silhouette detection

The original EPI uses hierarchical trees.
Does not map well to GPU.
Brute force method on the GPU.
Avoid edges transfer every frame.
Faster than hierarchical structures!
Shadow edge detection left on the CPU.

Edge texture
Model edges
19
Silhouette detection Limitations

GPU silhouette detection is limited by the fill
rate.
Texture memory constraints.
We need to keep all vertices as VBO.
Vertices and normals as textures.
One results texture.
Normals stored as fp16 to reduce space.

20
Edge Raster

Raster edges with subpixel precision.
Depends on model complexity.
Extended lines as described in SEN03.
Filtered depth as read-only depth buffer.
Free occlusion culling!

No depth texture
With depth texture
21
Edge Constrained Interpolation

Multi-pass pixel shaders.
Very long.
A lot of texture accesses.
Image resolution dependent.
Use look-up tables encoded as textures.
Avoid control code in shaders.
Encode original EPI operations.

22
Future trends

Branching granularity.
Some filters require fine granularity to take
advance of dynamic branching.
This issue is being solved with newer cards
beginning with ATI X1000 series.
Bit operations not directly supported.
DirectX 10 will support them.
Bottom line GPU implementation will get better
and faster.

23
Limitations

Fill rate and texture access.
These characteristics constantly improve with
newer hardware with more pipelines and faster
clock frequencies.
Improve by diminishing shaders length.
Number of registers used is still important.
A 180 instructions shader with 25 registers
performs 50 slower than a 215 instructions
shader with and 24 registers on our GPU.

24
Talk overview

Algorithm overview.
Mapping to the hardware strategies and
challenges.
Results.
Discussion.

25
Test platform

Test environment.
Software written in C, Cg 1.4rc, and Java
through JNI under Windows XP.
Pentium 4 EE 3.2 Ghz dual core, 2 GB RAM, dual
Nvidia GeForce 7800 GTX (81.85).
Test scenes.
Cornell Box
Chains
Mackintosh Room
David Head
Dragon

26
Results FPS

GPU version is 60110 faster than the original.
Speed up increases along with scene complexity.

27
Results Speed increase from CPU
28
Results Rendering times
29
Talk overview

Algorithm overview.
Mapping to the hardware strategies and
challenges.
Results.
Discussion.

30
Discussion

Point projection, even though it maps
straightforwardly to the GPU is the bottleneck.
Image filters are very fast in spite of their
multiple texture accesses and multiple passes.
We originally thought the opposite would be true!

31
Discussion

Projection is not optimal.
We wanted to use Vertex Texture Fetch (VTF) for
mapping the point cloud update but it was slower
than Render to Vertex Array (RTV).
Dual GPU rendering with Scalable Link Interface
(SLI) showed marginal gains.

32
Future performance

Texture accesses are very fast and efficient.
Transferring vertex data on the GPU is too slow
to be fully useful.
Scatter write on pixel shaders and geometry
shaders may allow complete data management on the
GPU.

33
Conclusions

We presented a hybrid GPU/CPU system for the
Render Cache and the EPI using commodity graphics
hardware.
Our implementation is 60-110 faster than a pure
CPU implementation and frees the CPU up for other
operations.
Systems performance is likely to improve with
the current trend of GPUs.

34
Questions?
Implementing the Render Cache and the
Edge-and-Point Image on Graphics Hardware
http//www.cs.cornell.edu/kb/projects/epigpu/

Write a Comment

User Comments (0)