Implementing the Render Cache and the Edge-and-Point Image on Graphics Hardware

About This Presentation
Title:

Implementing the Render Cache and the Edge-and-Point Image on Graphics Hardware

Description:

Title: Implementing the Render Cache and the Edge-and-Point Image on Graphics Hardware Subject: GI 2006 Author: Edgar Vel zquez-Armend riz Last modified by –

Number of Views:233
Avg rating:3.0/5.0
Slides: 35
Provided by: Edga51
Category:

less

Transcript and Presenter's Notes

Title: Implementing the Render Cache and the Edge-and-Point Image on Graphics Hardware


1
Implementing the Render Cache and the
Edge-and-Point Image on Graphics Hardware
  • Edgar Velázquez-Armendáriz
  • Eugene Lee
  • Bruce Walter
  • Kavita Bala

2
Motivation
  • High quality shading is still too slow.
  • Not ready for interactivity.
  • It is slow even on the GPU.
  • Potential applications.
  • Architecture.
  • Modeling.
  • Movies.

3
Overview
  • GPU acceleration of the Render Cache and the
    Edge-and-Point Image (EPI).

Points
Render Cache reconstruction
EPI reconstruction
Edges and Points
4
Render Cache overview
Projection
Depth cull
Interpolation
5
Edge-and-Point Image overview
  • Alternative display representation
  • Edge-constrained interpolation preserves sharp
    features
  • Fast anti-aliasing

6
Presented work
  • Mapping to the hardware
  • The algorithms components differ from standard
    hardware rendering.
  • Overcome GPU limitations.
  • Results
  • GPU strategies.
  • Better interactivity.

7
Related Work
  • Interactive.
  • Shading cache. Tole02
  • Corrective texturing. Stamminger00
  • Tapestry. Simmons00
  • Adaptive Frameless Rendering. Dayal05
  • Distance impostors. Szirmay-Kalos05
  • Non-interactive.
  • Irradiance caching. Smky05
  • Pure Hardware implementations.
  • Ray tracing. Purcell02, Carr06
  • Photon mapping. Purcell03

8
Talk overview
  • Algorithm overview.
  • Mapping to the hardware strategies and
    challenges.
  • Results.
  • Discussion.

9
Overview
10
Overview
11
Overview
12
Public availability
  • The complete Cg source of the shaders is
    available online
  • http//www.cs.cornell.edu/kb/projects/epigpu/

13
Talk overview
  • Algorithm overview.
  • Mapping to the hardware strategies and
    challenges.
  • Results.
  • Discussion.

14
Mapping to the hardware
  • Sections are grouped on computational similarity
  • Point processing
  • Edge finding
  • Edge constrained interpolation
  • Most of the processing has been moved to the GPU.

15
Point processing
  • Point Cloud as Vertex Buffer Object (VBO) and
    Texture.
  • Multiple Render Targets (MRT) used to write all
    information in a single pass.
  • Simplified predicted projection.
  • Not as accurate as the regular projection.

4 one-pixel points
1 splat point using one quarter of the point cloud
16
Point processing Update
  • Render Caches structures are complex to map.
  • We cannot modify pipelined GPU data.
  • Use additional passes.

17
Point processing Bandwidth issues
  • Point projection is bandwidth limited.
  • Point cloud update.
  • New samples request.
  • Write to the point cloud only the new samples.
  • We use vertex scatter.
  • Faster than replacing all the point cloud.
  • A static VBO is projected three times faster than
    a constantly modified one.

18
Silhouette detection
  • The original EPI uses hierarchical trees.
  • Does not map well to GPU.
  • Brute force method on the GPU.
  • Avoid edges transfer every frame.
  • Faster than hierarchical structures!
  • Shadow edge detection left on the CPU.

Edge texture
Model edges
19
Silhouette detection Limitations
  • GPU silhouette detection is limited by the fill
    rate.
  • Texture memory constraints.
  • We need to keep all vertices as VBO.
  • Vertices and normals as textures.
  • One results texture.
  • Normals stored as fp16 to reduce space.

20
Edge Raster
  • Raster edges with subpixel precision.
  • Depends on model complexity.
  • Extended lines as described in SEN03.
  • Filtered depth as read-only depth buffer.
  • Free occlusion culling!

No depth texture
With depth texture
21
Edge Constrained Interpolation
  • Multi-pass pixel shaders.
  • Very long.
  • A lot of texture accesses.
  • Image resolution dependent.
  • Use look-up tables encoded as textures.
  • Avoid control code in shaders.
  • Encode original EPI operations.

22
Future trends
  • Branching granularity.
  • Some filters require fine granularity to take
    advance of dynamic branching.
  • This issue is being solved with newer cards
    beginning with ATI X1000 series.
  • Bit operations not directly supported.
  • DirectX 10 will support them.
  • Bottom line GPU implementation will get better
    and faster.

23
Limitations
  • Fill rate and texture access.
  • These characteristics constantly improve with
    newer hardware with more pipelines and faster
    clock frequencies.
  • Improve by diminishing shaders length.
  • Number of registers used is still important.
  • A 180 instructions shader with 25 registers
    performs 50 slower than a 215 instructions
    shader with and 24 registers on our GPU.

24
Talk overview
  • Algorithm overview.
  • Mapping to the hardware strategies and
    challenges.
  • Results.
  • Discussion.

25
Test platform
  • Test environment.
  • Software written in C, Cg 1.4rc, and Java
    through JNI under Windows XP.
  • Pentium 4 EE 3.2 Ghz dual core, 2 GB RAM, dual
    Nvidia GeForce 7800 GTX (81.85).
  • Test scenes.
  • Cornell Box
  • Chains
  • Mackintosh Room
  • David Head
  • Dragon

26
Results FPS
  • GPU version is 60110 faster than the original.
  • Speed up increases along with scene complexity.

27
Results Speed increase from CPU
28
Results Rendering times
29
Talk overview
  • Algorithm overview.
  • Mapping to the hardware strategies and
    challenges.
  • Results.
  • Discussion.

30
Discussion
  • Point projection, even though it maps
    straightforwardly to the GPU is the bottleneck.
  • Image filters are very fast in spite of their
    multiple texture accesses and multiple passes.
  • We originally thought the opposite would be true!

31
Discussion
  • Projection is not optimal.
  • We wanted to use Vertex Texture Fetch (VTF) for
    mapping the point cloud update but it was slower
    than Render to Vertex Array (RTV).
  • Dual GPU rendering with Scalable Link Interface
    (SLI) showed marginal gains.

32
Future performance
  • Texture accesses are very fast and efficient.
  • Transferring vertex data on the GPU is too slow
    to be fully useful.
  • Scatter write on pixel shaders and geometry
    shaders may allow complete data management on the
    GPU.

33
Conclusions
  • We presented a hybrid GPU/CPU system for the
    Render Cache and the EPI using commodity graphics
    hardware.
  • Our implementation is 60-110 faster than a pure
    CPU implementation and frees the CPU up for other
    operations.
  • Systems performance is likely to improve with
    the current trend of GPUs.

34
Questions?
Implementing the Render Cache and the
Edge-and-Point Image on Graphics Hardware
http//www.cs.cornell.edu/kb/projects/epigpu/
Write a Comment
User Comments (0)
About PowerShow.com