Interactive Distributed Ray Tracing of Highly Complex Models - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Interactive Distributed Ray Tracing of Highly Complex Models

Description:

1 ms stall = 1 million cycles = about 1000 rays ! ... Suspend rays that would stall on missing data. Fetch missing data asynchronously ! ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 31
Provided by: Ingo5
Category:

less

Transcript and Presenter's Notes

Title: Interactive Distributed Ray Tracing of Highly Complex Models


1
Interactive Distributed Ray Tracing of Highly
Complex Models
  • Ingo Wald
  • University of Saarbrücken
  • http//graphics.cs.uni-sb.de/wald
  • http//graphics.cs.uni-sb.de/rtrt

2
Reference Model (12.5 million tris)
3
Power Plant- Detail Views
4
Previous Work
  • Interactive Rendering of Massive Models (UNC)
  • Framework of algorithms
  • Textured-depth-meshes (96 reduction in tris)
  • View-Frustum Culling LOD (50 each)
  • Hierarchical occlusion maps (10)
  • Extensive preprocessing required
  • Entire model 3 weeks (estimated)
  • Framerate (Onyx) 5 to 15 fps
  • Needs shared-memory supercomputer

5
Previous Work II
  • Memory Coherent RT, Pharr (Stanford)
  • Explicit cache management for rays and geometry
  • Extensive reordering and scheduling
  • Too slow for interactive rendering
  • Provides global illumination
  • Parallel Ray-Tracing, Parker et al. (Utah) Muus
    (ARL)
  • Needs shared-memory supercomputer
  • Interactive Rendering with Coherent Ray Tracing
    (Saarbrücken, EG 2001)
  • IRT on (cheap) PC systems
  • Avoiding CPU stalls is crucial

6
Previous Work Lessons Learned
  • Rasterization possible for massive models but
    not straightforward (UNC)
  • Interactive Ray Tracing is possible
    (Utah,Saarbrücken)
  • Easy to parallelize
  • Cost is only logarithmic in scene size
  • Conclusion Parallel, Interactive Ray Tracing
    should work great for Massive Models

7
Parallel IRT
  • Parallel Interactive Ray Tracing
  • Supercomputer more threads
  • PCs Distributed IRT on CoW
  • Distributed CoW Need fast access to scene data
  • Simplistic access to scene data
  • mmapCaching, all done automatically by OS
  • Either Replicate scene
  • Extremely inflexible
  • Or Access to single copy of scene over NFS
    (mmap)
  • Network issues Latencies/Bandwidth

8
Simplistic Approach
  • Caching via OS support wont work
  • OS cant even address more than 2Gb of data
  • Massive Models gtgt 2Gb !
  • Also an issue when replicating the scene
  • Process stalls due to demand paging
  • stalls very expensive !
  • Dual-1GHz-PIII 1 ms stall 1 million cycles
    about 1000 rays !
  • OS automatically stalls process ? reordering
    impossible

9
Distributed Scene Access
  • Simplistic approach doesnt work
  • Need manual caching and memory management

10
Caching Scene Data
  • 2-Level Hierarchy of BSP-Trees
  • Caching based on self-contained voxels
  • Clients need only top-level bsp (few kb)
  • Straightforward implementation

11
BSP-Tree Structure and Caching Grain
12
Caching Scene Data
  • Preprocessing Splitting Into Voxels
  • Simple spatial sorting (bsp-tree construction)
  • Out-of-core algorithm due to model size
  • Filesize-limit and address space (2GB)
  • Simplistic implementation 2.5 hours
  • Model Server
  • One machine serves entire model
  • ?Single server Potential bottleneck !
  • Could easily be distributed

13
Hiding CPU Stalls
  • Caching alone does not prevent stalls !
  • Avoiding Stalls ? Reordering
  • Suspend rays that would stall on missing data
  • Fetch missing data asynchronously !
  • Immediately continue with other ray
  • Potentially no CPU stall at all !
  • Resume stalled rays after data is available
  • Can only hide some latency
  • ? Minimize voxel-fetching latencies

14
Reducing Latencies
  • Reduce Network Latencies
  • Prefetching ?
  • Hard to predict data accesses several ms is
    advance !
  • Latency is dominated by transmission time
  • (100Mbit/s ? 1MB 80ms 160 million cycles !!!)
  • Reduce transmitted data volume

15
Reducing Bandwidth
  • Compression of Voxel Data
  • LZO-library provides for 31 compression
  • If compared to original transmission time,
    decompression cost is negligible !
  • Dual-CPU system Sharing of Voxel Cache
  • Amortize bandwidth, storage and decompression
    effort over both CPUs
  • ?Even better for more CPUs

16
Load Balancing
  • Load Balancing
  • Demand driven distribution of tiles (32x32)
  • Buffering of work tiles on the client
  • Avoid communication latency
  • Frame-to-Frame Coherence
  • ?Improves Caching
  • Keep rays on the same client
  • Simple Keep tiles on the same client
    (implemented)
  • Better Assign tiles based on reprojected pixels
    (future)

17
Results
  • Setup
  • Seven dual Pentium-III 800-866 MHzas rendering
    clients
  • 100 Mbit FastEthernet
  • One display model server (same machine)
  • GigabitEthernet (already necessary for pixels
    data)
  • Powerplant Performance
  • 3-6 fps in pure C implementation
  • 6-12 fps with SSE support

18
Animation Framerate vs. Bandwidth
  • ? Latency-hiding works !

19
Scalability
  • Server bottleneck after 12 CPUs
  • ? Distribute model server!

20
Performance Detail Views
Framerate (640x480) 3.9 - 4.7 fps (seven dual
P-III 800-866 Mhz CPUs, NO SSE)
21
Shadows and Reflections
Framerate 1.4-2.2 fps (NO SSE)
22
Demo
23
Conclusions
  • IRT works great for highly complex models !
  • Distribution issues can be solved
  • At least as fast as sophisticated HW-techniques
  • Less preprocessing
  • Cheap
  • Simple easy to extend (shadows, reflections,
    shading,)

24
Future Work
  • Smaller cache granularity
  • Distributed scene server
  • Cache-coherent load balancing
  • Dynamic scenes instances
  • Hardware support for ray-tracing

25
Acknowledgments
  • Anselmo Lastra, UNC
  • Power plant reference model
  • other complex models
    are welcome

26
Questions ?
For further information visit http//graphics.cs.
uni-sb.de/rtrt
27
Four Power Plants (50 million tris)
28
Detailed View of Power Plant
Framerate 4.7 fps (seven dual P-III 800-866 Mhz
CPUs, NO SSE)
29
Detail View Furnace
Framerate 3.9 fps, NO SSE
30
Overview
  • Reference Model
  • Previous Work
  • Distribution Issues
  • Massive Model Issues
  • Images Demo
Write a Comment
User Comments (0)
About PowerShow.com