Interactive Distributed Ray Tracing of Highly Complex Models - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

Interactive Distributed Ray Tracing of Highly Complex Models

Description:

1 ms stall = 1 million cycles = about 1000 rays ! ... Suspend rays that would stall on missing data. Fetch missing data asynchronously ! ... – PowerPoint PPT presentation

Number of Views:37

Avg rating:3.0/5.0

Slides: 31

Provided by: Ingo5

Category:

more less

Transcript and Presenter's Notes

Title: Interactive Distributed Ray Tracing of Highly Complex Models

1
Interactive Distributed Ray Tracing of Highly
Complex Models

Ingo Wald
University of Saarbrücken
http//graphics.cs.uni-sb.de/wald
http//graphics.cs.uni-sb.de/rtrt

2
Reference Model (12.5 million tris)
3
Power Plant- Detail Views
4
Previous Work

Interactive Rendering of Massive Models (UNC)
Framework of algorithms
Textured-depth-meshes (96 reduction in tris)
View-Frustum Culling LOD (50 each)
Hierarchical occlusion maps (10)
Extensive preprocessing required
Entire model 3 weeks (estimated)
Framerate (Onyx) 5 to 15 fps
Needs shared-memory supercomputer

5
Previous Work II

Memory Coherent RT, Pharr (Stanford)
Explicit cache management for rays and geometry
Extensive reordering and scheduling
Too slow for interactive rendering
Provides global illumination
Parallel Ray-Tracing, Parker et al. (Utah) Muus
(ARL)
Needs shared-memory supercomputer
Interactive Rendering with Coherent Ray Tracing
(Saarbrücken, EG 2001)
IRT on (cheap) PC systems
Avoiding CPU stalls is crucial

6
Previous Work Lessons Learned

Rasterization possible for massive models but
not straightforward (UNC)
Interactive Ray Tracing is possible
(Utah,Saarbrücken)
Easy to parallelize
Cost is only logarithmic in scene size
Conclusion Parallel, Interactive Ray Tracing
should work great for Massive Models

7
Parallel IRT

Parallel Interactive Ray Tracing
Supercomputer more threads
PCs Distributed IRT on CoW
Distributed CoW Need fast access to scene data
Simplistic access to scene data
mmapCaching, all done automatically by OS
Either Replicate scene
Extremely inflexible
Or Access to single copy of scene over NFS
(mmap)
Network issues Latencies/Bandwidth

8
Simplistic Approach

Caching via OS support wont work
OS cant even address more than 2Gb of data
Massive Models gtgt 2Gb !
Also an issue when replicating the scene
Process stalls due to demand paging
stalls very expensive !
Dual-1GHz-PIII 1 ms stall 1 million cycles
about 1000 rays !
OS automatically stalls process ? reordering
impossible

9
Distributed Scene Access

Simplistic approach doesnt work
Need manual caching and memory management

10
Caching Scene Data

2-Level Hierarchy of BSP-Trees
Caching based on self-contained voxels
Clients need only top-level bsp (few kb)
Straightforward implementation

11
BSP-Tree Structure and Caching Grain
12
Caching Scene Data

Preprocessing Splitting Into Voxels
Simple spatial sorting (bsp-tree construction)
Out-of-core algorithm due to model size
Filesize-limit and address space (2GB)
Simplistic implementation 2.5 hours
Model Server
One machine serves entire model
?Single server Potential bottleneck !
Could easily be distributed

13
Hiding CPU Stalls

Caching alone does not prevent stalls !
Avoiding Stalls ? Reordering
Suspend rays that would stall on missing data
Fetch missing data asynchronously !
Immediately continue with other ray
Potentially no CPU stall at all !
Resume stalled rays after data is available
Can only hide some latency
? Minimize voxel-fetching latencies

14
Reducing Latencies

Reduce Network Latencies
Prefetching ?
Hard to predict data accesses several ms is
advance !
Latency is dominated by transmission time
(100Mbit/s ? 1MB 80ms 160 million cycles !!!)
Reduce transmitted data volume

15
Reducing Bandwidth

Compression of Voxel Data
LZO-library provides for 31 compression
If compared to original transmission time,
decompression cost is negligible !
Dual-CPU system Sharing of Voxel Cache
Amortize bandwidth, storage and decompression
effort over both CPUs
?Even better for more CPUs

16
Load Balancing

Load Balancing
Demand driven distribution of tiles (32x32)
Buffering of work tiles on the client
Avoid communication latency
Frame-to-Frame Coherence
?Improves Caching
Keep rays on the same client
Simple Keep tiles on the same client
(implemented)
Better Assign tiles based on reprojected pixels
(future)

17
Results

Setup
Seven dual Pentium-III 800-866 MHzas rendering
clients
100 Mbit FastEthernet
One display model server (same machine)
GigabitEthernet (already necessary for pixels
data)
Powerplant Performance
3-6 fps in pure C implementation
6-12 fps with SSE support

18
Animation Framerate vs. Bandwidth

? Latency-hiding works !

19
Scalability

Server bottleneck after 12 CPUs
? Distribute model server!

20
Performance Detail Views
Framerate (640x480) 3.9 - 4.7 fps (seven dual
P-III 800-866 Mhz CPUs, NO SSE)
21
Shadows and Reflections
Framerate 1.4-2.2 fps (NO SSE)
22
Demo
23
Conclusions

IRT works great for highly complex models !
Distribution issues can be solved
At least as fast as sophisticated HW-techniques
Less preprocessing
Cheap
Simple easy to extend (shadows, reflections,
shading,)

24
Future Work

Smaller cache granularity
Distributed scene server
Cache-coherent load balancing
Dynamic scenes instances
Hardware support for ray-tracing

25
Acknowledgments

Anselmo Lastra, UNC
Power plant reference model
other complex models
are welcome

26
Questions ?
For further information visit http//graphics.cs.
uni-sb.de/rtrt
27
Four Power Plants (50 million tris)
28
Detailed View of Power Plant
Framerate 4.7 fps (seven dual P-III 800-866 Mhz
CPUs, NO SSE)
29
Detail View Furnace
Framerate 3.9 fps, NO SSE
30
Overview