Title: Interactive CGF Computations using COTS Graphics Processors
1Interactive CGF Computations using COTS Graphics
Processors
- Dinesh Manocha
- University of North Carolina at Chapel Hill
- dm_at_cs.unc.edu
- http//gamma.cs.unc.edu/LOS
2UNC Collaborators
- Co-PI
- Ming C. Lin
- Research Staff
- Naga Govindaraju
- Dave Tuft
- Graduate Students
- Russ Gayle
- Brandon Lloyd
- Brian Salomon
- Avneesh Sud
- Sungeui Yoon
- Talha Zaman
3Collaborative Effort
- RDECOM
- Maria Bauer
- Angel Rodriguez
- SAIC
- Eric Root
- Marlo Verdesca
- Jaeson Munro
- Stanford
- Pat Hanrahan
- Ian Buck
4Acknowledgements
- BCSEO
- DARPA
- RDECOM
- PEO STRI
5Real-time Computational Challenges for Computer
Generated Forces (CGF)
- Atmospheric transport models
- Vehicle dynamics
- Wide area sensors
- Petabyte Urban Terrain Databases
6Real-time Terrain Reasoning for Computer
Generated Forces
- Best algorithms are O(N2 ) where N
objects/entities in the CGF database (e.g.,
sensors, platforms, buildings, people) - Currently over 40 of CGF CPU time for
battalion-level scenarios spent in - Collision detection
- Line of sight computation
- Terrain placement
- Current system can barely handle 300 entities on
a 300K polygon terrain models at 10m x 10m
resolution - Need 200-500 times improvement to handle
sub-meter resolution terrain model - CPUs progressing at Moores law (1.7x per year)
? need more than 7-8 years to catch on
7Current Desktop System
GPU (500 MHz)
Video Memory(512 MB)
2 x 1 MB Cache
System Memory(2 GB)
PCI-E Bus(4 GB/s)
GPU (500 MHz)
Video Memory(512 MB)
35.2 GB/s bandwidth
6.4 GB/s bandwidth
8GeForce 7800 302M Transistors
9CPU vs. GPU
10CPU vs. GPU(Henry Moreton NVIDIA, Aug. 2005)
PEE 840 7800GTX GPU/CPU
Graphics GFLOPs 25.6 1300 50.8
Shader GFLOPs 25.6 313 12.2
Die Area (mm2) 206 326 1.6
Die Area normalized 206 218 1.1
Transistors (M) 230 302 1.3
Power (W) 130 65 0.5
GFLOPS/mm 0.1 6.0 47.9
GFLOPS/tr 0.1 4.3 38.7
GFLOPS/W 0.2 20.0 101.6
11GoalExploit GPUs for CGF Computations GPUs
Growing Faster than Moores Law
This graph highlights the relative growth rate of
GPUs vs. CPUs. GPUs have been growing at a rate
faster than Moores law and this trend is
expected to continue for at least 5 more years.
12Issues in using GPUs
- Programmability
- Precision
- Handling large data
13Project Accomplishments
- GPU-based LOS algorithm
- 150-200x improvement in LOS query
- Integration into OneSAF 15-20x simulation speed
improvement (5000 entities)
14Project Accomplishments
- GPU-based LOS algorithm
- 150-200x improvement in LOS query
- Integration into OneSAF 15-20x simulation speed
improvement (5000 entities) - Region-based visibility algorithms to accelerate
LOS (Supported by ATO) - 4-10x further improvement in LOS query
- Integrations into OneSAF 10x simulation speed
improvement in urban environments (3000 entities)
15Project Accomplishments
- GPU-based LOS algorithm
- 150-200x improvement in LOS query
- Integration into OneSAF 15-20x simulation speed
improvement (5000 entities) - Region-based visibility algorithms to accelerate
LOS - 4-10x further improvement in LOS query
- Integrations into OneSAF 10x simulation speed
improvement in urban environments (3000 entities) - GPU-based route planning
- 10-30X improvement in route computation
- 10x simulation speed improvement (3000 entities)
16Project Accomplishments
- GPU-based LOS algorithm
- 150-200x improvement in LOS query
- Integration into OneSAF 15-20x simulation speed
improvement (5000 entities) - Region-based visibility algorithms to accelerate
LOS - 4-10x further improvement in LOS query
- Integrations into OneSAF 10x simulation speed
improvement in urban environments (3000 entities) - GPU-based route planning
- 10-30X improvement in route computation
- 10x simulation speed improvement (3000 entities)
- GPU-based collision detection
- 10x estimated improvement in collision query
- 10x simulation speed improvement (150 entities)
17Project Accomplishments
- Successful demonstration at DARPATech2005
I/ITSEC04 I/ITSEC05 (RDECOM Booth 2266) - Other GPU-based algorithms applications
- Database, data streaming, numerical computation,
fluid dynamics, sorting, motion planning
18LOS Integration Process
19OneSAF with GPU-based LOS AlgorithmDemonstration
- Average time for Standard LOS service call
- 1-2 millisecond (w/o GPU-based algorithm)
- Average time for GPU LOS service call
- 8-12 microseconds
- Almost 200X speedup for single LOS query
- 15-20x improvement in OneSAF simulation speed in
JRTC terrain with 5000 entities
20Databases Predicate Evaluation
CPU implementation Intel compiler 7.1 with SSE
optimizations
(CPU GPU) is 20 times faster than only
CPU SIGMOD 2004
21Comparison on Different GPUs Super-Moores Law
22GPUSort 32-bit floating point inputs
GPUSORT slashdot.org Toms Hardware guide
(750 downloads in 6 weeks)
23LU-Decomposition with Partial Pivoting (32-bit
inputs)
IEEE/ACM SuperComputing 2005
24Project Status
- Integration of GPU-based algorithms in OOS
- Line-of-sight
- Route planning
- Collision detection
- 35 publications in last 18 months
- 2 best paper awards (Pacific Graphics04 IEEE
VR05) - Paper presentations on GPU technology in OOS
- Poster presentation at Army Science Conference04
- Best paper in Research Development Track at
I/ITSEC05 - Nominated for best overall paper award at
I/ITSEC05 - Other applications sorting, stream data mining,
surgical simulation, physical simulation,
computer animation, high-performance computing - Other collaborators NVIDIA, Intel, ATI, AGEIA,
Disney
25Future Goals
- Develop novel GPU-based algorithms
- Other LOS computations attenuation, handling
smoke - Force and atmospheric simulations
- Combine with multi-resolution representations
- Handle very large and complex terrains
- GPUs clusters for modeling and simulation
- Extension to multiple simulation environments,
WARSIM, JMTK, GIG