A Hardware Processing Unit For Point Sets - PowerPoint PPT Presentation

About This Presentation
Title:

A Hardware Processing Unit For Point Sets

Description:

Coherent Neighbor Cache (eNN) Find neighbors in slightly bigger radius ... Coherent Neighbor Cache (kNN, exact) Find (k 1) neighbors ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 45
Provided by: shei71
Category:

less

Transcript and Presenter's Notes

Title: A Hardware Processing Unit For Point Sets


1
A Hardware Processing Unit For Point Sets
  • S. Heinzle, G. Guennebaud,M. Botsch, M. Gross
  • Graphics Hardware 2008

2
Motivation
  • Point-based graphics established
  • Powerful algorithms
  • Representation
  • Processing
  • Manipulation
  • Rendering
  • Decomposition
  • Get neighborhood
  • Operate on neighbors

3
Motivation
  • GPUs not suited for getting neighborhood
  • SIMD
  • Incoherent branching
  • Dynamic data structures slow
  • Recursive calls not supported
  • CPUs
  • Small number of FPUs
  • Inflexible memory caches

Courtesy of NVIDIA
Courtesy of Intel
4
Contributions
  • Hardware architecture for point sets
  • Neighbor search module
  • Novel advanced caching mechanism
  • Reconfigurable processing module
  • Programmability using FPGA compiler
  • FPGA prototype and measurements
  • Small Lean
  • ? Integration into multi-core CPU/GPU possible

5
Outline
  • Related Work
  • Spatial Searching and Caching
  • Architecture and Prototype
  • Results
  • Conclusion

6
Related Work
  • Kd-Tree
  • Bentley 75

kNN on GPUsMa and McCool 02
Kd-Tree Hardware Woop et al. 05 Woop et al. 06
Kd-Tree on GPUs Popov et al. 07
7
Related Work
Algebraic Moving Least Squares, Guennebaud and
Gross 07
Linear Moving Least Squares, Adamson and Alexa
04
  • Adaptive SPH Fluid Simulation
  • Adams et al. 07

8
Linear Moving Least Squares
  • Implicit surface definition defined by set of
    points

9
Linear Moving Least Squares
  • Implicit surface definition defined by set of
    points

x
10
Linear Moving Least Squares
10
ni
pi
x
11
Linear Moving Least Squares
  • Iterative projections onto plane

x
12
Linear Moving Least Squares
  • Iterative projections onto plane

x
x

13
Linear Moving Least Squares
  • Iterative projections onto plane

x
x

14
Linear Moving Least Squares
  • Iterative projections onto plane

x
x

15
Linear Moving Least Squares
  • Surface defined by points projecting onto
    themselves

x
16
Outline
  • Related Work
  • Spatial Searching and Caching
  • Architecture Prototype
  • Results
  • Conclusion

17
Spatial Search
  • Spatial search kNN and eNN
  • Common in most point operations
  • Based on kd-tree
  • Example eNN

18
Spatial Search
  • kNN search similar to eNN search
  • Start with infinite radius
  • Sort leaf points into priority queue
  • Shrink radius with every point sorted

19
Coherent Neighbor Cache(eNN)
  • Find neighbors in slightly bigger radius
  • Re-use result for spatially close query

20
Coherent Neighbor Cache(kNN, exact)
  • Find (k1) neighbors
  • Re-use result for spatially close query

21
Coherent Neighbor Cache(kNN, approximation)
  • Approximation error e
  • Enlarge radius

22
Outline
  • Related Work
  • Spatial Searching and Caching
  • Architecture Prototype
  • Results
  • Conclusion

23
The Architecture
Host
24
Coherent Neighbor Cache
0
0
0
1
1
1
n
n
n
  • Eight cached neighborhoods
  • Problem parallel queries in kd-tree module
  • ? Interleave spatially similar queries

25
Kd-Tree Traversal
26
NodeRecurse
  • Kd-tree structure on chip
  • 16 threads
  • Pipelining and multi-threading

27
Stacks
  • 16 stacks
  • Parallel read/write
  • Bounded in depth
  • 6 bytes per thread per recursion

28
Leaf
  • 16 parallel priority queues (1-cycle ops)
  • Queues store pointers and distances
  • Bandwidth bottleneck

29
Processing Module
  • Multithreaded quad-port bank of 16 registers
  • 128 threads
  • Programmability using FPGA-technology

30
Further Data
  • Implemented on two FPGAs
  • 64 bit DDR DRAM
  • Interconnection no overhead
  • Resource usage regs and LUTs
  • Virtex 2 Pro 100 (kNN) 26 registers, 38 LUTs
  • Virtex 2 Pro 70 (MLS)47 registers, 52 LUTs
  • Clock frequency 75 MHz

31
Outline
  • Related Work
  • Spatial Searching and Caching
  • Architecture Prototype
  • Results
  • Conclusion

32
Applications
  • Tested on various applications
  • PCI interface of prototype slow

Weyrich et al. 04
Adams et al. 07
33
Results kNN
75 MHz
2200 MHz
1200 MHz
CUDA x4
ASIC estimate, 500 MHz x6.6
Number of queries
CUDA w/o sort x4.0
CPU x1.5
CUDA x2.4
CUDA w/o sort x3.1
CPU x1.4
CUDA x1.6
FPGA x1
CPU x1.1
FPGA x1
FPGA x1
Number of Neighbors
34
Results kNN
75 MHz
  • Small hardware footprint
  • FPGA slightly slower
  • Realistic clock frequency
  • ? Prototype faster than CPU/GPU

2200 MHz
1200 MHz
CUDA x4
ASIC estimate, 500 MHz x6.6
Number of queries
CUDA w/o sort x4.0
CPU x1.5
CUDA x2.4
CUDA w/o sort x3.1
CPU x1.4
CUDA x1.6
FPGA x1
CPU x1.1
FPGA x1
FPGA x1
Number of Neighbors
35
Results MLS
75 MHz
2200 MHz
1200 MHz
Number of queries
MLS CUDA x3.8
  • kNN bottleneck
  • FPGA
  • GPU

FPGA x1
  • FPGA faster than CPU

MLS CPU x0.4
Number of Neighbors
36
Coherent Neighbor Cache
CPU, e0.1
Number of queries
FPGA, e0.1
FPGA, exact
Level of coherence
37
Results Approximation Error (MLS projection)
MLS Error
e approximation
no approx.
38
Results Approximation Error (MLS projection)
Cache hits
Cache Hits
e approximation
39
Approximation Error (visual)
40
Approximation Error (visual)
  • Coherent Neighbor Cache
  • Not optimal for exact queries
  • Approximate queries
  • Can be tolerated in most cases
  • Greatly increases performance
  • Even for small approximations

41
Outline
  • Related Work
  • Spatial Searching and Caching
  • Architecture Prototype
  • Results
  • Conclusion

42
Conclusion
  • Novel hardware architecture for
  • Nearest-neighbor searches
  • Generic meshless processing operators
  • Cache exploiting spatial coherence
  • Good performance considering resources
  • Possible GPU integration

43
Future Work
  • Programmable data structure
  • Support different data structures
  • Programmability in data structure
  • Construction on-chip
  • Real programmability in point processing module

44
A Hardware Processing Unit For Point Sets
  • S. Heinzle, G. Guennebaud,M. Botsch, M. Gross
  • Graphics Hardware 2008
Write a Comment
User Comments (0)
About PowerShow.com