Title: GPU Computational Geometry
1GPU Computational Geometry
- By Shawn Brown - April 3rd, 2007, CS790-058
2Overview
- Introduction to Computational Geometry
- 3 Papers in the area
3Computational Geometry
- Where am I? How do I get there?
- mapping
- Where is the closest post office?
- Nearest neighbor search
- Find all the movie theaters in a 10 mile square.
- Range queries
- Geometric Problems
- Think of problem solution in geometric terms
- Data structures algorithms follow from this
approach
4CG Application Areas
- Computer Graphics
- Robotics (motion planning)
- Geographic Information Systems (mapping)
- CAD/CAM (design, manufacturing)
- Molecular Modeling
- Pattern Recognition
- Databases (queries)
- AI (Path finding)
- Etc
5Some broad themes
- Geometric Reasoning
- Vertices, lines, Polygons, Half-planes, Simplexs,
arrangements, connectedness, graph theory, etc. - Normal CS Data Structures algorithms
- Applied in geometric context
- Backwards Analysis
- Look at algorithm in reverse order to make proofs
- At current step (final step), how did I get here?
- Randomization techniques
- Randomly pick next object to work on from set
- Robustness Degeneracy's
- Will algorithm work correctly under numerical
accuracy constraints - Will algorithm work correctly for co-incident,
co-linear, co-planer, redundant data, etc.
6CG Data Structures Algorithms
- Convex hulls
- Polygon Triangulation
- Line segment intersection
- Linear Programming
- Minimum enclosing region (Disc, Sphere, box)
- Range Searching
- KD-Trees, Range Trees, Partition Trees, Simplex
Trees, Cutting trees, etc. - Point Location
- Trapezoidal Maps
7More data structures Algorithms
- Voronoi Diagrams
- Delaunay Triangulation (dual of Voronoi)
- Arrangements and Duality
- Windowing (Rectangle query)
- Binary Space Partitions (BSPs)
- Minkowski Sums (Motion Planning)
- Quad Trees
- Visibility Graphs (shortest path)
8GPU Limitations
- Fixed size memory
- Upper bound on amount of data handled
- Works best on stand-a-lone objects
- Each object handled has very few dependencies on
neighbors - Works best on memory efficient data
- Cache coherent memory access
- Coalesce memory accesses
- Regular grids better than irregular meshes
- Neighbor dependencies as predictable patterns
- Works best on multiple objects in parallel
- Data Structures algorithms need to support
- Works poorly on algorithms with dependencies on
previous steps - Avoid comparisons between objects and levels
- Works best on algorithms with high arithmetical
intensity - High cost of I/O vs. compute power
9GPU Solutions Data Structures Algorithms
- Data represented on regular grids (texture maps)
- Data access patterns are regular and predictable
- Data has few dependencies
- Each object is independent of its neighbors
- Any dependencies are read only, predictable,
cache coherent - Dependencies across multiple iterations are
regular, predictable, and cache coherent - Low bandwidth I/O
- Lots of compute operations per I/O operation
10GPU Vs. CPU
- Good Fits for GPU
- Voronoi Diagrams, Distance Fields
- Poor Fits for GPU
- Binary Searches, Tree searches (KDTrees, etc.)
- Cant parallize (next compare dependent on
results of previous compare) - Unpredictable Cache incoherent access patterns
across multiple data objects - Traditional Sorting
- Bitonic sort is exception
- Reductions (from n objects to single answer)
113 Research Papers
- Generic Mesh Refinement on GPU, by Tamy
Boubekeur and Christophe Schlick, 2005 - Dynamic LOD on GPU by Junfeng Ji, Enhua Wu,
Sheng Li, and Xuehiu Liu, 2005 - Isosurface Computation Made Simple Hardware
Acceleration, Adaptive Refinement and Tetrahedral
Stripping by Valerio Pascucci, Joint
Eurographics - IEEE TVCG Symposium on
Visualization (VisSym), 2004, p. 293-300.
121st Paper
- Generic Mesh Refinement on GPU by Tamy
Boubekeur and Christophe Schlick, Proceedings of
SIGGRAPH /Eurographics Graphics Hardware, 2005,
ACM Press
13Mesh Refinement - Intro
- Geometry Mesh Refinement
- Displacement Mapping
- Subdivision Surfaces
- Refinement Typically done on CPU
- GPU Pipeline optimized for rendering millions of
triangles from vertex lists - But lack of support for geometry generation on
GPU - Goal How to do Mesh Refinement on GPU
14Displacement mapping
- A texture (height map) is used to displace
underlying geometry. - Displacement done in direction of local surface
normal. - Re-tessellation of original polygons into
micro-polygons - Example Pixars REYES on Renderman
from Wikipedia.com
15SUBDIVISION
- The limit of an infinite refinement process
- Start with an initial polyhedral mesh, G0(V0,
E0, F0) - Subdivide via a set of rules, Gn1 Subdivide(
Gn ) - Repeat subdivision step until refined polyhedral
mesh approximates desired smooth surface. - Algorithm (One Refinement step)
- New Edge Vertices (by weighting rules)
- Remesh each original face (new edges, new faces)
- Perturb original vertices (by weighting rules)
16Loop SubvisionNew Vertex WEIGHTING RULEs
Edge Mask Interior Edge
Edge Mask Border Edge
17LOOP SUBDIVISIONREMESH
Remesh New Edges, New Faces
Create New Edge Vertices
18LOOP SUBVISIONPerturb Original VerteX RULES
Vertex Mask Ordinary Valance
Vertex Mask Extra-ordinary Valance
19Loop SUBDIVISIONRefinement
Gn Current Mesh
Create New Edges And Remesh
Gn1 Subdivided Mesh
Perturb Original Vertices
20Previous Schemes
- Traditional subdivision schemes (Loop) require
dynamic adjacency information to implement. - Adjacency information is cache coherent in at
most one direction (vertical or horizontal) for
both reads and writes - Works best on CPU
- Works poorly on GPU
- lack of cache coherency
- Hard to parrellize
21GPU LIMITATIONS
- Entire mesh must fit in GPU memory
- LOD rendering means n copies of different size
meshes must be stored in memory - Dynamic Meshes must be updated on each frame by
CPU - Conclusion Use/update coarse meshes on CPU,
generate refined meshes on GPU to desired LOD.
22JUSTIFICATION
- Main Reason Overcome Bandwidth Bottleneck
- CPU approach
- Load coarse mesh on CPU (thousands of polygons)
- Optionally load height map (for displacement
mapping) - Generate refined mesh on CPU (millions of
polygons) - Transfer refined mesh to GPU (high bandwidth)
- Render refined mesh on GPU
- GPU approach
- Load coarse mesh on CPU (thousands of polygons)
- transfer coarse mesh to GPU (low bandwidth)
- Optional transfer height map (for displacement
mapping) - Generate refined mesh on GPU (millions of
polygons) - Render refined mesh on GPU
- Secondary Reason Offload work load from CPU
onto GPU
23Proposed SOLUTION
- Generic Refinement Pattern (RP - template)
- Store RP as vertex buffer on GPU
- Use coarse triangle T as input to vertex shader
- Update and Draw virtual triangles of RP from
attributes of input Triangle T
24Algorithm
- Render( Mesh M)
- For each coarse triangle T in M do
- Place triangle attributes TA as inputs to vertex
shader - Draw parameterized RP template instead of T
25MORE Details
- Need to map virtual vertices of pattern onto
actual attributes (ltx,y,zgt, ltu,vgt, etc.) of
triangle T - Store virtual coordinates of pattern vertices V
as barycentric triple (u,v,w). - Vwuv w,u,v with w 1-u-v
- Given P0, P1, P2 as actual positions of T
- Vpos V.w P0 V.u P1 V.v P2
- Other triangle attributes (u,v, colors, etc.) can
be generated in a similar manner from virtual
vertices
26GPU Displacement MAPPING
- Given coarse triangle T with attributes TA
- Position, texture coords, normals,etc.
- ltP0,P1,P2, u0,u1,u2, v0,v1,v2, N0,N1,N2gt
- For each vertex V in RP template
- Interpolate position Pv x,y,z from P0,P1,P2
- Interpolate texture values Huv u,v
- Interpolate normal values Nv nx,ny,nz
- Use texture coords (Huv) to get value h in
height map - Compute Displaced Position
- Dv Pv hNv
27Procedural DISPLACEMENT Mapping
- Texture Map access in Vertex Shader can be slow
(especially if accesses are not coherent). - Use a parameter driven function instead which can
be quickly computed in Vertex Shader
DP(asin(fP)N)
28LEVEL of DETAIL (LOD)
- Store a set of larger and larger refinement
patterns on GPU RP0, RP1,, RPn - Use LOD techniques to pick appropriate LOD
pattern for refinement and rendering
29LIMITATIONS TO APPROACH
- No true subdivision scheme support
- No geometric continuity guarantees across shared
edges of coarse triangles - LOD Scheme is not adaptive and exhibits popping
artifacts
30Curved PN Triangles
- Purely local interpolating refinement scheme
- Fast mesh smoothing
- Provides visual smoothness
- Despite lack of geometric continuity across edges
- Generate Triangle normal's using linear or
quadratic interpolation (enhanced triangle
definition) - Offers results similar to Modified Butterfly
subdivision scheme
31PERFORMANCE
- Environment
- P4 3.0 Ghz
- Nvidia Quadro FX 4400 PCIe
- MS Windows XP
- Running on OpenGL
Conclusion Frame rates are equivalent,
Vertices on bus greatly reduced, CPU freed up
to work on other tasks than refinement.
32CONCLUSIONS
- Simple Vertex Shader Method for low cost
tessellation of meshes on GPU - At cost of linear interpolation of 3 original
triangle attributes for each virtual triangle
attribute in pattern - Generic and Economic PN-Triangle implementation
on GPU - Reduced bandwidth on graphics bus
- Low level constant amount transferred regardless
of target refinement (use larger templates for
more refined results) - CPU freed up
- to work on other tasks than refinement
332nd Paper
- Dynamic LOD on GPUby Junfeng Ji, Enhua Wu, Sheng
Li, and Xuehui Liu, Proceedings of Computer
Graphics International (CGI), 2005, IEEE Computer
Society Press.
34Introduction
- Modern Datasets are getting to large to visualize
at interactive rates - Level of Detail (LOD) methods are used to greatly
reduce the amount of geometry that needs to be
visualized - Because of complexity, LOD methods are
traditionally performed on the CPU - This paper proposes a GPU LOD technique using
shaders
35PRIOR WORK
- Irregular Meshes
- Progressive Meshes, H. Hoppe, 1996
- Hierarchical Dynamic Simplification, D. Luebke,
1997 - Regular Meshes
- Multi-resolution Analysis of Arbitrary Meshes,
Eck et al., 1995 - Digital Elevation Models (DEMs) LOD Quad Trees,
Lindstrom 1996 Parojala 1998 - Geometry Image Meshes, Gu Hoppe et al., 2002
- Extended to poly cube maps by Tarini et al, 2004.
- Point Techniques
- Qsplat, Rusinkiewicz, 2000
36Progressive Meshes
ecol(vs ,vt , vs )
vt
vl
vl
vr
vr
vs
vs
vspl(vs ,vl ,vr ,vs ,vt ,)
37Hierarchical DYNamic SIMPLIFIcATION
- Entire object represented as single vertex tree
- Start at base level
- Collapse group of vertices into parent
representative vertex (proxy) - Render at appropriate LOD by traversing to level
of tree based on current viewing parameters
38Geometry Image Meshes
CUT
PARAMETERIZE
REGULAR GRID
SAMPLE
GEOMETRY IMAGE RGB XYZ
RENDER
39Poly-CUBE MAPS
- GIMs have complex distorted parameterizations
- Approximate geometry by polycube map
- Project Geometry onto PolyCube
- Store each face of polycube in texture atlas
TEXTURE ATLAS
40GOAL GPU LOD Geometry
- Perform LOD geometry selection dynamically on GPU
- GPU limitations push us towards a regular
representation of geometry - For max efficiency, data structure must support
parallel algorithms.
41Proposed Solution
- Use Geometry Image Mesh (GIM) as underlying data
structure. - Regular structure (texture map) works very well
on GPU. - Use Polycube texture atlas for complex objects
- Add LOD support via a modified Quad Tree data
structure called P-QuadTree.
42OVERVIEW of APPROACH
- Creation
- LOD Atlas Texture
- Rendering
- Select appropriate LOD level
- Render on GPU
43CREATION
- Generate GIM Atlas from 3D model
- Generate LOD atlas from GIM
- Generate additional texture maps
- Normal Map
- LOD metrics
- Index map (parent lookup)
44CREATE GIM ATLAS
- Generate Polycube from geometry object using
semi-automatic technique from Tarini et al. - Cut cube faces along edges to get individual
textures - Pack face textures into square or rectangular
texture Sample texture atlas on regular grid - Create GIM from projected samples
45CREATE LOD QUADTREE ATLAS
- For each chart, Texture must be (2m1)(2m1)
- Pad Texture with null samples
- Construct QuadTree top down using GPU Kernel
- Each node represents 3x3 of vertices
- Uses Restricted QuadTree triangulation
- Stack all levels of LOD quadtree in LOD Atlas
- Can be done in rectangle with ratio 11.5
46RESTRICTED QUADTREE TRIANGULATION
- Avoid problems with cracks at T-intersections
- Compute error at each node
- Parent error always greater than children
- Constrain difference in error between neighboring
vertices to never be greater than one - Check 2 nephews as well (cost of 2 texture
lookups)
47LOD NODES
- Each node represents 3x3 vertices and 8 triangles
- Easily rendered as triangle fan
- Bounding sphere around 9 vertices
- Not much information in paper on how they compute
normals or normal cone
48CUTTING AND PACKING
CUTTING
CUTTING
PACKING
PACKING
RECTANGULAR CHARTS
SQUARE CHARTS
49GIM ATLAS LOD ATLAS
504 Texture maps required
- Geometry Map (GIM) (x,y,z) on regular grid
- Center position of node
- LOD Parameter map
- Error (used for LOD selection)
- Normal cone (used for back face culling)
- bounding sphere radius (used for backface
culling) - Normal Map (N.x,N.y,N.z)
- Normal at center position of node
- Index Map
- Parent node lookup
51RENDERING
- Pass 1
- LOD Selection (GPU Kernel)
- Pass 2
- Node Culling and Triangulation (GPU Kernel)
- Rasterization
- Pass triangles to normal render pipeline
52LOD Selection (GPU Kernel)
- Parameters
- Viewing frustrum, Viewing cone
- Pass in CPU LOD error threshold from viewpoint
- LOD Atlas textures
- 1-1 mapping (fragments processed to texels in LOD
atlas) - Algorithm
- 1. Kill invalid nodes (padded or empty pixels)
- 2. LOD threshold tests
- Threshold test parent LOD, if passes discard
current node - Threshold test current node, keep if passes
- 3. Culling tests
- Normal Normal Cone vs. View Cone
- Bounding Sphere vs. Viewing frustrum
- Output
- Bitmap (true/false) of LOD fragments
53CULLING Triangulation (GPU Kernel)
- Cull node (false in LOD bitmap)
- Retrieve 3x3 vertices for each valid node using
vertex texturing - Cull invalid vertices (false in LOD bitmap) by
moving them to infinity. - Keep valid vertices (true in LOD bitmap)
- T-Intersection tests
- Check 4 edge vertices (1,3,5,7) for possible T
cracks - check 2 adjacent nephew connections for each edge
- Cant actually delete vertices from triangle fan
- One position (active) actual edge vertex
position - 2nd position (inactive) move to corresponding
corner vertex (disappears) - Output Triangle Fan from Vertex Shader
54Rendering PIPELINE
LOD Bitmap
LOD Threshold Kernel
Normal Map Atlas
LOD QuadTree Atlas
LOD Mesh Kernel (Cull Triangulate)
Rasterize Triangles
55RESULTS
56PERFORMANCE
- Environment
- VC
- Windows 2000
- OpenGL extensions
- CPU 2.8GHz Pentium 4
- 2G DRAM
- NVIDIA GeforceFX 5950
- 256 MB of DDR RAM
- texturing not available on our GPU,
- this step is estimated in our test.
- GPU approach about 10x faster than CPU approach
57LIMITATIONS
- Minor discontinuity artifacts sometimes visible
- No speedup in LOD algorithm itself for small
distant objects vs. full size objects O(1.5xN) - all nodes (texels) visited in both GPU kernels
- Speedup win is in rasterization (reduced tris)
VS.
All nodes visited
Subset of nodes visited
58CONCLUSIONS
- Proof of this LOD technique on GPU
- Robust
- Efficient (10x performance over CPU)
- Dynamic LOD
- Offloads work from CPU
- Future work
- Room for more complex operations in shader
- Adaptive Tessellation for Radiosity lighting
593rd Paper
- Isosurface Computation Made Simple Hardware
Acceleration, Adaptive Refinement and Tetrahedral
Stripping by Valerio Pascucci, Joint
Eurographics - IEEE TVCG Symposium on
Visualization (VisSym), 2004, p. 293-300.
60Introduction
- Use a GPU to speed up generation of Iso-Surfaces
from a 3D volume set for interactive exploration
of the data volume - Spatially subdivide 3D volume with a 3D
tetrahedral volume filling 3D curve - Find all tetra-hedrons containing desired
iso-value and interpolate a quad (or tri)
approximating iso-surface in that tetrahedron - Complete set of generated quads (tris) forms
iso-surface corresponding to iso-value. - Uses nested errors scheme to form consistent
meshes
61ISO-CONTOURS ISO-SURFACES
- Iso-contour - All points in a 2D data set with
the same function value - Iso-Surface all points in a 3D data volume with
the same function value
Elevation Maps
Medical Imaging Scientific Visualization
622D Iso-Contours Spatial SUBDIVISION
TRIANGULATION
- Data set covered with triangulation
- 2D scalar function value associated with each
vertex F(x,y) - Generate Iso-contours for a given Iso-Value
632D ISO-CONTOURS (INTERPOLATION)
- Find triangles with vertices that bracket desired
iso-value C(w). - Create line segments that approximate iso-contour
by interpolating values on triangle edges
F0
C(w)1.8
F3
F1.8
F2
642D Iso-COntours
- Collection of interpolating line segments form
final Iso-contour set for a given Iso-value.
653D ISo-SURFACES
F0
- Use tetrahedrons instead of triangles
- Iso-surface approximated by quads (or tris)
instead of line segments - Need to estimate normals for quads
C(w)2.5
F3
F2
F4
66PRIOR-WORK
- Iso-Surfaces
- Marching Cubes, Lorenson Cline, 1987
- Octree with min/max scalars, Wilheims Van
Gelder, 1992 - Span Spaces, Livnat et al, 1996, Shen et al, 1996
- Occlusion Optimization, Livnat Hansen, 1998
- Multi-Pass, Gao Shen, 2001
- Nested Errors
- Longest edge bisection rule, various, 1997-2001
- Saturated Errors, Gerstner Pajarola, 2000
67Marching CUBES, LORENSen, 1987
- Cubes formed from 4 pixels neighborhood of 2
image slices - Identify cubes containing iso-value
- Create surface interpolation according to 14
templates - Normals computed from approximate gradient in
local neighborhood
68Saturated ERRORS, Gerstner et al, 2000
- Extraction Topology preserving Iso-surface
extraction from multi-resolution cubes - Uses tetrahedrons to gurantee piecewise linear
connected components - Automatically generates lookup table of all
possible valid topologies of cube - Identifies critical points (genus changing
topology) - Simplification reduces size of mesh in topology
preserving manner - Sorts critical points in importance
- User defined threshold value eliminate lower
threshold critical points from topology first
69Building Blocks
- Generate quads from tetrahedrons
- Compute Normals
- Render quads
- View Dependent Refinement
- Tetrahedral Strips
- 3D Space Filling Curve
- Author says he uses 4 GPU Kernels to accomplish
this technique but in my opinion, he doesnt
explain it well, so Im not quite sure which
building blocks are on GPU and which are on CPU
and how the overall flow of the program works
70Quad GENERATION
- Generate one Quad per tetrahedron
- Interpolate one vertex along each edge
- Mark invalid if outside range of 2 defining
vertices - 3 types
- Empty tetrahedron (4 invalid co-incident
vertices) - Triangle (1 invalid 2 co-incident vertices)
- Quad (4 valid vertices)
- lookup tables for efficient interpolation
generation
71Compute Normals
Orientation (1 determinant)
where
Normal (3 determinants)
where
Note F can be stored in vertex.w coordinate for
efficiency
72Quad Rendering
- Draw quads in OpenGL directly
- Rely on OpenGL to solve problems
- Throw away invalid quads (4 co-incident vertices)
- Reduce quad to triangle (2 co-incident vertices)
- Use computed normals for shading
73View Dependent Refinement
- Adaptively refine a tetrahedral mesh
- Bi-sect the longest edge of tetrahedron creating
2 new tetrahedrons - Similar to Octree
- Given a cube divided into 6 tetrahedrons
- 3 sub-divisions of tetrahedrons gives you a new
smaller grid of 8 cubes (1 level of octree
subdivision) - Cube subdivision can be done via a simple rule
pattern without ever measuring lengths
74View Dependent REFINEMENT, cont
75View DEPENDENT REFINEMENT, III
- Each split point is actually at center of several
tetrahedrons which form a shape called a diamond. - Each tetrahedron can be associated with the
vertex that caused its split into a diamond
shape - The hierarchy of refined tetrahedrals forms a
binary tree - The hierarchy of diamonds is more complicated and
takes the form of a directed acyclic graph (DAG). - Starting from a uniform grid guarantees a
predictable pattern to the size and shape of
tetrahedrons - IE no need to store auxillary info, it can be
calculated based on the level of the subdivision)
76View DEPENDENT REFINEMENT - Algorithm
- Refine Mesh( tetra, level, tier )
- v Bisect longest edge, by tier pattern (0,1,2)
- If (level max level) or satisfies_tolerance(
v, level ) - Draw_ISO_Surface( tetra )
- Cull Mesh
- Bisection point outside min, max bounding
distances - Bounding sphere of diamond outside view frustrum
- Recursively refine mesh, by tier pattern (0,1,2)
- RefineMesh( left tetra, level 1, (tier) 3))
- RefineMesh( right tetra, level1, (tier) 3 )
77Satisfies_TOLERANCE( V, level )
- projects the error of vertex v onto the current
view plane from the closest point of bounding
sphere of diamond - View plane can be computed from level
- Size of bounding sphere can also be computed from
level - Returns true if projected error is smaller than a
given threshold tolerance (global variable) - Written to guarantee that if any diamond is
included in the current mesh then all its
parents are also included. - Therefore the adaptive mesh will have no cracks
78Results of Adaptive Refinement
Adaptive
Non-Adaptive
79Tetrahedral Strips (Streaming)
- Transferring all the vertices of base level
tetrahedrons from CPU to GPU consumes a lot of
bandwidth - Use tetrahedral strips similar to triangular
strips in 2D to reduce vertex bandwidth - Any 2 adjacent tetrahedrons have 3 vertices in
common (meaning only 1 new vertex needs to be
transferred). - Use adjacency graph info to build strips
- Results in a 60 decrease in vertex bandwidth
803D Space Filling curve
- The Author recommends using a new 3D space
filling based on sierpinksis curve adapted for
tetrahedrons that fills 1/6 of a cube. 6 such
curves fill a 3D cube. - Author provides no details other than some
pictures and a link to another paper
81Performance Results
800 Mhz Pentium CPU 800 RAM Main Memory Linux
Operating System
82CONTRIBUTIONS
- Simple technique for generating Iso-Surfaces
presented from a 3D volume data set - Using tetrahedrons OpenGL quads
- Allows interactive rendering of 512x512x512 data
sets - Adaptive Refinement
- based on viewing direction
- Uses longest edge bi-section
- Uses nested errors scheme to avoid cracks in
meshes - Tetrahedral Strips
- Optimizes bandwidth from CPU to GPU
- 3D space filling curve