GPU Computational Geometry

About This Presentation

Title:

GPU Computational Geometry

Description:

Think of problem & solution in geometric terms ... Extra-ordinary Valance. Gn = Current Mesh. Gn 1 = Subdivided Mesh. Create New Edges. And Remesh ... – PowerPoint PPT presentation

Number of Views:285

Avg rating:3.0/5.0

Slides: 83

Provided by: Shawn73

Category:

more less

Transcript and Presenter's Notes

Title: GPU Computational Geometry

1
GPU Computational Geometry

By Shawn Brown - April 3rd, 2007, CS790-058

2
Overview

Introduction to Computational Geometry
3 Papers in the area

3
Computational Geometry

Where am I? How do I get there?
mapping
Where is the closest post office?
Nearest neighbor search
Find all the movie theaters in a 10 mile square.
Range queries
Geometric Problems
Think of problem solution in geometric terms
Data structures algorithms follow from this
approach

4
CG Application Areas

Computer Graphics
Robotics (motion planning)
Geographic Information Systems (mapping)
CAD/CAM (design, manufacturing)
Molecular Modeling
Pattern Recognition
Databases (queries)
AI (Path finding)
Etc

5
Some broad themes

Geometric Reasoning
Vertices, lines, Polygons, Half-planes, Simplexs,
arrangements, connectedness, graph theory, etc.
Normal CS Data Structures algorithms
Applied in geometric context
Backwards Analysis
Look at algorithm in reverse order to make proofs
At current step (final step), how did I get here?
Randomization techniques
Randomly pick next object to work on from set
Robustness Degeneracy's
Will algorithm work correctly under numerical
accuracy constraints
Will algorithm work correctly for co-incident,
co-linear, co-planer, redundant data, etc.

6
CG Data Structures Algorithms

Convex hulls
Polygon Triangulation
Line segment intersection
Linear Programming
Minimum enclosing region (Disc, Sphere, box)
Range Searching
KD-Trees, Range Trees, Partition Trees, Simplex
Trees, Cutting trees, etc.
Point Location
Trapezoidal Maps

7
More data structures Algorithms

Voronoi Diagrams
Delaunay Triangulation (dual of Voronoi)
Arrangements and Duality
Windowing (Rectangle query)
Binary Space Partitions (BSPs)
Minkowski Sums (Motion Planning)
Quad Trees
Visibility Graphs (shortest path)

8
GPU Limitations

Fixed size memory
Upper bound on amount of data handled
Works best on stand-a-lone objects
Each object handled has very few dependencies on
neighbors
Works best on memory efficient data
Cache coherent memory access
Coalesce memory accesses
Regular grids better than irregular meshes
Neighbor dependencies as predictable patterns
Works best on multiple objects in parallel
Data Structures algorithms need to support
Works poorly on algorithms with dependencies on
previous steps
Avoid comparisons between objects and levels
Works best on algorithms with high arithmetical
intensity
High cost of I/O vs. compute power

9
GPU Solutions Data Structures Algorithms

Data represented on regular grids (texture maps)
Data access patterns are regular and predictable
Data has few dependencies
Each object is independent of its neighbors
Any dependencies are read only, predictable,
cache coherent
Dependencies across multiple iterations are
regular, predictable, and cache coherent
Low bandwidth I/O
Lots of compute operations per I/O operation

10
GPU Vs. CPU

Good Fits for GPU
Voronoi Diagrams, Distance Fields
Poor Fits for GPU
Binary Searches, Tree searches (KDTrees, etc.)
Cant parallize (next compare dependent on
results of previous compare)
Unpredictable Cache incoherent access patterns
across multiple data objects
Traditional Sorting
Bitonic sort is exception
Reductions (from n objects to single answer)

11
3 Research Papers

Generic Mesh Refinement on GPU, by Tamy
Boubekeur and Christophe Schlick, 2005
Dynamic LOD on GPU by Junfeng Ji, Enhua Wu,
Sheng Li, and Xuehiu Liu, 2005
Isosurface Computation Made Simple Hardware
Acceleration, Adaptive Refinement and Tetrahedral
Stripping by Valerio Pascucci, Joint
Eurographics - IEEE TVCG Symposium on
Visualization (VisSym), 2004, p. 293-300.

12
1st Paper

Generic Mesh Refinement on GPU by Tamy
Boubekeur and Christophe Schlick, Proceedings of
SIGGRAPH /Eurographics Graphics Hardware, 2005,
ACM Press

13
Mesh Refinement - Intro

Geometry Mesh Refinement
Displacement Mapping
Subdivision Surfaces
Refinement Typically done on CPU
GPU Pipeline optimized for rendering millions of
triangles from vertex lists
But lack of support for geometry generation on
GPU
Goal How to do Mesh Refinement on GPU

14
Displacement mapping

A texture (height map) is used to displace
underlying geometry.
Displacement done in direction of local surface
normal.
Re-tessellation of original polygons into
micro-polygons
Example Pixars REYES on Renderman

from Wikipedia.com
15
SUBDIVISION

The limit of an infinite refinement process
Start with an initial polyhedral mesh, G0(V0,
E0, F0)
Subdivide via a set of rules, Gn1 Subdivide(
Gn )
Repeat subdivision step until refined polyhedral
mesh approximates desired smooth surface.
Algorithm (One Refinement step)
New Edge Vertices (by weighting rules)
Remesh each original face (new edges, new faces)
Perturb original vertices (by weighting rules)

16
Loop SubvisionNew Vertex WEIGHTING RULEs
Edge Mask Interior Edge
Edge Mask Border Edge
17
LOOP SUBDIVISIONREMESH
Remesh New Edges, New Faces
Create New Edge Vertices
18
LOOP SUBVISIONPerturb Original VerteX RULES
Vertex Mask Ordinary Valance
Vertex Mask Extra-ordinary Valance
19
Loop SUBDIVISIONRefinement
Gn Current Mesh
Create New Edges And Remesh
Gn1 Subdivided Mesh
Perturb Original Vertices
20
Previous Schemes

Traditional subdivision schemes (Loop) require
dynamic adjacency information to implement.
Adjacency information is cache coherent in at
most one direction (vertical or horizontal) for
both reads and writes
Works best on CPU
Works poorly on GPU
lack of cache coherency
Hard to parrellize

21
GPU LIMITATIONS

Entire mesh must fit in GPU memory
LOD rendering means n copies of different size
meshes must be stored in memory
Dynamic Meshes must be updated on each frame by
CPU
Conclusion Use/update coarse meshes on CPU,
generate refined meshes on GPU to desired LOD.

22
JUSTIFICATION

Main Reason Overcome Bandwidth Bottleneck
CPU approach
Load coarse mesh on CPU (thousands of polygons)
Optionally load height map (for displacement
mapping)
Generate refined mesh on CPU (millions of
polygons)
Transfer refined mesh to GPU (high bandwidth)
Render refined mesh on GPU
GPU approach
Load coarse mesh on CPU (thousands of polygons)
transfer coarse mesh to GPU (low bandwidth)
Optional transfer height map (for displacement
mapping)
Generate refined mesh on GPU (millions of
polygons)
Render refined mesh on GPU
Secondary Reason Offload work load from CPU
onto GPU

23
Proposed SOLUTION

Generic Refinement Pattern (RP - template)
Store RP as vertex buffer on GPU
Use coarse triangle T as input to vertex shader
Update and Draw virtual triangles of RP from
attributes of input Triangle T

24
Algorithm

Render( Mesh M)
For each coarse triangle T in M do
Place triangle attributes TA as inputs to vertex
shader
Draw parameterized RP template instead of T

25
MORE Details

Need to map virtual vertices of pattern onto
actual attributes (ltx,y,zgt, ltu,vgt, etc.) of
triangle T
Store virtual coordinates of pattern vertices V
as barycentric triple (u,v,w).
Vwuv w,u,v with w 1-u-v
Given P0, P1, P2 as actual positions of T
Vpos V.w P0 V.u P1 V.v P2
Other triangle attributes (u,v, colors, etc.) can
be generated in a similar manner from virtual
vertices

26
GPU Displacement MAPPING

Given coarse triangle T with attributes TA
Position, texture coords, normals,etc.
ltP0,P1,P2, u0,u1,u2, v0,v1,v2, N0,N1,N2gt
For each vertex V in RP template
Interpolate position Pv x,y,z from P0,P1,P2
Interpolate texture values Huv u,v
Interpolate normal values Nv nx,ny,nz
Use texture coords (Huv) to get value h in
height map
Compute Displaced Position
Dv Pv hNv

27
Procedural DISPLACEMENT Mapping

Texture Map access in Vertex Shader can be slow
(especially if accesses are not coherent).
Use a parameter driven function instead which can
be quickly computed in Vertex Shader

DP(asin(fP)N)
28
LEVEL of DETAIL (LOD)

Store a set of larger and larger refinement
patterns on GPU RP0, RP1,, RPn
Use LOD techniques to pick appropriate LOD
pattern for refinement and rendering

29
LIMITATIONS TO APPROACH

No true subdivision scheme support
No geometric continuity guarantees across shared
edges of coarse triangles
LOD Scheme is not adaptive and exhibits popping
artifacts

30
Curved PN Triangles

Purely local interpolating refinement scheme
Fast mesh smoothing
Provides visual smoothness
Despite lack of geometric continuity across edges
Generate Triangle normal's using linear or
quadratic interpolation (enhanced triangle
definition)
Offers results similar to Modified Butterfly
subdivision scheme

31
PERFORMANCE

Environment
P4 3.0 Ghz
Nvidia Quadro FX 4400 PCIe
MS Windows XP
Running on OpenGL

Conclusion Frame rates are equivalent,
Vertices on bus greatly reduced, CPU freed up
to work on other tasks than refinement.
32
CONCLUSIONS

Simple Vertex Shader Method for low cost
tessellation of meshes on GPU
At cost of linear interpolation of 3 original
triangle attributes for each virtual triangle
attribute in pattern
Generic and Economic PN-Triangle implementation
on GPU
Reduced bandwidth on graphics bus
Low level constant amount transferred regardless
of target refinement (use larger templates for
more refined results)
CPU freed up
to work on other tasks than refinement

33
2nd Paper

Dynamic LOD on GPUby Junfeng Ji, Enhua Wu, Sheng
Li, and Xuehui Liu, Proceedings of Computer
Graphics International (CGI), 2005, IEEE Computer
Society Press.

34
Introduction

Modern Datasets are getting to large to visualize
at interactive rates
Level of Detail (LOD) methods are used to greatly
reduce the amount of geometry that needs to be
visualized
Because of complexity, LOD methods are
traditionally performed on the CPU
This paper proposes a GPU LOD technique using
shaders

35
PRIOR WORK

Irregular Meshes
Progressive Meshes, H. Hoppe, 1996
Hierarchical Dynamic Simplification, D. Luebke,
1997
Regular Meshes
Multi-resolution Analysis of Arbitrary Meshes,
Eck et al., 1995
Digital Elevation Models (DEMs) LOD Quad Trees,
Lindstrom 1996 Parojala 1998
Geometry Image Meshes, Gu Hoppe et al., 2002
Extended to poly cube maps by Tarini et al, 2004.
Point Techniques
Qsplat, Rusinkiewicz, 2000

36
Progressive Meshes
ecol(vs ,vt , vs )

vt
vl
vl
vr
vr
vs

vs
vspl(vs ,vl ,vr ,vs ,vt ,)
37
Hierarchical DYNamic SIMPLIFIcATION

Entire object represented as single vertex tree
Start at base level
Collapse group of vertices into parent
representative vertex (proxy)
Render at appropriate LOD by traversing to level
of tree based on current viewing parameters

38
Geometry Image Meshes
CUT
PARAMETERIZE
REGULAR GRID
SAMPLE
GEOMETRY IMAGE RGB XYZ
RENDER
39
Poly-CUBE MAPS

GIMs have complex distorted parameterizations
Approximate geometry by polycube map
Project Geometry onto PolyCube
Store each face of polycube in texture atlas

TEXTURE ATLAS
40
GOAL GPU LOD Geometry

Perform LOD geometry selection dynamically on GPU
GPU limitations push us towards a regular
representation of geometry
For max efficiency, data structure must support
parallel algorithms.

41
Proposed Solution

Use Geometry Image Mesh (GIM) as underlying data
structure.
Regular structure (texture map) works very well
on GPU.
Use Polycube texture atlas for complex objects
Add LOD support via a modified Quad Tree data
structure called P-QuadTree.

42
OVERVIEW of APPROACH

Creation
LOD Atlas Texture
Rendering
Select appropriate LOD level
Render on GPU

43
CREATION

Generate GIM Atlas from 3D model
Generate LOD atlas from GIM
Generate additional texture maps
Normal Map
LOD metrics
Index map (parent lookup)

44
CREATE GIM ATLAS

Generate Polycube from geometry object using
semi-automatic technique from Tarini et al.
Cut cube faces along edges to get individual
textures
Pack face textures into square or rectangular
texture Sample texture atlas on regular grid
Create GIM from projected samples

45
CREATE LOD QUADTREE ATLAS

For each chart, Texture must be (2m1)(2m1)
Pad Texture with null samples
Construct QuadTree top down using GPU Kernel
Each node represents 3x3 of vertices
Uses Restricted QuadTree triangulation
Stack all levels of LOD quadtree in LOD Atlas
Can be done in rectangle with ratio 11.5

46
RESTRICTED QUADTREE TRIANGULATION

Avoid problems with cracks at T-intersections
Compute error at each node
Parent error always greater than children
Constrain difference in error between neighboring
vertices to never be greater than one
Check 2 nephews as well (cost of 2 texture
lookups)

47
LOD NODES

Each node represents 3x3 vertices and 8 triangles
Easily rendered as triangle fan
Bounding sphere around 9 vertices
Not much information in paper on how they compute
normals or normal cone

48
CUTTING AND PACKING
CUTTING
CUTTING
PACKING
PACKING
RECTANGULAR CHARTS
SQUARE CHARTS
49
GIM ATLAS LOD ATLAS
50
4 Texture maps required

Geometry Map (GIM) (x,y,z) on regular grid
Center position of node
LOD Parameter map
Error (used for LOD selection)
Normal cone (used for back face culling)
bounding sphere radius (used for backface
culling)
Normal Map (N.x,N.y,N.z)
Normal at center position of node
Index Map
Parent node lookup

51
RENDERING

Pass 1
LOD Selection (GPU Kernel)
Pass 2
Node Culling and Triangulation (GPU Kernel)
Rasterization
Pass triangles to normal render pipeline

52
LOD Selection (GPU Kernel)

Parameters
Viewing frustrum, Viewing cone
Pass in CPU LOD error threshold from viewpoint
LOD Atlas textures
1-1 mapping (fragments processed to texels in LOD
atlas)
Algorithm
1. Kill invalid nodes (padded or empty pixels)
2. LOD threshold tests
Threshold test parent LOD, if passes discard
current node
Threshold test current node, keep if passes
3. Culling tests
Normal Normal Cone vs. View Cone
Bounding Sphere vs. Viewing frustrum
Output
Bitmap (true/false) of LOD fragments

53
CULLING Triangulation (GPU Kernel)

Cull node (false in LOD bitmap)
Retrieve 3x3 vertices for each valid node using
vertex texturing
Cull invalid vertices (false in LOD bitmap) by
moving them to infinity.
Keep valid vertices (true in LOD bitmap)
T-Intersection tests
Check 4 edge vertices (1,3,5,7) for possible T
cracks
check 2 adjacent nephew connections for each edge
Cant actually delete vertices from triangle fan
One position (active) actual edge vertex
position
2nd position (inactive) move to corresponding
corner vertex (disappears)
Output Triangle Fan from Vertex Shader

54
Rendering PIPELINE
LOD Bitmap
LOD Threshold Kernel
Normal Map Atlas
LOD QuadTree Atlas
LOD Mesh Kernel (Cull Triangulate)
Rasterize Triangles
55
RESULTS
56
PERFORMANCE

Environment
VC
Windows 2000
OpenGL extensions
CPU 2.8GHz Pentium 4
2G DRAM
NVIDIA GeforceFX 5950
256 MB of DDR RAM
texturing not available on our GPU,
this step is estimated in our test.
GPU approach about 10x faster than CPU approach

57
LIMITATIONS

Minor discontinuity artifacts sometimes visible
No speedup in LOD algorithm itself for small
distant objects vs. full size objects O(1.5xN)
all nodes (texels) visited in both GPU kernels
Speedup win is in rasterization (reduced tris)

VS.
All nodes visited
Subset of nodes visited
58
CONCLUSIONS

Proof of this LOD technique on GPU
Robust
Efficient (10x performance over CPU)
Dynamic LOD
Offloads work from CPU
Future work
Room for more complex operations in shader
Adaptive Tessellation for Radiosity lighting

59
3rd Paper

Isosurface Computation Made Simple Hardware
Acceleration, Adaptive Refinement and Tetrahedral
Stripping by Valerio Pascucci, Joint
Eurographics - IEEE TVCG Symposium on
Visualization (VisSym), 2004, p. 293-300.

60
Introduction

Use a GPU to speed up generation of Iso-Surfaces
from a 3D volume set for interactive exploration
of the data volume
Spatially subdivide 3D volume with a 3D
tetrahedral volume filling 3D curve
Find all tetra-hedrons containing desired
iso-value and interpolate a quad (or tri)
approximating iso-surface in that tetrahedron
Complete set of generated quads (tris) forms
iso-surface corresponding to iso-value.
Uses nested errors scheme to form consistent
meshes

61
ISO-CONTOURS ISO-SURFACES

Iso-contour - All points in a 2D data set with
the same function value
Iso-Surface all points in a 3D data volume with
the same function value

Elevation Maps
Medical Imaging Scientific Visualization
62
2D Iso-Contours Spatial SUBDIVISION
TRIANGULATION

Data set covered with triangulation
2D scalar function value associated with each
vertex F(x,y)
Generate Iso-contours for a given Iso-Value

63
2D ISO-CONTOURS (INTERPOLATION)

Find triangles with vertices that bracket desired
iso-value C(w).
Create line segments that approximate iso-contour
by interpolating values on triangle edges

F0
C(w)1.8
F3
F1.8
F2
64
2D Iso-COntours

Collection of interpolating line segments form
final Iso-contour set for a given Iso-value.

65
3D ISo-SURFACES
F0

Use tetrahedrons instead of triangles
Iso-surface approximated by quads (or tris)
instead of line segments
Need to estimate normals for quads

C(w)2.5
F3
F2
F4
66
PRIOR-WORK

Iso-Surfaces
Marching Cubes, Lorenson Cline, 1987
Octree with min/max scalars, Wilheims Van
Gelder, 1992
Span Spaces, Livnat et al, 1996, Shen et al, 1996
Occlusion Optimization, Livnat Hansen, 1998
Multi-Pass, Gao Shen, 2001
Nested Errors
Longest edge bisection rule, various, 1997-2001
Saturated Errors, Gerstner Pajarola, 2000

67
Marching CUBES, LORENSen, 1987

Cubes formed from 4 pixels neighborhood of 2
image slices
Identify cubes containing iso-value
Create surface interpolation according to 14
templates
Normals computed from approximate gradient in
local neighborhood

68
Saturated ERRORS, Gerstner et al, 2000

Extraction Topology preserving Iso-surface
extraction from multi-resolution cubes
Uses tetrahedrons to gurantee piecewise linear
connected components
Automatically generates lookup table of all
possible valid topologies of cube
Identifies critical points (genus changing
topology)
Simplification reduces size of mesh in topology
preserving manner
Sorts critical points in importance
User defined threshold value eliminate lower
threshold critical points from topology first

69
Building Blocks

Generate quads from tetrahedrons
Compute Normals
Render quads
View Dependent Refinement
Tetrahedral Strips
3D Space Filling Curve
Author says he uses 4 GPU Kernels to accomplish
this technique but in my opinion, he doesnt
explain it well, so Im not quite sure which
building blocks are on GPU and which are on CPU
and how the overall flow of the program works

70
Quad GENERATION

Generate one Quad per tetrahedron
Interpolate one vertex along each edge
Mark invalid if outside range of 2 defining
vertices
3 types
Empty tetrahedron (4 invalid co-incident
vertices)
Triangle (1 invalid 2 co-incident vertices)
Quad (4 valid vertices)
lookup tables for efficient interpolation
generation

71
Compute Normals
Orientation (1 determinant)
where
Normal (3 determinants)
where
Note F can be stored in vertex.w coordinate for
efficiency
72
Quad Rendering

Draw quads in OpenGL directly
Rely on OpenGL to solve problems
Throw away invalid quads (4 co-incident vertices)
Reduce quad to triangle (2 co-incident vertices)
Use computed normals for shading

73
View Dependent Refinement

Adaptively refine a tetrahedral mesh
Bi-sect the longest edge of tetrahedron creating
2 new tetrahedrons
Similar to Octree
Given a cube divided into 6 tetrahedrons
3 sub-divisions of tetrahedrons gives you a new
smaller grid of 8 cubes (1 level of octree
subdivision)
Cube subdivision can be done via a simple rule
pattern without ever measuring lengths

74
View Dependent REFINEMENT, cont
75
View DEPENDENT REFINEMENT, III

Each split point is actually at center of several
tetrahedrons which form a shape called a diamond.
Each tetrahedron can be associated with the
vertex that caused its split into a diamond
shape
The hierarchy of refined tetrahedrals forms a
binary tree
The hierarchy of diamonds is more complicated and
takes the form of a directed acyclic graph (DAG).
Starting from a uniform grid guarantees a
predictable pattern to the size and shape of
tetrahedrons
IE no need to store auxillary info, it can be
calculated based on the level of the subdivision)

76
View DEPENDENT REFINEMENT - Algorithm

Refine Mesh( tetra, level, tier )
v Bisect longest edge, by tier pattern (0,1,2)
If (level max level) or satisfies_tolerance(
v, level )
Draw_ISO_Surface( tetra )
Cull Mesh
Bisection point outside min, max bounding
distances
Bounding sphere of diamond outside view frustrum
Recursively refine mesh, by tier pattern (0,1,2)
RefineMesh( left tetra, level 1, (tier) 3))
RefineMesh( right tetra, level1, (tier) 3 )

77
Satisfies_TOLERANCE( V, level )

projects the error of vertex v onto the current
view plane from the closest point of bounding
sphere of diamond
View plane can be computed from level
Size of bounding sphere can also be computed from
level
Returns true if projected error is smaller than a
given threshold tolerance (global variable)
Written to guarantee that if any diamond is
included in the current mesh then all its
parents are also included.
Therefore the adaptive mesh will have no cracks

78
Results of Adaptive Refinement
Adaptive
Non-Adaptive
79
Tetrahedral Strips (Streaming)

Transferring all the vertices of base level
tetrahedrons from CPU to GPU consumes a lot of
bandwidth
Use tetrahedral strips similar to triangular
strips in 2D to reduce vertex bandwidth
Any 2 adjacent tetrahedrons have 3 vertices in
common (meaning only 1 new vertex needs to be
transferred).
Use adjacency graph info to build strips
Results in a 60 decrease in vertex bandwidth

80
3D Space Filling curve

The Author recommends using a new 3D space
filling based on sierpinksis curve adapted for
tetrahedrons that fills 1/6 of a cube. 6 such
curves fill a 3D cube.
Author provides no details other than some
pictures and a link to another paper

81
Performance Results
800 Mhz Pentium CPU 800 RAM Main Memory Linux
Operating System
82
CONTRIBUTIONS

Simple technique for generating Iso-Surfaces
presented from a 3D volume data set
Using tetrahedrons OpenGL quads
Allows interactive rendering of 512x512x512 data
sets
Adaptive Refinement
based on viewing direction
Uses longest edge bi-section
Uses nested errors scheme to avoid cracks in
meshes
Tetrahedral Strips
Optimizes bandwidth from CPU to GPU
3D space filling curve