Title: Computer Vision on the GPU
1Computer Vision on the GPU
- Feb 5, 2007
- Sudipta N. Sinha
2Overview
- Introduction
- A Simple Example
- Vision algorithms on GPU
- Feature Extraction, Tracking
- Stereo
- Similarity measure / matching cost
- Optic flow
- Image Registration
- Project Ideas
3Introduction
- Image Processing / Low-level Computer Vision
- Data Parallel (multiple pixels can be processed
in parallel) - 2D/3D grids form computational domains
- Linear Algebra, Vector Operations common.
- Classical GPGPU Concepts
- Fragment Programs Computational
Kernel - Textures Storage
(Arrays) - Render to Texture Multiple
iterations / - (multiple render passes) multi-stage
algorithms
4A Simple Example Removing Radial Distortion
5 Radial Distortion Model / Parameters
6Radial Distortion Model / Parameters
- Parametric Model D ( xc , yc , K1 , K2 , K3 )
r 2 (x xc )2 (y yc )2 L(r) 1 K1 r
K2 r 2 K3 r 3
for i 1 to nrows for j 1 to ncols
(x , y ) distort ( i , j ) // radial
distortion model B ( i , j ) A ( x ,
y ) // bilinear interpolation end end
7GPU Implementation 1
- 1. Upload target image (a texture of size w x h)
- 2. Bind fragment program (see below)
- 3. Render a screen-aligned quad of size w x h
- 4. Readback the rendered quad (undistorted)
Undistort.cg
float4 undistort ( float2 texCoord TEXCOORD0,
uniform samplerRECT image) COLOR float2 pos
texCoord - float2( xc , yc ) pos pos
// evaluate the distortion func return
texRECT ( image , pos )
8GPU Implementation 2
- Upload target image (a texture of size w x h)
- Upload pre-computed lookup table as 2nd texture
- Create fragment program (see below)
- Render, Readback
Undistort2.cg
float4 undistort ( float2 texCoord TEXCOORD0,
uniform samplerRECT image, uniform
samplerRECT LUTable ) COLOR float2 pos pos
texCoord texRECT( LUTable , texCoord ) return
texRECT ( image , pos )
9Typical Image Processing pipeline using GPGPU
- 3 successive computations mapped to render
passes. Each step implemented as a separate
fragment program
Undistort.cg
RGB2Gray.cg
Threshold.cg
10A few things to remember
- Scatter vs. Gather
- Concurrent Memory Read / Write not allowed
- Ping-pong rendering
- GPU -gt CPU readback (PCI-e)
- Key Issue
- Computer Vision applications are often a pipeline
of algorithms/routines. The key challenge is to
port the whole application to the GPU, not just a
few computationally expensive routines. - May need to use CPU for some steps.
11Convolution, Building Gaussian Scale Space
Stack
12Convolution
NVidia OpenGL/Cg Demo
http//developer.nvidia.com/object/convolution_fi
lters.html
13Convolution
Gaussian Scale Space stack or volume is obtained
by repeated Gaussian Convolution of an image
with a Gaussian with a certain ? Useful
for multi-scale image matching, segmentation etc.
14Feature Tracking
15Lukas Kanade Tracker (KLT)
Shi and Tomasi CVPR94, Tomasi and Kanade91,
Birchfield96
- Main Idea Assuming brightness constancy, try to
find the new positions of some salient image
points in the second image (where the motion is
small) - Steps
- Detect Salient Points to track (in current frame)
- Track those features in next frame
- Could be done by Searching (Template matching)
BUT - KLT does gradient descent optimization and finds
the - motion vector by solving a linear system.
16GPU-KLT
17GPU-KLT
cs.unc.edu/ssinha/Research/GPU_KLT
18GPU-KLT CPU vs GPU Timings
- GPU-KLT tracks 1000 features in real-time at
30 Hz on 1024 768 resolution -
- 15 - 20X improvement over the CPU
- Can be extended to deal with change in
brightness ( gain offset )
19GPU-KLT on various graphics hardware
20Feature Extraction
21Harris Corner Detector
- Idea
- Detect a patch which looks locally unique.
- Shifting the patch in any direction will give a
large change in intensity. - Texture-less region
- no change in all directions
- Edge
- no change along one direction.
-
- Corner large changes in all direction.
22 Harris Corner Detector
Eigen-value analysisof the 2x2 matrix M
23SIFT Scale Invariant Feature Transform
Lowe IJCV 2004
- Goal
- Find feature vectors with
- invariant properties
- Detect locations in Image Scale
- Space which are invariant to
- translation, scaling, rotation and small
distortions.
Match in n-D feature space
Detect Keypoints
Local Description
24SIFT Scale Invariant Feature Transform
Lowe IJCV 2004
- (1) Selecting Interest Point Locations
- Build Gaussian Scale Space.
- Detect local extrema of Difference of
- Gaussian on Scale Space
25SIFT Scale Invariant Feature Transform
Lowe IJCV 2004
- For each Interest Point, using local gradient
vectors - (2) Compute a local
- coordinate frame
- (3) Compute a weighted
- orientation histogram
- (128 dimensional
- SIFT descriptor
- vector).
26GPU-SIFT
- Issues for GPGPU implementation
- Speed-up Scale Space Construction ( I, ?I, DOG )
- Fast Seperable Gaussian Convolution
- variable sigma.
- Avoid Read-back of large buffers.
- Computing Weighted Orientation Histogram
difficult on GPU. - Exploit texture mapping hardware for
fast-bilinear interpolation - Split computation between CPU and GPU
27GPU-SIFT
28(1) Scale Space Construction
I1
I2
I3
- R Intensity
- G Gradient.X
- B Gradient.Y
- A DoG
- 2D Gaussian Convolution is Separable.
- Ping Pong Between 2 Surfaces of pbuffer
- 1 Fragment Program computes all four values
29(1) Scale Space Construction per iteration
30(2) Finding DoG Extrema
- Need to compare 26 neighbors.
- Set glBlendEquation(GL_MAX)
- Render Quad Six /-1 shifted Quads
- with Blending Enabled.
- Render Maxima Depth Buffer
- Render Image again with DepthTest set to EQUAL
31(3) Readback Interest Points (x, y, scale)
- Sparse Bitmap must be readback to CPU.
- Fragment Shader encodes 32 bits into 8-bit RGBA
- Readback RGBA, decode Color to recover bitmap.
- 8X speedup.
32GPU_SIFT
33Stereo
34Stereo
- Determine Camera Calibration
- Pixel Backprojected Ray
-
- Compute Dense Pixel
- Correspondence
- u u
- 3. Triangulate to obtain 3D Point X
- Ill-posed Problem, require regularization
(priors)
35Matching Cost / Similarity Measure
image I(x,y)
image I(x,y)
Disparity map D(x,y)
(x,y)(xD(x,y),y)
36Planesweeping Stereo
- Hypothesize a plane
- Project images from all cameras onto it
- Measure dissimilarity as seen from a reference
view - Repeat for a family of planes
- Per pixel, record plane with least dissimilarity
near
far
37Summing values over a window
- Ruigang and Marc used a nice trick they used
the texture MIPMAPPING hardware to do the
summation over 2D windows ( powers of 2 square
window) - Another trick is to average 4 values in a 2 x 2
block - Do a texture lookup at the center (here)
- (3) For larger windows which are not
- 2n x 2n , need to use other tricks like
- - partial sum of columns and then rows
- - TEXTURE REDUCE (log M render passes)
38Computing Histograms on the GPU
- Image Histograms (eg. RGB histogram)
- (used in histogram equalization,
- image-based retrieval etc.)
- Method1 Use Occlusion Queries
- N-bin histogram needs N passes
- Expensive for higher dimensions.
-
- Some recent work but maybe still worth exploring
into
39Fusion of Multi-View Silhouette Cues Using a
Space Occupancy GridJean-Sébastien Franco,
Edmond Boyer ICCV05
- Unreliable silhouettes do not make decision
about their image location - Sensor fusion use all image information
simultaneously - Account for silhouette and CCD sensor uncertainty
- Use the occupancy grid framework, which has 2
associated models - sensor model
- probabilistic representation of space an grid of
voxels that store the probability of object
occupancy
40Bayesian formulation
Fusion of Multi-View Silhouette Cues Using a
Space Occupancy GridJean-Sébastien Franco,
Edmond Boyer ICCV05
- Idea we wish to find the content of the scene
from images, as a probability grid - Modeling the forward problem - explaining image
observations given the grid state - is easy. It
can be accounted for in a sensor model. - Bayesian inference enables the formulation of our
initial inverse problem from the sensor model - Simplification for tractability independent
analysis and processing of voxels
??
41Modeling
Fusion of Multi-View Silhouette Cues Using a
Space Occupancy GridJean-Sébastien Franco,
Edmond Boyer ICCV05
Grid
Sensor model
- I color information in images
- B background color model
- F silhouette detection variable (0 or 1) hidden
- OX occupancy at voxel X (0 or 1)
Inference
42 Fusion of Multi-View Silhouette Cues Using a
Space Occupancy GridJean-Sébastien Franco,
Edmond Boyer ICCV05
43Project ideas
- Fast Bilateral Filtering / Anisotropic diffusion
- Background Segmentation in Video
- Multi-scale Optic Flow
- Optimization on the GPU
- Graph Cuts, Level Sets, Snakes.
- Implement SIFT Feature Extraction
- - Widely used by Vision Community
- (my GPU-SIFT code developed at Siemens SCR,
Princeton) - Jan Michael Frahm and Philippos Mordohai have
ideas - feel free to discuss with them.
44A few points
- Fast bilinear interpolation in hardware.
- - non-programmable FLOPS.
- Fixed pipeline graphics pipeline can be used to
efficiently render images from a viewpoint. - Visibility Inference in Multiple View algorithms
can use Z-buffer hardware present on GPUs. - Other useful features
- floating point blending
- Alpha blending, Depth Test, Occlusion Queries.
- Classical GPGPU vs. CUDA based implementations.
45References
- www.gpgpu.org/vision
- GPU Gems2 Chapter 40
- OpenVidia www.eyetap.org
- GPU-Based Video Feature Tracking and Matching,
tech report.