Computer Vision on the GPU - PowerPoint PPT Presentation

1 / 45

About This Presentation

Title:

Computer Vision on the GPU

Description:

Computer Vision on the GPU – PowerPoint PPT presentation

Number of Views:447

Avg rating:4.0/5.0

Slides: 46

Provided by: ssi3

Category:

more less

Transcript and Presenter's Notes

Title: Computer Vision on the GPU

1
Computer Vision on the GPU

Feb 5, 2007
Sudipta N. Sinha

2
Overview

Introduction
A Simple Example
Vision algorithms on GPU
Feature Extraction, Tracking
Stereo
Similarity measure / matching cost
Optic flow
Image Registration
Project Ideas

3
Introduction

Image Processing / Low-level Computer Vision
Data Parallel (multiple pixels can be processed
in parallel)
2D/3D grids form computational domains
Linear Algebra, Vector Operations common.
Classical GPGPU Concepts
Fragment Programs Computational
Kernel
Textures Storage
(Arrays)
Render to Texture Multiple
iterations /
(multiple render passes) multi-stage
algorithms

4
A Simple Example Removing Radial Distortion

5

Radial Distortion Model / Parameters

6
Radial Distortion Model / Parameters

Parametric Model D ( xc , yc , K1 , K2 , K3 )

r 2 (x xc )2 (y yc )2 L(r) 1 K1 r
K2 r 2 K3 r 3
for i 1 to nrows for j 1 to ncols
(x , y ) distort ( i , j ) // radial
distortion model B ( i , j ) A ( x ,
y ) // bilinear interpolation end end
7
GPU Implementation 1

1. Upload target image (a texture of size w x h)
2. Bind fragment program (see below)
3. Render a screen-aligned quad of size w x h
4. Readback the rendered quad (undistorted)

Undistort.cg
float4 undistort ( float2 texCoord TEXCOORD0,
uniform samplerRECT image) COLOR float2 pos
texCoord - float2( xc , yc ) pos pos
// evaluate the distortion func return
texRECT ( image , pos )
8
GPU Implementation 2

Upload target image (a texture of size w x h)
Upload pre-computed lookup table as 2nd texture
Create fragment program (see below)
Render, Readback

Undistort2.cg
float4 undistort ( float2 texCoord TEXCOORD0,
uniform samplerRECT image, uniform
samplerRECT LUTable ) COLOR float2 pos pos
texCoord texRECT( LUTable , texCoord ) return
texRECT ( image , pos )
9
Typical Image Processing pipeline using GPGPU

3 successive computations mapped to render
passes. Each step implemented as a separate
fragment program

Undistort.cg
RGB2Gray.cg
Threshold.cg
10
A few things to remember

Scatter vs. Gather
Concurrent Memory Read / Write not allowed
Ping-pong rendering
GPU -gt CPU readback (PCI-e)
Key Issue
Computer Vision applications are often a pipeline
of algorithms/routines. The key challenge is to
port the whole application to the GPU, not just a
few computationally expensive routines.
May need to use CPU for some steps.

11
Convolution, Building Gaussian Scale Space
Stack

12
Convolution

NVidia OpenGL/Cg Demo
http//developer.nvidia.com/object/convolution_fi
lters.html
13
Convolution
Gaussian Scale Space stack or volume is obtained
by repeated Gaussian Convolution of an image
with a Gaussian with a certain ? Useful
for multi-scale image matching, segmentation etc.

14
Feature Tracking

15
Lukas Kanade Tracker (KLT)
Shi and Tomasi CVPR94, Tomasi and Kanade91,
Birchfield96

Main Idea Assuming brightness constancy, try to
find the new positions of some salient image
points in the second image (where the motion is
small)
Steps
Detect Salient Points to track (in current frame)
Track those features in next frame
Could be done by Searching (Template matching)
BUT
KLT does gradient descent optimization and finds
the
motion vector by solving a linear system.

16
GPU-KLT

17
GPU-KLT
cs.unc.edu/ssinha/Research/GPU_KLT

18
GPU-KLT CPU vs GPU Timings

GPU-KLT tracks 1000 features in real-time at
30 Hz on 1024 768 resolution
15 - 20X improvement over the CPU
Can be extended to deal with change in
brightness ( gain offset )

19
GPU-KLT on various graphics hardware

20
Feature Extraction

21
Harris Corner Detector

Idea
Detect a patch which looks locally unique.
Shifting the patch in any direction will give a
large change in intensity.
Texture-less region
no change in all directions
Edge
no change along one direction.
Corner large changes in all direction.

22

Harris Corner Detector
Eigen-value analysisof the 2x2 matrix M
23
SIFT Scale Invariant Feature Transform
Lowe IJCV 2004

Goal
Find feature vectors with
invariant properties
Detect locations in Image Scale
Space which are invariant to
translation, scaling, rotation and small
distortions.

Match in n-D feature space
Detect Keypoints
Local Description
24
SIFT Scale Invariant Feature Transform
Lowe IJCV 2004

(1) Selecting Interest Point Locations
Build Gaussian Scale Space.
Detect local extrema of Difference of
Gaussian on Scale Space

25
SIFT Scale Invariant Feature Transform
Lowe IJCV 2004

For each Interest Point, using local gradient
vectors
(2) Compute a local
coordinate frame
(3) Compute a weighted
orientation histogram
(128 dimensional
SIFT descriptor
vector).

26
GPU-SIFT

Issues for GPGPU implementation
Speed-up Scale Space Construction ( I, ?I, DOG )
Fast Seperable Gaussian Convolution
variable sigma.
Avoid Read-back of large buffers.
Computing Weighted Orientation Histogram
difficult on GPU.
Exploit texture mapping hardware for
fast-bilinear interpolation
Split computation between CPU and GPU

27
GPU-SIFT

28
(1) Scale Space Construction

I1
I2
I3

R Intensity
G Gradient.X
B Gradient.Y
A DoG

2D Gaussian Convolution is Separable.
Ping Pong Between 2 Surfaces of pbuffer
1 Fragment Program computes all four values

29
(1) Scale Space Construction per iteration

30
(2) Finding DoG Extrema

Need to compare 26 neighbors.
Set glBlendEquation(GL_MAX)
Render Quad Six /-1 shifted Quads
with Blending Enabled.
Render Maxima Depth Buffer
Render Image again with DepthTest set to EQUAL

31
(3) Readback Interest Points (x, y, scale)

Sparse Bitmap must be readback to CPU.
Fragment Shader encodes 32 bits into 8-bit RGBA
Readback RGBA, decode Color to recover bitmap.
8X speedup.

32
GPU_SIFT

33
Stereo

34
Stereo

Determine Camera Calibration
Pixel Backprojected Ray
Compute Dense Pixel
Correspondence
u u
3. Triangulate to obtain 3D Point X
Ill-posed Problem, require regularization
(priors)

35
Matching Cost / Similarity Measure

image I(x,y)
image I(x,y)
Disparity map D(x,y)
(x,y)(xD(x,y),y)
36
Planesweeping Stereo

Hypothesize a plane
Project images from all cameras onto it
Measure dissimilarity as seen from a reference
view
Repeat for a family of planes
Per pixel, record plane with least dissimilarity

near
far
37
Summing values over a window

Ruigang and Marc used a nice trick they used
the texture MIPMAPPING hardware to do the
summation over 2D windows ( powers of 2 square
window)
Another trick is to average 4 values in a 2 x 2
block
Do a texture lookup at the center (here)
(3) For larger windows which are not
2n x 2n , need to use other tricks like
- partial sum of columns and then rows
- TEXTURE REDUCE (log M render passes)

38
Computing Histograms on the GPU

Image Histograms (eg. RGB histogram)
(used in histogram equalization,
image-based retrieval etc.)
Method1 Use Occlusion Queries
N-bin histogram needs N passes
Expensive for higher dimensions.
Some recent work but maybe still worth exploring
into

39
Fusion of Multi-View Silhouette Cues Using a
Space Occupancy GridJean-Sébastien Franco,
Edmond Boyer ICCV05

Unreliable silhouettes do not make decision
about their image location
Sensor fusion use all image information
simultaneously
Account for silhouette and CCD sensor uncertainty
Use the occupancy grid framework, which has 2
associated models
sensor model
probabilistic representation of space an grid of
voxels that store the probability of object
occupancy

40
Bayesian formulation
Fusion of Multi-View Silhouette Cues Using a
Space Occupancy GridJean-Sébastien Franco,
Edmond Boyer ICCV05

Idea we wish to find the content of the scene
from images, as a probability grid
Modeling the forward problem - explaining image
observations given the grid state - is easy. It
can be accounted for in a sensor model.
Bayesian inference enables the formulation of our
initial inverse problem from the sensor model
Simplification for tractability independent
analysis and processing of voxels

??
41
Modeling
Fusion of Multi-View Silhouette Cues Using a
Space Occupancy GridJean-Sébastien Franco,
Edmond Boyer ICCV05
Grid
Sensor model

I color information in images
B background color model
F silhouette detection variable (0 or 1) hidden
OX occupancy at voxel X (0 or 1)

Inference
42

Fusion of Multi-View Silhouette Cues Using a
Space Occupancy GridJean-Sébastien Franco,
Edmond Boyer ICCV05

43
Project ideas

Fast Bilateral Filtering / Anisotropic diffusion
Background Segmentation in Video
Multi-scale Optic Flow
Optimization on the GPU
Graph Cuts, Level Sets, Snakes.
Implement SIFT Feature Extraction
- Widely used by Vision Community
(my GPU-SIFT code developed at Siemens SCR,
Princeton)
Jan Michael Frahm and Philippos Mordohai have
ideas
feel free to discuss with them.

44
A few points

Fast bilinear interpolation in hardware.
- non-programmable FLOPS.
Fixed pipeline graphics pipeline can be used to
efficiently render images from a viewpoint.
Visibility Inference in Multiple View algorithms
can use Z-buffer hardware present on GPUs.
Other useful features
floating point blending
Alpha blending, Depth Test, Occlusion Queries.
Classical GPGPU vs. CUDA based implementations.

45
References