Computer Vision on the GPU - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Computer Vision on the GPU

Description:

Computer Vision on the GPU – PowerPoint PPT presentation

Number of Views:447
Avg rating:4.0/5.0
Slides: 46
Provided by: ssi3
Category:
Tags: gpu | computer | kop | vision

less

Transcript and Presenter's Notes

Title: Computer Vision on the GPU


1
Computer Vision on the GPU
  • Feb 5, 2007
  • Sudipta N. Sinha

2
Overview
  • Introduction
  • A Simple Example
  • Vision algorithms on GPU
  • Feature Extraction, Tracking
  • Stereo
  • Similarity measure / matching cost
  • Optic flow
  • Image Registration
  • Project Ideas

3
Introduction
  • Image Processing / Low-level Computer Vision
  • Data Parallel (multiple pixels can be processed
    in parallel)
  • 2D/3D grids form computational domains
  • Linear Algebra, Vector Operations common.
  • Classical GPGPU Concepts
  • Fragment Programs Computational
    Kernel
  • Textures Storage
    (Arrays)
  • Render to Texture Multiple
    iterations /
  • (multiple render passes) multi-stage
    algorithms

4
A Simple Example Removing Radial Distortion

5

Radial Distortion Model / Parameters

6
Radial Distortion Model / Parameters
  • Parametric Model D ( xc , yc , K1 , K2 , K3 )

r 2 (x xc )2 (y yc )2 L(r) 1 K1 r
K2 r 2 K3 r 3
for i 1 to nrows for j 1 to ncols
(x , y ) distort ( i , j ) // radial
distortion model B ( i , j ) A ( x ,
y ) // bilinear interpolation end end
7
GPU Implementation 1
  • 1. Upload target image (a texture of size w x h)
  • 2. Bind fragment program (see below)
  • 3. Render a screen-aligned quad of size w x h
  • 4. Readback the rendered quad (undistorted)

Undistort.cg
float4 undistort ( float2 texCoord TEXCOORD0,
uniform samplerRECT image) COLOR float2 pos
texCoord - float2( xc , yc ) pos pos
// evaluate the distortion func return
texRECT ( image , pos )
8
GPU Implementation 2
  • Upload target image (a texture of size w x h)
  • Upload pre-computed lookup table as 2nd texture
  • Create fragment program (see below)
  • Render, Readback

Undistort2.cg
float4 undistort ( float2 texCoord TEXCOORD0,
uniform samplerRECT image, uniform
samplerRECT LUTable ) COLOR float2 pos pos
texCoord texRECT( LUTable , texCoord ) return
texRECT ( image , pos )
9
Typical Image Processing pipeline using GPGPU
  • 3 successive computations mapped to render
    passes. Each step implemented as a separate
    fragment program

Undistort.cg
RGB2Gray.cg
Threshold.cg
10
A few things to remember
  • Scatter vs. Gather
  • Concurrent Memory Read / Write not allowed
  • Ping-pong rendering
  • GPU -gt CPU readback (PCI-e)
  • Key Issue
  • Computer Vision applications are often a pipeline
    of algorithms/routines. The key challenge is to
    port the whole application to the GPU, not just a
    few computationally expensive routines.
  • May need to use CPU for some steps.

11
Convolution, Building Gaussian Scale Space
Stack

12
Convolution

NVidia OpenGL/Cg Demo
http//developer.nvidia.com/object/convolution_fi
lters.html
13
Convolution
Gaussian Scale Space stack or volume is obtained
by repeated Gaussian Convolution of an image
with a Gaussian with a certain ? Useful
for multi-scale image matching, segmentation etc.

14
Feature Tracking

15
Lukas Kanade Tracker (KLT)
Shi and Tomasi CVPR94, Tomasi and Kanade91,
Birchfield96
  • Main Idea Assuming brightness constancy, try to
    find the new positions of some salient image
    points in the second image (where the motion is
    small)
  • Steps
  • Detect Salient Points to track (in current frame)
  • Track those features in next frame
  • Could be done by Searching (Template matching)
    BUT
  • KLT does gradient descent optimization and finds
    the
  • motion vector by solving a linear system.

16
GPU-KLT

17
GPU-KLT
cs.unc.edu/ssinha/Research/GPU_KLT

18
GPU-KLT CPU vs GPU Timings
  • GPU-KLT tracks 1000 features in real-time at
    30 Hz on 1024 768 resolution
  • 15 - 20X improvement over the CPU
  • Can be extended to deal with change in
    brightness ( gain offset )

19
GPU-KLT on various graphics hardware

20
Feature Extraction

21
Harris Corner Detector
  • Idea
  • Detect a patch which looks locally unique.
  • Shifting the patch in any direction will give a
    large change in intensity.
  • Texture-less region
  • no change in all directions
  • Edge
  • no change along one direction.
  • Corner large changes in all direction.

22

Harris Corner Detector
Eigen-value analysisof the 2x2 matrix M
23
SIFT Scale Invariant Feature Transform
Lowe IJCV 2004
  • Goal
  • Find feature vectors with
  • invariant properties
  • Detect locations in Image Scale
  • Space which are invariant to
  • translation, scaling, rotation and small
    distortions.

Match in n-D feature space
Detect Keypoints
Local Description
24
SIFT Scale Invariant Feature Transform
Lowe IJCV 2004
  • (1) Selecting Interest Point Locations
  • Build Gaussian Scale Space.
  • Detect local extrema of Difference of
  • Gaussian on Scale Space

25
SIFT Scale Invariant Feature Transform
Lowe IJCV 2004
  • For each Interest Point, using local gradient
    vectors
  • (2) Compute a local
  • coordinate frame
  • (3) Compute a weighted
  • orientation histogram
  • (128 dimensional
  • SIFT descriptor
  • vector).

26
GPU-SIFT
  • Issues for GPGPU implementation
  • Speed-up Scale Space Construction ( I, ?I, DOG )
  • Fast Seperable Gaussian Convolution
  • variable sigma.
  • Avoid Read-back of large buffers.
  • Computing Weighted Orientation Histogram
    difficult on GPU.
  • Exploit texture mapping hardware for
    fast-bilinear interpolation
  • Split computation between CPU and GPU

27
GPU-SIFT

28
(1) Scale Space Construction

I1
I2
I3
  • R Intensity
  • G Gradient.X
  • B Gradient.Y
  • A DoG
  • 2D Gaussian Convolution is Separable.
  • Ping Pong Between 2 Surfaces of pbuffer
  • 1 Fragment Program computes all four values

29
(1) Scale Space Construction per iteration

30
(2) Finding DoG Extrema
  • Need to compare 26 neighbors.
  • Set glBlendEquation(GL_MAX)
  • Render Quad Six /-1 shifted Quads
  • with Blending Enabled.
  • Render Maxima Depth Buffer
  • Render Image again with DepthTest set to EQUAL

31
(3) Readback Interest Points (x, y, scale)
  • Sparse Bitmap must be readback to CPU.
  • Fragment Shader encodes 32 bits into 8-bit RGBA
  • Readback RGBA, decode Color to recover bitmap.
  • 8X speedup.

32
GPU_SIFT

33
Stereo

34
Stereo
  • Determine Camera Calibration
  • Pixel Backprojected Ray
  • Compute Dense Pixel
  • Correspondence
  • u u
  • 3. Triangulate to obtain 3D Point X
  • Ill-posed Problem, require regularization
    (priors)

35
Matching Cost / Similarity Measure

image I(x,y)
image I(x,y)
Disparity map D(x,y)
(x,y)(xD(x,y),y)
36
Planesweeping Stereo
  • Hypothesize a plane
  • Project images from all cameras onto it
  • Measure dissimilarity as seen from a reference
    view
  • Repeat for a family of planes
  • Per pixel, record plane with least dissimilarity

near
far
37
Summing values over a window
  • Ruigang and Marc used a nice trick they used
    the texture MIPMAPPING hardware to do the
    summation over 2D windows ( powers of 2 square
    window)
  • Another trick is to average 4 values in a 2 x 2
    block
  • Do a texture lookup at the center (here)
  • (3) For larger windows which are not
  • 2n x 2n , need to use other tricks like
  • - partial sum of columns and then rows
  • - TEXTURE REDUCE (log M render passes)

38
Computing Histograms on the GPU
  • Image Histograms (eg. RGB histogram)
  • (used in histogram equalization,
  • image-based retrieval etc.)
  • Method1 Use Occlusion Queries
  • N-bin histogram needs N passes
  • Expensive for higher dimensions.
  • Some recent work but maybe still worth exploring
    into

39
Fusion of Multi-View Silhouette Cues Using a
Space Occupancy GridJean-Sébastien Franco,
Edmond Boyer ICCV05
  • Unreliable silhouettes do not make decision
    about their image location
  • Sensor fusion use all image information
    simultaneously
  • Account for silhouette and CCD sensor uncertainty
  • Use the occupancy grid framework, which has 2
    associated models
  • sensor model
  • probabilistic representation of space an grid of
    voxels that store the probability of object
    occupancy

40
Bayesian formulation
Fusion of Multi-View Silhouette Cues Using a
Space Occupancy GridJean-Sébastien Franco,
Edmond Boyer ICCV05
  • Idea we wish to find the content of the scene
    from images, as a probability grid
  • Modeling the forward problem - explaining image
    observations given the grid state - is easy. It
    can be accounted for in a sensor model.
  • Bayesian inference enables the formulation of our
    initial inverse problem from the sensor model
  • Simplification for tractability independent
    analysis and processing of voxels

??
41
Modeling
Fusion of Multi-View Silhouette Cues Using a
Space Occupancy GridJean-Sébastien Franco,
Edmond Boyer ICCV05
Grid
Sensor model
  • I color information in images
  • B background color model
  • F silhouette detection variable (0 or 1) hidden
  • OX occupancy at voxel X (0 or 1)

Inference
42

Fusion of Multi-View Silhouette Cues Using a
Space Occupancy GridJean-Sébastien Franco,
Edmond Boyer ICCV05

43
Project ideas
  • Fast Bilateral Filtering / Anisotropic diffusion
  • Background Segmentation in Video
  • Multi-scale Optic Flow
  • Optimization on the GPU
  • Graph Cuts, Level Sets, Snakes.
  • Implement SIFT Feature Extraction
  • - Widely used by Vision Community
  • (my GPU-SIFT code developed at Siemens SCR,
    Princeton)
  • Jan Michael Frahm and Philippos Mordohai have
    ideas
  • feel free to discuss with them.

44
A few points
  • Fast bilinear interpolation in hardware.
  • - non-programmable FLOPS.
  • Fixed pipeline graphics pipeline can be used to
    efficiently render images from a viewpoint.
  • Visibility Inference in Multiple View algorithms
    can use Z-buffer hardware present on GPUs.
  • Other useful features
  • floating point blending
  • Alpha blending, Depth Test, Occlusion Queries.
  • Classical GPGPU vs. CUDA based implementations.

45
References
  • www.gpgpu.org/vision
  • GPU Gems2 Chapter 40
  • OpenVidia www.eyetap.org
  • GPU-Based Video Feature Tracking and Matching,
    tech report.
Write a Comment
User Comments (0)
About PowerShow.com