Parallel Image Registration using - PowerPoint PPT Presentation

1 / 34

About This Presentation

Title:

Parallel Image Registration using

Description:

Data Collection. Step-Stare Pattern. GREEN. RED. BLUE. NIR. Multi Spectral Imagery (MSI) Data ... Reed, I. S., and Yu, X. 1990. ... – PowerPoint PPT presentation

Number of Views:114

Avg rating:3.0/5.0

Slides: 35

Provided by: ds4b

Category:

more less

Transcript and Presenter's Notes

Title: Parallel Image Registration using

1
Parallel Image Registration using Graphical
Processing Unit
CS 387 Semester Project Srinivasa Shivakar
Vulli Singiresu Dheeraj
2
Problem Statement
3
Image Registration

Image Registration, also called as Image
Mosaicing or Image Stitching, involves the
estimation of transformation from one image
coordinate system to another image coordinate
system
Transformation elements could include any/all of
the following
Translation
Rotation
Scale
Shear
Warp

4
Data Description
5
Data Collection
Step-Stare Pattern
6
Multi Spectral Imagery (MSI) Data
BLUE
NIR
GREEN
RED
7
Medium Wave Infrared (MWIR) Data
Each image is called a frame
8
Data files
Each file, called a Segment, consists of 21
contiguous frames
9
Motivation
10
Need to move to a real time based system

Existing code base is in Matlab
Registering a segment takes around 3-5 minutes
Done as post processing long after the data is
collected
Need to speed up registration process for
onboard data processing
Data is collected at the rate of 2 segments
( 1 IR and 1 MSI) per 3 sec
Real time requirement is therefore 7 frames per
sec

11
Registration Procedure
12
Phase correlation

Used frequently for estimating the translation
between two images
Advantages
Operation performed in frequency domain,
potentially faster than spatial processing
Provides an approximate translation between
image coordinates
Reduces the effort needed for other estimations
(rotation, scaling, feature tracking)
Disadvantages
Not very robust, sensitive to low frequency
noise
bad results when signal-to-noise ratio is low
(plain featureless backgrounds)

13
Phase correlation
14
Feature selection
RX
Select features from ROI
15
Feature Tracking

Selected features are tracked in second frame.

Sum of Absolute Differences (SAD) block matching
algorithm is used.

Most computationally expensive task.

16
Feature Tracking
17
Refining Locations

Bad pairs should be removed to avoid
mis-registration.
Euclidian distance measure is used to weed out
bad pairs.

18
Refining Locations
19
Affine Transform

Affine transform is calculated from the refined
location pairs as

Takes into account translation, rotation, scale
and shear
Does not take into account warping not handled
in the current project
Singular Value Decomposition (SVD) is used to
calculate pseudo inverse

20
Serial Code

Serial version of code is written using the best
available tools for image processing on CPU.
Intel Performance Primitives (IPP) is used for
low level routines.
Provides most of the image processing functions
such as FFT, Convolution, Matrix Multiplication,
Matrix Conjugate Multiplication, Element-wise
division.
OpenCV is used for high level functions like GUI
display and window handling.
The serial code was tested on a Pentium
processor with 4Gb RAM.

21
Parallel Code

Parallel version of the code is written in
C/CUDA.
OpenGL is used for GUI and window handling.
The code was tested on Nvidia GTX 280, which has
240 stream processors and 1Gb of DDR3 memory.

22
Graphics Processing Unit (GPU)

Graphics Processing Unit (GPU) is a dedicated
graphics rendering device used to offload work
from the CPU.
Traditionally, GPU hardware had vertex shaders,
to process 3D geometry and pixel shaders to
handle scene lighting and color toning.
A modern GPU has as many as 240 stream
processors, which can be either used as pixel or
vertex shaders, thus increasing the hardware
utilization.

23
Compute Unified Device Architecture (CUDA)

CUDA is parallel programming model and software
environment designed to use GPUs for general
purpose computing.
Stream processors on the GPU can be programmed
in C using the APIs provided.
GPUs having G80 or newer core architecture can
be programmed using CUDA.

24
Task Partitioning

Not all tasks can be parallelized.
Tasks done on CPU
Refining of correlation pairs
Affine Transformation
Tasks done on GPU
Phase correlation
Feature selection
Feature tracking
Image transformation

25
Data Partitioning

Each thread handles one pixel.
Each multiprocessor is allotted 64 threads (8
threads/proc)
Number of thread blocks widthheight/64
5120 thread blocks

26
Libraries Used

CUFFT, CUDA FFT
Powers of 2 not required.
Faster when size of input matrix is a power of a
single factor
CUDPP, CUDA Data Parallel Primitives
has routines for parallel sorting and parallel
reduction
used to calculate image mean

27
Sample Code
textureltfloat, 2, cudaReadModeElementTypegt tex
tex.addressMode0 cudaAddressModeWrap tex.addr
essMode1 cudaAddressModeWrap tex.filterMode
cudaFilterModeLinear tex.normalized true
// access with normalized texture coordinates
28
Sample Code
dim3 dimBlock(8,16, 1) dim3 dimGrid(fwidth /
dimBlock.x, fheight / dimBlock.y,
1) CUDA_SAFE_CALL(cudaMemcpyToArray(cuarray2,0,0
, img2, widthheightsizeof(float),cudaMemcpyDevic
eToDevice)) registrltltlt dimGrid, dimBlock, 0 gtgtgt
(final,affMatD,min(dl.x,ul.x),min(dl.y,dr.y),fx,w
idth,height)
29
Sample Code
__global__ void registr( float final, float
affMatD, int xshift, int yshift, int fx, int
width, int height) unsigned int x
blockIdx.xblockDim.x threadIdx.x xshift
unsigned int y blockIdx.yblockDim.y
threadIdx.y yshift float
ux1/(float)(width) float uy1/(float)(height
) ux(affMatD1yaffMatD13xaffMatD16
1)ux uy(affMatD0yaffMatD03xaffMa
tD061)uy if(uxlt1 ux gt0 uylt1
uygt0) finaly(fxwidth)xtex2D(tex2,
ux, uy)
30
Results
31
Comparison
32
Conclusions

We presented a parallel implementations of image
registration and anomaly detection tools using
GPU
Parallel code on GPU was significantly faster
than C/IPP for all the applications
Current implementation of registration tool is
not robust as existing Matlab code
Need to survey new data structures and
algorithms for faster processing on GPUs

33
Thank you
Questions
34
References

Elsen, E., Houston, M., Vishal, V., Darve, E.,
Hanrahan, P., and Pande, V. 2006. N-Body
simulation on GPUs. In Proceedings of the 2006
ACM/IEEE Conference on Supercomputing (Tampa,
Florida, November 11 - 17, 2006). SC '06. ACM,
New York, NY, 188.
Griesser, A. Aug. 2005. Real-time, GPU-based
foreground-background segmentation. Tech. Rep.
BIWI-TR-269. Computer Vision Lab, ETH Zurich.
Govindaraju, N. K., Lloyd, B., Dotsenko, Y.,
Smith, B., and Manferdelli, J. 2008. High
performance discrete Fourier transforms on
graphics processors. In Proceedings of the 2008
ACM/IEEE Conference on Supercomputing (Austin,
Texas, November 15 - 21, 2008). Conference on
High Performance Networking and Computing. IEEE
Press, Piscataway, NJ, 1-12.
Garcia, V., Debreuve, E., and Barlaud, M. Apr.
2008. Fast k nearest neighbor search using gpu.
Online. Available http//arxiv.org/abs/0804.144
8
Govindaraju, N. K., Lloyd, B., Wang, W., Lin, M.,
Manocha, D. 2004. Fast computation of database
operations using graphics processors. In
Proceedings of the 2004 ACM SIGMOD International
Conference on Management of Data, 215-226.
Zitova, B., and Flusser, J. 2003. Image
registration methods A survey. Image and Vision
computing, Vol. 21, 977-1000.
Crosby, F. 2007. Adaptive correlation analysis
with non-overlapping imagery indication.
Photogrammetric Engineering and Remote Sensing,
Vol. 73, No. 9, 1-7.
Reed, I. S., and Yu, X. 1990. Adaptive multiband
CFAR detection of an optical pattern with unknown
spectral distribution. IEEE Transactions on
Acoustics, Speech and Signal Processing, Vol. 38.
Chandola, V., Banerjee, A., and Kumar, V. 2009.
Anomaly detection A survey. To appear in ACM
Computing Surveys. Available http//www-users.cs.
umn.edu/kumar/papers/anomaly-survey.php
CUDA Programming Guide v2.0, Online. Available
http//developer.download.nvidia.com/compute/cuda/
2_0/docs/NVIDIA_CUDA_Programming_Guide_2.0.pdf