Parallel Image Registration using - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Parallel Image Registration using

Description:

Data Collection. Step-Stare Pattern. GREEN. RED. BLUE. NIR. Multi Spectral Imagery (MSI) Data ... Reed, I. S., and Yu, X. 1990. ... – PowerPoint PPT presentation

Number of Views:114
Avg rating:3.0/5.0
Slides: 35
Provided by: ds4b
Category:

less

Transcript and Presenter's Notes

Title: Parallel Image Registration using


1
Parallel Image Registration using Graphical
Processing Unit
CS 387 Semester Project Srinivasa Shivakar
Vulli Singiresu Dheeraj
2
Problem Statement
3
Image Registration
  • Image Registration, also called as Image
    Mosaicing or Image Stitching, involves the
    estimation of transformation from one image
    coordinate system to another image coordinate
    system
  • Transformation elements could include any/all of
    the following
  • Translation
  • Rotation
  • Scale
  • Shear
  • Warp

4
Data Description
5
Data Collection
Step-Stare Pattern
6
Multi Spectral Imagery (MSI) Data
BLUE
NIR
GREEN
RED
7
Medium Wave Infrared (MWIR) Data
Each image is called a frame
8
Data files
Each file, called a Segment, consists of 21
contiguous frames
9
Motivation
10
Need to move to a real time based system
  • Existing code base is in Matlab
  • Registering a segment takes around 3-5 minutes
  • Done as post processing long after the data is
    collected
  • Need to speed up registration process for
    onboard data processing
  • Data is collected at the rate of 2 segments
  • ( 1 IR and 1 MSI) per 3 sec
  • Real time requirement is therefore 7 frames per
    sec

11
Registration Procedure
12
Phase correlation
  • Used frequently for estimating the translation
    between two images
  • Advantages
  • Operation performed in frequency domain,
    potentially faster than spatial processing
  • Provides an approximate translation between
    image coordinates
  • Reduces the effort needed for other estimations
    (rotation, scaling, feature tracking)
  • Disadvantages
  • Not very robust, sensitive to low frequency
    noise
  • bad results when signal-to-noise ratio is low
    (plain featureless backgrounds)

13
Phase correlation
14
Feature selection
RX
Select features from ROI
15
Feature Tracking
  • Selected features are tracked in second frame.
  • Sum of Absolute Differences (SAD) block matching
    algorithm is used.
  • Most computationally expensive task.

16
Feature Tracking
17
Refining Locations
  • Bad pairs should be removed to avoid
    mis-registration.
  • Euclidian distance measure is used to weed out
    bad pairs.

18
Refining Locations
19
Affine Transform
  • Affine transform is calculated from the refined
    location pairs as
  • Takes into account translation, rotation, scale
    and shear
  • Does not take into account warping not handled
    in the current project
  • Singular Value Decomposition (SVD) is used to
    calculate pseudo inverse

20
Serial Code
  • Serial version of code is written using the best
    available tools for image processing on CPU.
  • Intel Performance Primitives (IPP) is used for
    low level routines.
  • Provides most of the image processing functions
    such as FFT, Convolution, Matrix Multiplication,
    Matrix Conjugate Multiplication, Element-wise
    division.
  • OpenCV is used for high level functions like GUI
    display and window handling.
  • The serial code was tested on a Pentium
    processor with 4Gb RAM.

21
Parallel Code
  • Parallel version of the code is written in
    C/CUDA.
  • OpenGL is used for GUI and window handling.
  • The code was tested on Nvidia GTX 280, which has
    240 stream processors and 1Gb of DDR3 memory.

22
Graphics Processing Unit (GPU)
  • Graphics Processing Unit (GPU) is a dedicated
    graphics rendering device used to offload work
    from the CPU.
  • Traditionally, GPU hardware had vertex shaders,
    to process 3D geometry and pixel shaders to
    handle scene lighting and color toning.
  • A modern GPU has as many as 240 stream
    processors, which can be either used as pixel or
    vertex shaders, thus increasing the hardware
    utilization.

23
Compute Unified Device Architecture (CUDA)
  • CUDA is parallel programming model and software
    environment designed to use GPUs for general
    purpose computing.
  • Stream processors on the GPU can be programmed
    in C using the APIs provided.
  • GPUs having G80 or newer core architecture can
    be programmed using CUDA.

24
Task Partitioning
  • Not all tasks can be parallelized.
  • Tasks done on CPU
  • Refining of correlation pairs
  • Affine Transformation
  • Tasks done on GPU
  • Phase correlation
  • Feature selection
  • Feature tracking
  • Image transformation

25
Data Partitioning
  • Each thread handles one pixel.
  • Each multiprocessor is allotted 64 threads (8
    threads/proc)
  • Number of thread blocks widthheight/64
  • 5120 thread blocks

26
Libraries Used
  • CUFFT, CUDA FFT
  • Powers of 2 not required.
  • Faster when size of input matrix is a power of a
    single factor
  • CUDPP, CUDA Data Parallel Primitives
  • has routines for parallel sorting and parallel
    reduction
  • used to calculate image mean

27
Sample Code
textureltfloat, 2, cudaReadModeElementTypegt tex
tex.addressMode0 cudaAddressModeWrap tex.addr
essMode1 cudaAddressModeWrap tex.filterMode
cudaFilterModeLinear tex.normalized true
// access with normalized texture coordinates
28
Sample Code
dim3 dimBlock(8,16, 1) dim3 dimGrid(fwidth /
dimBlock.x, fheight / dimBlock.y,
1) CUDA_SAFE_CALL(cudaMemcpyToArray(cuarray2,0,0
, img2, widthheightsizeof(float),cudaMemcpyDevic
eToDevice)) registrltltlt dimGrid, dimBlock, 0 gtgtgt
(final,affMatD,min(dl.x,ul.x),min(dl.y,dr.y),fx,w
idth,height)
29
Sample Code
__global__ void registr( float final, float
affMatD, int xshift, int yshift, int fx, int
width, int height) unsigned int x
blockIdx.xblockDim.x threadIdx.x xshift
unsigned int y blockIdx.yblockDim.y
threadIdx.y yshift float
ux1/(float)(width) float uy1/(float)(height
) ux(affMatD1yaffMatD13xaffMatD16
1)ux uy(affMatD0yaffMatD03xaffMa
tD061)uy if(uxlt1 ux gt0 uylt1
uygt0) finaly(fxwidth)xtex2D(tex2,
ux, uy)
30
Results
31
Comparison
32
Conclusions
  • We presented a parallel implementations of image
    registration and anomaly detection tools using
    GPU
  • Parallel code on GPU was significantly faster
    than C/IPP for all the applications
  • Current implementation of registration tool is
    not robust as existing Matlab code
  • Need to survey new data structures and
    algorithms for faster processing on GPUs

33
Thank you
Questions
34
References
  • Elsen, E., Houston, M., Vishal, V., Darve, E.,
    Hanrahan, P., and Pande, V. 2006. N-Body
    simulation on GPUs. In Proceedings of the 2006
    ACM/IEEE Conference on Supercomputing (Tampa,
    Florida, November 11 - 17, 2006). SC '06. ACM,
    New York, NY, 188.
  • Griesser, A. Aug. 2005. Real-time, GPU-based
    foreground-background segmentation. Tech. Rep.
    BIWI-TR-269. Computer Vision Lab, ETH Zurich.
  • Govindaraju, N. K., Lloyd, B., Dotsenko, Y.,
    Smith, B., and Manferdelli, J. 2008. High
    performance discrete Fourier transforms on
    graphics processors. In Proceedings of the 2008
    ACM/IEEE Conference on Supercomputing (Austin,
    Texas, November 15 - 21, 2008). Conference on
    High Performance Networking and Computing. IEEE
    Press, Piscataway, NJ, 1-12.
  • Garcia, V., Debreuve, E., and  Barlaud, M. Apr.
    2008. Fast k nearest neighbor search using gpu.
    Online. Available http//arxiv.org/abs/0804.144
    8
  • Govindaraju, N. K., Lloyd, B., Wang, W., Lin, M.,
    Manocha, D. 2004. Fast computation of database
    operations using graphics processors. In
    Proceedings of the 2004 ACM SIGMOD International
    Conference on Management of Data, 215-226.
  • Zitova, B., and Flusser, J. 2003. Image
    registration methods A survey. Image and Vision
    computing, Vol. 21, 977-1000.
  • Crosby, F. 2007. Adaptive correlation analysis
    with non-overlapping imagery indication.
    Photogrammetric Engineering and Remote Sensing,
    Vol. 73, No. 9, 1-7.
  • Reed, I. S., and Yu, X. 1990. Adaptive multiband
    CFAR detection of an optical pattern with unknown
    spectral distribution. IEEE Transactions on
    Acoustics, Speech and Signal Processing, Vol. 38.
  • Chandola, V., Banerjee, A., and Kumar, V. 2009.
    Anomaly detection A survey. To appear in ACM
    Computing Surveys. Available http//www-users.cs.
    umn.edu/kumar/papers/anomaly-survey.php
  • CUDA Programming Guide v2.0, Online. Available
    http//developer.download.nvidia.com/compute/cuda/
    2_0/docs/NVIDIA_CUDA_Programming_Guide_2.0.pdf
Write a Comment
User Comments (0)
About PowerShow.com