Title: Outline
1(No Transcript)
2Outline
- Introduction
- Image Registration
- High Performance Computing
- Desired Testing Methodology
- Reviewed Registration Methods
- Preliminary Results
- Future Work
- Cool App Demo
3Introduction
- Primary Motivation
- After some research, the scope of this project
increased tenfold
4Image Registration
- Image Registration is the process of determining
a spatial transformation that establishes the
correspondence of two images
5Image Registration
- Applications of Image Registration
- Cartography
- Computer Vision
- Image Guided Surgery
- Brain Mapping
- Detection of Disease state change over time
- And many more
6Image Registration
- Software packages, libraries, and frameworks
capable of Image Registration - Automated Image Registration Package (AIR)
- Insight Segmentation and Registration Toolkit
(ITK) - FLexible Image Registration Toolkit (FLIRT)
- Mathworks Image Processing Toolkit
- Others
- None currently support registration by means of
parallel computing!
7Image Registration
- Depending on the application, registration can be
highly demanding of resources - Large amounts of data to be worked on can be too
large for physical memory (results in disk
swapping) - Search spaces (deformable problems can get as
large as say 9.8 106)
8High Performance Computing
- Extremely efficient in reducing performance and
memory issues - Steadily decreasing prices and a high increase
availability of high performance machines has
made parallel computing for many a reality - Most image registration specialists are not
familiar with parallel and distributed computing
techniques - Many researchers have successfully applied such
methods, but none have a created a generic
software module
9High Performance Computing
- My Role
- Administer and maintain the two clusters Nick and
Optimus - Head of the USC High Performance Computing Group
- Assist users
- Developed and (try to) maintain the HPCG Webpage
10High Performance Computing
- Systems Nick
- HARDWARE 76 Compute Nodes Dual 3.4 Xeon 2ML2,
4GB RAM, 1-40GB1 Master Node Dual 3.2 GHz Xeon
2ML2, 4GB RAM, 3-73GB disks RAID 5 - INTERCONNECT Topspin Infiniband
- SOFTWARE Platform Rocks 4 (RHEL 4), Platform
LSF, OpenMPI (Compiled with Infiniband
Libraries), 64bit GCC compiles, Intel Compilers,
Star-CD, ITK, others - Will support starting Summer GAMESS, NWCHEM,
11High Performance Computing
- Systems Optimus
- HARDWARE 64 Compute Nodes Dual, Dual-core 2.2
GHz Opteron 2ML2, 8GB RAM, 1-250GB1 Master Node
Dual, Dual-core 2.2 GHz Xeon 2ML2, 8GB RAM,
2-500GB disks - INTERCONNECT GigE
- SOFTWARE Fedora Core 4, ABC Management Software,
OpenPBS scheduling software. OpenMPI (Compiled
with Infiniban Libraries), 64bit GCC compiles,
Intel Compilers, ITK, others - Will support starting Summer GAMESS, NWCHEM,
12High Performance Computing
- Message Passing
- In distributed memory systems, the most prevalent
means of communication is message passing - Message Passing Interface (MPI)
- Takes care of low-level details such as
buffering, error handling, and data-type
conversion - Middleware component in conjunction with standard
programming language like C, C, and Fortran
13High Performance Computing
- Issues with Multi-core 6
- Memory Contention
- Interconnect Contention
- Program Locality
- "--mca mpi_paffinity_alone 1"
14Desired Testing Methodology
- Research and analyze existing registration
frameworks to determine if their workload can be
distributed in a parallel environment - Thoroughly test all methods sequentially and in
parallel to determine Speedup - Testing in 2-D and 3-D, intermodal and
intramodal, and rigid and non-rigid image
registration - Focus on Intensity based methods
- Address known multi-core issues
15Desired Testing Methodology
- Two strategies
- Parallelizing the optimization method
- Parallelizing the metric function
16Desired Testing Methodology
- The measure of quality will be defined using
Parallel Speedup and Parallel Efficiency - Parallel speed up is defined as
- SN TS/TN
- where TS is the execution time of the best
sequential algorithm, and TN is the execution
time on N processors - Parallel efficiency is defined as
- EN SN/N
- where N is the number of processors
17Reviewed Registration Methods
- Warfields Approach 3
- Cachier's demons algorithm 5 as used in 7
- Claims its precise, robust, relatively low
computation time - Structure makes it a good candidate for
parallelization - Can be divided into three main bricks
- Oversampling needed by the pyramidal approach
- Search for the matches
- Parallel gaussian filtering
18Reviewed Registration Methods
- Cachier's demons algorithm 5 as used in 6
19Reviewed Registration Methods
- Acceleration of Genetic Algorithm with Parallel
Processing with Application in Medical Image
Registration (B. Laksanapanai W.
Withayachumnankul C. Pintavirooj P.Tosranon) - Very intriguing, but such a short paper and
didnt really dive into how it was implemented
20Reviewed Registration Methods
- Distributed Registration Framework as proposed by
Michael Kuhn 1 - The metric calculation is organized in a
master/slave design. - The master process is responsible for data
distribution as well as communication of the
existing framework - Each slave is assigned a region of the fixed
image, and calculates an intermediate metric
value - Master node coordinates all steps required to
collect and process the partial results and
passes the final result to the registration
framework
21Reviewed Registration Methods
22Reviewed Registration Methods
- Implemented these concepts through
- DistributedImageToImageMetric
- RegistrationCommunicator
- DistributedImageToImageMetric class is divided
into master and slave, and is derived from
itkImageToImageMetric class - RegistrationCommuncator provides an interface for
all communication tasks and uses MPI
23Reviewed Registration Methods
- Whole registration process consists of two
stages Initialization and Optimization - Initialization distribute data to nodes
- Optimization optimizers in ITK work iteration
based - During each iteration, metric values and
derivatives are requested from metric function - When new values are required, optimizer requests
a metric from the master, master then asks slaves
to compute the partial value associated with
their fixed region and transmits back to master - Master processes and repeats until complete
24Preliminary Results
- Sequential Runs MeanSquaresImagetoImageMetric
25Preliminary Results
- Sequential Runs MeanSquaresImagetoImageMetric
Nick Optimus
Best Run Time 427.7 s 522.3 s
26Future Work
- Implement an attachable parallel image
registration framework (that supports Multi-core
as well) to existing tools such as ITK - Thorough Testing on both clusters
- The usage of multiple cores in one node requires
a new programming model - Forms of Data Decomposition
27 28