Title: Experiences with Distributed and Parallel MATLAB
1Experiences with Distributed and Parallel MATLAB
on CCS
- Daniel Goodman, Stef Salvini
- and Anne Trefethen
2Who we are
- The focus of the OeRC is the development and
application of new advances in computational and
information technology to allow groups of
researchers to tackle problems with increasing
scale and complexity, facilitating
interdisciplinary research and creating
appropriate research infrastructure. - The Centre supports a community of
multidisciplinary researchers who are engaged in
e-Research, providing suitable education and
training and an interface to industry.
3Our CCS Cluster
- 20 dual CPU, dual core SMP nodes with 8 GB of RAM
- 2 quad CPU, dual core SMP nodes with 32 GB of
RAM - Gigabit private network
- 10 terabyte file store
- Installed libraries include MS-MPI, Intel Math
Kernel Libraries, Numerical Algorithms Group
Windows libraries and ITK - 32 Distributed MATLAB licenses
4Users of the OeRC Cluster
- Financial Computing Both for research and
teaching, lead by Prof. Mike Giles - Zoology Department Analysing homogenous
recombination in bacteria - Experiments using CCS as the backend for large
Excel workbooks - OxGrid Globus gateway based on software from
Southampton
5Users of the OeRC Cluster
- ClimatePrediction.net Worlds largest climate
experiment
6Users of the OeRC Cluster
- Optical Grid High bandwidth-based collaboration
between Oxford and UCSD
7What we are going to cover in this talk
- Introduce MATLAB Distributed toolbox
- Introduce two existing MATLAB projects
- Examine the different techniques available to
port these to the CCS cluster - Our thoughts on the MATLAB Distributed toolbox
- Our thoughts on the CCS cluster
- Recommendations for improvement
8MATLAB Distributed Toolbox
- Allows instances of MATLAB to run as workers on
clusters. - These workers can be used to run a range of
different styles of job. (Condor, message
passing, global operations) - Supports a set of distributed matrices that can
be used to abstract the parallelisation from the
system.
9Electron Microscope Data
Thanks to Rick Lawrence of UCSD
10Electron Microscope Data
First take images of a slide from many different
angles
Thanks to Rick Lawrence of UCSD
11Electron Microscope Data
First take images of a slide from many different
angles
Thanks to Rick Lawrence of UCSD
12Electron Microscope Data
First take images of a slide from many different
angles
Thanks to Rick Lawrence of UCSD
13Electron Microscope Data
Slice 1
Then using CT style techniques convert this into
slices of your sample.
Thanks to Rick Lawrence of UCSD
14Electron Microscope Data
Slice 2
Then using CT style techniques convert this into
slices of your sample.
Thanks to Rick Lawrence of UCSD
15Electron Microscope Data
Slice 3
Then using CT style techniques convert this into
slices of your sample.
Thanks to Rick Lawrence of UCSD
16Electron Microscope Data
Slice 4
Then using CT style techniques convert this into
slices of your sample.
Thanks to Rick Lawrence of UCSD
17Electron Microscope Data
Slice 5
Then using CT style techniques convert this into
slices of your sample.
Thanks to Rick Lawrence of UCSD
18Electron Microscope Data
Sweep
Slices
19Merging Medical Information
Take different sources of information and merge
them to produce a more detailed image
Thanks to Vicente Grau of Oxford University
20Merging Medical Information
Take different sources of information and merge
them to produce a more detailed image
Beating Heart
Static Heart With Annotations
Thanks to Vicente Grau of Oxford University
21Merging Medical Information
Take different sources of information and merge
them to produce a more detailed image
Beating Heart
Alignment Function
Static Heart With Annotations
Thanks to Vicente Grau of Oxford University
22Merging Medical Information
Take different sources of information and merge
them to produce a more detailed image
Beating Heart
Alignment Function
Beating Heart With Annotations
Static Heart With Annotations
Thanks to Vicente Grau of Oxford University
23Independent Tasks
- Both problems are Embarrassingly Parallel so
are in theory, easily split into independent
tasks. - Used standard distributed toolbox objects to
parallelise the code and executed on the cluster. - Return results to the client for marshalling and
saving.
24Independent Tasks
- jm findResource('scheduler','configuration',CCS
') - job1 createJob(jm)
- createTask(job1, _at_projective_reconstruction_core,
1, imodfile_in, numtlts, xsize, ysize,
plane_coeffs2, blocksize, z_inc, homography_3D,
z_block_start) - f 'projective_reconstruction_core.m',
'imod_fileread_slice.m', 'mrc_head_read.m',
'mrc_read_slice.m', 'init_mrc_head.m',
'get_datumsize.m' - set(job1, 'FileDependencies', f)
25Independent Tasks
- submit(job1)
- waitForState(job1, 'finished')
- blocks getAllOutputArguments(job1)
26Analysis of Method 1
- Lack of refactoring tools and tools to determine
file dependences makes construction from legacy
code fiddly. (MATLAB) - Limited and sometimes fiddly control of output
(MATLAB) - Requires construction of custom submission and
activation filters (CCS)
27Communicating Tasks
- Allow tasks to communicate so they can save the
results from the nodes directly to the file
system. - Use LabSend and LabReceive commands to pass a
token that controls access to the file system. - Construct code to control which tasks each node
will perform based their index.
28Communicating Tasks
- sched findResource('scheduler',configuration',
CCS') - pjob createParallelJob(sched)
- set(pjob, 'MaximumNumberOfWorkers', 30)
- set(pjob, 'MinimumNumberOfWorkers', 15)
- f 'mrc_write_slice.m', 'imod_filewrite_slice.m'
, 'mrc_head_write.m', 'imod_filewrite_first_slice.
m', 'projective_reconstruction.m',
'imod_fileread_slice.m', 'mrc_head_read.m',
'mrc_read_slice.m', 'init_mrc_head.m',
'get_datumsize.m' - set(pjob, 'FileDependencies', f)
29Communicating Tasks
- task createTask(pjob, _at_projective_reconstruction
, 0, basename) -
- submit(pjob)
- waitForState(pjob)
30Communicating Tasks
- Initialise
- iblock labindex
- Save output
- if iblock 1
- out_ptrlabReceive(mod(labindex-2,numlabs)1)
- end
- Save output
- if iblock numblocks
- labSend(out_ptr, mod(labindex,numlabs)1)
- end
- Advance
- iblock iblock numlabs
31Analysis of Method 2
- Same issues as before with refactoring and
determining file dependencies (MATLAB) - Lack of multi-threading wastes resources on tasks
with heterogeneous execution times. This is being
addressed (MATLAB) - Again custom submission and activation filters
need to be constructed (CCS) - Much more vulnerable to failing nodes (CCS and
MATLAB)
32Performance
- Both methods provided almost linear speedup.
- Using 30 nodes the time to perform the analysis
of the microscope data is reduced from 3.4 hours
to 7 minutes. - Using 19 nodes the time to run the heart analysis
is reduced from 5.7 hours to 18 minutes
33Thoughts on MATLAB
- Easy to install
- Easy to configure
- Easy to use
- Lacks tooling for refactoring jobs out of
existing code, and setting configuration
parameters - Data model needs extending
- Lack of ability to have threads sharing data
wastes time and memory
34Thoughts on CCS
- Mostly a good experience
- Few specific difficulties
- Submission and Activation Filters
- Authentication
- Shared folders
- Error Messages
- Failover function for head node
35Submission and Activation Filters
- Single executable for each makes management of
multiple applications hard - It can be hard to determine which application the
user is attempting to run - No means of the activation filter feeding back
why the job was rejected - Would be nice to have more control over job
license restrictions without the use of filters
36Authentication
- On some client machines it appears not to be
possible to get the client to remember the users
password and automatically authenticate.
37Shared Folders
- When copying large data files to nodes, the file
server ceases to appear as a network resource,
resulting in transfers failing.
38Error Messages
- Often when a job fails, no error message is
provided to assist in debugging.
39Failover of head node
- When the head node fails it will remain in its
failed state indefinitely.
40Recommendations
- Tool for picking up console output and load
information from your job. - Better way of managing licenses
- Mandatory field identifying the program to be
executed - Better control of job distribution across nodes
- Make it easier to integrate legacy systems
- Include SFU and SUA
- Include more information from active directory in
CCS Administrator - Add more descriptive filtering to the job queue