Title: aka The Full Monte!
1aka The Full Monte!
- Optimisation of Monte Carlo codes for
- High Performance Computing
- in Radiotherapy Applications
Dr Iwan Cornelius, M.B. Flegg, C.M. Poole, Prof
Christian Langton Faculty of Science and
Technology Queensland University of
Technology Queensland Cancer Physics
Collaborative
2Outline
- Introduction
- Development of a LINAC Monte Carlo model using
GEANT4 - Optimisation
- Future Directions
- Conclusions
3Introduction Radiotherapy
- LINAC produce highly controllable source of MeV
photons - Energy
- Gantry angle
- Patient position
4Introduction Radiotherapy
- LINAC produce highly controllable source of MeV
photons - Multi Leaf Collimators (MLCs) to define arbitrary
shaped fields
5Introduction Radiotherapy
- Planning
- Patient imaged
- PTV OAR Contoured
- Optimisation of fields to conform Dose to tumour
and spare healthy tissue - Delivery
- Fractionated
- Based on analytical calculations
- Can be inaccurate in regions of high
heterogeneity
6Monte Carlo
- What is it?
- How is it used in radiotherapy?
- Treatment plan verification
- Support new dosimetry measurements used in QA
- What tools exist?
- EGSnrc/BEAMnrc, PENELOPE, MCNPX, GEANT4
- Challenges to overcome
- Reduce Computation times (maintain accuracy)
- Code optimisation
- Variance reduction
- High Performance Computing (HPC)
- Usability
7High Performance Computing
- Monte Carlo trivial to parallelise
- Launch identical application with unique random
number generator seed - Collate results
- Centralised Clusters
- Multiple machines, Beowulf
- Multiple CPU, Shared memory (SGI Altix)
- Cons
- Look better on paper
- Sharing resource with other users
- Often limited to of processors, wait in queue
- Single machine, multiple processors
- Dual quad core
- Hyperthreading can get 16 cores
8High Performance Computing GPGPU
- General Purpose Graphics Processing Units
- hundreds of processors on a chip
- NVIDIA Tesla C1060 PCIx 240 cores per card 4GB
memory - CUDA
- Compute Unified Device Architecture
- Write kernel in C for CUDA to run on the GPU
- Copy from main memory to device memory
- Kernel executes on GPU
- Copies result back to main memory
- Great for loops
- How to Accelerate Monte Carlo codes with GPUs
- Re-engineer entire code into C for CUDA kernels
- Re-write computationally intensive portions of
code into kernels using CUDA - Calculation time doesnt scale with of
processors
9GEANT4
- Toolkit of C classes
- Primary beam, geometry, physics processes,
scoring - User must create their own application based on
these - Very powerful general purpose Monte Carlo tool
- High energy physics, space physics, medical
physics, optics, radiation protection,
astrophysics
10GEANT4
- Pros
- Extremely flexible
- Time dependent geometries
- Radioactive decay, Neutron transport
- Various visualisation tools
- Cons
- Extremely flexible
- Requires proficiency with C programming
- Steep learning curve
- Deterrent for first time users
- Hospital based Medical Physicists with limited
research time
11The Full Monte!
- Create generic LINAC application using GEANT4
- Capable of modelling Elekta, Varian, Siemens
LINACs - Do for GEANT4 what BEAMnrc did for EGSnrc (just
text inputs) - Accurate. Verify against experimental data.
- Optimise for HPC environments (Desktop
Supercomputer) - Distribute over available CPUs
- Port to the GPU
- User interface
- Simple text-file based interface
- Graphical User Interface
- Interface with TPS
- Able to routinely verify treatment plans
12Geometry
- Varian 2100 Clinac
- Dimensions, material composition from Varian
Docs - Target
- Primary Collimator
- Vacuum window
13Geometry
- Flattening filter
- Compensate for forward peaked distribution of
bremsstrahlung photons - Ionisation chamber
- Monitor total Dose delivery
14Geometry
- Jaws
- Define square fields
15Geometry
- Multi-Leaf Collimators (MLCs)
- Interleaved Tungsten leaves
- Varian Millenium
- Brad Oborn (UoW)
16Primary Beam
- Monoenergetic electron beam
- Normally incident on target
- Gaussian spread radially
17Physics
- Photons
- Photoelectric effect
- Compton
- GammaConversion
- Electrons
- Multiple scatter
- Ionisation
- Bremmstrahlung
- Positrons
- Ditto
- Annihilation
18Scoring
- Water Phantom
- 50 cm x 50 cm x 50 cm
- Score in voxelised geometry
19Validation / Commissioning
- Comparison with ionisation chamber measurements
in a water phantom - Scanning with x,y,z
- Dose along beam axis
20Validation Tune Electron Beam Energy
- Tuning of electron beam energy for best match
- 10 cm x 10 cm field
- Compare between
- 10-30cm depths
21Results Tune Electron Beam Energy
- Comparison with ionisation chamber measurements
in water - Tuning of electron beam energy for best match
- 10 cm x 10 cm field
- Compare between
- 10-30cm depths
22Results 5.85 MeV, 10 cm x 10 cm
- Within 2 agreement between 0.5cm and 38cm
23Results 5.85 MeV, 10 cm x 10 cm
- Within 2 agreement between 0.5cm and 38cm
24Results 5.85 MeV, 5 cm x 5 cm
25Results 5.85 MeV, 20 cm x 20 cm
26Results 5.85 MeV, 40 cm x 40 cm
27Optimisations
- No Optimisation
- Many photons produced will never reach the
sensitive region of the geometry
28Optimisations
- Kill zones
- Nothing fancy-pants
- Terminate histories that are unlikely to
contribute to observable - Above target
- Around primary collimator
- Relative Computation Time 78
29Optimisations
- Phase space files
- Some aspects of geometry dont change
- Create pre-calculated radiation field at plane
- Sample this population to conserve computation
times - Relative Computation Time 38
- 380 hrs, O(1010)
30HPC GPU/CPU Desktop Supercomputer
- Purchase of Xenon T5 Desktop Supercomputer
- The Terminator
- 4 x C1060 Tesla card 960 cores!
- 2 x quad core processors
- hyper-threading
- Linux sees 16 processors
- NVIDIA Professorial partnership grant
- Awarded 3 x C1060 Tesla cards
- Research team learning CUDA
- Mark Harris, local CUDA guru
31Optimisations Parallelise on CPUs
- Message Passing Interface (MPI)
- Run identical simulation on different core with
unique random number - Geant4 MPImanager class
- Time scales roughly linearly with number of
processors - Simulations in 24 hrs, O(1010)
32The GPU Dilemma
- 1. Re-write entire code into C for CUDA?
- C for CUDA doesnt support sophisticated data
types (classes) - O(106) lines of code, dozens of developers
- Wait for CUDA to catch up (?)
- 2. Create C wrapper classes for certain methods
- First step, random number generator
- Incorporated into GEANT4 framework via
inheritance - Implementing Mersenne Twister algorithm (hack
example from CUDA SDK) to generate cache of
random numbers - Improvement of only a few percent
33Profiling!
- Great first step when optimising code
- Linux gprof require to re-compile with flags set
- MacOSX
- Profiling tool doesnt require recompile
34Conclusions
- GEANT4 LINAC application has been developed
- Specific to Varian Clinac
- Many parameters hard-coded
- Work commenced on textfile based UI commands
- Preliminary validation promising
- Optimisation
- Phase space files
- Kill zones
- MPI for parallel processing on CPUs
- Porting random number generator to GPU
35Future Directions
- Validation
- Verify dose distributions in heterogeneous
phantoms - Verify model of MLCs (irregular fields)
- Develop interface to Treatment Planning System
- Optimisation
- Re-write part of GEANT4 to run on GPU
- Interface
- User friendly text-file based commands
- Treatment Plan interface
- Implement DICOM-RT interface
36Acknowledgements
- QUT
- Scott Crowe, Tanya Kairn, Andrew Fielding
- discussion on Varian LINAC model, Experimental
data - Mark Barry, Mark O Dwyer
- discussion on CPU optimisation, High Performance
Computing - Mater Hospital, Brisbane
- Radiation Oncology Group
- UoW
- Brad Oborn
- Millenium MLC model
- GEANT4 Collaboration
- Joseph Perl (SLAC)
- discussion on visualisation / profiling
- NVIDIA
- Mark Harris