Simulation in a Distributed Computing Environment - PowerPoint PPT Presentation

About This Presentation
Title:

Simulation in a Distributed Computing Environment

Description:

Speed of execution is often a concern in Monte Carlo simulation ... succesful prototypes running on LSF and EDG. Parallel cluster processing ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 24
Provided by: guat
Category:

less

Transcript and Presenter's Notes

Title: Simulation in a Distributed Computing Environment


1
Simulation in a Distributed Computing
Environment
  • S. Guatelli1, J. Moscicki2, M.G. Pia1
  • 1INFN Genova, Italy
  • 2CERN, Geneva, Switzerland

2
Speed of Monte Carlo simulation
  • Speed of execution is often a concern in Monte
    Carlo simulation
  • Often a trade-off between precision of the
    simulation and speed of execution

Typical use cases
  • Semi-interactive response
  • Detector design
  • Optimisation
  • Oncological radiotherapy
  • Very long execution time
  • High statistics simulation
  • High precision simulation

Fast simulation Variance reduction techniques
(event biasing) Inverse Monte Carlo
methods Parallelisation
Methods for faster simulation response
3
Features of this study
  • Geant4 application in a distributed computing
    environment
  • Architecture
  • Implications on simulation applications
  • Environments
  • PC farm
  • GRID
  • Two use cases Geant4 Advanced Examples
  • semi-interactive response (brachytherapy)
  • high statistics (medical_linac)
  • By-product results for Geant4 medical
    application
  • Quantitative study
  • results to be submitted for publication

4
Requirements
Architectural requirements
  • Transparent execution in sequential/parallel mode
  • Transparent execution on a PC farm and on the Grid

High statistics simulation
Semi-interactive simulation
  • Geant4 brachytherapy
  • Execution time for 20 M events 5 hours
  • Goal execution time few minutes
  • Geant4 medical_linac
  • Execution time for 109 events 10 days
  • Goal execution time few hours

Reference sequential mode on a Pentium IV, 3 GHz
5
Parallel mode local cluster / GRID
  • Both applications have the same computing model
  • a job consists of a number of independent tasks
    which may be executed in parallel
  • result of each task is a small data packet (few
    kb), which is merged as the job runs
  • In a cluster
  • computing resources are used for parallel
    execution
  • user connects to a possibly remote cluster
  • input data for the job must be available on the
    site
  • typically there is a shared file system and a
    queuing system
  • network is fast
  • GRID computing uses resources from multiple
    computing centres
  • typically there is no shared file system
  • (parts of) input data must be replicated in
    remote sites
  • network connection is slower than within a cluster

6
Overview
  • Architectural issues
  • DIANE
  • How to dianize a Geant4 application
  • Performance tests
  • On a single CPU
  • On clusters
  • On the GRID
  • Conclusions
  • Lessons learned
  • Outlook

Quantitative, documented results
Publicly distributed DIANE Geant4 application
code
7
DIANE
http//cern.ch/DIANE
Developed by J. Moscicki, CERN/IT
  • RD project
  • started in 2001 in CERN/IT with very limited
    resources
  • collaboration with Geant4 groups at CERN, INFN,
    ESA
  • succesful prototypes running on LSF and EDG

Master-Worker architectural pattern
  • Parallel cluster processing
  • make fine tuning and customisation easy
  • transparently using GRID technology
  • application independent

8
Practical example Geant4 simulation with analysis
  • Each task produces a file with histograms
  • The job result is the sum of histograms produced
    by tasks
  • Master-worker model
  • client starts a job
  • workers perform tasks and produce histograms
  • master integrates the results
  • Distributed Processing for Geant4 Applications
  • task N events
  • job M tasks
  • tasks may be executed in parallel
  • tasks produce histograms/ntuples
  • task output is automatically combined (add
    histograms, append ntuples)
  • Master-Worker Model
  • Master steers the execution of job, automatically
    splits the job and merges the results
  • Worker initializes the Geant4 application and
    executes macros
  • Client gets the results

9
simulation with DIANE
UML Deployment Diagram for Geant4 applications
  • Completely transparent to the user same Geant4
    application code
  • G4Simulation class is responsible of managing the
    simulation
  • manage random number seeds
  • Geant4 initialisation
  • macros to be executed in batch mode
  • termination

10
Development costs
  • Strategy to minimise the cost of migrating a
    Geant4 simulation to a distributed environment
  • DIANE Active Workflow framework
  • provides automatic communication/synchronization
    mechanisms
  • application is glued to the framework using a
    small Python module
  • in most cases no code changes to the original
    application are required
  • load balancing and error recovery policies may be
    plugged in form of simple python functions
  • Transparent adaptation for Clusters/GRIDs,
    shared/local file systems, shared/private queues
  • Development/modification of application code
  • original source code unmodified
  • addition of an interface class which binds
    together application and M-W framework

The application developer is shielded from the
complexity of underlying technology via DIANE
11
Test results
  • Performance of the execution of the dianized
    Brachytherapy example
  • Test on a single CPU
  • Test on a dedicated farm (60 CPUs)
  • Test on a farm shared with other users (LSF,
    CERN)
  • Test on the GRID (LCG)

Tools and libraries Simulation toolkit Geant4
7.0.p01 Analysis tools AIDA 3.2.1 and PI
1.3.3 DIANE DIANE 1.4.2 CLHEP 1.9.1.2 G4EMLOW
2.3
12
Overhead at initialisation/termination
  • Test on a single dedicated CPU (Intel , Pentium
    IV, 3.00 GHz)
  • Study execution via DIANE w.r.t. sequential
    execution
  • run 1 event

Standalone application 4.6 ? 0.2 s
Application via DIANE, simulation only 8.8 ? 0.8 s
Application via DIANE, with analysis integration 9.5 ? 0.5 s
Overhead 5 s, negligible in a high statistics
job
13
Overhead due to DIANE
  • Test on a single dedicated CPU (Intel , Pentium
    IV, 3.00 GHz)
  • Study execution via DIANE w.r.t. sequential
    execution

Execution time vs. number of events in the job
The overhead of DIANE is negligible in high
statistics jobs
Ratio
with respect to the number of events
14
Farm execution time and efficiency
  • Dedicated farm 30 identical bi-processors
    (Pentium IV, 3 GHz)
  • Thanks to Regional Operation Centre (ROC) Team,
    Taiwan
  • Thanks to Hurng-Chun Lee (Academia Sinica Grid
    Computing Center, Taiwan)
  • Load balancing optimisation of the number of
    tasks and workers

15
Optimizing the number of tasks
  • The job ends when all the tasks are executed in
    the workers
  • If the job is split into a higher number of
    tasks, the chance that the workers finish the
    tasks at the same time is a higher
  • Note the overall time of the job is determined
    by the last worker to finish the last task

Example of a good job balancing
Example of a job that can be improved from a
performance point of view
16
Farm shared with other users
Real-life case farm shared with other users
Execution in parallel mode on 5 workers of CERN
LSF DIANE used as intermediate layer
Preliminary!
The load of the cluster changes quickly in
time The conditions of the test are not
reproducible
Highly variable performance
17
Parallel execution in a PC farm
  • Required production of Brachytherapy 20 M events
  • 20 M events in sequential mode
  • 16646 s ( 4h and 38 minutes) on a a Intel ,
    Pentium IV, 3.00 GHz
  • The same simulation runs in 5 minutes in parallel
    on 56 CPUs
  • appropriate for clinical usage
  • Similar results for Geant4 medical_linac Advanced
    Example
  • production can become compatible with usage for
    the verification of IMRT treatment planning
  • sequential execution requires 10 days to obtain
    significant results

18
Running on the Grid (LCG)
  • G4Brachy executed on the GRID (LCG)
  • nodes located in Spain, Russia, Italy, Germany,
    Switzerland
  • Conditions of the test
  • The load of the GRID changes quickly in time
  • The conditions of the test are not reproducible
  • Efficiency
  • The evaluation of the efficiency with the same
    criterion as in a dedicated farm does not make
    much sense in this context
  • Study the efficiency of DIANE as automated job
    management w.r.t. manual submission through
    simple scripts

19
Test results
Execution on the GRID through DIANE, 20 M
events,180 tasks, 30 workers
Execution on the GRID, without DIANE
Worker number
Worker number
Time (seconds)
Time (seconds)
Through DIANE - All the tasks are executed
successfully on 22 workers - Not all the workers
are initialized and used on-going investigation
Without DIANE - 2 jobs not successfully
executed due to set-up problems of the workers
20
How the GRID load changes
  • Execution time of Brachytherapy in two different
    conditions of the GRID
  • DIANE used as intermediate layer

Worker number
Worker number
Time (seconds)
Time (seconds)
20 M events, 60 workers initialized, 360 tasks
Very different result!
21
Farm/GRID execution
  • Brachy, 20 M events, 180 tasks
  • Taipei cluster
  • 29 machines, 734 s 12 minutes
  • GRID
  • 27 machines, 1517 s 25 minutes

Preliminary indication The conditions are not
reproducible
22
Lessons learned
  • DIANE as intermediate layer
  • Transparency
  • Good separation of the subsystems
  • Good management of CPU resources
  • Negligible overhead
  • Load balancing
  • A relatively large number of tasks increases the
    efficiency of parallel execution in a farm
  • Trade-off between optimisation of task splitting
    and overhead introduced
  • Controlled and real life situation is quite
    different in a farm
  • need dedicated farm for critical usage (i.e.
    hospital)
  • Grid
  • highly variable environment
  • not mature yet for critical usage
  • automated management through a smart system is
    mandatory
  • work in progress, details still to be understood
    quantitatively

23
Conclusions
  • General approach to the execution of Geant4
    simulation in a distributed computing environment
  • transparent sequential/parallel application
  • transparent execution on a local farm or on the
    Grid
  • user code is the same
  • Quantitative, documented results
  • reference for users and for further improvement
  • on-going work to understand details
Write a Comment
User Comments (0)
About PowerShow.com