Simulation in a Distributed Computing Environment - PowerPoint PPT Presentation

About This Presentation

Title:

Simulation in a Distributed Computing Environment

Description:

Speed of execution is often a concern in Monte Carlo simulation ... succesful prototypes running on LSF and EDG. Parallel cluster processing ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 24

Provided by: guat

Category:

more less

Transcript and Presenter's Notes

Title: Simulation in a Distributed Computing Environment

1
Simulation in a Distributed Computing
Environment

S. Guatelli1, J. Moscicki2, M.G. Pia1
1INFN Genova, Italy
2CERN, Geneva, Switzerland

2
Speed of Monte Carlo simulation

Speed of execution is often a concern in Monte
Carlo simulation
Often a trade-off between precision of the
simulation and speed of execution

Typical use cases

Semi-interactive response
Detector design
Optimisation
Oncological radiotherapy

Very long execution time
High statistics simulation
High precision simulation

Fast simulation Variance reduction techniques
(event biasing) Inverse Monte Carlo
methods Parallelisation
Methods for faster simulation response
3
Features of this study

Geant4 application in a distributed computing
environment
Architecture
Implications on simulation applications
Environments
PC farm
GRID
Two use cases Geant4 Advanced Examples
semi-interactive response (brachytherapy)
high statistics (medical_linac)
By-product results for Geant4 medical
application
Quantitative study
results to be submitted for publication

4
Requirements
Architectural requirements

Transparent execution in sequential/parallel mode
Transparent execution on a PC farm and on the Grid

High statistics simulation
Semi-interactive simulation

Geant4 brachytherapy
Execution time for 20 M events 5 hours
Goal execution time few minutes

Geant4 medical_linac
Execution time for 109 events 10 days
Goal execution time few hours

Reference sequential mode on a Pentium IV, 3 GHz
5
Parallel mode local cluster / GRID

Both applications have the same computing model
a job consists of a number of independent tasks
which may be executed in parallel
result of each task is a small data packet (few
kb), which is merged as the job runs
In a cluster
computing resources are used for parallel
execution
user connects to a possibly remote cluster
input data for the job must be available on the
site
typically there is a shared file system and a
queuing system
network is fast
GRID computing uses resources from multiple
computing centres
typically there is no shared file system
(parts of) input data must be replicated in
remote sites
network connection is slower than within a cluster

6
Overview

Architectural issues
DIANE
How to dianize a Geant4 application
Performance tests
On a single CPU
On clusters
On the GRID
Conclusions
Lessons learned
Outlook

Quantitative, documented results
Publicly distributed DIANE Geant4 application
code
7
DIANE
http//cern.ch/DIANE
Developed by J. Moscicki, CERN/IT

RD project
started in 2001 in CERN/IT with very limited
resources
collaboration with Geant4 groups at CERN, INFN,
ESA
succesful prototypes running on LSF and EDG

Master-Worker architectural pattern

Parallel cluster processing
make fine tuning and customisation easy
transparently using GRID technology
application independent

8
Practical example Geant4 simulation with analysis

Each task produces a file with histograms
The job result is the sum of histograms produced
by tasks
Master-worker model
client starts a job
workers perform tasks and produce histograms
master integrates the results
Distributed Processing for Geant4 Applications
task N events
job M tasks
tasks may be executed in parallel
tasks produce histograms/ntuples
task output is automatically combined (add
histograms, append ntuples)
Master-Worker Model
Master steers the execution of job, automatically
splits the job and merges the results
Worker initializes the Geant4 application and
executes macros
Client gets the results

9
simulation with DIANE
UML Deployment Diagram for Geant4 applications

Completely transparent to the user same Geant4
application code
G4Simulation class is responsible of managing the
simulation
manage random number seeds
Geant4 initialisation
macros to be executed in batch mode
termination

10
Development costs

Strategy to minimise the cost of migrating a
Geant4 simulation to a distributed environment
DIANE Active Workflow framework
provides automatic communication/synchronization
mechanisms
application is glued to the framework using a
small Python module
in most cases no code changes to the original
application are required
load balancing and error recovery policies may be
plugged in form of simple python functions
Transparent adaptation for Clusters/GRIDs,
shared/local file systems, shared/private queues
Development/modification of application code
original source code unmodified
addition of an interface class which binds
together application and M-W framework

The application developer is shielded from the
complexity of underlying technology via DIANE
11
Test results

Performance of the execution of the dianized
Brachytherapy example
Test on a single CPU
Test on a dedicated farm (60 CPUs)
Test on a farm shared with other users (LSF,
CERN)
Test on the GRID (LCG)

Tools and libraries Simulation toolkit Geant4
7.0.p01 Analysis tools AIDA 3.2.1 and PI
1.3.3 DIANE DIANE 1.4.2 CLHEP 1.9.1.2 G4EMLOW
2.3
12
Overhead at initialisation/termination

Test on a single dedicated CPU (Intel , Pentium
IV, 3.00 GHz)
Study execution via DIANE w.r.t. sequential
execution
run 1 event

Standalone application 4.6 ? 0.2 s
Application via DIANE, simulation only 8.8 ? 0.8 s
Application via DIANE, with analysis integration 9.5 ? 0.5 s
Overhead 5 s, negligible in a high statistics
job
13
Overhead due to DIANE

Test on a single dedicated CPU (Intel , Pentium
IV, 3.00 GHz)
Study execution via DIANE w.r.t. sequential
execution

Execution time vs. number of events in the job
The overhead of DIANE is negligible in high
statistics jobs
Ratio
with respect to the number of events
14
Farm execution time and efficiency

Dedicated farm 30 identical bi-processors
(Pentium IV, 3 GHz)
Thanks to Regional Operation Centre (ROC) Team,
Taiwan
Thanks to Hurng-Chun Lee (Academia Sinica Grid
Computing Center, Taiwan)
Load balancing optimisation of the number of
tasks and workers

15
Optimizing the number of tasks

The job ends when all the tasks are executed in
the workers
If the job is split into a higher number of
tasks, the chance that the workers finish the
tasks at the same time is a higher
Note the overall time of the job is determined
by the last worker to finish the last task

Example of a good job balancing
Example of a job that can be improved from a
performance point of view
16
Farm shared with other users
Real-life case farm shared with other users
Execution in parallel mode on 5 workers of CERN
LSF DIANE used as intermediate layer
Preliminary!
The load of the cluster changes quickly in
time The conditions of the test are not
reproducible
Highly variable performance
17
Parallel execution in a PC farm

Required production of Brachytherapy 20 M events
20 M events in sequential mode
16646 s ( 4h and 38 minutes) on a a Intel ,
Pentium IV, 3.00 GHz
The same simulation runs in 5 minutes in parallel
on 56 CPUs
appropriate for clinical usage
Similar results for Geant4 medical_linac Advanced
Example
production can become compatible with usage for
the verification of IMRT treatment planning
sequential execution requires 10 days to obtain
significant results

18
Running on the Grid (LCG)

G4Brachy executed on the GRID (LCG)
nodes located in Spain, Russia, Italy, Germany,
Switzerland
Conditions of the test
The load of the GRID changes quickly in time
The conditions of the test are not reproducible
Efficiency
The evaluation of the efficiency with the same
criterion as in a dedicated farm does not make
much sense in this context
Study the efficiency of DIANE as automated job
management w.r.t. manual submission through
simple scripts

19
Test results
Execution on the GRID through DIANE, 20 M
events,180 tasks, 30 workers
Execution on the GRID, without DIANE
Worker number
Worker number
Time (seconds)
Time (seconds)
Through DIANE - All the tasks are executed
successfully on 22 workers - Not all the workers
are initialized and used on-going investigation
Without DIANE - 2 jobs not successfully
executed due to set-up problems of the workers
20
How the GRID load changes

Execution time of Brachytherapy in two different
conditions of the GRID
DIANE used as intermediate layer

Worker number
Worker number
Time (seconds)
Time (seconds)
20 M events, 60 workers initialized, 360 tasks
Very different result!
21
Farm/GRID execution

Brachy, 20 M events, 180 tasks
Taipei cluster
29 machines, 734 s 12 minutes
GRID
27 machines, 1517 s 25 minutes

Preliminary indication The conditions are not
reproducible
22
Lessons learned

DIANE as intermediate layer
Transparency
Good separation of the subsystems
Good management of CPU resources
Negligible overhead
Load balancing
A relatively large number of tasks increases the
efficiency of parallel execution in a farm
Trade-off between optimisation of task splitting
and overhead introduced
Controlled and real life situation is quite
different in a farm
need dedicated farm for critical usage (i.e.
hospital)
Grid
highly variable environment
not mature yet for critical usage
automated management through a smart system is
mandatory
work in progress, details still to be understood
quantitatively

23
Conclusions

General approach to the execution of Geant4
simulation in a distributed computing environment
transparent sequential/parallel application
transparent execution on a local farm or on the
Grid
user code is the same
Quantitative, documented results
reference for users and for further improvement
on-going work to understand details

Write a Comment

User Comments (0)