Title: Distributed Monte Carlo Instrument Simulations at ISIS
1Distributed Monte Carlo Instrument Simulations at
ISIS
Tom Griffin, ISIS Facility University of
Manchester
2Introduction
- What is Distributed Computing
- The software we use
- VITESS Specifics
- McStas Specifics
- Conclusions
3What do I mean by Distributed Grid?
- A way of speeding up large, compute intensive
tasks - Break large jobs into smaller chunks
- Send these chunks out to (distributed) machines
- Distributed machines do the work
- Collate and merge the results
4Spare Cycles Concept
- Typical PC usage is about 10
- Most PCs not used at all after 5pm
- Even with heavily used (Outlook, Word, IE)
PCs, the CPU is still grossly underutilised - Everyone wants a fast PC!
- Can we use (steal?) their unused CPU cycles?
- SETI_at_home, World Community Grid
(www.worldcommunitygrid.org)
5Possible Software Implementations
- Toolkit e.g. COSM
- Low level toolkit source code level
integration - So time consuming work, for each application
- Entropia DC Grid
- Trial run at ISIS two years ago. Some success
- Company bought out and in limbo (?)
- United Devices Grid MP
- What were currently using
- Quite expensive
- Condor
- Free (academic research project)
- In our experience 2 yrs ago, not reliable with
Windows
6The United Devices System
- Server hardware
- We use two, dual Xeon servers 280 client
licenses - Could (will) easily cope with more clients
- Software
- Servers run RedHat Linux Advanced Server / DB2
- Clients available for Windows, Linux, SPARCs and
Macs - Programming
- MGSI Web Services interface XML, SOAP
- Accessed with C and Java classes etc
- Management Console
- Web browser based
- Can manage services, jobs, devices etc
7Visual Introduction to the Grid
8Suitable / Unsuitable Applications
- CPU Intensive
- Low to moderate memory use
- Not too much file output
- Coarse grained
- Command line / batch driven
- Licensing issues?
9Objects within the Grid
- Program
- McStas
- Job
- wish_simulation
- Jobstep
- Workunit
- sent to a Device
- Data Set
- Data
10How to write Grid Programs
- Fairly easy to write
- Interface to grid via Web Services
- So far used C, Java, Perl, Fortran, C
- Think about how to split your data and merge
results - Wrap and upload your executable
- Write the application service
- Pre and Post processing
- Use the Grid
11Wrapping Your Executable
- Executable any dlls etc
- Standard data files
- Compression
- Encryption
- Capture screen output
- Set Environmental Variables
- Command Line
12Application Service
- Pre-processing
- Partition data
- Package data partitions
- Log in to the Grid server
- Create a Job and Job Step
- Create a Data Set
- Create Datas and upload data packages
- Create Workunits
- Set the Job running
- Post-Processing
- Retrieve results
- Merge results
13Monte Carlo Speed-up Ideas
- Two scenarios
- Single large simulation run
- Split the neutrons into smaller numbers and
execute separately - Merge results in some way
- Many smaller runs
- Parameter scan
14VITESS Splitting It
- Easy mode of operation fixed executables data
files - Executables held on server
- Split command line into bits divide Ncount
- Vary the random seed
- Create data packages
- Upload data packages
15VITESS Running It
- Use GUI to create instrument Save As Command
- Parameter directory set to .
- Submit program parses bat file
- Substitutes V and P
- Removes header and footer
- Creates many new bat files with different --Zs
and
16C\My_GRID\VITESSE\VITESSE\buildgtVitess-Submit.exe
example_job example.bat req_files 20 logging in
to https//bruce.nd.rl.ac.uk18443/mgsi/rpc_soap.f
cgi as tom.... Adding Vitesse dataset.... Adding
Vitesse datas.... 3e007 neutrons split into 20
chunks, of -n1500000 neutrons Total number of
Vitesse 'runs' 20 Uploading data for run
1... Uploading data for run 2... . . Uploading
data for run 19... Uploading data for run
20... Adding Vitesse datas to system.... Adding
job.... Adding jobstep.... Turning on automatic
workunit generation.... Closing jobstep.... All
done Your job_id is 4878
VITESS Running It
- Submit program creates many bat files
17VITESS Monitoring It
18(No Transcript)
19(No Transcript)
20(No Transcript)
21(No Transcript)
22VITESS Merging It
- Download the chunks
- Merge Data files
- DetectedNeutrons.dat concatenate
- vpipes trajectories count rate
- Two classes of files
- 1D - Values sum divide by num chunks-
- - Errors square, sum and divide
- 2D Sum / num of chunks
23VITESS Advantages and Problems
- Many times faster linear increase
- Needs verification runs (x3)
- Typically 11 (potentially) 30 times faster
- 12 hours runs in 1 hour!
- Very large simulations reach random limits
24VITESS Some Results
176 hours 59 hours
6hrs 20mins
25McStas Splitting It
- Different executable for every run
- Executable must be uploaded at run time
- Split n into chunks
- or run many instances (parameter scan)
- Create data ( executable) packages
- Upload packages
26McStas Running It
- Use McGui to create and compile executable
- Create input file for Submit program
27McStas Running It
- Large run
- Submit program breaks up n
- Uploads new command line data executable
- Parameter Scan
- Send each run to a separate machine
28McStas Merging It
- Many output files ? Separate merge program
- PGPLOT and Matlab implemented
- Very similar
- PGPLOT
- 1D intensities sum and divide. Errors square,
sum and divide. Events Sum - 2D intensities sum and divide. Errors square,
sum and divide. Events Sum - Matlab
- 1D Same maths, different format
- 2D Virtually the same
- Metadata leave untouched
29McStas Advantages and Problems
- Security Do we trust users?
- 100 times faster?
- Linux version much faster than Windows ?
- How do we merge certain fields?
- values '1.44156e006 10459.9 30748'
- statistics 'X03.5418 dX1.52975
Y00.000822474 dY1.0288' - Some issue related to randomness of moderator
file
30Future Developments - Expansion
- Expansion
- Proposal accepted for an additional 400 licenses
- Giving us a total of 480
- Change in licensing model
50k
45k
- Bottom Line Costs
- Setup, server licenses, 80 client licenses
support 18k CMSD
50k
83k
31Conclusions
- Both run well under Grid MP
- Submit Retrieve a few hours work
- Merge a bit more
- Needs to merge more output formats ?
- Issues with very large simulations
- More info on Grid MP at www.ud.com