Title: Distributed computing at the Facility level: applications and attitudes
1Distributed computing at the Facility level
applications and attitudes
- Tom Griffin
- STFC ISIS Facility
- tom.griffin_at_stfc.ac.uk
- NOBUGS 2008, Sydney
2Spare cycles
- Typical PC CPU usage is about 10
- Usage minimal 5pm 8am
- Most desktop PCs are really fast
- Waste of energy
- How can we use (steal?) unused CPU cycles
to solve computational problems?
3Types of Application
- CPU Intensive
- Low to moderate memory use
- Not too much file output
- Coarse grained
- Command line / batch driven
- Licensing issues?
4Distributed computing solutions
Lots of choice CONDOR, GridEngine, GridMP
- Grid MP Server hardware
- Two, dual Xeon 2.8GHz servers RAID 10
- Software
- Servers run RedHat Linux Enterprise Server / DB2
- Unlimited Windows (and other) clients
- Programming
- Web Services interface XML, SOAP
- Accessed with C , Java, C
- Management Console
- Web browser based
- Can manage services, jobs, devices etc
- Large industrial user base
- GSK, JJ, Novartis etc.
5Installing and Running Grid MP
Server Installation 2 hours Client
Installation Create MSI and RPM using
setmsiprop 30 seconds Manual Install Better
security on Linux and Macs
6Adapting a program for GridMP
- Fairly easy to write
- Interface to grid via Web Services
- C, Java, C
- Think about how to split your data
- Wrap your executable
- Write the application service
- Pre and Post processing
7Package your executable
DLLs
Standard data files
Executable
Environment variables
PROGRAM MODULE EXECUTABLE
Compress?
Encrypt?
Uploaded to, and resident on, the server
8Create / run a job
Proteins
Molecules
Pkg4
Pkg2
Pkg3
Pkg1
Client side
https//
Datasets
Create job, generate cross product
Server side
Workunits
Start job
9Code examples
Mgsi.Job job new Mgsi.Job() job.application_gid
app.application_gid job.description
txtJobName.Text.Trim() job.state_id
1 job.job_gid ud.createJob(auth,
job) Mgsi.JobStep js new Mgsi.JobStep() js.jo
b_gid job.job_gid js.state_id
1 js.max_concurrent 1 js.max_errors
20 js.num_results 1 js.program_gid
prog.program_gid
10Code examples
Mgsi.DataSet ds new Mgsi.DataSet() ds.job_gid
job.job_gid ds.data_set_name job.description
"_ds_" DateTime.Now.Ticks ds.data_set_gid
ud.createDataSet(auth, ds) for (int i 1 i lt
numWorkunits.Value i) FileTransfer.UploadDat
a uploadD ft.uploadFile(auth,
Application.StartupPath
"\\testdata.tar") Mgsi.Data data new
Mgsi.Data() data.data_set_gid
ds.data_set_gid data.index i data.file_hash
uploadD.hash data.file_size
long.Parse(uploadD.size) datasi - 1 data
ud.createDatas(auth, datas) ud.createWorkunit
sFromDataSetsAsync(auth, js.job_step_gid, new
string ds.data_set_gid , options)
11Performance
Famotidine form B 13 degrees of freedom P21/c
V1421 Sync data to 1.64A 1 x 107 moves per run,
64 runs
12Performance 999 SA runs, full grid
4 days 18 hours CPU in 40 minutes elapsed time
317 cores from 163 devices 42 Athlons
1.62.2Ghz 168 Core 2 duos 1.83 Ghz 36 Core 2
quads 2.42.8 Ghz 1 duron _at_ 1.2Ghz 42 P4s
2.43.6Ghz 27 Xeons 2.53.6Ghz
Workunits
Time
13A Particular Success - McStas
HRPD supermirror guide design Complex
design Meaningful simulations take a long
time Want to try lots of ideas Many runs of
gt200 CPU days Simpler model was best
value Massive improvement in flux Significant
cost savings
14Problems
McStas Interactions in the wild Symantec
Anti-Virus Did not show up in testing McStas
restricted to night running only
15User Attitudes
A range Theft Im not having that on my
machine First thing to get blamed Gaining
more trust Evangelism by users
16 Flexibility with virtualisation
Request to run GARefl code ISIS is Windows
based Few Linux PCs VMWare server is
freeware 8 Hosts gave 26 cores More cores
more demand 56 real cores recruited from
servers, 64-core Beowulf 10 mac cores Run Linux
as a job
17 Flexibility with virtualisation
18The Future
Grid growing in power every day New machines
added, old ones still left on Electricity Energy
saving drive at STFC switch machines off Wake
On-LAN Magic Packets Remote
hibernate Laptops Good or bad?
19Summary
Distributed computing Perfect for
coarse-grained,CPU intensive, disk-lite Resourc
es Use existing resources. Power increases with
time, no need to write-off assets. Scalable Not
just faster Allows one to try different
scenarios Virtualisation Linux under Windows,
Windows under Linux. Green credentials PCs are
running anyway, better to utilise them. Can be
powered down up.
20Acknowledgements
ISIS Data Analysis Group Kenneth Shankland Damian
Flannery STFC FBU IT Service Desk and ISIS
Computing Group Key Users Richard Ibberson
(HRPD) Stephen Holt (GARefl) Questions?