Title: Using Lewis and Clark
1Using Lewis and Clark
Bill Spollen Division of Information
Technology/Research Support Computing Thursday,
Sept. 24, 2009
http//umbc.rnet.missouri.edu/ spollenw_at_missouri.e
du
2Outline
- Background
- Overview of High Performance Computing at the
UMBC - Clark usage
- Portable Batch Submission (PBS) and qsub
- Lewis usage
- Load Sharing Facility (LSF) and bsub
3Why this workshop?
- To encourage you to take advantage of resources
available through the UMBC. - Running your jobs in parallel will save you time.
- We may have tools to broaden your investigative
reach. - To show how best to use the resources.
4http//umbc.rnet.missouri.edu
5(No Transcript)
6Some Definitions
- A process is a program in execution.
- Serial processing uses only 1 cpu.
- Parallel processing (multiprocessing) uses two or
more cpus simultaneously. - Threads allow a process to run on more than one
cpu but only if they are all on the same computer
or node. - A cluster is a collection of interconnected
computers but each node has its own OS.
7Parallel Architectures
- Distinguished by the kind of interconnection,
both between processors, and between processors
and memory - Shared memory
- Network
(A)
Job
(B)
8A High Performance Computing Infrastructure
A 2 M Federal Earmark was made to the UM
Bioinformatics Consortium to obtain computers
with architectures to match the research problems
in the UM system.
9High Performance Computing Infrastructure Concept
(1) Clark Modeling and Simulations SGI Altix
3700 BX2 128 GB shared memory 64 cpus
(3) York Macromolecule Database
Searches TimeLogic DeCypher - Hardware/software f
or streamlined searches
FC
FC
(4) 12 TB SGI TP9500 Infinite Storage Disk Array
FC
(2) Lewis General Purpose Computing Dell
Linux Cluster with 128 nodes, 4 cpus per node
Fusion IBRIX
(5) 50 TB EMC CLARiiON CX700 Networked Storage
IB
IBRIX
IB
IBRIX
IB
IBRIX
IB
IBRIX
10(1) SGI Altix 3700 BX2
- 64 1.5 GHz Itanium2 processors
- 128 GB NumaLink Symmetric Multi-Processor (SMP)
Shared Memory - One OS image with 64 P
- Each processor has 28 ns access to all 128 GB RAM
clark.rnet.missouri.edu
11(2) Dell 130-Node Dual-Core HPC Cluster
- Woodcrest Head node 2 Dell Dual Core 2950 2.66
GHz cpus - Dell Xeon 2.8 GHz cluster admin node
- 128 Dell PowerEdge 1850 Xeon EM64T 2.8 GHz
compute nodes (512P) - 640 GB RAM (64 nodes _at_ 6GB, 64 nodes _at_ 4 GB)
- TopSpin Infiniband 2-Tier interconnect switch
- Access to 50 TB disk storage
lewis.rnet.missouri.edu
12(3) Sun/TimeLogic DeCypher
- 4 Sun V240 servers (UltraSparc IIIi, 1.5 GHz, 4P,
4GB) - 8 TimeLogic G4 DeCypher FPGA Engines
- TimeLogic DeCypher Annotation Suite (BLAST, HMM,
Smith-Waterman, etc.) - 50-1,000 times faster than clusters for some
BLASTs
york.rnet.missouri.edu
13(4) SGI TP9500 Infinite Storage Disk Array
- SGI TP9500 Disk Array w/ dual 2 Gbit controllers,
2 GB cache - 12 TB Fiber Channel disk array (6 Drawers 14 146
GB disks/drawer 2.044 TB/drawer) - 2 fiber connections each to the Altix, Dell, and
Sun systems.
14(5) EMC CLARiiON CX700 Disk Storage
- 125 500 GB SATA drives
- IB SAN support to Lewis
- IBRIX software is used to manage
- the I/O to the disk storage
- to all Lewis nodes
15Selected Software Installed
- SAS
- R
- Matlab
- Gaussian03
- NAMD
- AMBER
- CHARMM
- Octopus
- Locally developed code
- More
- NCBI Blast
- WU Blast
- HMMER
- ClustalW
- NextGen sequencing tools
- Phred, Phrap, Consed
- Oracle
- MySQL
- PGenesis
- M-Cells
16Compilers
- Linux (lewis, clark)
- Intel (icc, ifort) preferred better optimized
for the architecture than gnu. - Gnu (gcc, g, g77)
- javac
17Some Research Areas
- Chemical structure prediction and property
analysis with GAUSSIAN - Ab initio quantum-mechanical molecular dynamics
with VASP - Simulation of large biomolecular systems with
NAMD - Molecular simulations with CHARMM/AMBER
- Statistics of microarray experiments with R
18Clark 128 GB SMP Shared Memory 1 linux OS w/ 64
processors
Use Portable Batch System (PBS)!!!
CPU1
CPU2
CPU63
CPU64
19Using Clark PBS (Portable Batch System)
- clarkgt qsub scriptfile
- qsub submits a batch job to PBS. Submitting a PBS
- job specifies a task, requests resources and sets
- job attributes.
- clarkgt cat scriptfile
- PBS l cput100000,ncpus8,mem2gb
- (note -l for resource list)
- ./myProgram
20Using Clark output
- scriptfile.onnnn  (written to the standard output
stream) - scriptfile.ennnn  (written to the standard error
stream)
21Using Clark PBS example 1
- clarkgt qsub runSAS
- 6190.clark
- clarkgt cat runSAS
- PBS -l cput10000,ncpus1,mem1gb
- cd workingdir/
- sas test
- runSAS.o6190
- runSAS.e6190
- test.log
- test.lst
22Using Clark PBS example 2
As part of a script qsub -V -k n -j oe -o
PBS_path\ -r n z\ -l ncpusPBS_ncpus \ -l
cputPBS_cput myprog To learn more about
qsub clarkgt man qsub
23Using Clark qstat and queues
- clarkgt qstat
- Jobid Name User Time Use S Queue
- ----- ----- ---- -------- - -----
- 6422 qcid2 fjon 603620 R long
- 6432 redo1 fjon 0 Q long
- 6434 wrky4 fjon 100334 R standard
- 6487 job1 cdar 050610 R standard
- 6488 job23 cdar 013412 R standard
- 6489 jobh2 cdar 0 Q standard
- Long queue is for gt 100h.
- 1 or 2 jobs can run simultaneously.
- Only one can be long.
- Submit as many as you like.
24Using Clark qstat -f
- clarkgt qstat -f 6502
- Job Id 6502.clark
- Job_Name Blast.rice.
- Job_Owner mid_at_clark.rnet.missouri.edu
- resources_used.cpupercent 0
- resources_used.cput 000001
- resources_used.mem 52048kb
- resources_used.ncpus 8
- job_state R
- ctime Thu Apr 19 095515 2007
- .......................................
25Using Clark qdel
- clarkgt qstat 6422
- Jobid Name User Time Use S Queue
- ----- ----- ---- -------- - -----
- 6422 qcid2 fjon 603620 R long
- clarkgt qdel 6422 (to kill a job)
26Using Clark user limits
- Maximum
- number cpus 16
- jobs running 2
- jobs pending no limit
- data storage no limit (yet)
27Lewis 129 linux OSs 1 OS coordinates the
restInfiniband connects all nodes
Head Node
50 TB EMC CLARiiON CX700 Networked Storage
Use the Load Sharing Facility!!!
FC
IBRIX
Node 127
Node 1
Node 2
Node 128
128 Compute Nodes
28LSF ex 1 1 processor program
29LSF ex 1 1 processor program
- lewisgt bsub lt myJob
- lewisgt cat myJob
- BSUB -J 1Pjob
- BSUB -oo 1Pjob.oJ
- BSUB -eo 1Pjob.eJ
- ./myProg
N.B. oo, eo avoids filling your mailbox with
output
30Using Lewis bjobs
- lewisgt bsub lt myJob
- lewisgt bjobs
- JOBID USER STAT QUEUE HOST EXEC_HOST
JOB_NAME SUB_TIME - 14070 spollen RUN norm lewis compute-20-5 myjob
Sep 18 132
31Using Lewis bjobs
- lewisgt bjobs
- JOBID 14070
- USER spollenw
- STAT RUN
- QUEUE norm
- HOST lewis
- EXEC_HOST compute-20-5
- JOB_NAME myjob
- SUB_TIME Sep 18 132
32Using Lewis bjobs (-w)
- Lewisgt bjobs
- JOBID USER STAT QUEUE HOST EXEC_HOST JOB_NAME
SUB_TIME - 14070 sqx1 RUN norm lewis 4compute-2 myjob
Apr 18 132 - 4compute-22-
- 4compute-20-
- 3compute-22-
- 1compute-20-
Lewisgt bjobs w JOBID USER STAT QUEUE HOST
EXEC_HOST JOB_NAME SUB_TIME 14070 sqx1 RUN
norm lewis 4compute-22-284compute-22- 304com
pute-20-113compute-22-291compute-20-30 myjob
Apr 18 1 32
33monitor job performance on Lewis
(go to a compute node) Lewisgt lsrun P m
compute-22-30 top top - 103546 up 65 days,
1702, 1 user, load average 4.00, 4.00,
4.00 Tasks 149 total, 5 running, 144 sleeping,
0 stopped, 0 zombie Cpu(s) 98.2 us, 1.8
sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0
si Mem 4038084k total, 2421636k used,
1616448k free, 251500k buffers Swap 11261556k
total, 26776k used, 11234780k free, 1069484k
cached PID USER PR NI VIRT RES SHR S
CPU MEM TIME COMMAND 14070 sqx1 25
0 247m 177m 9672 R 99.9 4.5 150901
BlastFull 13602 larry 25 0 318m 241m 11m
R 99.7 6.1 695842 namd9 18608 moe 25
0 247m 177m 9668 R 99.7 4.5 151103
MyProg 13573 shemp 25 0 319m 243m 11m R
99.4 6.2 687115 namd9 3055 root 16 0
153m 47m 1496 S 0.7 1.2 30003.49 precept . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . (for help interpreting
the output run man top)
34Using Lewis lsrun for short test runs and
interactive tasks
- gtlsrun command argument
- gtlsrun -P commandargument
- e.g.,
-
- gtlsrun P vi myFile.txt
35Using Lewis bjobs l 646190
spollenw_at_lewis bjobs -l 646190 Job
lt646190gt, Job Name ltbustgt, User ltspollenwgt,
Project ltdefaultgt, Status ltRUNgt
, Queue ltmultigt, Command ltBSUB -q multiBSUB
-J bustBS UB -o
bust.oJBSUB -e bust.eJBSUB -n 4BSUB -R
"span hosts1" nohup make
-j 4 recursivegt Wed Sep 23 114045 Submitted
from host ltlewisgt, CWD lt/ifs/data/dnalab/Solexa/
090918_HWI-EAS313_42KW0AAXX/Da
ta/C1-42_Firecrest1.4.0_22-0
9-2009_spollenw/Bustard1.4.0_22-09-2009_spollenwgt,
Output File ltbust.oJgt,
Error File ltbust.eJgt, 4 Processors Reque
sted, Requested Resources
ltspanhosts1gt Wed Sep 23 114049 Started on
4 Hosts/Processors lt4compute-20-32gt, Execution
Home lt/home/spollenwgt,
Execution CWD lt/ifs/data/dnalab/Sol
exa/090918_HWI-EAS313_42KW0AAXX/Data/C1-42_
Firecrest1.4.0_
22-09-2009_spollenw/Bustard1.4.0_22-09-2009_spolle
nwgt Wed Sep 23 121941 Resource usage
collected. The CPU time used
is 2637 seconds. MEM 352
Mbytes SWAP 865 Mbytes NTHREAD 13
PGID 3112 PIDs 3112 3113 3116 3117
3123 11957 11958 11961
11962 11965 11966 11969 11970 SCHEDULING
PARAMETERS r15s r1m r15m ut
pg io ls it tmp swp mem
loadSched - - - - - -
- - - - - loadStop -
- - - - - - - -
- - gm_ports loadSched -
loadStop -
36LSF ex 2 threaded/1 node
- lewisgt cat myJob
- BSUB -J thrdjob
- . . . . . . . . .
- BSUB -n 4
- BSUB -R "spanhosts1"
- (-R for resource requirement)
- ./myProg
37LSF ex 3 parallel program -things to know
- Lewis uses the Message Passing Interface (MPI) to
communicate between nodes. - Parallel programs run on
- Infiniband connection (preferred)
- TCP/IP network connection
38LSF ex 3 MPI with Infiniband
- BSUB -a mvapich
- (-a for specific application requirements here,
a program compiled for Infiniband) - BSUB -J jobname
- . . . . . . . . . . . . . . . . . . . .
- Set number of CPUs
- BSUB -n 16
- mpirun.lsf ./mpi_program
39LSF ex 4MPI but pre-compiled for TCP/IP
- BSUB -a mpichp4
- (note prog compiled for TCP/IP)
- BSUB -J jobname
- . . . . . . . . . .
- BSUB -n 16
- mpirun.lsf ./mpi_program
40job arrays for multiple inputs
- bsub -J myArray1-100
- -o J.output.I
- (for command line input)
- ./myProgram file.\LSB_JOBINDEX
- Where input files are numbered
- file.1, file.2, file.100
-
41job arrays for multiple inputs
- bsub -J myArray1-100
- -o J.output.I
- (for standard in)
- -i file.I ./myProgram
- Where input files are numbered
- file.1, file.2, file.100
-
42conditional jobs bsub -w
- bsub -w "done("myArray1-100")"Â
- -J collate ./collateData
- (conditions include done, ended, exit,
external, numdone, numended, numexit, numhold,
numpend, numrun, numstart, post_done, post_err,
started)
43gocomp for interactive sessions with GUI
- spollenw_at_lewis gocomp
- Logging on to interactive compute node
- spollenw_at_compute-20-5
LSF does not schedule to this node. It is free
for interactive jobs.
Some programs used interactively MATLAB MapMan Ge
nesis
44MATLAB
- Batch job (32 licenses available) or interactive
mode - Graphical or non-graphical
45MATLAB Batch, Multi CPU, non-Graphical Use
- lewisgt cat firstscript
- BSUB -J myjob
- BSUB -n 1
- BSUB R "rusagematlab1duration1
- BSUB -oo myjob.oJ
- BSUB -eo myjob.eJ
- matlab -nodisplay -r MyMATLABscript
- lewisgt bsub lt firstscript
N.B. One cpu requested
Multiple cpus requested in script
46MATLAB Multi CPU, non-Graphical Use
- lewisgt cat MyMATLABscript.m
- sched findResource('scheduler',
'configuration', 'lsf') - set(sched, 'configuration', 'lsf')
- set(sched, 'SubmitArguments', '-R
"rusagemdce3"') - job createJob(sched)
- createTask(job, _at_sum, 1, 1 1)
- createTask(job, _at_sum, 1, 2 2)
- createTask(job, _at_sum, 1, 3 3)
- submit(job)
- waitForState(job, 'finished')
- results getAllOutputArguments(job)
47MATLAB Graphical Use 1 cpu
- X-windowing system needed for the MATLAB window
to display. See the UMBC web site for
instructions on downloading and installing the
Cygwin Xserver. - After opening an xwindow type
- lewisgtgocomp
- lewisgtmatlab
48Using Lewis short and normal and multi queues
- short jobs have higher priority than normal jobs
but will quit at 15 minutes. - Script BSUB -q short
- Or lewisgtbsub -q short ltscriptfile
- More cpus for non-mpi jobs with multi
- bsub q multi ltscriptfile
49Using Lewis Intel compiling programs for MPI
- mpicc.i - to compile C programs
- mpiCC.i to compile C programs
- mpif77.i to compile FORTRAN 77 programs
- mpif90.i to compile FORTRAN 90 programs
50Using Lewis processor limits
- Maximum number of
- cores in jobs running 48
- cores in jobs pending 200
Do not submit jobs requiring more than 248 cores
else those in pend state will never progress!
51Using Lewis memory limits
- For large memory requirements (gt900 MB), use the
resource specification string - BSUBÂ -R "rusagememnnnn"Â
- nnnn is in MB, and is per node.
- Maximum available on any node 5,700 MB
-
- If a job spans multiple nodes, each node will
have to have nnnn available.
52Storage through Lewis
5 TB for Home Directories (2.5 GB/user)
15 TB Data Directories (50 GB/user) uid_at_lewis
./data
50 TB EMC CLARiiON CX700 Networked Storage
14 TB paid for by a grant and dedicated to that
project
13 TB for backup and future needs
53Using Lewis Storage Limits
- 2.5 GB in home directory.
- 50 GB - soft quota - under ltuseridgt/data
- 55 GB hard quota no further writing to files.
- The EMC CLARiiON storage is not backed up. A
deleted file cannot be retrieved. - The EMC is a RAID5 design so if one disk fails
the data are still available. - The data are viewable only by the user unless
he/she changes the permissions to their directory.
54Checkpointing on Lewis and Clark
- Checkpointing is not supported on either system.
- However, some programs, e.g., GAUSSIAN, come with
their own checkpointing options which can be used.
55Questions?
- Any questions about the high performance
computing equipment and its use can be sent to
support_at_rnet.missouri.edu
56http//umbc.rnet.missouri.edu