Title: Using Lewis and Clark
1Using Lewis and Clark
William G. Spollen Division of Information
Technology/Research Support Computing Thursday,
Nov. 1, 2007
http//umbc.rnet.missouri.edu/ spollenw_at_missouri.e
du
2Outline
- Background
- Overview of High Performance Computing at the
UMBC - Clark usage
- Portable Batch Submission and qsub
- Lewis usage
- Load Sharing Facility and bsub
3Why this workshop?
- To encourage you to take advantage of resources
available through the UMBC. - Running your jobs in parallel will save you time.
- We may have tools to broaden your investigative
reach. - To show how best to use the resources.
4(No Transcript)
5Please note that each user must have their own
account. Do not use your advisors, friends,
etc.
6Some Definitions
- A process is a program in execution.
- Serial processing uses only 1 cpu.
- Parallel processing (multiprocessing) uses two or
more cpus simultaneously. - A cluster is a collection of interconnected
computers but each node has its own OS. - Threads allow a process to run on more than one
cpu but only if they are all on the same computer
or node.
7Parallel Architectures
- Distinguished by the kind of interconnection,
both between processors, and between processors
and memory - Shared memory
- Network
(A)
Job
(B)
8A High Performance Computing Infrastructure
A 2 M Federal Earmark was made to the UM
Bioinformatics Consortium to obtain computers
with architectures to match the research problems
in the UM system.
9High Performance Computing Infrastructure Concept
(1) Clark Modeling and Simulations SGI Altix
3700 BX2 128 GB shared memory 64 cpus
(3) York Macromolecule Database
Searches TimeLogic DeCypher - Hardware/software f
or streamlined searches
FC
FC
(4) 12 TB SGI TP9500 Infinite Storage Disk Array
FC
(2) Lewis General Purpose Computing Dell
Linux Cluster with 128 nodes, 4 cpus per node
Fusion IBRIX
(5) 50 TB EMC CLARiiON CX700 Networked Storage
IB
IBRIX
IB
IBRIX
IB
IBRIX
IB
IBRIX
10(1) SGI Altix 3700 BX2
- 64 1.5 GHz Itanium2 processors
- 128 GB NumaLink Symmetric Multi-Processor (SMP)
Shared Memory - One OS image with 64 P
- Each processor has 28 ns access to all 128 GB RAM
clark.rnet.missouri.edu
11(2) Dell 130-Node Dual-Core HPC Cluster
- Woodcrest Head node 2 Dell Dual Core 2950 2.66
GHz cpus - Dell Xeon 2.8 GHz cluster admin node
- 128 Dell PowerEdge 1850 Xeon EM64T 2.8 GHz
compute nodes (512P) - 640 GB RAM (64 nodes _at_ 6GB, 64 nodes _at_ 4 GB)
- TopSpin Infiniband 2-Tier interconnect switch
- Access to 50 TB disk storage
lewis.rnet.missouri.edu
12(3) Sun/TimeLogic DeCypher
- 4 Sun V240 servers (UltraSparc IIIi, 1.5 GHz, 4P,
4GB) - 8 TimeLogic G4 DeCypher FPGA Engines
- TimeLogic DeCypher Annotation Suite (BLAST, HMM,
Smith-Waterman, etc.) - 50-1,000 times faster than clusters for some
BLASTs
york.rnet.missouri.edu
13(4) SGI TP9500 Infinite Storage Disk Array
- SGI TP9500 Disk Array w/ dual 2 Gbit controllers,
2 GB cache - 12 TB Fiber Channel disk array (6 Drawers 14 146
GB disks/drawer 2.044 TB/drawer) - 2 fiber connections each to the Altix, Dell, and
Sun systems.
14(5) EMC CLARiiON CX700 Disk Storage
- 125 500 GB SATA drives
- IB SAN support to Lewis
- IBRIX software is used to manage
- the I/O to the disk storage
- to all Lewis nodes
15Selected Software Installed
- SAS
- R
- Matlab
- Oracle
- MySQL
- PGenesis
- M-Cells
- Phred, Phrap, Consed
- Locally developed code
- More
- Gaussian03
- NAMD
- AMBER
- CHARMM
- Octopus
- NCBI Blast
- WU Blast
- MSA
- ClustalW
16Compilers
- Linux (lewis, clark)
- Intel (icc, ifort) this usually is preferred
better optimized for the architecture than gnu. - Gnu (gcc, g, g77)
- javac
17Some Research Areas
- Chemical structure prediction and property
analysis with GAUSSIAN - Ab initio quantum-mechanical molecular dynamics
with VASP - Simulation of large biomolecular systems with
NAMD - Molecular simulations with CHARMM/AMBER
- Statistics of microarray experiments with R
18Clark 128 GB SMP Shared Memory 1 linux OS w/ 64
processors
Use Portable Batch System (PBS)!!!
CPU1
CPU2
CPU63
CPU64
19Using Clark PBS (Portable Batch System)
- clark qsub scriptfile
- clark cat scriptfile
- PBS l cput100000,ncpus8,mem2gb
- (note -l for resource list)
- ./myProgram
20Using Clark output
- scriptfile.onnnn  (written to the standard output
stream) - scriptfile.ennnn  (written to the error output
stream)
21Using Clark PBS example 1
- clark qsub runSAS
- 6190.clark
- clark cat runSAS
- PBS -l cput10000,ncpus1,mem1gb
- cd workingdir/
- sas test
- runSAS.o6190
- runSAS.e6190
- test.log
- test.lst
22Using Clark PBS example 2
As part of a script qsub -V -k n -j oe -o
PBS_path\ -r n z\ -l ncpusPBS_ncpus \ -l
cputPBS_cput myprog To learn more about
qsub clark man qsub
23Using Clark qstat and queues
- clark qstat
- Jobid Name User Time Use S Queue
- ----- ----- ---- -------- - -----
- 6422 qcid2 fjon 603620 R long
- 6432 redo1 fjon 0 Q long
- 6434 wrky4 fjon 100334 R standard
- 6487 job1 cdar 050610 R standard
- 6488 job23 cdar 013412 R standard
- 6489 jobh2 cdar 0 Q standard
- Long queue is for 100h.
- 1 or 2 jobs can run simultaneously.
- Only one can be long.
- Submit as many as you like.
24Using Clark qstat -f and qdel
- clark qstat -f 6502
- Job Id 6502.clark
- Job_Name Blast.rice.
- Job_Owner mid_at_clark.rnet.missouri.edu
- resources_used.cpupercent 0
- resources_used.cput 000001
- resources_used.mem 52048kb
- resources_used.ncpus 8
- job_state R
- ctime Thu Apr 19 095515 2007
- .......................................
- clark qdel 6502 (to kill a job)
25Using Clark user limits
- Maximum
- number cpus 16
- jobs running 2
- jobs pending no limit
- data storage no limit (yet)
26Lewis 129 linux OSs 1 OS coordinates the
restInfiniband connects all nodes
Head Node
50 TB EMC CLARiiON CX700 Networked Storage
Use the Load Sharing Facility!!!
FC
IBRIX
Node 127
Node 1
Node 2
Node 128
128 Compute Nodes
27LSF ex 1 1 processor program
- lewis bsub
- lewis cat myJob
- BSUB -J 1Pjob
- BSUB -oo 1Pjob.oJ
- BSUB -eo 1Pjob.eJ
- ./myProg
28LSF ex 2 threaded/1 node
- lewis cat myJob
- BSUB -J thrdjob
- . . . . . . . . .
- BSUB -n 4
- BSUB -R "spanhosts1"
- (-R for resource requirement)
- ./myProg
29LSF ex 3 parallel program -things to know
- Lewis uses the Message Passing Interface (MPI) to
communicate between nodes. - Parallel programs run on
- Infiniband connection
- TCP/IP network connection
30LSF ex 3 MPI with Infiniband
- BSUB -a mvapich
- (-a for specific application requirements here,
a program compiled for Infiniband) - BSUB -J jobname
- . . . . . . . . . . . . . . . . . . . .
- Set number of CPUs
- BSUB -n 16
- mpirun.lsf ./mpi_program
31LSF ex 4MPI but pre-compiled for TCP/IP
- BSUB -a mpichp4
- (note prog compiled for TCP/IP)
- BSUB -J jobname
- . . . . . . . . . .
- BSUB -n 16
- mpirun.lsf ./mpi_program
32Using Lewis Intel compiling programs for MPI
- mpicc.i - to compile C programs
- mpiCC.i to compile C programs
- mpif77.i to compile FORTRAN 77 programs
- mpif90.i to compile FORTRAN 90 programs
33Using Lewis bjobs (-w)
- Lewis bjobs
- JOBID USER STAT QUEUE HOST EXEC_HOST JOB_NAME
SUB_TIME - 14070 sqx1 RUN norm lewis 4compute-2 myjob
Apr 18 132 - 4compute-22-28
- 4compute-20-11
- 3compute-22-29
- 1compute-20-30
Lewis bjobs w JOBID USER STAT QUEUE HOST
EXEC_HOST JOB_NAME SUB_TIME 14070 sqx1 RUN
norm lewis 4compute-22-284compute-22- 304com
pute-20-113compute-22-291compute-20-30 myjob
Apr 18 1 32
34monitor job performance on Lewis
(go to a compute node) Lewis lsrun P m
compute-22-30 top top - 103546 up 65 days,
1702, 1 user, load average 4.00, 4.00,
4.00 Tasks 149 total, 5 running, 144 sleeping,
0 stopped, 0 zombie Cpu(s) 98.2 us, 1.8
sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0
si Mem 4038084k total, 2421636k used,
1616448k free, 251500k buffers Swap 11261556k
total, 26776k used, 11234780k free, 1069484k
cached PID USER PR NI VIRT RES SHR S
CPU MEM TIME COMMAND 14070 sqx1 25
0 247m 177m 9672 R 99.9 4.5 150901
BlastFull 13602 larry 25 0 318m 241m 11m
R 99.7 6.1 695842 namd9 18608 moe 25
0 247m 177m 9668 R 99.7 4.5 151103
MyProg 13573 shemp 25 0 319m 243m 11m R
99.4 6.2 687115 namd9 3055 root 16 0
153m 47m 1496 S 0.7 1.2 30003.49 precept . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . (for help interpreting
the output run man top)
35Using Lewis short and normal queues
- short jobs have higher priority than normal jobs
but will quit at 15 minutes. - In a script
- BSUB -q short
- Or, on submission
- lewisbsub -q short
36Using Lewis lsrun for short test runs and
interactive tasks
- lsrun command argument
- lsrun -P commandargument
- e.g.,
-
- lsrun P vi myFile.txt
37job arrays for multiple inputs
- bsub -J myArray1-100
- -o J.output.I
- (for command line input)
- ./myProgram file.\LSB_JOBINDEX
- Where input files are numbered
- file.1, file.2, file.100
-
38job arrays for multiple inputs
- bsub -J myArray1-100
- -o J.output.I
- (for standard in)
- -i file.I ./myProgram
- Where input files are numbered
- file.1, file.2, file.100
-
39conditional jobs bsub -w
- bsub -w "done("myArray1-100")"Â
- -J collate ./collateData
- (conditions include done, ended, exit,
external, numdone, numended, numexit, numhold,
numpend, numrun, numstart, post_done, post_err,
started)
40gocomp for interactive sessions with GUI
- gocomp
- Logging on to interactive compute node
- Last login Wed Apr 18 135446 2007 from
- Platform OCS Compute Node
- Platform OCS 4.1.1-1.0 (Cobblestone)
- Profile built 1741 12-Apr-2007
- Kickstarted 1758 12-Apr-2007
- spollenw_at_compute-20-5
- LSF does not schedule to this node. It is free
for interactive jobs. - Some interactive programs
- MATLAB
- MapMan
- Genesis
41MATLAB Graphical Use 1 cpu
- X-windowing system needed for the MATLAB window
to display. See the UMBC web site for
instructions on downloading and installing the
Cygwin Xserver. - After opening an xwindow type
- lewisgocomp
- lewismatlab
42MATLAB Multi CPU, non-Graphical Use
- lewis cat secondscript
- BSUB -J myjob
- BSUB -n 1
- BSUB R "rusagematlab1duration1
- BSUB -oo myjob.oJ
- BSUB -eo myjob.eJ
- matlab -nodisplay -r firstscript
- lewis bsub
43MATLAB Multi CPU, non-Graphical Use
- lewis cat firstscript.m
- sched findResource('scheduler',
'configuration', 'lsf') - set(sched, 'configuration', 'lsf')
- job createJob(sched)
- createTask(job, _at_sum, 1, 1 1)
- createTask(job, _at_sum, 1, 2 2)
- createTask(job, _at_sum, 1, 3 3)
- submit(job)
- waitForState(job, 'finished')
- results getAllOutputArguments(job)
44Using Lewis processor limits
- Maximum number of
- cpus 64
- jobs running 64
- jobs pending 200
Do not submit more than 264!
45Using Lewis memory limits
- For large memory requirements (900 MB), use the
resource specification string - BSUBÂ -R "rusagememnnnn"Â
- nnnn is in MB, and is per node.
- Maximum available on any node 5,700 MB
-
- If a job spans multiple nodes, each node will
have to have nnnn available.
46Storage through Lewis
5 TB for Home Directories (2.5 GB/user)
15 TB Data Directories (50 GB/user) uid_at_lewis
./data
50 TB EMC CLARiiON CX700 Networked Storage
14 TB paid for by a grant and dedicated to that
project
13 TB for backup and future needs
47Using Lewis Storage Limits
- 2.5 GB in home directory.
- 50 GB - soft quota - under /data
- 55 GB hard quota no further writing to files.
- The EMC CLARiiON storage is not backed up. A
deleted file cannot be retrieved. - The EMC is a RAID5 design so if one disk fails
the data are still available. - The data are viewable only by the user unless
he/she changes the permissions to their directory.
48Checkpointing on Lewis and Clark
- Checkpointing is not supported on either system.
- However, some programs, e.g., GAUSSIAN, come with
their own checkpointing options which can be used.
49Distributed and Parallel Computing with
MATLAB Thursday, November 8, 2007 Registration
Sign-in 100 p.m. Presentation 130 p.m. to 330
p.m. W1005 Lafferre Hall www.mathworks.com/semina
rs/columbianov8 For more information
contact Alyssa Winer alyssa.winer_at_mathworks.com 5
08-647-4343
50http//umbc.rnet.missouri.edu
Please note that each user must have their own
account. Do not use your advisors, friends,
etc.