Title: Introduction to Scientific Computing on BU
1Introduction to Scientific Computing on BUs
Linux Cluster
Doug Sondak Linux Clusters and Tiled Display
Walls Boston University July 30 August 1, 2002
2Outline
- hardware
- parallelization
- compilers
- batch system
- profilers
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
3Hardware
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
4BUs Cluster
- 52 2-processor nodes
- specifications
- 2 Pentium III processors per node
- 1 GHz
- 1 GB memory per node
- 32 KB L1 cache per CPU
- 256 KB L2 cache per CPU
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
5BUs Cluster (2)
- Myrinet 2000 interconnects
- sustained 1.96 Gb/s
- Linux
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
6Some Timings
- CFD code, MPI, 4 procs.
- Origin2000 495
- SP 329
- Cluster, 2 procs. per box 174
- Cluster, 1 proc. per box 153
- Regatta 78
Machine
Sec.
7Parallelization
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
8Parallelization
- MPI is the recommended method
- PVM may also be used
- some MPI tutorials
- Boston University
- http//scv.bu.edu/Tutorials/MPI/
- NCSA
- http//pacont.ncsa.uiuc.edu8900/public/MPI/
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
9Parallelization (2)
- OpenMP is available for SMP within a node
- mixed MPI/OpenMP not presently available
- were working on it!
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
10Compilers
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
11Compilers
- Portland Group
- pgf77
- pgf90
- pgcc
- pgCC
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
12Compilers (2)
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
13Compilers (3)
- Intel
- Fortran
- ifc
- C/C
- icc
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
14Compilers (2)
Polyhedron F77 Benchmarks http//www.polyhedron.c
om/
PG gnu Intel AC 8.66
12.38 6.13 ADI 8.48 9.27
6.83 AIR 16.41 15.65 13.45 CHESS 11.67
10.06 10.16 DODUC 21.35 36.23 18.18 LP8
4.31 7.88 4.16 MDB 3.62 3.81
2.94 MOLENR 11.66 12.72 7.61 PI 24.58
41.95 7.08 PNPOLY 3.81 5.24
4.86 RO 10.75 10.31 3.92 TFFT 18.84
20.24 20.18
15Compilers (3)
- Portland Group
- pgf77 generally faster than g77
- Intel
- ifc generally faster than pgf77
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
16Compilers (4)
- Linux C/C compilers
- gcc/g seems to be the standard, usually
described as a good compiler
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
17Portland Group
- -O2
- highest level of optimization
- -fast
- same as -O2 -Munroll -Mnoframe
- -Minline
- function inlining
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
18Portland Group (2)
- -Mbyteswapio
- swaps between big endian and little endian
- useful for using files created on our SP,
Regatta, or Origin2000 - -Ktrapfp
- trap floating point invalid operation, divide by
zero, or overflow - slows code down, only use for debugging
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
19Portland Group (3)
- -Mbounds
- array bounds checking
- slows code down, only use for debugging
- -mp
- process OpenMP directives
- -Mconcur
- automatic SMP parallelization
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
20Intel
- Need to set some environment variables
- contained in
- /usr/local/IT/intel6.0/compiler60/ia32/bin/iccva
rs.csh - source this file, copy it into your .cshrc file,
or source it in .cshrc - theres an identical file called ifcvars.csh to
avoid (create?) confusion
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
21Intel (2)
- -O3
- highest level of optimization
- -ipo
- interprocedural optimization
- -unroll
- loop unrolling
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
22Intel (3)
- -openmp -fpp
- process OpenMP directives
- -parallel
- automatic SMP parallelization
- -CB
- array bounds checking
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
23Intel (3)
- -CU
- check for use of uninitialized variables
- Endian conversion by way of environment variables
- setenv F_UFMTENDIAN big
- all reads will be converted from big to little
endian, all writes from little to big endian
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
24Intel (4)
- Can specify units for endian conversion
- setenv F_UFMTENDIAN big10,20
- Can mix endian conversions
- setenv F_UFMTENDIAN littlebig10,20
- all units are little endian except for 10 and 20,
which wil be converted
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
25Batch System
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
26Batch System
- PBS
- different than LSF on O2ks, SPs, Regattas
- theres only one queue
- dque
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
27qsub
- job submission done through script
- script details will follow
- qsub scriptname
- returns job ID
- in working directory
- std. out - scriptname.ojobid
- std. err - scriptname.ejobid
sondak_at_hn003 run qsub corrun 808.hn003.nerf.bu.
edu
28qstat
- Check status of all your jobs
- qstat
- lies about run time
- often (always?) zero
sondak_at_hn003 run qstat Job id Name
User Time Use S
Queue ---------------- ----------------
---------------- ------------ -
-------- 808.hn003 corrun sondak
0 R dque
29qstat (2)
- S - job status
- Q - queued
- R - running
- E - exiting (finishing up)
- qstat -f gives detailed status
- exec_host nodem019/0nodem018/0
- nodem017/0nodem016/0
- to specify jobid
- qstat jobid
30Other PBS Commands
- kill job
- qdel jobid
- some less-important PBS commands
- qalter, qhold, qrls, qmsg, qrerun
- man pages are available for all commands
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
31PBS Script
!/bin/bash Set the default queue PBS -q
dque ppn is cpu's per node PBS -l
nodes1ppn1,walltime003000 cd
PBS_O_WORKDIR myrun
32PBS/MPI
- For MPI, set up gmi file in PBS script
-
test -d /.gmpi mkdir /.gmpi GMCONF/.gmpi/co
nf.PBS_JOBID /usr/local/xcat/bin/pbsnodefile2gmco
nf PBS_NODEFILE gt GMCONF cd PBS_O_WORKDIR NP(
head -1 GMCONF)
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
33PBS/MPI (2)
- To run MPI, end PBS script with (all on one line)
-
mpirun.ch_gm --gm-f GMCONF --gm-recv
polling --gm-use-shmem --gm-kill 5 -np
NP PBS_JOBIDPBS_JOBID myprog
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
34PBS/MPI (3)
- mpirun.ch_gm
- version of mpirun that uses myrinet
- --gm-f GMCONF
- access configuration file constructed above
- --gm-recv polling
- poll continually to check for completion of sends
and receives - most efficient for dedicated procs.
- Thats us!
35PBS/MPI (4)
- --gm-use-shmem
- enable shared-memory support
- may improve or degrade performance
- try your code with and without it
- --gm-kill 5
- if one MPI process aborts, kill others after 5
sec. -
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
36PBS/MPI (5)
- -np NP
- run on NP procs as computed earlier in script
- equals nodes x ppn from PBS -l option
- PBS_JOBIDPBS_JOBID
- seems redundant redundant
- do it anyway
- myprog
- run the darn code already!
37Profiling
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
38Portland Group
- Portland Group Compiler flag
- function level
- -Mproffunc
- line level
- -Mproflines
- much larger file
- creates pgprof.out file in working directory
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
39PG (2)
- At unix prompt, type pgprof command
- will pop up window with bar chart of timing
results - can take file name argument in case youve
renamed the pgprof.out file - pgprof pgprof.lines
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
40PG (3)
- option to specify source directory
- pgprof -I sourcedir pgprof.lines
- can specify multiple directories with multiple -I
flags - also can use GUI menu
- Options Source Directory...
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
41PG (4)
42PG (5)
- Calls - number of times routine was called
- Time - time spent in specified routine
- Cost - time spent in specified routine plus time
spent in called routines
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
43PG (6)
- Lines profiling
- with optimization, may not be able to identify
many (most?) lines in source code - reports results for blocks of code, e.g., loops
- without optimization, doesnt measure what you
really want - initial screen looks like func screen
- double-click function/subroutine name to get
line-level listing
44PG (7)
45Questions/Comments
- Feel free to contact us directly with questions
about the cluster or parallelization/optimization
issues - Doug Sondak sondak_at_bu.edu
- Kadin Tseng kadin_at_bu.edu
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002