Title: STG WW Blue Gene
1STG WW Blue Gene HPC Benchmark
CentersTutorial Introduction to the Blue
Gene Facility in Rochester, Minnesota
Carlos P SosaChemistry and Life Sciences
GroupAdvanced Systems Software
DevelopmentRochester, MN
2Rochester Blue Gene Center Team
- Cindy Mestad, Certified PMP, STG WW Blue Gene
HPC p Benchmark Centers - Steve M Westerbeck, System Administrator, STG WW
Blue Gene HPC p Benchmark Centers
3Chemistry Life Sciences Applications Team
- Carlos P Sosa, Chemistry and Life Sciences
Applications, Advanced Systems Software
Development
4Preface
- This tutorial provides a brief introduction to
the environment for the Blue Gene IBM Facilities
in Rochester, Minnesota - Customers should be mindful of their own security
issues - The following points should be considered
- Sharing of userids is not an accepted practice in
order to maintain proper authentication controls - Additional encryption of data and source code on
the filesystem is encouraged - Housekeeping procedures on your assigned frontend
node and filesystem is recommended - Report any security breaches or concerns to the
Rochester Blue Gene System Administration - Changing permissions on user generated files for
resource sharing is the responsibility of the
individual user - Filesystem cleanup at the end of the engagement
is the responsibility of the customer
51. Blue Gene Hardware Overview
6Blue Gene System Modularity
Service Front End (Login) Nodes
1GbE Service Network
How is BG/P Configured?
SLES10 DB2 XLF XLC/C GPFS ESSL TWS LL
File Servers
Storage Subsystem
10GbE Functional Network
7Hierarchy
Compute nodes dedicated to running user
applications, and almost nothing else simple
compute node kernel (CNK)
I/O nodes run Linux and provide a more complete
range of OS services files, sockets, process
launch, debugging, and termination
Service node performs system management services
(e.g., heart beating, monitoring errors)
largely transparent to application/system software
Looking inside Blue Gene
8Blue Gene Environment
9IBM System Blue Gene/P
System-on-Chip (SoC) Quad PowerPC 450 w/ Double
FPU Memory Controller w/ ECC L2/L3 Cache DMA
PMU Torus Network Collective Network Global
Barrier Network 10GbE Control Network JTAG Monitor
10BG/P Applications Specific Integrated Circuit
(ASIC) Diagram
11Blue Gene/P Job Modes Allow Flexible Use of Node
Memory
Whats new?
Virtual Node Mode Previously called Virtual Node Mode All four cores run one MPI process each No threading Memory / MPI process ¼ node memory MPI programming model Dual Node Mode Two cores run one MPI process each Each process may spawn one thread on core not used by other process Memory / MPI process ½ node memory Hybrid MPI/OpenMP programming model SMP Node Mode One core runs one MPI process Process may spawn threads on each of the other cores Memory / MPI process full node memory Hybrid MPI/OpenMP programming model
12Blue Gene Integrated Networks
- Torus
- Interconnect to all compute nodes
- Torus network is used
- Point-to-point communication
- Collective
- Interconnects compute and I/O nodes
- One-to-all broadcast functionality
- Reduction operations functionality
- Barrier
- Compute and I/O nodes
- Low latency barrier across system (lt 1usec for 72
rack) - Used to synchronize timebases
- 10Gb Functional Ethernet
- I/O nodes only
- 1Gb Private Control Ethernet
- Provides JTAG, i2c, etc, access to hardware.
Accessible only from Service Node system - Boot, monitoring, and diagnostics
- Clock network
- Single clock source for all racks
13HPC Software Tools for Blue Gene
- Other Software Support
- Parallel File Systems
- Lustre at LLNL, PVFS2 at ANL
- Job Schedulers
- SLURM at LLNL, Cobalt at ANL
- Altair PBS Pro, Platform LSF (for BG/L only)
- Condor HTC (porting for BG/P)
- Parallel Debugger
- Etnus TotalView (for BG/L as of now, porting for
BG/P) - Allinea DDT and OPT (porting for BG/P)
- Libraries
- FFT Library - Tuned functions by TU-Vienna
- VNI (porting for BG/P)
- Performance Tools
- HPC Toolkit MP_Profiler, Xprofiler, HPM,
PeekPerf, PAPI - Tau, Paraver, Kojak
- IBM Software Stack
- XL (FORTRAN, C, and C) compilers
- Externals preserved
- Optimized for specific BG functions
- OpenMP support
- LoadLeveler scheduler
- Same externals for job submission and system
query functions - Backfill scheduling to achieve maximum system
utilization - GPFS parallel file system
- Provides high performance file access, as in
current pSeries and xSeries clusters - Runs on I/O nodes and disk servers
- ESSL/MASSV libraries
- Optimization library and intrinsics for better
application performance - Serial Static Library supporting 32-bit
applications - Callable from FORTRAN, C, and C
- MPI library
- Message passing interface library, based on
MPICH2, tuned for the Blue Gene architecture
14High-Throughput Computing (HTC) modes on Blue
Gene/P
- BG/P with HTC looks like a cluster for serial and
parallel apps - Hybrid environment standard HPC (MPI) apps plus
now HTC apps - Enables a new class of workloads that use many
single-node jobs - Easy administration using web-based Navigator
HTC
152. IBM Rochester Center Overview
16Rochester Blue Gene Infrastructure
17Shared GPFS Filesystem
18Understanding Performance on Blue Gene/P
- Theoretical floating-point performance
- 1 fpmadd per cycle
- Total of 4 floating-point operations per cycle
- 4 floating-point operations/cycle x 850 cycle/s x
106 3,400 x 106 3.4 GFlop/s per core - Peak performance 13.6 GFlop/s per node ( 4
cores )
19Two Generations BG/L and BG/P
203. How to Access Your Frontend Node
21How to Login to the Frontend
bcssh.rochester.ibm.com
22Gateway
gateway
ssh to your assigned front-end
23Your front-end
gateway
24Transferring Files
- Transferring Files into the Rochester IBM Blue
Gene Center
WinSCP
25Transferring to the Front-end
- Use scp
- bcssh/codhome/myaccount scp conf_gen.cpp
frontend-1 - conf_gen.cpp
100 46KB 45.8KB/s 0000
26Current Disk Space Limts
- bcssh gateway
- /codhome/userid directories on bcssh are limited
to 300GB (shared, no quota) - Used for transferring files in and out of the
environment - Frontend node
- /home directories have 10GB for all users, no
quotas - The /gpfs file system is 400GB in size, there are
no quotas as the file space is shared between all
users on that frontend node
274. Compilers for Blue Gene
28IBM Compilers
- Compilers for Blue Gene are located in the
front-end (/opt/ibmcmp) - Fortran
- /opt/ibmcmp/xlf/bg/11.1/bin/bgxlf
- /opt/ibmcmp/xlf/bg/11.1/bin/bgxlf90
- /opt/ibmcmp/xlf/bg/11.1/bin/bgxlf95
- C
- /opt/ibmcmp/vac/bg/9.0/bin/bgxlc
- C
- /opt/ibmcmp/vacpp/bg/9.0/bin/bgxlC
29GNU Compilers
- The Standard GNU compilers and libraries which
are also located on the frontend node will NOT
produce Blue Gene compatible binary code. The
standard GNU compilers can only be used for
utility or frontend code development that your
application may require. - GNU compilers (Fortran, C, C) for Blue Gene are
located in (/opt/blrts-gnu/ ) - Fortran
- /opt/gnu/powerpc-bgp-linux-gfortran
- C
- /opt/gnu/powerpc-bgp-linux-gcc
- C
- /opt/gnu/powerpc-bgp-linux-g
- It is recommended not to use GNU compiler for
Blue Gene as the IBM XL compilers offer
significantly higher performance. The GNU
compilers do offer more flexible support for
things like inline assembler.
305. MPI on Blue Gene
31MPI Library Location
- MPI implementation on Blue Gene is based on
MPICH-2 from Argonne National Laboratory - Include files mpi.h and mpif.h are at the
location - -I/bgsys/drivers/ppcfloor/comm/include
326 7. Compilation and Execution on Blue Gene
33Copying Executables and Input
- Step 1 Copy Input files and executables to a
shared directory - Place data and executables in a directory under
/gpfs - Example
- cd /gpfs/fs2/frontend-1
- mkdir myaccount
- cp myaccount/sander /gpfs/fs2/frontend-1/myaccou
nt - cp myaccount/input.tar /gpfs/fs2/frontend-1/myac
count
34Compiling on Blue Gene C
- /gpfs/fs2/frontend-11/myaccount/hello0gtmake -f
make.hello - mpixlc_r -O3 -qarch450 -qtune450 hello.c -o
hello
gtcat make.hello XL_CC mpixlc_r OBJ
hello SRC hello.c FLAGS
-O3 -qarch450 -qtune450 LIBS (OBJ)
(SRC) XL_CC (FLAGS) (SRC) -o
(OBJ) (LIBS) clean rm .o hello
35Hello World C
- gtcat hello.c
- include ltstdio.hgt / Headers /
- include "mpi.h"
- main(int argc, char argv) / Function main /
-
- int rank, size, tag, rc, i
- MPI_Status status
- char message20
- rc MPI_Init(argc, argv)
- rc MPI_Comm_size(MPI_COMM_WORLD, size)
- rc MPI_Comm_rank(MPI_COMM_WORLD, rank)
- tag 100
- if(rank 0)
- strcpy(message, "Hello, world")
- for (i1 iltsize i)
- rc MPI_Send(message, 13, MPI_CHAR, i,
tag, MPI_COMM_WORLD)
36Compiling on Blue Gene C
- gtcat make.hello
- XL_CC mpixlcxx_r
- OBJ hello
- SRC hello.cc
- FLAGS -O3 -qarch450 -qtune450
- LIBS
- (OBJ) (SRC)
- XL_CC (FLAGS) (SRC) -o (OBJ)
(LIBS) - clean
- rm .o hello
37Hello World C
- cat hello.cc
- // Include the MPI version 2 C bindings
- include ltmpi.hgt
- include ltiostreamgt
- include ltstring.hgt
- using namespace std
- int
- main(int argc, char argv)
-
- MPIInit(argc, argv)
-
- int rank MPICOMM_WORLD.Get_rank()
- int size MPICOMM_WORLD.Get_size()
- char nameMPI_MAX_PROCESSOR_NAME
https//spaces.umbc.edu/pages/viewpage.action?page
Id5245461C2B2BHelloWorldProgram-parallel
38Running Programs (applications) on Blue Gene
- Job running is managed via Loadleveler
- LoadLeveler is a job scheduler written by IBM, to
control scheduling of batch jobs - mpirun is invoked via loadleveler
39Script to Emulate Syntax of mpirun
40llrun
pts/0gt0gtllrun
41mpirun
- Step 2 job submission using mpirun
- User can use mpirun to submit jobs.
- The Blue Gene mpirun is located in
/usr/bin/mpirun - Typical use of mpirun
- mpirun -np lt of processesgt partition ltblock idgt
-cwd pwd -exe ltexecutablegt - Where
- -np Number of processors to be used. Must fit
in available partition - -partition A partition from Blue Gene rack on
which a given executable will execute, eg., R000. - -cwd The current working directory and is
generally used to specify where any input and
output files are located. - -exe The actual binary program which user wish
to execute. - Example
- mpirun np 32 partition R000 -cwd
/gpfs/fs2/frontend-11/myaccount -exe
/gpfs/fs2/frontend-11/myaccount/hello
42mpirun Selected Options
- Selected options
- -args List of arguments to the executables in
double quotes - -env List of environment variables in double
quotes. VARIABLEvalue - -mode SMP or VN or DUAL
- For more details perform following operation on
command prompt - mpirun -h
43mpirun Selected Example in an sh Script
!/bin/sh --------- User options start here
-------------------- MPIRUN"mpirun" MPIOPT"-np
32" PARTITION"-partition R000_J203_128" WDIR"-cw
d /FS1/myaccount/amber/IIsc/b4amber_mod/data1_32"
SANDER"-exe /FS1/myaccount/amber/exe/sander_bob_n
oBTREE" time_ends1600 till many pico seconds
after 150ps ---------- User options end here
--------------------- . . . MPIRUN MPIOPT
PARTITION -args "-O -i trna.md.in -o
trna.FRST_LAST.out -p trna.prm.top -c
trna.PRIV_FRST.res -r trna.FRST_LAST.r
es -x trna.FRST_LAST.crd -v
trna.FRST_LAST.vel -e trna.FRST_LAST.e
n -inf trna.FRST_LAST.info" WDIR
SANDER
44Invoking llrun
- pts/0/gpfs/fs2/frontend-11/myaccount/test0gtllrun
-np 32 -cwd /gpfs/fs2/frontend-11/myaccount/test
-exe /gpfs/fs2/frontend-11/myaccount/test/hello - Output
- Submitted job frontend-11.rchland.ibm.com.1675
- Command file llrun.myaccount.090704.1040.cmd
- Output stdout myaccount.frontend-11.(jobid).out
- stderr myaccount.frontend-11.(jobid).err
- path /gpfs/fs2/frontend-11/myaccount/tes
t/ - Files created
- myaccount_at_frontend-11
- pts/0/gpfs/fs2/frontend-11/myaccount/test1gtls
- myaccount.frontend-11.1675.err
myaccount.frontend-11.1675.out
llrun.myaccount.090704.1040.cmd
45llrun cmd file
- _at_ job_type bluegene
- _at_ requirements (Machine "(host)")
- _at_ class medium
- _at_ job_name myaccount.frontend-11
- _at_ comment "llrun generated jobfile"
- _at_ error myaccount.frontend-11.(jobid).err
- _at_ output myaccount.frontend-11.(jobid).out
- _at_ environment COPY_ALL
- _at_ wall_clock_limit 003000
- _at_ notification always
- _at_ notify_user
- _at_ bg_connection prefer_torus
- _at_ bg_size 32
- _at_ initialdir/gpfs/fs2/frontend-11/myaccount/tes
t - _at_ queue
- /bgsys/drivers/ppcfloor/bin/mpirun -np 32 -cwd
/gpfs/fs2/frontend-11/myaccount/test -exe
/gpfs/fs2/frontend-11/myaccount/test/hello
46ll Command Script
- pts/0/gpfs/fs2/frontend-11/myaccount/namd_test0gt
cat llrun_namd.cmd - _at_ job_type bluegene
- _at_ requirements (Machine "(host)")
- _at_ class medium
- _at_ job_name myaccount.frontend-11
- _at_ comment "LoadLeveler llrun script"
- _at_ error (job_name).(jobid).err
- _at_ output (job_name).(jobid).out
- _at_ environment COPY_ALL
- _at_ wall_clock_limit 006000
- _at_ notification never
- _at_ notify_user
- _at_ bg_connection prefer_torus
- _at_ bg_size 256
- _at_ initialdir/gpfs/fs2/frontend-11/myaccount/nam
d_test - _at_ queue
- /bgsys/drivers/ppcfloor/bin/mpirun -np 256
-verbose 1 -mode SMP -env "BG_MAPPINGTXYZ" -cwd
/gpfs/fs2/frontend-11/myaccount/namd_test -exe
./namd2 -args "apoa1.namd"
LL section
mpirun section specific to the application
47mpirun Standalone Versus mpirun in LL Environment
- Comparison between mpirun and Loadleveler
llsubmit command command
job_type and requirements tags must ALWAYS be
specified as listed above If the above command
file listing were contained in a file named
my_job.cmd, then the job would then be submitted
to the LoadLeveler queue using llsubmit
my_job.cmd.
48Blue Gene Monitoring Jobs bgstatus
- Monitor Status of job executing on Blue Gene
- bgstatus
49Blue Gene Monitoring Jobs lljobq
50Avoid Firewall inactivity timeout issues
- Before
- screen ltentergt
- After
- screen -r ltentergt
- More information
- http//www.kuro5hin.org/story/2004/3/9/16838/14935
51Appendix Blue Gene Specific LL Keywords - 1
52Appendix Blue Gene Specific LL Keywords - 2
53Appendix Blue Gene Specific LL Keywords - 3
54Appendix Understanding Job Status - 1
55Appendix Understanding Job Status - 2
56Appendix Understanding Job Status - 3
57Appendix Hardware Naming Convention 1
http//www.redbooks.ibm.com/redbooks/SG247417/wwhe
lp/wwhimpl/js/html/wwhelp.htm
58Appendix Hardware Naming Convention 2
http//www.redbooks.ibm.com/redbooks/SG247417/wwhe
lp/wwhimpl/js/html/wwhelp.htm
59Appendix Understanding Job Status - 4
60Help?
- Where to submit questions related to the
Rochester IBm Center? - bgcod_at_us.ibm.com
61References Blue Gene/L
- Blue Gene/L System Administration, SG24-7178-03
Redbooks, published 27 October 2006, last updated
30 October 2006 - Blue Gene/L Safety Considerations, REDP-3983-01
Redpapers, published 29 June 2006 - Blue Gene/L Hardware Overview and Planning,
SG24-6796-02 Redbooks, published 11 August 2006 - Blue Gene/L Application Development,
SG24-7179-03 Redbooks, published 27 October 2006,
last updated 18 January 2007 - Unfolding the IBM eServer Blue Gene Solution,
SG24-6686-00 Redbooks, published 20 September
2005, last updated 1 February 2006 - GPFS Multicluster with the IBM System Blue Gene
Solution and eHPS Clusters, REDP-4168-00
Redpapers, published 24 October 2006 - Blue Gene/L Performance Analysis Tools,
SG24-7278-00 Redbooks, published 18 July 2006 - IBM System Blue Gene Solution Problem
Determination Guide, SG24-7211-00 Redbooks,
published 11 October 2006 - http//www.redbooks.ibm.com/