Running on the SDSC Blue Gene - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Running on the SDSC Blue Gene

Description:

Software comes in drivers: We are currently running Driver V1R3M1 ... Running jobs: Partition Layout and Usage Guidelines ... Running Jobs: Reservation ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 26
Provided by: WaynePf3
Category:

less

Transcript and Presenter's Notes

Title: Running on the SDSC Blue Gene


1
Running on the SDSC Blue Gene
  • Mahidhar Tatineni
  • Blue Gene Workshop
  • SDSC, April 5, 2007

2
BG System OverviewSDSCs three-rack system
3
BG System Overview Integrated system
4
BG System OverviewMultiple operating systems
functions
  • Compute nodes run Compute Node Kernel (CNK
    blrts)
  • Each run only one job at a time
  • Each use very little memory for CNK
  • I/O nodes run Embedded Linux
  • Run CIOD to manage compute nodes
  • Perform file I/O
  • Run GPFS
  • Front-end nodes run SuSE Linux
  • Support user logins
  • Run cross compilers linker
  • Run parts of mpirun to submit jobs LoadLeveler
    to manage jobs
  • Service node runs SuSE Linux
  • Uses DB2 to manage four system databases
  • Runs control system software, including MMCS
  • Runs other parts of mpirun LoadLeveler
  • Software comes in drivers We are currently
    running Driver V1R3M1

5
SDSC Blue Gene Getting Started Logging on
moving files
  • Logging on
  • ssh bglogin.sdsc.edu
  • or
  • ssh -l username bglogin.sdsc.edu
  • Alternate login node bg-login4.sdsc.edu
  • (We will use bg-login4 for the workshop)
  • Moving files
  • scp file username_at_bglogin.sdsc.edu
  • or
  • scp -r directory username_at_bglogin.sdsc.edu

6
SDSC Blue Gene Getting started Places to store
your files
  • /users (home directory)
  • 1.1 TB NFS mounted file system
  • Recommended for storing source / important files.
  • Do not write data/output to this area Slow and
    limited in size!
  • Regular backups
  • /bggpfs available for parallel I/O via GPFS
  • 18.5 TB accessed via IA-64 NSD servers
  • No backups
  • 700 TB /gpfs-wan available for parallel I/O and
    shared with DataStar and TG IA-64 cluster.

7
SDSC Blue Gene Checking your allocation
  • Use the reslist command to check your allocation
    on the SDSC Blue Gene
  • Sample output is as follows
  • bg-login1 mahidhar/bg_workshopgt reslist -u
    ux452208
  • Querying database, this may take several seconds
    ...
  • Output shown is local machine usage. For full
    usage on roaming accounts, please use tgusage.
  • SBG Blue Gene at SDSC
  • SU Hours SU
    Hours
  • Name UID ACID ACC PCTG
    ALLOCATED USED USER
  • ux452208 452208 1606 U 100 99999
    0 Guest8, Hpc
  • MKG000 1606 99999
    40

8
Accessing HPSS from the Blue Gene
  • What is HPSS
  • The centralized, long-term data storage
  • system at SDSC is the
  • High Performance Storage System (HPSS)
  • Setup your authentication
  • run get_hpss_keytab script
  • Use hsi, and htar clients to connect to HPSS. For
    example
  • hsi put mytar.tar
  • htar -c -f mytar.tar -L file_or_directory

9
Using the compilersImportant programming
considerations
  • Front-end nodes have different processors run
    different OS than compute nodes
  • Hence codes must be cross compiled
  • Care must be taken with configure scripts
  • Discovery of system characteristics during
    compilation (e.g., via configure) may require
    modifications to the configure script.
  • Make sure that if code has to be executed during
    the configure, it runs on the compute nodes.
  • Alternately, system characteristics can be
    specified by user and the configure modified to
    take this into account.
  • Some system calls are not supported by the
    compute node kernel

10
Using the compilersCompiler versions, paths,
wrappers
  • Compilers (version numbers the same as on
    DataStar)
  • XL Fortran V10.1 blrts_xlf blrts_xlf90
  • XL C/C V8.0 blrts_xlc blrts_xlC
  • Paths to compilers in default .bashrc
  • export PATH/opt/ibmcmp/xlf/bg/10.1/binPATH
  • export PATH/opt/ibmcmp/vac/bg/8.0/binPATH
  • export PATH/opt/ibmcmp/vacpp/bg/8.0/binPATH
  • Compilers with MPI wrappers (recommended)
  • mpxlf, mpxlf90, mpcc, mpCC
  • Path to MPI-wrapped compilers in default .bashrc
  • export PATH/usr/local/apps/binPATH

11
Using the compilers Options
  • Compiler options
  • -qarch440 uses only single FPU per processor
    (minimum option)
  • -qarch440d allows both FPUs per processor
    (alternate option)
  • -qtune440 tunes for the 440 processor
  • -O3 gives minimal optimization with no
    SIMDization
  • -O3 qarch440d adds backend SIMDization
  • -O3 qhot adds TPO (a high-level inter-procedural
    optimizer) SIMDization, more loop optimization
  • -O4 adds compile-time interprocedural analysis
  • -O5 adds link-time interprocedural analysis
  • (TPO SIMDization default with O4 and O5)
  • Current recommendation
  • Start with -O3 qarch440d qtune440
  • Try O4, -O5 next

12
Using libraries
  • ESSL
  • Version 4.2 is available in /usr/local/apps/lib
  • MASS/MASSV
  • Version 4.3 is available in /usr/local/apps/lib
  • FFTW
  • Versions 2.1.5 and 3.1.2 available in both single
    double precision. The libraries are located in
    /usr/local/apps/V1R3
  • NETCDF
  • Versions 3.6.0p1 and 3.6.1 are available in
    /usr/local/apps/V1R3
  • Example link paths
  • -Wl,--allow-multiple-definition
    -L/usr/local/apps/lib -lmassv -lmass -lesslbg
    -L/usr/local/apps/V1R3/fftw-3.1.2s/lib -lfftw3f

13
Running jobs Overview
  • There are two compute modes
  • Coprocessor (CO) mode one compute processor per
    node
  • Virtual node (VN) mode two compute processors
    per node
  • Jobs run in partitions or blocks
  • These are typically powers of two
  • Blocks must be allocated (or booted) before run
    are restricted to a single user at a time
  • Only batch jobs are supported
  • Batch jobs are managed by LoadLeveler
  • Users can monitor jobs using llq b llq -x

14
Running jobs LoadLeveler for batch jobs
  • Here is an example LoadLeveler run script
    (test.cmd)
  • !/usr/bin/ksh
  • _at_ environment COPY_ALL
  • _at_ job_type BlueGene
  • _at_ account_no ltyour user accountgt
  • _at_ class parallel
  • _at_ bg_partition ltpartition name for example
    topgt
  • _at_ output file.(jobid).out
  • _at_ error file.(jobid).err
  • _at_ notification complete
  • _at_ notify_user ltyour email addressgt
  • _at_ wall_clock_limit 001000
  • _at_ queue
  • mpirun -mode VN -np ltnumber of procsgt -exe ltyour
    executablegt -cwd ltworking directorygt
  • Submit as follows
  • llsubmit test.cmd

15
Running jobs mpirun options
  • Key mpirun options are
  • -mode compute mode CO or VN
  • -np number of compute processors
  • -mapfile logical mapping of processors
  • -cwd full path of current working directory
  • -exe full path of executable
  • -args arguments of executable (in double quotes)
  • -env environmental variables (in double quotes)
  • (These are mostly different than for TeraGrid)

16
Running jobs Partition Layout and Usage
Guidelines
  • To make effective use of the Blue Gene,
    production runs should generally use one-fourth
    or more of the machine, i.e., 256 or more compute
    nodes. Thus predefined partitions are provided
    for production runs.
  • SDSC All 3076 nodes
  • R01R02 2048 nodes combing rack 1 2
  • rack, R01, R02all 1,024 nodes of each rack 0,
    rack 1, and rack 2
  • top, bot R01-top, R01-bot R02-top, R02-bot 512
    nodes
  • top2561 top2562 256 nodes in each half of the
    top midplane of rack 0
  • bot2561 bot2562 256 nodes in each half of the
    bottom midplane of rack 0
  • Smaller 64 (bot64-1, , bot64-8) and 128
    (bot128-1 , , bot128-4) node partitions are
    available for test runs.
  • Use the /usr/local/apps/utils/showq command to
    get more information on partition requests of
    jobs in the queue.

17
Running jobs Partition Layout
18
Running Jobs Reservation
  • There is a reservation in place for todays
    workshop for all the guest users.
  • The reservation ID is bgsn.76.r
  • Set the LL_RES_ID variable to bgsn.76.r. This
    will automatically bind jobs to the reservation.
  • csh/tcsh setenv LL_RES_ID bgsn.76.r
  • bash export LL_RES_IDbgsn.76.r

19
Running Jobs Example 1
  • The examples featured in todays talk are
    included in the following directory
  • /bggpfs/projects/bg_workshop
  • Copy them to your directory by using the
    following command
  • cp -r /bggpfs/projects/bg_workshop
    /users/ltyour_dirgt
  • In the first example we will compile a simple mpi
    program (mpi_hello_c.c/mpi_hello_f.f), use the
    sample Loadleveler script (example1.cmd) to
    submit and run the job.

20
Example 1 (contd.)
  • Compile the example files using the mpcc/mpxlf
    wrappers
  • mpcc -o hello mpi_hello_c.c
  • mpxlf o hello mpi_hello_f.f
  • Modify the loadleveler submit file
    (example1.cmd). Add the account number, partition
    name, email address, and mpirun options
  • Use llsubmit to put the job in the queue
  • llsubmit example1.cmd

21
Running Jobs Example 2
  • In example 2 we will use a IO benchmark (IOR) to
    illustrate the use of arguments with mpirun
  • The mpirun line is as follows
  • mpirun -np 64 -mode CO -cwd /bggpfs/projects/bg_wo
    rkshop exe /bggpfs/projects/bg_workshop/IOR
    -args "-a MPIIO -b 32m -t 4m -i 3
  • The mode, -exe, and args options are used in
    this example. The args option is used to pass
    options to the IOR executable.

22
Checkpoint-Restart on the Blue Gene
  • Checkpoint and restart are among the primary
    techniques for fault recovery on the Blue Gene.
  • The current version of the checkpoint library
    requires users to manually insert calls in their
    code to checkpoint their code at the proper place
    in their codes.
  • The process can be initialized by calling the
    BGLCheckpointInit() function.
  • Checkpoint files can be written by making a call
    to BGLCheckpoint(). This can be done any number
    of times and the checkpoint files are
    distinguished by a sequence number.
  • The environment variables BGL_CHKPT_RESTART_SEQNO
    and BGL_CHKPT_DIR_PATH control the restart number
    and location.

23
Example for Checkpoint-Restart
  • Let us look at the entire checkpoint restart
    process using the example provided in the
    /bggpfs/projects/bg_workshop directory.
  • We are using a simple Poisson solver to
    illustrate the checkpoint process (file
    poisson-chkpt.f)
  • Compile the program using mpxlf and including the
    checkpoint library
  • mpxlf o pchk poisson-chkpt.f /bgl/BlueLight/ppcfl
    oor/bglsys/lib/libchkpt.rts.a
  • Use the chkpt.cmd file to submit the job
  • The program writes checkpoint files after every
    1000 steps. The checkpoint files are tagged with
    the node ids and the sequence number. For
    example
  • ckpt.x06-y01-z00.1.2

24
Example for Checkpoint-Restart (Contd.)
  • Verify that the checkpoint restart works
  • From the first run (when the checkpoint files
    were written)
  • Done Step 3997 Error 1.83992678887004613
  • Done Step 3998 Error 1.83991115295111185
  • Done Step 3999 Error 1.83989551716504351
  • Done Step 4000 Error 1.83987988151185511
  • Done Step 4001 Error 1.83986424599153198
  • Done Step 4002 Error 1.83984861060408078
  • Done Step 4003 Error 1.83983297534951951
  • From the second run (continued from step 4000,
    sequence 4)
  • Done Step 4000 Error 1.83987988151185511
  • Done Step 4001 Error 1.83986424599153198
  • Done Step 4002 Error 1.83984861060408078
  • We get identical results from both runs

25
BG System Overview References
  • Blue Gene Web site at SDSC
  • http//www.sdsc.edu/us/resources/bluegene
  • Loadleveler guide
  • http//publib.boulder.ibm.com/infocenter/clresctr
    /vxrx/index.jsp?topic/com.ibm.cluster.loadl.doc/l
    oadl331/am2ug30305.html
  • Blue Gene Application development guide (from IBM
    redbooks)
  • http//www.redbooks.ibm.com/abstracts/sg247179.ht
    ml
Write a Comment
User Comments (0)
About PowerShow.com