SGE and Modules: Getting Started in CRCHPCC - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

SGE and Modules: Getting Started in CRCHPCC

Description:

Introduction to ND-HPCC ... ND-HPCC has implemented and enhanced the modules to manage the user environment. ... In general, ND users are not required to ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 37
Provided by: crc5
Category:

less

Transcript and Presenter's Notes

Title: SGE and Modules: Getting Started in CRCHPCC


1
SGE and Modules Getting Started in CRC/HPCC
  • In-Saeng Suh and Rich Sudlow
  • OIT, Univ. of Notre Dame

2
Contents
  • Overview http//crc.nd.edu
  • Modules
  • Examples in Modules
  • ND-HPCC Batch System
  • SGE in ND
  • SGE Commands
  • Examples in SGE Client Commands

3
Overview
  • Introduction to ND-HPCC
  • Provide faculty, staff, and graduate students at
    ND high-end facilities and applications for their
    research.
  • Facilities
  • http//crc.nd.edu/resources/facilities.shtml
  • Software
  • http//crc.nd.edu/resources/software.shtml

4
Using Modules
  • What is a module ?
  • User interface to the Modules package.
  • The module package provides for the dynamic
    modification of the users environment
    via modulefiles
  • They provide lots of flexibility. (But more
    complexity too). HPCC staff however try to
    minimize this complexity for users by loading
    frequently used modules by default, e.g.,
    SGE.

5
  • What is a module ? (Continue)
  • Modulefile contains the information needed to
    configure the shell for a specific
    application.
  • Modules typically add to the PATH, MANPATH,
    LD_LIBRARY_PATH variables.
  • Modules allow a user access to multiple versions
    of software, e.g., matlab/7.0_SP2 (default),
    6.1, 6.5, 7.0, 7.0_SP3.
  • Usually, the default is set with the latest
    version by system administrators.

6
Modules at ND-HPCC
  • ND-HPCC has implemented and enhanced the modules
    to manage the user environment.
  • Modules have been setup used for a number of
    years on the SGI architecture and have been added
    approximately September, 2001 for the Sun HPCC
    environment and then Linux system.
  • In general, ND users are not required to
    explicitly specify a users environment variables
    for applications installed at ND-AFS.

7
Module Initialization
  • The module package and the module command are
    initialized when a shell-specific initialization
    script is sourced into the shell.
  • This is already done in /usr/local/Startup/Cshrc
  • Most module files are created by the system
    administrators although user may create their
    own.
  • Many modules are loaded by default when a
    user logs in (or uses the batch system).

8
Module Initialization
  • Modules can be written to prevent conflicts,
    e.g., loading two different versions of an
    application. May be written to force a user
    to specifically request (and understand)
    their changes.
  • For example, two version of matlab cannot be
    loaded simultaneously .

9
Module Subcommands
module ____
  • avail
  • List all available modulefiles in the current
    MODULEPATH
  • list
  • List all modules currently loaded.
  • load add
  • Load modulefile into the shell environment
  • unload rm
  • Remove modulefile from the shell environment

10
Module Subcommands (II)
  • swap switch modulefile1 modulefile2
  • Switch loaded modulefile1 to modulefile2
  • show display modulefile
  • Display information about a modulefile, e.g.,
    the full path and environment changes.
  • help
  • Print the usage of each sub-command.

gt man module
  • The which command seeks your PATH for a
  • match for the name specified after which

11
Modulefiles
  • Modulefiles are written in the tcl (Tool Command
    Language) and are interpreted by modulecmd.
  • Environment variables are unset when unloading a
    modulefile.

12
Modulefile Sample
13
  • The HPCC Batch System
  • Serial queue - x86 64 dual CPU xeon (each 2
    smp),
  • x86_64 144 dual-core opteron
    (dcopt)
  • Parallel queue xeon (64), opteron(16),
    dcopt(288)
  • Serial/Parallel SMP queue Sun Solaris 5 V880
  • Batch queuing system
  • We use SGE (Sun Grid Engine) 5.3 which is
    released as SGEEE (Sun Grid Engine
    Enterprise Edition).

14
(No Transcript)
15
SGE in ND-AFS
  • All SGE commands and environment setup are
    contained in the sge/5.3 module which is loaded
    by default.
  • It allows for transparent use of the AFS file
    system used extensively at Notre Dame.
  • The token lifetime used for all batch jobs
  • 720 hours or 30 days

16
SGE Commands
  • SGEs command line user interface
  • Manage queues, submit, and delete jobs,
    check job status and queues and jobs.
  • Prerequisite
  • Skipping commands that are not appropriate for
    batch work (configuring the prompt, setting the
    delete character, etc)
  • Near the top in your .login and/or .cshrc files,
    if ( ?ENVIRONMENT ! 0 ) exit 0
  • This stops executions of your interactive
    commands.

17
SGE client commands
  • qsub
  • The user interface for submitting a job to SGE.
  • qstat
  • A status listing of all jobs and queues
    associated with the cluster.
  • qdel
  • To delete SGE jobs, regardless whether they are
    running or spooled.
  • qalter
  • Alters the attributes of already submitted but
    still pending jobs.

18
Examples Job submission
  • To run the job on the command line, type
  • gtqsub -l archsolaris64 -M afs_id_at_nd.edu -m ae -r
    y a.out
  • To submit a batch script job file, e.g.,
    sample.job

!/bin/csh -l archsolaris64 -M
afs_id_at_nd.edu -m ae -r y a.out
(Executable compiled at Solaris)
gt qsub sample.gob
19
qsub Options to SGE(I)
The following options can be given to qsub on the
command line, or preceded with in batch
scripts.
  • -M your_afs_id_at_nd.edu (Optional )
  • Specify an address where SGE should send email
    about your job.
  • -m abe (Optional )
  • Tell SGE to send email to the specified address
    if the job aborts, begins, or ends.
  • -r y or n (Optional )
  • Tell SGE if your job is rerunnable. Most jobs
    are rerunnable but application jobs such as
    Gaussian are not. So you should specify -r n

20
qsub Options to SGE(II)
  • -j y or n (Optional )
  • Specify whether or not the standard error stream
    of the job is merged with the standard output
    stream.
  • Default is to merge the standard error and the
    standard output stream.
  • -pe of processor (Required for parallel jobs
    !!)
  • Specify of processor your job will need for
    parallel jobs. The default is one CPU. The
    Maximum of CPU can be requested is
    specified in the parallel queue.

!! Jobs requesting a large number of CPUs might
spend a long time waiting in queue
21
qsub Options to SGE(III)
  • -l Request a resource
  • cputhhmmss (Optional )
  • Requests resources of this much CPU time to




    run your job. cput is the sum of all the
    time used by all threads in parallel jobs.
  • Not currently used - but may be beneficial in the
    future
  • limit is token lifetime.

!! Note that better to run several shorter jobs
than one long job.
22
  • -l Request a resource (continue)
  • archglinux or solaris64
  • (Optional -depends
    on architecture)
  • Requests resources of this architecture type to
    run your job on.
  • glinux specifies the Linux architecture (32/64
    bit), while solaris64 specifies the Sun Solaris
    64 bit architecture.
  • Specifying an architecture is unnecessary if you
    are running an application which is available on
    both architectures, e.g., Gaussian.
    (May run quicker if no architecture is
    specified)

23
  • archglinux or solaris64 (continue)
  • Compilation and run depend upon what kind of
    architecture is used.

!! Note that by default the batch queuing system
will run the job on the fastest system
which meets the requirements that you
specify.
24
Examples Job submission
  • To run the job on the command line, type
  • gtqsub -l archsolaris64 -M afs_id_at_nd.edu -m ae -r
    y a.out
  • To submit a batch script job file, e.g.,
    sample.job

!/bin/csh -l archsolaris64 -M
afs_id_at_nd.edu -m ae -r y a.out
(Executable compiled at Solaris)
gt qsub sample.gob
25
Examples Job submission
  • Application job gaussian, matlab, ...

E.g.) sample_qsub.job
!/bin/csh -l cput360000 -M
your_afs_id_at_nd.edu -m ae -r n g03 lt
testDFT.com
gtqsub sample_qsub.job
26
Parallel Job submission
  • To submit a batch script job file, e.g.,
    sample_parallel.job

!/bin/csh -l cput360000 -l
archglinux -pe smp-xeon 4 -M
afs_id_at_nd.edu -m ae module load
mpich/. mpirun -np 4 ./a.out
(a.out is compiled with MPI or OpenMP
library on x84 xeon architecture)
gt qsub sample_parallel.job
27
Parallel Job Submission
E. g.,) gaussian.job
!/bin/csh -l cput360000 -pe smp-xeon
4 -M your_afs_id_at_nd.edu -m ae -r n g03l
lt testDFT.com
gt qsub gaussian.job
28
SGE client commands
  • qsub
  • The user interface for submitting a job to SGE.
  • qstat
  • The status of all jobs and queues associated with
    the cluster.
  • qdel
  • To cancel SGE jobs, regardless whether they
    are running or spooled.
  • qalter
  • Changes the attributes of already submitted but
    still pending jobs.

29
qstat Options on qstat (I)
Jobs can be monitored using the qstat command.
  • qstat - without arguments will print the
    status of all jobs.
  • The job ID number
  • Priority of job
  • Name of job
  • ID of user who submitted job
  • Submit or start time and date of the job
  • If running, the queue in which the job is running
  • The function of the running job (MASTER or SLAVE)
  • The job array task ID

30
qstat Options on qstat (II)
  • State of the job States can be
  • t(ransferring)
  • r(unning)
  • s(uspended)
  • S(uspended)
  • T(hreshold)
  • R(estarted)
  • qstat -j Job-ID
  • Prints either for all pending jobs or the jobs
    contained in job list, the reason for not
    being scheduled

31
qstat Options on qstat (III)
  • qstat -f Job-ID
  • Provides a full listing of the job which has the
    listed Job- ID (or all jobs if no Job-ID is
    given).
  • The printed information for each queue,
  • The queue name
  • The queue type Types or combinations of types
    can be
  • B(atch)
  • I(nteractive)
  • C(heckpointing)
  • P(arallel)
  • T(ransfer)
  • The number of used and available job slots
  • The load average on the queue host
  • The architecture of the queue host

32
qstat Options on qstat (IV)
  • qstat -f Job ID (Continue)
  • The state of the queue - Queue states or
    combinations of states can be
  • u(nknown)
  • a(larm)
  • A(larm)
  • C(alendar suspended)
  • s(uspended)
  • S(ubordinate)
  • d(isable)
  • D(isable)
  • E(rror)

33
SGE client commands
  • qdel
  • To delete SGE jobs, regardless whether they are
    running or pending jobs.
  • Usage qdel Job_ID
  • deletes the job that matches the Job ID
  • qalter
  • Usage qalter Job_ID

34
SGE client commands
  • qhost
  • displays status information about SGE execution
    hosts.
  • qmon
  • An X-windows Motif command interface and
    monitoring facility.

35
qmon GUI
36
Thank you !!
Write a Comment
User Comments (0)
About PowerShow.com