LSF for Users - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

LSF for Users

Description:

LSF for Users – PowerPoint PPT presentation

Number of Views:477
Avg rating:3.0/5.0
Slides: 27
Provided by: tri64
Category:
Tags: lsf | grep | users

less

Transcript and Presenter's Notes

Title: LSF for Users


1
LSF for Users
  • Mike Page
  • mpage_at_ucar.edu
  • SCD Consulting Services Group
  • SCD/HSS/CSG

2
What is LSF?LSF - Load Sharing FacilityBatch
Management Subsystemfor multi-host, multi-vendor
complexesSame role as LoadLeveler or NQE with
capability to manage computing resources across
multiple platforms LSF runs on the Lightning
cluster------------------------------------------
------------------------------------Documentation
/usr/local/docs/LSF/6.0/.pdfHardware
description http//www.scd.ucar.edu/docs/lightnin
g/overview.html At a lightning command line
enter man lsfintro Further reading
http//accl.grc.nasa.gov/lsf/about.html
3
To be able to access LSFThis has been added to
your login processing. /usr/local/lsf/conf/prof
ile.lsf (sh users) or source /usr/local/lsf/conf
/cshrc.lsf (csh users)These commands are
executed before you receive a command
prompt.There is no need for you to add anything
to your login files in order to use LSF.These
commands define the LSF environmentLSF_SERVERDIR
, LSF_BINDIR, LSF_LIBDIR, XLSF_UIDDIR,
LSF_ENVDIR, PATH, MANPATH------------------------
-------------------------------------------Check
env grep -i lsf
4
Essential Commandsfor Users
  • bhosts
  • bqueues
  • bsub
  • bjobs
  • bhist
  • bpeek
  • bmod
  • bbot/btop
  • bswitch
  • bstop/bresume
  • bkill

5
Essential CommandsPurpose
  • bhosts - information about available hosts
    (lshosts)
  • bqueues - information about available queues
  • bsub - submit jobs to batch subsystem
  • bjobs - list jobs in the batch subsystem
  • bhist - displays historical information about
    users jobs
  • bpeek - displays stdout and stderr of users
    unfinished job
  • bmod - modifies job submission options for users
    job

6
Essential CommandsPurpose (contd)
  • bbot/btop - moves a pending job relative to
    users last/first job in a queue
  • bswitch - switches users unfinished jobs from
    one queue to another
  • bstop/bresume - suspends/resumes users
    unfinished jobs
  • bkill - kill, suspend or resume users jobs

7
Essential Commands bhosts
  • bhosts -w-l-R res_reqhost_namehost_group
  • Displays information about hosts/platforms
  • lshosts -w -l -R "res_req" host_name
    cluster_name
  • lshosts -s shared_resource_name ...
  • Displays hosts and their static resource
    information
  • ln0126en bhosts
  • HOST_NAME STATUS JL/U
    MAX NJOBS RUN SSUSP USUSP RSV
  • ln0126en ok - 2
    0 0 0 0 0
  • ln0127en ok - 2
    0 0 0 0 0
  • ln0128en ok - 2
    0 0 0 0 0
  • .
  • .
  • .
  • ln0440en ok - 2
    0 0 0 0 0
  • ln0441en ok - 2
    0 0 0 0 0
  • ln0442en ok - 2
    0 0 0 0 0

8
Essential Commands bqueues
  • bqueues -w-l-r-m host_name-m all
  • -u user_name-u allqueue_name
  • Displays information about queues.
  • By default, returns the following information
    about all queues queue name, queue priority,
    queue status, job slot statistics, and job state
    statistics.
  • ln0126en bqueues
  • QUEUE_NAME PRIO STATUS MAX JL/U
    JL/P JL/H NJOBS PEND RUN SUSP
  • special 500 OpenActive -
    - - - 0 0 0 0
  • premium 300 OpenActive -
    - - - 0 0 0 0
  • regular 200 OpenActive -
    - - - 0 0 0 0
  • economy 160 OpenActive -
    - - - 0 0 0 0
  • hold 104 OpenActive -
    - - - 0 0 0 0
  • standby 100 OpenActive -
    - - - 0 0 0 0
  • share 100 OpenActive -
    - - - 0 0 0 0

9
Essential Commands bsub
  • bsub options command cmd_args
  • Submits a job for batch execution

10
Essential Commands bsub (contd)
  • bsub options command cmd_args

11
Essential Commands bsub (contd)
  • bsub options command cmd_args

12
The Importance of Being lt
LSF usage is different from LL/NQS
bsub a.out bsub -n 2 a.out bsub myscript bsub -q
queuename a.out bsub -i infile -o outfile - e
errfile a.out bsub lt myscript
13
Sample LSF scriptSerial Job
!/bin/ksh LSF batch script to run a serial
code BSUB -P 93300070
Project 93300070 BSUB -n 1
number of tasks BSUB -J
seriallsf.test job
name BSUB -o seriallsf.out
output filename BSUB -e seriallsf.err
input filename BSUB -q regular
queue Fortran
example pgf90 -o samp_f -Mextend
samp.f ./samp_f C example pgcc -o samp_c
samp.c ./samp_c C example pgCC
--no_auto_instantiation -o samp_cc
samp.cc ./samp_cc
bsub lt serial.lsf
14
Sample LSF scriptMPI Job
!/bin/ksh LSF batch script to run the test
MPI code BSUB -P 93300070
Project 93300070 BSUB -a mpich_gm
select the mpich-gm elim BSUB -x
exlusive use
of node (not_shared) BSUB -n 2
number of total tasks BSUB
-R "spanptile1" run 1 tasks per
node BSUB -J mpilsf.test
job name BSUB -o mpilsf.out
output filename BSUB -e mpilsf.err
error filename BSUB -q regular
queue Fortran
example mpif90 -o mpi_samp_f mpisamp.f mpirun.lsf
./mpi_samp_f C example mpicc -o mpi_samp_c
mpisamp.c mpirun.lsf ./mpi_samp_c C
example mpicxx -o mpi_samp_cc mpisamp.cc mpirun.ls
f ./mpi_samp_cc
bsub lt mpi.lsf
15
Sample LSF script OpenMP Job
!/bin/ksh LSF script to run the test OMP
codes BSUB -P 93300070
Proposal group 2 - Project 93300070 BSUB -a
mpich_gm select the mpich-gm elim
BSUB -x
exclusive use of node BSUB -n 2
number of tasks BSUB -R
"spanhosts1" jobs run on one host BSUB
-J omplsf.test job name BSUB -o
omplsf.out ouput filename BSUB -e
omplsf.err input filename BSUB -q
regular queue Fortran
example pgf90 -o samp_f -Mextend -mp
samp.f export OMP_NUM_THREADS1 ./samp_f export
OMP_NUM_THREADS2 ./samp_f
C example pgcc -mp -o samp_c samp.c export
OMP_NUM_THREADS1 ./samp_c export
OMP_NUM_THREADS2 ./samp_c C example pgCC
--no_auto_instantiation -mp -o sampcc
samp.cc export OMP_NUM_THREADS1 ./samp_cc export
OMP_NUM_THREADS2 ./samp_cc
bsub lt omp.lsf
16
Sample LSF scriptMPMD Job
!/bin/ksh LSF batch script to run the test
MPMD codes BSUB -P 93300070
Project 93300070 BSUB -a mpich_gm BSUB -n
2 BSUB -x BSUB -R "spanptile1" BSUB -o
mpmdlsf.out output
filename BSUB -e mpmdlsf.err
error filename BSUB -J mpmdlsf.test
job name BSUB -q regular
queue Build pgfile for mpmd
run rm -f pgfile touch pgfile EXE../bin/itmpmd
j0 for h in echo LSB_HOSTS do echo h"
"j" "EXEj gtgt pgfile jexpr j
1 done cat pgfile
Fortran example mpif90 -Mextend -o EXE'0'
../src/mpmd/itmpmd.f mpif90 -Mextend -o EXE'1'
../src/mpmd/itmpmd.f mpirun -pg pgfile
/bin/pwd C example mpicc -o EXE'0'
../src/mpmd/itmpmd.c mpicc -o EXE'1'
../src/mpmd/itmpmd.c mpirun -pg pgfile
/bin/pwd C example mpicxx --no_auto_instantia
tion -o EXE'0' ../src/mpmd/itmpmd.cc mpicxx
--no_auto_instantiation -o EXE'1'
../src/mpmd/itmpmd.cc mpirun -pg pgfile
/bin/pwd rm EXE'0' EXE'1' pgfile
bsub lt mpmd.lsf
17
Sample LSF script Hybrid Job
!/bin/ksh LSF batch script to run the test
mixed MPI/OMP codes BSUB -a mpich_gm
select mpich_gm elim BSUB -x
exclusive use of
node BSUB -n 2
sum of number of tasks BSUB -R "spanptile1"
number of processes per node BSUB
-o mixlsf.out output
filename BSUB -e mixlsf.err
error filename BSUB -J mixlsf.test
job name BSUB -q regular
queue Build pgfile for mix run rm -f
pgfile touch pgfile EXEPWD/mix echo
LSB_HOSTS j0 for h in echo LSB_HOSTS do
echo h" "j" "EXE gtgt pgfile jexpr j
1 done
Fortran example mpif90 -Mextend -mp -lmp -o mix
mix.f export OMP_NUM_THREADS1 mpirun-env.pl -pg
pgfile EXE export OMP_NUM_THREADS2 mpirun-env.pl
-pg pgfile EXE C example mpicc -mp -o mix
mix.c export OMP_NUM_THREADS1 mpirun-env.pl -pg
pgfile EXE export OMP_NUM_THREADS2 mpirun-env.pl
-pg pgfile EXE C example mpicxx
--no_auto_instantiation -mp -o mix mix.cc export
OMP_NUM_THREADS1 mpirun-env.pl -pg pgfile
EXE export OMP_NUM_THREADS2 mpirun-env.pl -pg
pgfile EXE rm pgfile
bsub lt mix.lsf
18
Essential Commands bjobs
  • bjobs - Displays information about LSF jobs
  • bjobs -u user_name
  • bjobs -u all
  • bjobs -l
  • bjobs -r
  • bjobs -s
  • bjobs -q queue_name

19
Essential Commands bhist
  • bhist - displays historical information about
    jobs
  • bhist -J job_name
  • bhist -C start_time, end_time
  • bhist -D start_time, end_time
  • bhist -S start_time, end_time
  • bhist -T start_time, end_time

20
Essential Commands bpeek
  • bpeek - displays stdout and stderr of users
    selected, unfinished job
  • bpeek -f uses tail -f to display output instead
    of cat
  • bpeek -q queue_name -m host_name -J job_name
  • job_ID "job_IDindex_list"

21
Essential Commands bmod
bmod - modifies job submission options of a
job bmod bsub options job_ID
"job_IDindex" bmod -g job_group_name -gn
job_ID bmod -sla service_class_name -slan
job_ID bmod -h -V
22
Essential Commands bbot, btop
  • bbot - moves a pending job relative to the last
    job in the queue
  • bbot job_ID "job_IDindex_list" position
  • bbot -h -V
  • btop - moves a pending job relative to the first
    job in the queue
  • btop job_ID "job_IDindex_list" position
  • btop -h -V

23
Essential Commands bswitch
bswitch - switches unfinished jobs from one
queue to another bswitch -J job_name -m
host_name -m host_group -q queue_name
-u user_name -u user_group -u all
destination_queue 0 bswitch destination_queue
job_ID "job_IDindex_list" ... bswitch -h
-V
24
Essential Commands bstop/bresume
  • bstop -suspends unfinished jobs
  • bstop -a -d -g job_group_name -sla
    service_class_name
  • -J job_name -m host_name -m host_group
  • -q queue_name -u user_name -u user_group
    -u all 0
  • job_ID "job_IDindex" ...
  • bstop -h -V
  • bresume -resumes one or more suspended jobs
  • bresume -g job_group_name -J job_name -m
    host_name
  • -q queue_name -u user_name -u user_group
    -u all 0
  • bresume job_ID "job_IDindex_list" ...
  • bresume -h -V

25
Essential Commands bkill
bkill - sends signals to kill, suspend, or
resume unfinished jobs bkill -l -g
job_group_name -sla service_class_name -J
job_name -m host_name -m host_group -q
queue_name -r -s (signal_value
signal_name) -u user_name -u user_group
-u all job_ID ... 0 "job_IDindex"
... bkill -h -V
26
Questions?Comments?
Write a Comment
User Comments (0)
About PowerShow.com