Interactive and Batch Jobs HP XC 6000 Cluster - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Interactive and Batch Jobs HP XC 6000 Cluster

Description:

... as described in the original HP Documentation is not accessible for the users on ... Linux Utility for Resource Management) is part of the CHAOS project, developed ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 23
Provided by: horstg
Category:

less

Transcript and Presenter's Notes

Title: Interactive and Batch Jobs HP XC 6000 Cluster


1
Interactive and Batch JobsHP XC 6000 Cluster
  • Horst Gernert
  • University of Karlsruhe
  • Computing Center
  • Gernert_at_rz.uni-karlsruhe.de

2
Software Environment for running (MPI) programms
  • We build our own environment Job Management
    System (JMS). The XC batch user interface as
    described in the original HP Documentation is not
    accessible for the users on our system.
  • Based on HP MPI ( mpirun ) and Slurm.
  • Slurm ( Simply Linux Utility for Resource
    Management) is part of the CHAOS project,
    developed at the Lawrence Livermore National
    Laboratory (LLNL).

3
Architecture of the system
nodes for login, pre-/postproc.
4
Model of Operation
  • Run program interactive (Login Nodes).
  • Run program in batch mode (Batch Nodes).
  • Development and test (development class)
  • Partition d ( 2 nodes )
  • Resources are shared ( nodes, CPUs, memory )
  • Production (production class)
  • Partition p ( 99 nodes ) resources are used
    exclusively
  • Partition f ( 4/8 nodes ) nodes are shared,
  • CPU and Memory used exclusively.

5
Architecture of the system (Jan. 2005)
thin nodes for MPI applications
. . .
Quadrics QSNet II
nodes for login, pre-/postproc.
login node compute node service node
6
Software Environment for running (MPI) programms
  • User Commands
  • job_run (alias mpirun alias mpirun.mpich)
  • Run an mpi program.
  • job_submit
  • Create a batch job.
  • job_queue
  • Show all batch jobs of the user.
  • job_cancel
  • Delete a job.
  • job_info
  • Show limits and other JMS informations.

7
Generell behavior of job_... commands
  • Option h print a short syntax help.
  • Option H prints a detailed description of the
    command (like man).
  • Messages go to STDOUT.
  • Errors go to STDERR.
  • If the command failed, the return code is gt 0.

8
job_run
  • job_run -? -help -H
  • job_run -version -v -d -p -ck -j -T
    -1sided -mpich
  • -i ltspecgt -np lttasksgt -n lttasksgt
    -distributionblockcyclic
  • -stdioltoptionsgt -f ltappfilegt
    program
  • Mpirun and job_run are equivalent.
  • Mpirun.mpich is equivalent to job_run mpich

9
job_submit
  • job_submit -? -h -H
  • job_submit -t time -m mem -c class -p number -T
    time -M mem
  • -i file -o file -e
    file job
  • parameter
  • -h help
  • -? help
  • -H more help

10
job_submit (continue)
  • -t time maximum CPU time per thread / CPU
    (minutes)
  • -T time maximum elapse time of the job
    (minutes)
  • -m mem maximum real memory requirement per
    process / task
  • -p i/j number of processes (tasks) i
    and threads per task j
  • -c job class classpriority
  • classd
    development class
  • classp
    production class

11
job_submit (continue)
  • -i file standard input (default
    /dev/null)
  • -o file standard output (default
    Job_JID.out)
  • -e file standard error (default
    Job_JID.err)
  • If file then stderr stdout
    ( 2gt1 )
  • job name of the executable (program /
    shell script / ...)
  • if you want to add parameters to
    the job
  • you can write them after the job
    name.

12
job_submit (continue)
  • Environment Parameter can also be set by
    "export JMS_ltparametergtvalue"
  • Examples JMS_ttime
    JMS_ofile JMS_jobjob
  • Parameters set in the
    command line overwrite parameters set
  • by the environment.
    job_submit itself set the environment
  • and exports it to the
    user job.
  • Examples job_submit -p 4 -c d -t 20 -m 100
    test_job -x 1000 5 16 0
  • job_submit -p1 -cp
    -t2000 -T3000 -m2000 big_serial_job
  • job_submit -p 64 -c p
    -t 2000 -m 1000 big_parallel_job 4 10
  • job_submit -p8/2 -cp
    -t200 -m6000 -M9000 2_threads_per_task

13
job_info
  • job_info
  • job_info -? -h -H
  • -h help
  • -? help
  • -H more help
  • job_info gives you information about the limits
    of the job classes.

14
job_info (output)
  • c q i j m
    (Mbyte) t (minutes) T (minutes)
  • max. max. max. max.
    max. default max.

  • d 1 8 4 1000
    60 2t 240
  • p 6 64 2 12000
    4320 1,1t 5400
  • 128 1 6000

15
job_queue
  • job_queue -? -h -H
  • -h help
  • -? help
  • -H more help
  • List all jobs of the user in JMS job queue.

16
job_queue ( output )
  • job-id c P n/i/j t T
    m queued s start node(s)
  • --------------------------------------------------
    --------------------------------------------------
    ----------------
  • 3652 d d 1/1/1 30 120 200
    04/1321 r 4/1321 112
  • 3649 p p 23/46/1 1200 1320 5000 04/0334
    r 4/0907 14-17,20-23, 30,77-90
  • 3650 p p 13/25/1 2000 2200 3000 04/0814
    r 4/1201 54-66
  • 3651 p p 4/8/1 2400 2640 3000
    04/0827 r 4/1201 31-34
  • 3642 p p 16/16/1 10000 4730 4096 03/1908
    w
  • 3654 p p 4/8/1 4300 4730 4000
    04/0909 w
  • 3563 p p 16/8/2 240 250 6000
    03/2301 Lw 21

17
job_cancel
  • job_cancel -? -h -H
  • job_cancel jid jid ...
  • -h help
  • -? help
  • -H more help
  • jid Job identifier (first column of
    job_queue listing).
  • You can only address your own jobs
  • job_cancel deletes one ore more jobs from the
    job queue.
  • If the job is running, the job is killed.

18
Batch jobs
  • job_submit myjob.bash
  • !/bin/bash
  • .
  • job_run myprogram
  • .
  • Environment and current directory are taken over
    to the user job.
  • Additional environment variables are set by JMS.

19
Batch jobs (continue)
  • If the jobscribt contains only one command line,
    it can be omited
  • job_submit -p1 . myprogram
  • job_submit -p? . job_run myprogram
    ? gt 1
  • TMP is set by JMS and points to a local
    filesystem for each node,
  • NOT for each task !!!

20
Batch jobs (chaining)
  • job_submit . myjob.bash
  • !/bin/bash
  • job_run . myprogram
  • if .
  • then
  • job_submit
  • fi
  • exit

21
todo list
  • Integration of gtfatlt nodes.
  • Give job a name.
  • Send Email from JMS when job starts, ends,
  • History of jobs.
  • Information about system state, maintenance,
    workload,
  • What else ???

22
End
  • Vielen Dank !
  • Tel
    07231/608-6422

  • Gernert_at_rz.uni-karlsruhe.de
Write a Comment
User Comments (0)
About PowerShow.com