Batch Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Batch Systems

Description:

Title: Slide 1 Author: pnilsson Last modified by: pnilsson Created Date: 10/16/2005 6:17:46 PM Document presentation format: On-screen Show Company – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 15
Provided by: pnil
Category:
Tags: batch | cycle | life | mouse | systems

less

Transcript and Presenter's Notes

Title: Batch Systems


1
Batch Systems P. Nilsson, PROOF Meeting, October
18, 2005
2
Overview
  • Suns Grid Engine Open Source
  • What is the Grid Engine project?
  • Daemon components
  • Execution daemon
  • Priorities
  • Job life cycle
  • Platform LSF Commercial system
  • What is Platform LSF? CERN Batch Service
  • Product features
  • Architecture
  • Load sharing
  • Job life cycle
  • Other Batch Systems
  • OpenPBS, Condor, BQS,
  • Maui Scheduler Open Source

3
Suns Grid Engine
What is the Grid Engine project?
  • An open source community effort to facilitate
    the adoption of distributed computing solutions,
    sponsored by Sun
  • The project provides distributed resource
    management software for wide ranging requirements
    from compute farms to grid computing

The Grid Engine has been ported to many operating
systems, including Sun Solaris, Linux, SGI IRIX,
Compaq/HP Tru64, IBM AIX, HP HP/UX, Apple Mac
OS/X and others. The project welcomes those who
are interested in implementing new ports or in
taking over the maintenance of an existing port
Good documentation
More information at http//gridengine.sunsource.ne
t/
4
Suns Grid Engine
Grid Engine Components
  • Qmaster queue master controls the overall
    behavior in a cluster, responsible for answering
    requests from clients and for delivering
    dispatched jobs to the assigned Execd's
  • Schedd scheduling daemon gets notified about
    all scheduling relevant information. Resulting
    scheduling decisions are sent as orders to
    Qmaster
  • Execd execution daemon provides Qmaster with
    information about utilization and availability of
    resources. A job sent to Execd is started by
    writing all relevant information into files
    describing the job and forking a Shepherd.
    After Shepherds termination, Execd reports back
    to Qmaster
  • Shepherd starts all kinds of jobs according to
    what he finds in the per-job configuration files
    written by Execd
  • Commd communication daemon handles network
    communication in a cluster
  • Shadowd shadow daemon detects failures of the
    Qmaster and starts a new Qmaster if necessary

5
Suns Grid Engine
Execd The Execution Daemon
The Execution Daemon is the instance that
  • Starts jobs
  • Controls jobs (e.g. it can suspend/unsuspend a
    job, reprioritize the processes associated with a
    job, etc)
  • Gathers information about jobs (e.g. resource
    usage, exit code, etc)
  • Gathers information about the execution host it
    controls (e.g. load, free memory, etc)

There is one execd on each host of a cluster
6
Suns Grid Engine
On Priorities
The Grid Engine has the feature of a share-based
scheduler, where each job gets a certain share of
the system resources
  • The sum of all shares for a job is expressed in
    tickets. A job has a certain number of tickets
    enabling it to run with certain process
    priorities
  • If multiple jobs are running concurrently on a
    host, their different share of system resources
    (their different number of tickets) can be mapped
    to priorities in the OS
  • Setting priorities in the OS is done by either
    setting the nice value for all processes of a job
    or by using special priority mapping facilities
    provided by the OS
  • The Grid Engine reassigns the number of tickets
    per job in a regular interval. It then maps the
    number of tickets of a job to nice values (or
    another operating system priority representation)
    and renices all processes of the job

Much more information about the scheduler is in
the documentation
7
Suns Grid Engine
Job Life Cycle
  1. Execds report load information to Qmaster
  2. User submits job using qsub command
  3. Qmaster notifies Schedd about new job
  4. Schedd dispatches job to an Execd
  5. Qmaster delivers job to Execd Execd starts job
    using Shepherd
  6. At job end Execd notifies Qmaster about job
    finish
  7. Qmaster feeds jobs resource consumption in
    accounting database

Execd1
Execd2
ExecdN
1
1
1
5
4
6
Qmaster
Schedd
3
7
2
qsub
8
Platform LSF
What is Platform LSF?
  • Platform LSF (Load Sharing Facility) is a
    commercial workload management solution that
    optimizes the use of enterprise-wide resources by
    providing transparent, on demand access to
    valuable computing resources

CERN Batch Service
  • CERN Batch Service provides an LSF farm with
    1500 dual-processor machines for data analysis
    and simulation (used CPU time is accounted to the
    experiments!!!)
  • LXPLUS is used for the public logon (i.e. for the
    job submits)
  • Depending on the load and resource requirements
    of jobs up to 3 jobs are running in parallel on
    the same node

More information at http//batch.web.cern.ch/batch
/ http//www.platform.com/Products/Platform.LSF.Fa
mily/Platform.LSF http//www.hp.com/techservers/so
ftware/lsf.html
9
Platform LSF
Product features
  • Dynamic Load Balancing by continuously monitoring
    of system resources CPU and memory usage, swap
    space, software license availability (!)
  • Resource-based Queuing and Scheduling. Resources
    are dynamically managed based on policies,
    schedules and thresholds jobs submitted to any
    network-based queues are automatically processed
    as resources become available
  • Optimal Resource Sharing. Continuous resource
    management, even in the event of host failures
    failed jobs are automatically re-run and failed
    servers restarted
  • Administrative Control and Policies. Admins can
    suspend, stop, and submit jobs from any node in a
    network users can modify their own jobs once
    submitted to queues. Varied options are available
    for configuring workload policies, supporting
    resource sharing by users, user groups, and
    projects

10
Platform LSF
Load Sharing
To achieve load sharing, LSF must have up to date
information about the load on each machine in a
cluster. The Load Information Manager (LIM)
component is responsible for this. A LIM daemon
runs on each host of the cluster. It gathers
information about its host and makes the
information available to all hosts. The
information is organized as a load vector. The
load vector comprises a number of load indices as
described in the following table Load
index Description r15s Load average for last 15
seconds r1m Load average for last minute
exponentially averaged CPU run queue
lengths r15m Load average for last 15
minutes ut Percent CPU utilization averaged over
last minute pg Paging (in/out) activity over
last 20 seconds ls Number of login
sessions it Idle time - number of minutes since
last keyboard or mouse activity tmp Available
space (MB) in /tmp file system swp Available
swap space (MB) mem Available real memory (MB)
11
Platform LSF
Job Life Cycle
  1. User submits a job to LSF for execution
  2. The submitted job proceeds through the batch
    library to the Load Information Manager (LIM)
  3. LIM communicates the job's information to the
    cluster's master LIM. Periodically, the LIM on
    individual machines gathers its 12 built-in load
    indices and forwards this information to the
    master LIM see previous slide
  4. The master LIM determines the best host to run
    the job and sends this information back to the
    submission host's LIM
  5. (Information about the chosen execution host is
    passed through the batch library)
  6. Information about the host to execute the job is
    passed back to the bsub process or lsb_submit()
    function
  7. To enter the batch system, bsub or lsb_submit()
    sends the job to the batch library
  8. Using batch library services, the job is sent to
    the mbatchd running on the cluster's master host

1
12
Platform LSF
Job Life Cycle
  1. The mbatchd puts the job in an appropriate queue
    and waits for the appropriate time to dispatch
    the job. User jobs are held in batch queues by
    mbatchd, which checks the load information on all
    candidate hosts periodically
  2. The mbatchd dispatches the job when an execution
    host with the necessary resources becomes
    available where it is received by the host's
    sbatchd
  3. sbatchd controls the execution of the job and
    reports the job's status to mbatchd. The sbatchd
    creates a child sbatchd to handle job execution
  4. The child sbatchd sends the job to the Remote
    Execution Server (RES)
  5. The RES creates the execution environment to run
    the job
  6. The job is run in the execution environment
  7. The results of the job are sent to the email
    system
  8. The email system sends the job's results to the
    user

2
13
Other Batch Systems
What other batch systems are on the market?
  • OpenPBS - Open source Portable Batch System
    Unsupported version development stopped in
    1999. Flexible batch queuing system developed
    for NASA in the early to mid-1990s. It operates
    on networked, multi-platform UNIX environments.
    Developed into a commercial PBS Pro version
    (http//www.openpbs.org, compilation requires
    hacker intervention..).
  • Public home http//www-unix.mcs.anl.gov/openpbs
  • Condor Specialized workload management system
    for compute-intensive jobs. Like other
    full-featured batch systems, Condor provides a
    job queuing mechanism, scheduling policy,
    priority scheme, resource monitoring, and
    resource management. Users submit their serial or
    parallel jobs to Condor, Condor places them into
    a queue, chooses when and where to run the jobs
    based upon a policy, carefully monitors their
    progress, and ultimately informs the user upon
    completion (http//www.cs.wisc.edu/condor)
  • BQS - Batch Queuing System. Some information at
    CC-IN2P3 web site (http//webcc.in2p3.fr/man/bqs/i
    ntro)

14
Maui Scheduler
  • Open source batch queuing and scheduling software
    designed to schedule parallel jobs
  • Maui can schedule the order of job execution for
    queued jobs (from other batch systems)
  • Has lots of scheduling concepts FIFO (First-in
    first-out) like reservations, back-filling of
    jobs, job priorities, time-of-day scheduling, etc
  • Written in Java
  • Maui Scheduler has been designed to communicate
    directly with a database through an abstraction
    layer BUT currently MySQL is the only one
    implemented MySQL is required
  • Considered for CASTOR2 (until LSF was chosen)
  • What is backfilling?
  • Maui allows a lower priority job to be executed
    before a higher priority job if it does not delay
    the start of the prioritized job apparently not
    found in other schedulers

More information at http//mauischeduler.sourcefor
ge.net
Write a Comment
User Comments (0)
About PowerShow.com