Title: The Bologna Batch System: Flexible Policy with Condor
1The Bologna Batch System Flexible Policy with
Condor
2The Bologna Batch System
- Custom batch scheduling system for local users at
INFN in Bologna, Italy. - Istituto Nazionale di Fisica Nucleare
- Dr. Paolo Mazzanti initiated the idea.
- Implement on a small subset of machines within
the larger nationwide INFN Condor pool - INFN Condor Pool 300 CPUs
- INFN-Bologna Condor 100 CPUs
- Bologna Batch System 50 CPUs
3Where We Started
- Basic Condor Policy
- Opportunistic resources
- Jobs only run when machines are otherwise idle
- Jobs can be preempted for machine owners or
higher-priority users - Fair-share across INFN pool
- Highest priority user in the pool gets first
crack at a given resource - The more you use, the worse your priority becomes
- Some problems
- Long-running vanilla jobs (with no checkpointing)
were frequently preempted before running to
completion - Users dislike waiting for a resource if they only
want to run a short job - High-priority users from other INFN sites running
on local resources while lower-priority local
users wait.
4BBS Policy Requirements
- Prioritize local work
- Share resources, but run outside jobs as backfill
- Treat local servers as dedicated resources for
local jobs, but opportunistic resources for
other jobs. - Run outside Condor jobs only if the server is
idle. - Run local batch jobs regardless of other system
load or console activity. - Preempt outside Condor jobs to allow local batch
jobs to run, but dont preempt local jobs for
outside work.
5BBS Policy Requirements
- Ensure resource availability for both short and
long-running jobs - Prioritize short batch jobs so that they are
never kept waiting by long batch jobs. - Prevent long batch jobs from being preempted or
starved by short jobs. - Never waste resources
- No idle CPUs when jobs are waiting to run!
- No preemption of vanilla jobs!
- Preemption ideal if you can checkpoint, but here
we cant
6A Contradiction!
- No way to guarantee resource availability for
short or long jobs without reserving some CPUs
for each - ...But no way to avoid idle CPUs without allowing
them to start any kind of job - If CPUs reserved for short jobs are used for long
jobs, they become unavailable to run short jobs. - If CPUs reserved for short jobs are not used for
long jobs, theyre being wasted when there are no
short jobs to run. - What to do, what to do
7A Solution!
- Allow resources to be temporarily overcommitted
- We treat one CPU as two
- On a two-CPU machine, define four Condor VMs
(virtual machines) two for short jobs and two
for long jobs. - Allow jobs to be suspended rather than preempted
- Think of as checkpointing to swap
- OR allow jobs to be de-prioritized temporarily
- If memory is adequate, allow suspended long
jobs to continue running at a poor OS priority
and steal cycles whenever active short jobs are
busy doing I/O.
8Everybody wins!
- Short jobs start right away on dedicated short
VMs - Long jobs arent preempted by short jobs, but
rather suspend temporarily or run at a lower
priority. - Outside jobs run only when no Bologna jobs
waiting. - All CPUs available to all types of jobs.
- No idle CPUs when jobs are waiting.
9Okay, how?
- Flipside of flexibility is complexity!
- Its pretty cool that Condor allows you to
combine dedicated and opportunistic scheduling in
one system, but it takes a bit of work to get it
all set up - Luckily for yall, weve already done the hard
part, and now you can copy it. ?
10Copy it from where?
- Bologna Batch System document
- http//www.cs.wisc.edu/pfc/bbs.doc
- A detailed walk-through of the specific policies
and the necessary Condor configuration to make
each one work. - Line by line examples of how we implemented each.
- Whats in it? Lets take a look
11First Step No hand waving!!
- Bologna Batch Jobs are specially-designated jobs
which may run only on specially-designated
Bologna Batch Servers. - Only users in Bologna may submit Bologna Batch
Jobs. - Bologna Batch Jobs must be vanilla-universe jobs
(and therefore are not capable of checkpointing
and resuming), and thus once they start they must
not be preempted for other jobs. - Bologna Batch Servers prefer Bologna Batch Jobs
over other Condor jobs, and will start Bologna
Batch Jobs regardless of system load or console
activity. - There are two types of Bologna Batch Jobs,
short-running and long-running. Bologna Batch
Jobs are assumed to be short-running unless they
are explicitly labeled as long-running when they
are submitted. - A short-running Bologna Batch Job must not be
forced to wait for the completion of a
long-running Bologna Batch Job before starting. - When short and long-running Bologna Batch Jobs
are running simultaneously on the same physical
machine, the short-running job processes should
run at a lower (better) OS priority than the
long-running jobs. - A short-running Bologna Batch Job may only run
for one hour, after which point it should be
killed and removed from the queue. - Bologna Batch Jobs have priority over other
Condor jobs. This means two things other jobs
must never preempt Bologna Batch Jobs, and
Bologna Batch Jobs must always immediately
preempt other jobs.
12Review
- Job
- Requirements
- Machine
- START
- PREEMPT
- RANK
- WANT_SUSPEND,
- JOB_RENICE_INCREMENT
- PREEMPTION_REQUIREMENTS
- STARTD_EXPRS, SUBMIT_EXPRS
13Requirement 1, Bologna Batch Jobs are
specially-designated jobs which may run only on
specially-designated Bologna Batch Servers.
- To identify the servers, place into local condor
config - BolognaBatchServer True
- STARTD_EXPRS (STARTD_EXPRS) BolognaBatchServer
- To indentify Bologna Batch Jobs by inserting the
following line into their job submit description
files - BolognaBatchJob True
- Now Bologna Batch Jobs and Servers can identify
one another, users ensure that Bologna Batch Jobs
run only on Bologna Batch Servers by specifying a
job requirement - Requirements (BolognaBatchServer True)
14Requirement 2, Only users in Bologna may submit
Bologna Batch Jobs.
- Each Bologna Batch Server double-checks the
origin of a job claiming to be a Bologna Batch
Job - IsBBJob ( TARGET.BolognaBatchJob ? True \
- TARGET.SUBMIT_SITE_DOMAIN
(SUBMIT_SITE_DOMAIN) ) - SUBMIT_SITE_DOMAIN is an attribute that INFN
defines on all machines, and which they
previously configured the Condor schedd to
automatically add to each jobs classad .
Individual Condor users are not able to override
it - SUBMIT_SITE_DOMAIN "(UID_DOMAIN)"
- SUBMIT_EXPRS (SUBMIT_EXPRS) SUBMIT_SITE_DOMAIN
-
15Requirement 3, BB Jobs must be vanilla-universe
jobs, and thus once they start they must not be
preempted
- Next we modified each Bologna Batch Servers
WANT_SUSPEND_VANILLA and PREEMPT expressions,
which Condor uses to decide when to suspend or
preempt a vanilla job, so that INFNs default
preemption policy would only affect non-Bologna
Batch Jobs. - IsNotBBJob ( (IsBBJob) ! True )
- WANT_SUSPEND_VANILLA ( (IsNotBBJob)
((WANT_SUSPEND_VANILLA)) ) - PREEMPT ( (IsNotBBJob) ((PREEMPT)) )
-
16Requirement 4, Bologna Batch Servers prefer
Bologna Batch Jobs over other Condor jobs, and
will start Bologna Batch Jobs regardless of
system load or console activity
- RANK (IsBBJob)
- INFN_START ( (LoadAvg - CondorLoadAvg) lt 0.3 \
- KeyboardIdle gt (15 60) \
- TotalCondorLoadAvg lt 1.0 )
- START ( (IsBBJob) ((INFN_START)) )
-
17Requirement 5, There are two types of Bologna
Batch Jobs, short-running and long-running.
Bologna Batch Jobs are assumed to be
short-running unless they are explicitly labeled
as long-running when they are submitted.
- Declare long running jobs by placing the
following into submit file - LongRunningJob True
- The in the config file, take advantage of
meta-operators - IsLongBBJob ( (IsBBJob) TARGET.LongRunningJo
b ? True ) - IsShortBBJob ( (IsBBJob) TARGET.LongRunningJ
ob ! True ) -
18Requirement 6, A short-running Bologna Batch
Job must not be forced to wait for the completion
of a long-running Bologna Batch Job before
starting..
- Declare more Virtual Machines than there are
actual CPUs (dual CPU 2 short VMs, 4 long) - NUM_SHORT_RUNNING_VMS 2
- IsShortRunningVM (VirtualMachineID lt
(NUM_SHORT_RUNNING_VMS)) - IsLongRunningVM (VirtualMachineID gt
(NUM_SHORT_RUNNING_VMS)) - Change the start expression
- SHORT_RUNNING_VM_START ( (IsShortBBJob) \
- ( (IsNotBBJob)
(INFN_START) ) ) - LONG_RUNNING_VM_START (IsLongBBJob)
- START ( ( (IsShortRunningVM)
(SHORT_RUNNING_VM_START) ) \ - ( (IsLongRunningVM)
(LONG_RUNNING_VM_START) ) ) -
19Requirement 7, When short and long-running BB
Jobs are running simultaneously on the same
physical machine, the short-running job processes
should run at a lower (better) OS priority
- JOB_RENICE_INCREMENT
- ( 5 ( 10 ( LongRunningJob ? True \
- BolognaBatchJob ! True ) )
- If LongRunningJob is true in the job classad, the
expression evaluates to (5 (10 1)), or 15.
If LongRunningJob is undefined or false in the
job classad, but BolognaBatchJob is true, the
expression evaluates to (5 (10 0)), or 5. If
neither is defined, the expression evaluates to
(5 (10 1)), or 15 -
20Requirement 8, A short-running Bologna Batch
Job may only run for one hour, after which point
it should be killed and removed from the queue.
- Declare long running jobs by placing the
following into submit file - PREEMPT ( ( (IsNotBBJob) ((PREEMPT)) ) \
- ( (IsShortBBJob) ((ActivityTimer)
gt 6060) ) ) - SHORT_RUNNING_VM_START (( (IsShortBBJob) \
- (RemoteWallClockTimelt6060) ! False) \
- ( (IsNotBBJob) ((INFN_START)) ) )
- To remove from the queue, in the job ad add
- Periodic_Remove ( LongRunningJob ! True \
- (RemoteWallClockTime lt
6060) ) -
21Requirement 9, Bologna Batch Jobs have priority
over other Condor jobs other jobs must never
preempt BBJobs, and BB Jobs must always
immediately preempt other jobs..
- RANK already dealt with, now priority preemption
- INFN_PREEMPTION_REQUIREMENTS
- ( (StateTimer) gt (2 (60 60)) \
- RemoteUserPrio gt SubmittorPrio 1.2 )
- PREEMPTION_REQUIREMENTS \
- (( BolognaBatchServer!True
(INFN_PREEMPTION_REQUIREMENTS)) \ - (BolognaBatchServer ? True \
- ( BolognaBatchJob ! True \
- ( TARGET.BolognaBatchJob ?
True \ - (INFN_PREEMPTION_REQUIREMEN
TS) )))) -
22Wrap condor_submit to make it easy for
usersbbs_submit_short / bbs_submit_long
- !/bin/sh
- _CONDOR_APPEND_REQ_VANILLA'(BolognaBatchServer
True)' - export _CONDOR_APPEND_REQ_VANILLA
- condor_submit -a 'BolognaBatchJob True' \
- -a 'should_transfer_files
IF_NEEDED' \ - -a 'when_to_transfer_output
ON_EXIT' \ - -a 'universe vanilla' \
- -a 'periodic_remove (
LongRunningJob ! True - (RemoteWallClockTime gt
6060) ) ' \ -
23Simple for Users
- Although policy is complicated, the interface for
users is kept simple - Users call bbs_submit_long or bbs_submit_short,
just as they would condor_submit - Short jobs start quickly, but those that run for
gt1 hour are killed. - Long jobs will run to completion...
- bbs_submit_ scripts automatically add the
appropriate classad attributes to the job to take
advantage of the long or short running VMs on
Bologna Batch Servers.
24Any Questions?
- Email me at condor-admin_at_cs.wisc.edu.
- Check the Bologna Batch System document at
http//www.cs.wisc.edu/pfc/bbs.doc - Thanks!