Title: Grids and Service Oriented Architectures Research Group
1Evaluating the EASY-Backfill Job Scheduling of
Static Workloads on Clusters
- Dr. Adam K.L. Wong
- Prof. Andrzej M. Goscinski
- Email aklwong, ang_at_deakin.edu.au
2Outline
- Objective of this research
- Background of job scheduling on computer clusters
- Related work and our response
- Experimental work
- Results and analysis
- Conclusions
- Future work
3Objective
- Most of the research of the performance of the
backfilling job scheduling was carried out by
simulation. - Many of these results are unclear or incomplete.
There is a lack of experiments carried out on
Real Systems! - Our performance evaluation of EASY-Backfilling on
a computer cluster serves two purposes - The results obtained could be used to validate
simulation results achieved by other researchers. - The methodology used in our study alleviated some
problems existing in the simulations presented in
the current literature.
4BackgroundJob Scheduling on a Computer Cluster
5BackgroundBasic Algorithm (First Come First
Serve, All Requested Computers Available)
6BackgroundBackfilling Algorithm (Perfect
Estimation)
7Background Backfilling Algorithm
(Over-Estimation vs. Under-Estimation)
8BackgroundBackfilling Algorithm (Conservative
vs. EASY)
9Related Work
- Many of simulation results shown in the
literature are unclear or incomplete and some
important issues are unattended. - Problems of simulation studies
- Non-unified use of terminology, e.g. system load.
- Over-simplified programming model, e.g. memory
and communication requirements are not addressed. - Ambiguity in experimental methodologies, e.g. the
Poisson distribution of workload is blindly used.
- Lack of experiments on real systems.
- Of course stimulation studies are important but
they are not completely trustworthy!
10Related Work
- Static Workload vs. Dynamic Workload.
- Jobs in a dynamic workload usually come with
bursts of different intensities depending on the
distribution of their inter-arrival times. - Not enough attention paid to the arrival process
of parallel jobs. - Changes in arrival time and the service time of
jobs affect a schedulers performance since they
can change the number of jobs in the ready queue.
- Estimated Execution Time.
- The mechanism of backfilling relies on the
estimates of jobs execution time. - Users tend to over estimate their jobs execution
time to avoid their jobs being killed! - Can a tighter estimation makes a better schedule?
11Our Response
- We carried out a detailed evaluation of
EASY-backfilling by scheduling MPI parallel
applications on a real cluster. - Static workloads were used.
- A static workload is a snapshot of a dynamic
workload. - The behaviour of a scheduler under different job
bursts was captured by evaluating the scheduler
with static workloads of different sizes. - Different workload types were constructed to
capture the characteristics of - Job length execution time of a job.
- Job width number of computers requested for a
job. - Users estimation of the job length
- The impact of the magnitude of users estimates
on the scheduling performance under different
workload compositions is studied.
12Experimental WorkTestbed
- Hardware A cluster of 16 Pentium Class PCs
- OS Linux (Red Hat 8.0)
- Job Scheduler We have developed (implemented in
C) - We used a batch mode of job execution
- Parallel Programming Tool MPI (LAM versions
6.5.6, 7.1.2) - Parallel Applications Selected NAS benchmark
programs
13Experimental WorkWorkload Construction
- Classification of jobs
- A narrow job needs gt1 but lt8 computers
- A wide job needs gt 8 but lt16 computers
- Three static workload sizes
- 10 jobs
- 50 jobs
- 100 jobs
- Three workload composition types
- Even-Distributed
- Wide-Dominated
- Narrow-Dominated
14Experimental WorkNine different workload types
constructed
- We generated 10 instances for each of the
workload types.
15Experimental WorkPerformance Metrics
- Waiting Time The amount of time a job has to
wait for execution in the batch queue. - Execution Time The amount of time a job has
executed for completion. - Response Time The sum of the waiting time spent
in the batch queue and the execution time of the
job. - Slowdown The response time normalized by the
execution time Response Time / Execution Time. - Makespan The total time that it takes for all
jobs in a workload to finish on the cluster. - Throughput The number of jobs completed on the
cluster per unit time.
16Experimental WorkExperiment 1
- Objective
- To find out by how much the EASY-Backfilling
performs better than the ARCA. - Work done
- All of the ten instances created for each of the
nine workload types were scheduled by our cluster
batch scheduler with the computer-allocation
policies ARCA and EASY-backfilling. - We measured
- The performance metrics specified.
17Experimental WorkExperiment 2
- Objective
- To study the influence of static workload size,
workload composition and accuracy of users
estimates on the performance of the
EASY-Backfilling. - Work done
- Perfect Estimate of a program execution time
(Tp) - We measured the actual programs execution time.
- Imperfect Estimate of a program execution time
(Te) - Te (1 Randomk)Tp for k 0.5, 1, 2 and
5 - is used to represent workloads with an over
estimation in the execution time in a random
range of 0 to 50, 0 to 100, 0 to 200 and 0 to
500. - We measured
- The performance metrics specified.
18Results and AnalysisPerformance of the ARCA
allocation policy
- Workload of Small Even-Distributed MG.8,
LU.2, MG.4, MG.4, EP.16, MG.2, LU.4, LU.16, MG.8,
MG.4
19Results and Analysis Comparison of ARCA and
EASY-Backfilling (Workload size of 10)
20Results and AnalysisImprovement in Slowdown of
EASY-Backfilling
- Workload Size 50 Workload Size
100
Workload Size 10
21Results and AnalysisImprovement in Throughput of
EASY-Backfilling
- Workload Size 10 Workload Size
50 Workload Size 100
22Conclusions
- We have provided a new methodology of evaluating
the EASY-Backfill job scheduling of static
workloads on a real cluster. - We carried out the experiments using our newly
developed scheduler of a cluster batch system. - We have learnt from our experiments
- EASY-backfilling (and the variants of
over-estimations) outperforms ARCA. - The number of jobs in the queue increases (i.e.
workload size), the performance of
EASY-backfilling improves. - Studying the impact of the workload compositions
is non-trivial. More work is needed! - Our experiment result have confirmed some
simulation results. - Our simple but effective experimental methodology
has clarified some ambiguities existing in the
simulation studies and thus it helps to improve
the interpretation of those simulation results.
23Future Work
- Submitted paper
- The Impact of Under-Estimated Length of Jobs on
EASY-Backfill Scheduling, Adam K.L. Wong and
Andrzej M. Goscinski, (submitted to PDP2008,
the 16th Euromicro International Conference on
Parallel, Distributed and network-based
Processing, http//pdp2008.org). - Paper to submit
- A Cluster Batch Scheduler with Efficient
Computer-Allocation Policies for Moldable Jobs,
Adam K.L. Wong and Andrzej M. Goscinski. - Paper in preparation
- A Dynamic Space Sharing Computer-Allocation
Policy for Scheduling Parallel Jobs on Clusters