Grids and Service Oriented Architectures Research Group - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

Grids and Service Oriented Architectures Research Group

Description:

Our performance evaluation of EASY-Backfilling on a computer cluster serves two purposes: ... A Cluster Batch Scheduler with Efficient Computer-Allocation Policies for ... – PowerPoint PPT presentation

Number of Views:16

Avg rating:3.0/5.0

Slides: 24

Provided by: aklw

Category:

more less

Transcript and Presenter's Notes

Title: Grids and Service Oriented Architectures Research Group

1
Evaluating the EASY-Backfill Job Scheduling of
Static Workloads on Clusters

Dr. Adam K.L. Wong
Prof. Andrzej M. Goscinski
Email aklwong, ang_at_deakin.edu.au

2
Outline

Objective of this research
Background of job scheduling on computer clusters
Related work and our response
Experimental work
Results and analysis
Conclusions
Future work

3
Objective

Most of the research of the performance of the
backfilling job scheduling was carried out by
simulation.
Many of these results are unclear or incomplete.
There is a lack of experiments carried out on
Real Systems!
Our performance evaluation of EASY-Backfilling on
a computer cluster serves two purposes
The results obtained could be used to validate
simulation results achieved by other researchers.
The methodology used in our study alleviated some
problems existing in the simulations presented in
the current literature.

4
BackgroundJob Scheduling on a Computer Cluster

How does it work?

5
BackgroundBasic Algorithm (First Come First
Serve, All Requested Computers Available)
6
BackgroundBackfilling Algorithm (Perfect
Estimation)
7
Background Backfilling Algorithm
(Over-Estimation vs. Under-Estimation)
8
BackgroundBackfilling Algorithm (Conservative
vs. EASY)
9
Related Work

Many of simulation results shown in the
literature are unclear or incomplete and some
important issues are unattended.
Problems of simulation studies
Non-unified use of terminology, e.g. system load.
Over-simplified programming model, e.g. memory
and communication requirements are not addressed.
Ambiguity in experimental methodologies, e.g. the
Poisson distribution of workload is blindly used.
Lack of experiments on real systems.
Of course stimulation studies are important but
they are not completely trustworthy!

10
Related Work

Static Workload vs. Dynamic Workload.
Jobs in a dynamic workload usually come with
bursts of different intensities depending on the
distribution of their inter-arrival times.
Not enough attention paid to the arrival process
of parallel jobs.
Changes in arrival time and the service time of
jobs affect a schedulers performance since they
can change the number of jobs in the ready queue.
Estimated Execution Time.
The mechanism of backfilling relies on the
estimates of jobs execution time.
Users tend to over estimate their jobs execution
time to avoid their jobs being killed!
Can a tighter estimation makes a better schedule?

11
Our Response

We carried out a detailed evaluation of
EASY-backfilling by scheduling MPI parallel
applications on a real cluster.
Static workloads were used.
A static workload is a snapshot of a dynamic
workload.
The behaviour of a scheduler under different job
bursts was captured by evaluating the scheduler
with static workloads of different sizes.
Different workload types were constructed to
capture the characteristics of
Job length execution time of a job.
Job width number of computers requested for a
job.
Users estimation of the job length
The impact of the magnitude of users estimates
on the scheduling performance under different
workload compositions is studied.

12
Experimental WorkTestbed

Hardware A cluster of 16 Pentium Class PCs
OS Linux (Red Hat 8.0)
Job Scheduler We have developed (implemented in
C)
We used a batch mode of job execution
Parallel Programming Tool MPI (LAM versions
6.5.6, 7.1.2)
Parallel Applications Selected NAS benchmark
programs

13
Experimental WorkWorkload Construction

Classification of jobs
A narrow job needs gt1 but lt8 computers
A wide job needs gt 8 but lt16 computers
Three static workload sizes
10 jobs
50 jobs
100 jobs
Three workload composition types
Even-Distributed
Wide-Dominated
Narrow-Dominated

14
Experimental WorkNine different workload types
constructed

We generated 10 instances for each of the
workload types.

15
Experimental WorkPerformance Metrics

Waiting Time The amount of time a job has to
wait for execution in the batch queue.
Execution Time The amount of time a job has
executed for completion.
Response Time The sum of the waiting time spent
in the batch queue and the execution time of the
job.
Slowdown The response time normalized by the
execution time Response Time / Execution Time.
Makespan The total time that it takes for all
jobs in a workload to finish on the cluster.
Throughput The number of jobs completed on the
cluster per unit time.

16
Experimental WorkExperiment 1

Objective
To find out by how much the EASY-Backfilling
performs better than the ARCA.
Work done
All of the ten instances created for each of the
nine workload types were scheduled by our cluster
batch scheduler with the computer-allocation
policies ARCA and EASY-backfilling.
We measured
The performance metrics specified.

17
Experimental WorkExperiment 2

Objective
To study the influence of static workload size,
workload composition and accuracy of users
estimates on the performance of the
EASY-Backfilling.
Work done
Perfect Estimate of a program execution time
(Tp)
We measured the actual programs execution time.
Imperfect Estimate of a program execution time
(Te)
Te (1 Randomk)Tp for k 0.5, 1, 2 and
5
is used to represent workloads with an over
estimation in the execution time in a random
range of 0 to 50, 0 to 100, 0 to 200 and 0 to
500.
We measured
The performance metrics specified.

18
Results and AnalysisPerformance of the ARCA
allocation policy

Workload of Small Even-Distributed MG.8,
LU.2, MG.4, MG.4, EP.16, MG.2, LU.4, LU.16, MG.8,
MG.4

19
Results and Analysis Comparison of ARCA and
EASY-Backfilling (Workload size of 10)
20
Results and AnalysisImprovement in Slowdown of
EASY-Backfilling

Workload Size 50 Workload Size
100

Workload Size 10
21
Results and AnalysisImprovement in Throughput of
EASY-Backfilling

Workload Size 10 Workload Size
50 Workload Size 100

22
Conclusions

We have provided a new methodology of evaluating
the EASY-Backfill job scheduling of static
workloads on a real cluster.
We carried out the experiments using our newly
developed scheduler of a cluster batch system.
We have learnt from our experiments
EASY-backfilling (and the variants of
over-estimations) outperforms ARCA.
The number of jobs in the queue increases (i.e.
workload size), the performance of
EASY-backfilling improves.
Studying the impact of the workload compositions
is non-trivial. More work is needed!
Our experiment result have confirmed some
simulation results.
Our simple but effective experimental methodology
has clarified some ambiguities existing in the
simulation studies and thus it helps to improve
the interpretation of those simulation results.

23
Future Work

Submitted paper
The Impact of Under-Estimated Length of Jobs on
EASY-Backfill Scheduling, Adam K.L. Wong and
Andrzej M. Goscinski, (submitted to PDP2008,
the 16th Euromicro International Conference on
Parallel, Distributed and network-based
Processing, http//pdp2008.org).
Paper to submit
A Cluster Batch Scheduler with Efficient
Computer-Allocation Policies for Moldable Jobs,
Adam K.L. Wong and Andrzej M. Goscinski.
Paper in preparation
A Dynamic Space Sharing Computer-Allocation
Policy for Scheduling Parallel Jobs on Clusters