Grids and Service Oriented Architectures Research Group - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Grids and Service Oriented Architectures Research Group

Description:

Our performance evaluation of EASY-Backfilling on a computer cluster serves two purposes: ... A Cluster Batch Scheduler with Efficient Computer-Allocation Policies for ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 24
Provided by: aklw
Category:

less

Transcript and Presenter's Notes

Title: Grids and Service Oriented Architectures Research Group


1
Evaluating the EASY-Backfill Job Scheduling of
Static Workloads on Clusters
  • Dr. Adam K.L. Wong
  • Prof. Andrzej M. Goscinski
  • Email aklwong, ang_at_deakin.edu.au

2
Outline
  • Objective of this research
  • Background of job scheduling on computer clusters
  • Related work and our response
  • Experimental work
  • Results and analysis
  • Conclusions
  • Future work

3
Objective
  • Most of the research of the performance of the
    backfilling job scheduling was carried out by
    simulation.
  • Many of these results are unclear or incomplete.
    There is a lack of experiments carried out on
    Real Systems!
  • Our performance evaluation of EASY-Backfilling on
    a computer cluster serves two purposes
  • The results obtained could be used to validate
    simulation results achieved by other researchers.
  • The methodology used in our study alleviated some
    problems existing in the simulations presented in
    the current literature.

4
BackgroundJob Scheduling on a Computer Cluster
  • How does it work?

5
BackgroundBasic Algorithm (First Come First
Serve, All Requested Computers Available)
6
BackgroundBackfilling Algorithm (Perfect
Estimation)
7
Background Backfilling Algorithm
(Over-Estimation vs. Under-Estimation)
8
BackgroundBackfilling Algorithm (Conservative
vs. EASY)
9
Related Work
  • Many of simulation results shown in the
    literature are unclear or incomplete and some
    important issues are unattended.
  • Problems of simulation studies
  • Non-unified use of terminology, e.g. system load.
  • Over-simplified programming model, e.g. memory
    and communication requirements are not addressed.
  • Ambiguity in experimental methodologies, e.g. the
    Poisson distribution of workload is blindly used.
  • Lack of experiments on real systems.
  • Of course stimulation studies are important but
    they are not completely trustworthy!

10
Related Work
  • Static Workload vs. Dynamic Workload.
  • Jobs in a dynamic workload usually come with
    bursts of different intensities depending on the
    distribution of their inter-arrival times.
  • Not enough attention paid to the arrival process
    of parallel jobs.
  • Changes in arrival time and the service time of
    jobs affect a schedulers performance since they
    can change the number of jobs in the ready queue.
  • Estimated Execution Time.
  • The mechanism of backfilling relies on the
    estimates of jobs execution time.
  • Users tend to over estimate their jobs execution
    time to avoid their jobs being killed!
  • Can a tighter estimation makes a better schedule?

11
Our Response
  • We carried out a detailed evaluation of
    EASY-backfilling by scheduling MPI parallel
    applications on a real cluster.
  • Static workloads were used.
  • A static workload is a snapshot of a dynamic
    workload.
  • The behaviour of a scheduler under different job
    bursts was captured by evaluating the scheduler
    with static workloads of different sizes.
  • Different workload types were constructed to
    capture the characteristics of
  • Job length execution time of a job.
  • Job width number of computers requested for a
    job.
  • Users estimation of the job length
  • The impact of the magnitude of users estimates
    on the scheduling performance under different
    workload compositions is studied.

12
Experimental WorkTestbed
  • Hardware A cluster of 16 Pentium Class PCs
  • OS Linux (Red Hat 8.0)
  • Job Scheduler We have developed (implemented in
    C)
  • We used a batch mode of job execution
  • Parallel Programming Tool MPI (LAM versions
    6.5.6, 7.1.2)
  • Parallel Applications Selected NAS benchmark
    programs

13
Experimental WorkWorkload Construction
  • Classification of jobs
  • A narrow job needs gt1 but lt8 computers
  • A wide job needs gt 8 but lt16 computers
  • Three static workload sizes
  • 10 jobs
  • 50 jobs
  • 100 jobs
  • Three workload composition types
  • Even-Distributed
  • Wide-Dominated
  • Narrow-Dominated

14
Experimental WorkNine different workload types
constructed
  • We generated 10 instances for each of the
    workload types.

15
Experimental WorkPerformance Metrics
  • Waiting Time The amount of time a job has to
    wait for execution in the batch queue.
  • Execution Time The amount of time a job has
    executed for completion.
  • Response Time The sum of the waiting time spent
    in the batch queue and the execution time of the
    job.
  • Slowdown The response time normalized by the
    execution time Response Time / Execution Time.
  • Makespan The total time that it takes for all
    jobs in a workload to finish on the cluster.
  • Throughput The number of jobs completed on the
    cluster per unit time.

16
Experimental WorkExperiment 1
  • Objective
  • To find out by how much the EASY-Backfilling
    performs better than the ARCA.
  • Work done
  • All of the ten instances created for each of the
    nine workload types were scheduled by our cluster
    batch scheduler with the computer-allocation
    policies ARCA and EASY-backfilling.
  • We measured
  • The performance metrics specified.

17
Experimental WorkExperiment 2
  • Objective
  • To study the influence of static workload size,
    workload composition and accuracy of users
    estimates on the performance of the
    EASY-Backfilling.
  • Work done
  • Perfect Estimate of a program execution time
    (Tp)
  • We measured the actual programs execution time.
  • Imperfect Estimate of a program execution time
    (Te)
  • Te (1 Randomk)Tp for k 0.5, 1, 2 and
    5
  • is used to represent workloads with an over
    estimation in the execution time in a random
    range of 0 to 50, 0 to 100, 0 to 200 and 0 to
    500.
  • We measured
  • The performance metrics specified.

18
Results and AnalysisPerformance of the ARCA
allocation policy
  • Workload of Small Even-Distributed MG.8,
    LU.2, MG.4, MG.4, EP.16, MG.2, LU.4, LU.16, MG.8,
    MG.4

19
Results and Analysis Comparison of ARCA and
EASY-Backfilling (Workload size of 10)
20
Results and AnalysisImprovement in Slowdown of
EASY-Backfilling
  • Workload Size 50 Workload Size
    100

Workload Size 10
21
Results and AnalysisImprovement in Throughput of
EASY-Backfilling
  • Workload Size 10 Workload Size
    50 Workload Size 100

22
Conclusions
  • We have provided a new methodology of evaluating
    the EASY-Backfill job scheduling of static
    workloads on a real cluster.
  • We carried out the experiments using our newly
    developed scheduler of a cluster batch system.
  • We have learnt from our experiments
  • EASY-backfilling (and the variants of
    over-estimations) outperforms ARCA.
  • The number of jobs in the queue increases (i.e.
    workload size), the performance of
    EASY-backfilling improves.
  • Studying the impact of the workload compositions
    is non-trivial. More work is needed!
  • Our experiment result have confirmed some
    simulation results.
  • Our simple but effective experimental methodology
    has clarified some ambiguities existing in the
    simulation studies and thus it helps to improve
    the interpretation of those simulation results.

23
Future Work
  • Submitted paper
  • The Impact of Under-Estimated Length of Jobs on
    EASY-Backfill Scheduling, Adam K.L. Wong and
    Andrzej M. Goscinski, (submitted to PDP2008,
    the 16th Euromicro International Conference on
    Parallel, Distributed and network-based
    Processing, http//pdp2008.org).
  • Paper to submit
  • A Cluster Batch Scheduler with Efficient
    Computer-Allocation Policies for Moldable Jobs,
    Adam K.L. Wong and Andrzej M. Goscinski.
  • Paper in preparation
  • A Dynamic Space Sharing Computer-Allocation
    Policy for Scheduling Parallel Jobs on Clusters
Write a Comment
User Comments (0)
About PowerShow.com