Scheduling on Clusters, Scheduling on Grids - PowerPoint PPT Presentation

1 / 62

About This Presentation

Title:

Scheduling on Clusters, Scheduling on Grids

Description:

How do we make a decision? Depends on the data. Sudharshan. Grid FTP Data, NWS ... We'd like to make life easier for the application scientist. Collaborators ... – PowerPoint PPT presentation

Number of Views:74

Avg rating:3.0/5.0

Slides: 63

Provided by: jms3

Category:

more less

Transcript and Presenter's Notes

Title: Scheduling on Clusters, Scheduling on Grids

1
Scheduling on Clusters,Scheduling on Grids

Jennifer M. Schopf
Northwestern University

2
Outline

Stochastic Scheduling
Thesis work
UC San Diego Fran Berman
Data Replica Selection
Work in progress
Northwestern University Jeff Mezger, Christopher
Beckmann
Argonne National Lab Sudharshan Vazhkudai and
Mohamed Kerasha

3
Stochastic Scheduling OverviewThe Problem

Clusters of workstations can provide the
resources required to execute a scientific
application efficiently
Cannot achieve good performance for any single
application when resources are shared

4
Our Solution

Scheduling techniques can be developed to make
use of the dynamic performance characteristics of
shared resources
The approach
Structural performance models
Stochastic values and predictions
Stochastic scheduling techniques

5
Why use shared clusters of workstations?

More resources at a low cost
Multiple distributed resources
Processors
Storage and Data sources
Memory
Cooperation execution of a single application
Focus on performance speed and capacity

6
How can these resources be used effectively?

Efficient scheduling
Selection of resources
Mapping of tasks to resources
Allocating data
Accurate prediction of performance
Good performance prediction modeling techniques

7
Stochastic Value Parameters
Point Value Parameters
Structural Prediction Models
Stochastic Prediction
Stochastic Scheduling
8
Modeling performance on shared clusters

Challenges
Heterogeneous setting
Multiple languages and programming styles
Multiple implementations of application
Non-dedicated and possibly highly contended
resources

9
Modeling approach must incorporate

Different models for different machine types and
different application types
Different models for the same application, or
parts of the application
Extensible and flexible models to adjust to
changing resources

10
Structural Modeling

Method to construct flexible, extensible
performance models for distributed parallel
applications
Application performance is decomposed according
to the structure of the application
Each sub-task can have its own model
Result is a performance equation
Parameters are application and system
characteristics

11
Successive Over-Relaxation (SOR)

Iterative solution to Laplaces equation
Typical stencil application
Divided into a red pahse and a black phase
2-d grid of data divided into strips

12
SOR
13
Models
14
Dedicated SOR Experiments

Platform- 2 Sparc 2s. 1 Sparc 5, 1 Sparc 10
10 mbit ethernet connection
Quiescent machines and network
Prediction within 3 before memory spill

15
Non-dedicated SOR results

Available CPU on workstations varied from .43 to
.53

16
Platforms with Scattered Range of CPU Availability
17
Improving structural models

Available CPU has range of 0.48 /- 0.05
Prediction should also have a range

18
Using Additional Information

Point value
Bandwidth reported as 7Mbits/sec
Single Value
Often a best guess, estimate under ideal
circumstances, or a value accurate only for a
given time frame

Stochastic value
Bandwidth reported as 7Mbits /- 2 Mbits
A set of possible values weighted by
probabilities
Represents a range of likely behavior

19
Stochastic Structural Models

Goal Extend structural models so that resulting
predictions are distributions
Structural model is an equation so
Need to represent stochastic information
Normal distribution
Interval
Histogram
Need to be able to mathematically combine the
stochastic values in a timely manner

20
Using Normal Distributions

A distribution is a set of values with associated
probabilities
General distributions have no unifying
characteristics (and no associated tractable
arithmetic)
Can often summarize with a well-known family of
distributions

21
Normal distributions

Symmetric and bell-shaped
Summarized by a mean and a standard deviation
Range of 2 standard deviations captures 95 of
the values
Assume that stochastic data can be adequately
represented by normal dist

22
Dedicated Execution Time
23
Practical issues when using stochastic data

Who/what can supply stochastic data?
User
Data from past runs
On-line measurement tools
Network weather service time series data
Time frame
Given a time series, how much data should we
consider?

24
Accuracy of stochastic results

Result of a stochastic prediction will also be a
range of values
Need to consider how to achieve a tight (sharp)
interval
What to do if interval isnt tight

25
How can I use these predictions in scheduling?
Point Value Parameters
Stochastic Value Parameters
Structural Prediction Models
Stochastic Prediction
Stochastic Scheduling
26
Using stochastic predictions

Simplest scheduling situation Given a data
parallel application, adjust amount of data
assigned to each processor to minimize execution
time

27
Delay in 1 can cause delay in all
28
Stochastic Scheduling

Examine
Stochastic data represented as normal
distributions
Data parallel codes
Fixed set of shared resources
Question How should data be distributed to
minimize execution time?
Approach Adjust data allocation so that a high
variance machine receives less work in order to
minimize the effects of contention

29
Time Balancing

Minimize execution time by assigning data so that
each processor finishes at roughly the same time
Di data assigned to processor I
Ui time per unit of data on processor I
Ci time to distributed the data
DiUi Ci Dj Uj Cj for all i,j
Sum Di Dtotal

30
Stochastic Time Balancing

Adapt time to compute a unit of data (ui) to
reflect stochastic information
Larger ui means smaller Di (less data)
If we have normal distributions
95 confidence interval corresponds to m-2sd,
m2sd
If we set u m2 sd
95 conservative schedule

31
Stochastic Time Balancing (cont)

Set of equations is now
Di (mi 2 sdi ) Ci Dj(mj 2 sdj ) Cj
for all i, j
Sum Di Dtotal

32
How do policies compare in a production
environment

4 contended Sparcs over 10 Mbit shared ethernet

33
Set of Schedules
34
Tuning factor

Tuning factor is the knobto turn to decide how
conservative a schedule should be
For example,m used to determine number of
standard deviations to add to mean
Let ui mi sdiTF
Solve
Di (mi sdiTF) Ci Dj (mjsdjTF) Cj

35
Extensible approach

Dont have to use mean and standard deviation
TF can be defined in a variety of ways

36
Defining our stochastic scheduling policy goals

Decrease execution time
Predictable performance
Avoid spikes in execution behavior
More conservative when in doubt

37
System of benefits and penalties

Based on Sih and Lees approach to scheduling
Benefit (give a less conservative schedule to)
Platforms with fewer varying machines
Low variance machines, especially those with
lower power

38
Partial ordering
39
Algorithm for TF
40
Scheduling Experiments

Platform-
4 contended PCs running Linux
100 mbit shared ethernet connection
3 policies run back to back
Mean Ui based on runtime mean pred.
VTF Ui based on mean and heuristic TF
evaluation
95TF Ui based on 95 conf. interval

41
Metrics

Window Which of each window of three runs has
fastest execution time?
Compare How often was one policy better than,
worse than, or split when compared with the
policy run just before and just after
Whats the right metric?

42
SOR- scheduling 1

Window Mean 9, CTF 27, 95TF 22 (of 57)
Compare Better Mixed Worse
Mean 3 4 12
VTF 10 7 3
95TF 6 9 4

43
CPU performance
44
SOR 2
SOR- scheduling 2

Window Mean 8, VTF 39, 95TF 11 (of 57)
Compare Better Mixed Worse
Mean 3 7 9
VTF 15 2 3
95TF 3 8 8

45
CPU
46
Experimental Conclusions

Stochastic information was more beneficial when
there was a higher variability in available CPU
Almost always we saw a reduction in variation in
actual execution times
Unclear when it is better to use which heuristic
scheduling policy at this point

47
Summary

The ability to parameterize structural models
with stochastic values in order to meet the
prediction needs of shared clusters of
workstations
A stochastic scheduling policy that can make use
of stochastic predictions to achieve better
execution times and more predictable application
behavior

48
The Grid