Title: Scheduling on Clusters, Scheduling on Grids
1Scheduling on Clusters,Scheduling on Grids
- Jennifer M. Schopf
- Northwestern University
2Outline
- Stochastic Scheduling
- Thesis work
- UC San Diego Fran Berman
- Data Replica Selection
- Work in progress
- Northwestern University Jeff Mezger, Christopher
Beckmann - Argonne National Lab Sudharshan Vazhkudai and
Mohamed Kerasha
3Stochastic Scheduling OverviewThe Problem
- Clusters of workstations can provide the
resources required to execute a scientific
application efficiently - Cannot achieve good performance for any single
application when resources are shared
4Our Solution
- Scheduling techniques can be developed to make
use of the dynamic performance characteristics of
shared resources - The approach
- Structural performance models
- Stochastic values and predictions
- Stochastic scheduling techniques
5Why use shared clusters of workstations?
- More resources at a low cost
- Multiple distributed resources
- Processors
- Storage and Data sources
- Memory
- Cooperation execution of a single application
- Focus on performance speed and capacity
6How can these resources be used effectively?
- Efficient scheduling
- Selection of resources
- Mapping of tasks to resources
- Allocating data
- Accurate prediction of performance
- Good performance prediction modeling techniques
7Stochastic Value Parameters
Point Value Parameters
Structural Prediction Models
Stochastic Prediction
Stochastic Scheduling
8Modeling performance on shared clusters
- Challenges
- Heterogeneous setting
- Multiple languages and programming styles
- Multiple implementations of application
- Non-dedicated and possibly highly contended
resources
9Modeling approach must incorporate
- Different models for different machine types and
different application types - Different models for the same application, or
parts of the application - Extensible and flexible models to adjust to
changing resources
10Structural Modeling
- Method to construct flexible, extensible
performance models for distributed parallel
applications - Application performance is decomposed according
to the structure of the application - Each sub-task can have its own model
- Result is a performance equation
- Parameters are application and system
characteristics
11Successive Over-Relaxation (SOR)
- Iterative solution to Laplaces equation
- Typical stencil application
- Divided into a red pahse and a black phase
- 2-d grid of data divided into strips
12SOR
13Models
14Dedicated SOR Experiments
- Platform- 2 Sparc 2s. 1 Sparc 5, 1 Sparc 10
- 10 mbit ethernet connection
- Quiescent machines and network
- Prediction within 3 before memory spill
15Non-dedicated SOR results
- Available CPU on workstations varied from .43 to
.53
16Platforms with Scattered Range of CPU Availability
17Improving structural models
- Available CPU has range of 0.48 /- 0.05
- Prediction should also have a range
18Using Additional Information
- Point value
- Bandwidth reported as 7Mbits/sec
- Single Value
- Often a best guess, estimate under ideal
circumstances, or a value accurate only for a
given time frame
- Stochastic value
- Bandwidth reported as 7Mbits /- 2 Mbits
- A set of possible values weighted by
probabilities - Represents a range of likely behavior
19Stochastic Structural Models
- Goal Extend structural models so that resulting
predictions are distributions - Structural model is an equation so
- Need to represent stochastic information
- Normal distribution
- Interval
- Histogram
- Need to be able to mathematically combine the
stochastic values in a timely manner
20Using Normal Distributions
- A distribution is a set of values with associated
probabilities - General distributions have no unifying
characteristics (and no associated tractable
arithmetic) - Can often summarize with a well-known family of
distributions
21Normal distributions
- Symmetric and bell-shaped
- Summarized by a mean and a standard deviation
- Range of 2 standard deviations captures 95 of
the values - Assume that stochastic data can be adequately
represented by normal dist
22Dedicated Execution Time
23Practical issues when using stochastic data
- Who/what can supply stochastic data?
- User
- Data from past runs
- On-line measurement tools
- Network weather service time series data
- Time frame
- Given a time series, how much data should we
consider?
24Accuracy of stochastic results
- Result of a stochastic prediction will also be a
range of values - Need to consider how to achieve a tight (sharp)
interval - What to do if interval isnt tight
25How can I use these predictions in scheduling?
Point Value Parameters
Stochastic Value Parameters
Structural Prediction Models
Stochastic Prediction
Stochastic Scheduling
26Using stochastic predictions
- Simplest scheduling situation Given a data
parallel application, adjust amount of data
assigned to each processor to minimize execution
time
27Delay in 1 can cause delay in all
28Stochastic Scheduling
- Examine
- Stochastic data represented as normal
distributions - Data parallel codes
- Fixed set of shared resources
- Question How should data be distributed to
minimize execution time? - Approach Adjust data allocation so that a high
variance machine receives less work in order to
minimize the effects of contention
29Time Balancing
- Minimize execution time by assigning data so that
each processor finishes at roughly the same time - Di data assigned to processor I
- Ui time per unit of data on processor I
- Ci time to distributed the data
- DiUi Ci Dj Uj Cj for all i,j
- Sum Di Dtotal
30Stochastic Time Balancing
- Adapt time to compute a unit of data (ui) to
reflect stochastic information - Larger ui means smaller Di (less data)
- If we have normal distributions
- 95 confidence interval corresponds to m-2sd,
m2sd - If we set u m2 sd
- 95 conservative schedule
31Stochastic Time Balancing (cont)
- Set of equations is now
- Di (mi 2 sdi ) Ci Dj(mj 2 sdj ) Cj
- for all i, j
- Sum Di Dtotal
32How do policies compare in a production
environment
- 4 contended Sparcs over 10 Mbit shared ethernet
33Set of Schedules
34Tuning factor
- Tuning factor is the knobto turn to decide how
conservative a schedule should be - For example,m used to determine number of
standard deviations to add to mean - Let ui mi sdiTF
- Solve
- Di (mi sdiTF) Ci Dj (mjsdjTF) Cj
35Extensible approach
- Dont have to use mean and standard deviation
- TF can be defined in a variety of ways
36Defining our stochastic scheduling policy goals
- Decrease execution time
- Predictable performance
- Avoid spikes in execution behavior
- More conservative when in doubt
37System of benefits and penalties
- Based on Sih and Lees approach to scheduling
- Benefit (give a less conservative schedule to)
- Platforms with fewer varying machines
- Low variance machines, especially those with
lower power
38Partial ordering
39Algorithm for TF
40Scheduling Experiments
- Platform-
- 4 contended PCs running Linux
- 100 mbit shared ethernet connection
- 3 policies run back to back
- Mean Ui based on runtime mean pred.
- VTF Ui based on mean and heuristic TF
evaluation - 95TF Ui based on 95 conf. interval
41Metrics
- Window Which of each window of three runs has
fastest execution time? - Compare How often was one policy better than,
worse than, or split when compared with the
policy run just before and just after - Whats the right metric?
42SOR- scheduling 1
- Window Mean 9, CTF 27, 95TF 22 (of 57)
- Compare Better Mixed Worse
- Mean 3 4 12
- VTF 10 7 3
- 95TF 6 9 4
43CPU performance
44SOR 2
SOR- scheduling 2
- Window Mean 8, VTF 39, 95TF 11 (of 57)
- Compare Better Mixed Worse
- Mean 3 7 9
- VTF 15 2 3
- 95TF 3 8 8
45CPU
46Experimental Conclusions
- Stochastic information was more beneficial when
there was a higher variability in available CPU - Almost always we saw a reduction in variation in
actual execution times - Unclear when it is better to use which heuristic
scheduling policy at this point
47Summary
- The ability to parameterize structural models
with stochastic values in order to meet the
prediction needs of shared clusters of
workstations - A stochastic scheduling policy that can make use
of stochastic predictions to achieve better
execution times and more predictable application
behavior
48The Grid
- What is a Grid?
- Shared resources
- Coordinated problem solving
- Grid Problems
- Multiple sites (multiple institutions)
- Autonomy
- Heterogeneity
- Focus on the user
49Scheduling on the Grid
- Select resources
- Machines
- Network
- Storage Devices (Data replica)
- Move
- Data to compute
- Compute to Data
- Both
50Moving Compute to Data
- Find the fastest data source, move compute
- Pro
- Often most efficient (BW is scarce commodity)
- Cons
- Executables are picky (compilers, libraries, etc)
- Authentification/authorization/accounting
- Many minimum parameters memory, disk,
bandwidth, etc - Socio-political
51Move Data to Compute
- Select best compute resource, copy data
- Pro
- Common model today
- Sure that application will run
- Con
- May not be most efficient (Best compute may be
far from data) - Need to pick best data source for a given compute
source
52Data Replication
- Extremely large data sets
- Distributed storage sites
- One file may be available from a number of
different sources - Question where is the best source for me to copy
it from?
53High Energy Physics Example
Image courtesy H. Newman, Caltech and C.
Kesselman, ISI
54Data Replica Selection
- Given a logical file name, Replica Catalog
returns a set of physical file names - Which replica should we copy the data from?
- (What happens if that connection is interrupted?)
55What do we need to do?
- Need information
- Need to make a decision
- These are intertwined
56Where does the info come from?
VO Specific Agg Dirs
D
D
Registration
R
R
R
R
57What info is available?
- Anything in the MDS
- Anything we write GRISs for
- GridFTP
- NWS
58How do we make a decision?
- Depends on the data
- Sudharshan
- Grid FTP Data, NWS data on bandwidth
- NWS predictors
- Christopher and Jeff
- Data transfer data
- Case based Reasoning techniques
59Stochastic Predictions
- Given variance information about
- Bandwidth
- Disk access times
- Example
- Data transfer from A will take 5-7 minutes
- Data transfer from B will take 3-9 minutes
- Which to pick?
60The Bigger Picture
- Wed like a framework for data replica selection
- Choose any info available
- Choose your own algorithm
- Wed like to combine data replica selection with
CPU selection - Wed like to make life easier for the application
scientist
61Collaborators
- The AppLeS group Fran Berman (UCSD), Rich
Wolski (Univ Tennessee, UCSB) - The Northwestern University Parallel Distributed
(Beer) Lab Team - Argonne DSL Students
62Contact
- jms_at_cs.nwu.edu
- http//www.cs.nwu.edu/jms
- Funding
- NASA GSRP grant NGT-1-52133
- Darpa Contract N66001-97-C-8521
- NSF Career Grant ACI-0093300