Evaluating Task Assignment Policies for Distributed Supercomputing Servers - PowerPoint PPT Presentation

About This Presentation

Title:

Evaluating Task Assignment Policies for Distributed Supercomputing Servers

Description:

Runtime-Based-E. Which TAP is best according to literature? 6. Simulation Setup. Runtimes are taken from PSC's. Cray J90 and C90 traces. Arrival times are ... – PowerPoint PPT presentation

Number of Views:22

Avg rating:3.0/5.0

Slides: 26

Provided by: rob1129

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Evaluating Task Assignment Policies for Distributed Supercomputing Servers

1
Evaluating Task Assignment Policies for
Distributed Supercomputing Servers
Bianca Schroeder, Mor Harchol-Balter Computer
Science Dept Carnegie Mellon University www.cs.cm
u.edu/bianca,harchol
2
The Distributed Server Model
Task Assignment Policy rule for assigning jobs
to hosts

Jobs are processed First-Come-First-Serve
Jobs are run to completion
Users provide upper bounds on runtime.

Motivation Xolas, Pleiades, NASA Ames and PSC
distributed server
3
Commonly used TAPs

1. Random
2. Round-Robin
3. Shortest-Queue
Send job to host with fewest number jobs.
4. Least-Work-Left
Send job to host with
least total work left.
Runtime-Based-E
Separate jobs by runtimes equal expected
load.

4
What is a good TAP?

We want to minimize
1. mean response time.
mean slowdown.
3. variance in slowdown.

Additionally, desire fairness.
5
Which TAP is best according to literature?

Round-Robin
Random
Shortest-Queue
4. Least-Work-Left
5.

Optimal for exponentially- distributed
runtimes. Wolff 1989
Runtime-Based-E
Better for heavy-tailed runtime
distributions. Harchol-Balter 1998
6
Simulation Setup

Runtimes are taken from PSCs
Cray J90 and C90 traces.
Arrival times are
The system has 2 or more
hosts.

A. Poisson i.i.d.
B. taken from traces.

7
Simulation Results for Slowdown
Random
LWL
Slowdown
Runtime-Based
System Load
8
Simulation Results for Variance of Slowdown
Random
Variance
LWL
Runtime-Based
1
System Load
9
WHY does Runtime-Based work so well?
Recall, P-K formula for M/G/1 queue
FCFS
Second moment of Runtime Distribution
Mean Waiting Time
Runtime-Based reduces variance of runtime
distribution at the hosts. No other policy does
this!
10
Simulation Results for Slowdown
Random
LWL
Slowdown
Runtime-Based
System Load
11
Is balancing load optimal?
All policies we have seen so far balance load.

12
New Load Unbalancing
Runtime-Based-U
13
Simulation results for Runtime-Based-U Slowdown
Slowdown
Runtime-Based-E
Runtime-Based-U-fair
Runtime-Based-U-opt
System Load
14
Simulation results for Runtime-Based-UVariance
in slowdown
Variance
Runtime-Based-E
Runtime-Based-U-fair
Runtime-Based-U-opt
System Load
15
Why does Runtime-Based-U work so well?

Like Runtime-Based-E, it reduces
the variance in job sizes.
It unbalances load.

16
How unbalanced is the load under Runtime-Based-U?
Runtime-Based-E
Runtime-Based-U-fair
Fraction of total load going to host 1
Runtime-Based-U-opt
System Load
17
Difficulties for runtime-based policies

Knowing runtimes.
Finding cutoffs.
Simple calculation using

Downey 1997
Gibbons 1997
Smith et al. 1998

P-K formula
Only 1/10 of trace data

18
Conclusion
Differences between TAPs are huge! Not intuitive
pre-analysis which TAPs are good!

Reducing variance at hosts
is important.
Load unbalancing may be
better than load balancing.
Penalizing long jobs may
actually be fair.

19
Simulation Results for Slowdown
Slowdown
System Load
20
Simulation Results for Slowdown
Slowdown
System Load
21
Simulation results for scaled interarrival times
22
Simulation results for scaled interarrival times
23
Simulation results for more than 2 hosts
Slowdown
Hosts
24
The SITA-E algorithmSize Interval Task
Assignment with Equal Load
S
Host 1
M
Host 2
Outside Arrivals
L
Host 3
XL
Host 4
The cutoffs are chosen as to balance the
load at the hosts.
25
How do you find the optimal or fair cutoff?