Title: Scheduling for Performance
1Scheduling for Performance
- UMass-Boston
- Ethan Bolker
- April 21, 1999
2Acknowledgements
- Joint work with Jeff Buzen (BMC Software)
- BMC Software
- Dan Keefe
- Yefim Somin
- Chen Xiaogang (oliver_at_cs.umb.edu)
3Outline
- Impossibly much to cover
- Performance metrics for workloads
- Beyond priorities
- Modeling. Degradation as a performance metric
- Conservation laws and the permutahedron
- Specifying response times (IBM goal mode)
- Specifying CPU shares (Sun Fair Share)
- Priority distributions
- Work in progress
4Workload Performance Metrics
- Transaction (open) workload jobs arrive at
random from an external source - web or database server, eris with many
interactive users - inputs job arrival rate (throughput), service
time - performance metric response time
- Batch (closed) workload jobs always waiting
(latent demand) - weather prediction, data mining
- input job service time
- performance metrics response time, throughput
5Beyond priorities
- User wants performance assurance
- response time (open wkls), throughput (closed
wkls) - Single workload performance depends on resources
available (CPU, IO, network) - Multiple workloads prioritize resource access
- Nice isnt nice - hard to predict performance
from priorities - Better set performance goals, system tunes
itself - Examples IBM Goal Mode, Sun Fair Share, Eclipse,
SMART, ...
6Tuning by Tinkering
Workload Performance (Response Time)
Administrator
Priority Assignments
7Scheduling for Performance
Administrator
Performance Goals
rarely change
measure frequently
Workload Performance (Response Time)
Priority Assignments
8Modeling
- System is dynamic, state changes frequently
- Model is a static snapshot, deals in averages and
probabilities - Can ask what if? inexpensively
- Modelers measure of performance degradation
(elapsed time)/(service time) - deg ? 1, deg 1 when no contention
- (deg lt 1 if parallel computation possible)
- deg n for n closed workloads (no priorities)
9Modeling One Open Workload
- arrival rate ? (job/sec) (Poisson)
- service time s (sec/job) (exponential distn)
- utilization u ?s, 0 ? u lt 1
- Theorem deg 1/(1-u)
- Often a useful guide even when hypotheses fail
- depends only on u many small jobs few large
jobs - faster system ? smaller s ? smaller u ? smaller
deg - want u small when waiting is costly (telephones)
- want u near 1 when system is costly
(supercomputers)
10Multiple (open) workloads
- Priority state order workloads by priority (ties
OK) - two workloads, 3 states 12, 21, 12
- three workloads, 13 states 123 (3! 6 ordered
states), 123 (3 of these), 123
(3 of these), 123 - n wkls, f(n) states (simplex lock combos), n!
ordered - At each time instant, system runs in some state
s, V(s) vector of workload
degradations - Measure or model V(s) (operational analysis)
- p(s) prob( state s ) fraction of time in
state s - V ?s p(s)V(s) (time average, convex
combination)
11Two workloads (general case)
wkl 2 degradation
V(12) (wkl 1 high prio)
?
achievable region
?
V(12) (no priorities)
?
0.5 V(12) 0.5V(21)
note u1 lt u2
?
V(21)
wkl 1 degradation
12Two workloads (conservation)
wkl 2 degradation
V(12)
?
d1 d2
V(12) (no priorities, degradation)
?
?
0.5 V(12) 0.5V(21)
achievable region
u1 d1 u2 d2 --------------- constant avg
degradation u1 u2
?
V(21)
wkl 1 degradation
13Conservation
- Theorem For any priority assignments
(1/util)?wkls wutil(w)deg(w) constant avg deg - Provable from some hypotheses, observable
(false for printer queues) - For any set A of workloads
- imagine giving those workloads top priority
- discover (measure or model) avg degradation
deg(A) - (1/util(A))?w ?A util(w)deg(w) ? deg(A)
- These linear inequalities determine the convex
achievable region
14Two workloads (conservation)
u1 d1 u2 d2 --------------- constant avg
degradation u1 u2
V(12)
?
achievable region
d2
V(12))
?
d1 ? 1/(1- u1 )
d2 ? 1/(1- u2 )
?
V(21)
d1
15Three workloads
d3
u1 d1 u2 d2 u3 d3 -----------------------
avg degradation u1 u2 u3
V(123)
?
?
V(213)
d2
d1
16Three workload permutahedron
d2
d1 d2
132
312
132
312
123
123
3
321
123
231
12
123
231
d2 d3
213
213
d1
17Four workload permutahedron
4! 24 vertices (ordered states) 24 - 2 14
facets (proper subsets) (conservation
constraints) 74 faces (states)
Simplicial geometry and transportation
polytopes, Trans. Amer. Math. Soc. 217 (1976) 138.
18Scheduling for Performance
- Administrator specifies goals - e.g. degradations
- Software determines priorities, trying to meet
goals - Model maps goals to achievable degradations
workload performance goals
achievable region
19IBM OS390 Goal Mode
Administrator specifies workload degradation goals
wkl 2 degradation
too generous
?
achievable region
?
too ambitious
?
wkl 1 degradation
20Modeling Goal Mode
- Find right point in permutahedron for given V
- Linear programming solution (Coffman Mitrani)
- Algorithm modeling problem more closely
- for each subset A of workloads
- scale(A) factor to force conservation true
for A - for each workload w
- scale(w) min scale(A) scale(A) lt 1
w ?A - V(w) scale
- // inequalities now OK, scale back to phedron if
necessary - O(2n), fast enough, conjecture ?(2n)
- Refinements for workload importance
21SUN SRM (Solaris Resource Manager)
- Administrator specifies workload CPU shares
- Share f (0 lt f lt 1) means wkl guaranteed fraction
f of CPU when its on run queue, can get more if
no competition - Share utilization only for closed workloads
- Model f1 1, f2 f3 0
means wkl 1 has preemptive highest priority - Two wkls V f1 V(12) f2 V(21)
22Map Shares to Degradations
- Three (n) workloads
- f1
f2 f3 - weight(123) ------------------------------
- (f1 f2 f3) (f2 f3)
(f3) - V ?ordered states s weight(s) V(s)
- Theorem weights sum to 1
- interesting identity generalizing adding
fractions - prove by induction, or by coupon collecting
- O(n!), ?(n!), fast enough for n lt 9 (12)
23Three workload example
24Map Shares to Degradations
- Normalize f1 f2 f3 1 (barycentric
coordinates)
f1 1
achievable region
f1 0
25Experimental results for 3 workloads
26Mapping a triangle to a hexagon
f1 1
f2 0
132
312
132
312
123
f1 0
f2 1
3
321
123
231
12
123
wkl 1 high priority
231
213
213
wkl 1 low priority
27(No Transcript)
28Map Goals to Shares
- For open workloads, specifying shares is as as
unintuitive as specifying priorities - Specify degradation goals
- Map to achievable region
- Reverse map from achievable region to shares
- do
- guess shares // bisection argument
- compute degradations
- until error is acceptably small
- 10 O(n!) is good to 1
29Map degradations to priorities
- Real system works with priorities
- pdist(w,p) prob( wkl w at prio p) time
fraction
pdist space (dim n(n-1)
achievable region (dim n-1)
30Pdists to degradations and back
d2
6 pieces, each combinatorially a square
d1 d2
123
123
123
123
d2 d3
d1
31Pdists to degradations and back
1 0 0 0 .5 .5 0 .5 .5
.33 .33 .33 .33 .33 .33 .33 .33 .33
123
123
1 0 0 0 1 0 0 0 1
.5 .5 0 .5 .5 0 0 0 1
123
123
32Work in progress
- Model mixed open and closed workloads
- Prove algorithms correct
- Solaris benchmark studies (under way)
- OS390 validation - does data exist?
- Write the paper ...
- Build a product for IBM/Sun/BMC customers