Title: Studies of the UserScheduler Relationship
1Studies of the User-Scheduler Relationship
- Cynthia Bailey Lee
- Advisor Allan E. Snavely
- Department of Computer Science and Engineering
- San Diego Supercomputer Center
- University of California, San Diego
- May 19, 2008
2Introduction
Introduction Runtime Inaccuracy
Utility Functions Utility Model
Scheduler
- The job submission routine
- Edit job script, including resources needed and
amount of time requested - Submit jobtypically, many questions remain
- Did I request enough time?
- How long will the job wait in the queue?
- Eventually, job runsmore questions
- I submitted to a high-priority queuewas my
wait time actually shorter than if I hadnt? - By how much?
- Was it worth it?
- Is this a satisfying relationship for either
party?
3Contributions of This Work
Introduction Runtime Inaccuracy
Utility Functions Utility Model
Scheduler
- Falsified The Padding Hypothesis as the sole
explanation for users inaccurate runtime
requests - Quantified users valuation of turnaround by
collecting actual users utility curves - Proposed a model for synthetically generating
utility functions that draws on patterns seen in
the actual user curves - A genetic algorithm-based scheduler that uses
aggregate utility as an explicit objective
function
4The Padding Hypothesis
Introduction Runtime Inaccuracy
Utility Functions Utility Model
Scheduler
- The inaccuracy of users requested runtimes,
relative to the actual runtime of jobs, is
explained by users explicitly padding otherwise
accurate runtime estimates in order to avoid the
possibility of being killed by the scheduler.
5Padding Hypothesis
Introduction Runtime Inaccuracy
Utility Functions Utility Model
Scheduler
Padding Hypothesis
SDSC users were asked to provide a
no-kill/no-pressure estimate, with prizes for
being accurate
- Lessons Learned
- Users cant provide information most schedulers
ask for, but - Maybe they can (and would want to) provide useful
information schedulers currently dont ask for
72
Users are able to self-identify as more or less
accurate
Decrease
6What is a Utility Function?
Introduction Runtime Inaccuracy
Utility Functions Utility Model
Scheduler
u(t)?
time
8 am 121pm 5 pm 8 am 9
am
Other factors coordinate with other grid sites
or sensors, paper deadlines, weather and
hurricane prediction,
7Real Users' Functions
Introduction Runtime Inaccuracy
Utility Functions Utility Model
Scheduler
- Randomly-selected users of SDSC systems provided
these data points for jobs they were submitting - Utility is in terms of the SDSC charge unit
(SU)?
8More Real Users' Functions
Introduction Runtime Inaccuracy
Utility Functions Utility Model
Scheduler
9 Introduction Runtime Inaccuracy
Utility Functions Utility Model
Scheduler
Existing Model
Used by e.g. Chun and Culler 2002, and Irwin,
Grit, Chase 2004
10Proposed Model
Introduction Runtime Inaccuracy
Utility Functions Utility Model
Scheduler
- To use Aggregate Utility, utility functions
needed for all jobs - Propose to store function as series of (time,
value) pairs appending each line of Standard
Workload Format, allowing arbitrarily-shaped
functions
Absent real data collected from users for
each job, we need a model for synthetic
generation...
11Modeling Three Distinct Decay Patterns
Introduction Runtime Inaccuracy
Utility Functions Utility Model
Scheduler
- Expected Linear
- Expected Exponential
- Step
- Expected refers to the fact that each point is
chosen randomly (i.e. Most won't follow the
pattern as cleanly as shown here)?
12Start Values and Deadlines
Introduction Runtime Inaccuracy
Utility Functions Utility Model
Scheduler
- User-provided priority (queue) from the log
controls the starting (maximum) job value - Distribution of actual wait times from the log
controls the deadline (when the value goes to
zero)?
13Metric Aggregate Utility
Introduction Runtime Inaccuracy
Utility Functions Utility Model
Scheduler
- Reflects administrator's priorities
- allocation of funds (SUs/Monopoly money) to
users at the beginning of the fiscal
year/quarter/month/etc - Reflects users' personal input
- how they choose to spend their funds
- Enables more comprehensive evaluation and
comparison of all job scheduling algorithms
14Parallel Job Scheduling Explicitly by Utility
Function
Introduction Runtime Inaccuracy
Utility Functions Utility Model
Scheduler
?
Finding the best solution is NP-hard
- Tennis Court Scheduling (human-powered)?
- Still practiced occasionally at most centers
(officially and not) -- a phone call to sys
admins gets a job a reservation or to the front
of the queue - Custom Heuristics
- Sort by current value, or a combination of start
value and slope Chun and Culler 2002 Irwin,
Grit, Chase 2004
15Genetic Algorithm Scheduler
Introduction Runtime Inaccuracy
Utility Functions Utility Model
Scheduler
- Individuals
- permutations of the job queue ordering
- Mutation
- swap two randomly-selected jobs
- Reproduction
- zipper-like merging of parents (skip duplicates)?
- Fitness global utility of resulting schedule
(approx.)?
16Results
Introduction Runtime Inaccuracy
Utility Functions Utility Model
Scheduler
- Schedulers compared
- CONS Conservative Backfilling
- EASY Aggressive Backfilling
- PRIO Priority FIFO (typical supercomputer
priority scheduler)? - GA genetic algorithm
- Workload is SDSC-BLUE from the Parallel Workloads
Archive (Dror Feitelson)? - Load modified by scaling inter-arrival times
17Accurate and Inaccurate Runtimes
Introduction Runtime Inaccuracy
Utility Functions Utility Model
Scheduler
Normal Load? Heavy Load
Many, many more results in the paper...
18Current Future Work
Current Future Work
- Eliciting the Utility Function
- What would this look like in a production
environment - Interview users to better see how they think
about the utility function - Quantifying the benefit
- What is the additional benefit of providing
additional utility function data points? - Who benefits? Everyone? Do users who provide more
data points than their peers benefit individually?
19For more information
- Inaccurate runtime requests survey
- Lee, C., Y. Schwartzman, J. Hardy, A. Snavely.
Are user runtime estimates inherently
inaccurate? Workshop on Job Scheduling
Strategies for Parallel Processing, with
SIGMETRICS, June 2004. - Survey collecting SDSC users' utility curves
- Lee, C. and A. Snavely. "On the User-Scheduler
Dialogue Studies of User-Provided Runtime
Estimates and Utility Functions." International
Journal of High Performance Computing
Applications, vol. 20, 2006. - Genetic algorithm scheduler and model for
generating synthetic utility curves - Lee, C. and A. Snavely. Precise and Realistic
Utility Functions for User-Centric Performance
Analysis of Schedulers. HPDC-16, June 2007. - Contact Cynthia Lee, CL_at_SDSC.EDU