Title: Scheduling Mixed Parallel Applications with Reservations
1Scheduling Mixed Parallel Applications with
Reservations
- Henri Casanova
- Information and Computer Science Dept.
- University of Hawaii at Manoa
- henric_at_hawaii.edu
2Mixed Parallelism
- Both task- and data-parallelism
- Malleable tasks with precedence constraints
. . .
3Mixed Parallelism
- Mixed parallelism arises in many applications,
many of them scientific workflows - Example Image processing applications that apply
a graph of data-parallel filters - e.g., Hastings et al., 2003
- Many workflow toolkits support mixed-parallel
applications - e.g., Stef-Praun et al., 2007, Kanazawa,
2005, Hunold et al., 2003
4Mixed-Parallel Scheduling
- Mixed-parallel scheduling has been studied by
several researchers - NP-hard, with guaranteed algorithms Lepere et
al., 2001 Jansen et al., 2006 - Several heuristics have been proposed in the
literature - One-step algorithms Boudet et al., 2003
Vydyanathan et al., 2006 - Task allocations and task mapping decisions
happen concurrently - Two-step algorithms Radulescu et al., 2001
Bandala et al., 2006 Rauber et al., 1998
Suter et al. 2007 - First, compute task allocations
- Second, map tasks to processors using some
standard list-scheduling approach
5The Allocation Problem
- We can give each task very few (one?) processors
- We have tasks that run for a long time
- But we can do a lot of them in parallel
- We can give each task many (all?) processors
- We have tasks that run quickly, but typically
with diminishing return due to lt1 parallel
efficiencies - But we cant run many tasks in parallel
- Trade-off parallelism and task execution times
- Question How do we achieve a good trade-off?
6Critical Path and Work
total work sum of rectangle surfaces critical
path length execution time of the longest path
in the DAG
processors
time
- Two constraints
- Makespan procs gt total work
- Makespan gt critical path length
7Work vs. CP Trade-off
best lower bound on makespan
total work / procs
critical path
large
small
task allocations
8The CPA 2-Step Algorithm
- Original Algorithm Radulescu et al., 2001
- For a homogeneous platform
- Start by allocating 1 processor to all tasks
- Then pick a task and increase its allocation by 1
processor - Picking the task that benefits the most from one
extra processor, in terms of execution time - Repeat until the critical path length and the
total work / procs become approximately equal - Improved Algorithm Suter et al., 2007
- Uses an empirically better stopping criterion
9Presentation Outline
- Mixed-Parallel Scheduling
- The Scheduling Problem with Reservations
- Models and Assumptions
- Algorithms for Minimizing Makespan
- Algorithms for Meeting a Deadline
- Conclusion
10Batch Scheduling and Reservations
- Platforms are shared by users, today typically by
batch schedulers - Batch schedulers have known drawbacks
- non-deterministic queue waiting times
- In many scenarios, one needs guarantees regarding
application completion times - As a result, most batch schedulers today support
advance reservations - One can acquire reservations for some number of
processors and for some period of time
11Reservations
We have to schedule around the holes in the
reservation schedule
processors
time
12Reservations
One reservation per task
processors
time
13Complexity
- The makespan minimization problem is NP-hard at
several levels (and thus also for meeting a
deadline) - Mixed-parallel scheduling is NP-hard
- Guaranteed algorithms Lepère et al., 2001
Jansen et al., 2006 - Scheduling independent tasks with reservations is
NP-hard and unapproximable in general
Eyraud-Dubois et al., 2007 - Guaranteed algorithms with restrictions
- Guaranteed algorithms for mixed-parallel
scheduling with reservations are open - In this work we focus on developing heuristics
14Presentation Outline
- Mixed-Parallel Scheduling
- The Scheduling Problem with Reservations
- Models and Assumptions
- Algorithms for Minimizing Makespan
- Algorithms for Meeting a Deadline
- Conclusion
15Models and Assumptions
- Application
- We assume that the application is fully specified
and static - Conservative reservations can be used to be safe
- Random DAGs are generated using the method in
Suter et al., 2007 - Data-parallelism is modeled based on Amdahls law
- Platform
- We assume that the reservation schedule does not
change while we compute the schedule - We assume that we know the reservation schedule
- Sometimes not enabled by cluster administrators
- We ignore communication between tasks
- Since a parent task may complete well before one
of its children can start, data must be written
to disk anyway - Can be modeled via task execution time and/or
Amdahls law parameter
16Minimizing Makespan
- Natural approach adapt the CPA algorithm
- Its a simple algorithm
- First phase compute allocations
- Second phase list-scheduling
- Problem
- Allocations are computed without considering
reservations - Considering reservations would involve
considering time, which is only done in the
second phase - Greedy Approach
- Sort the tasks by decreasing bottom-level
- For each task in this order, determine the best
feasible processor allocation - i.e., the one that has the earliest completion
time
17Example
B
C
A
possible task configurations
D
processors
B
time
18Computing Bottom-Levels
- Problem
- Computing bottom levels (BL) requires that we
know task execution times - Task execution times depend on allocations
- But we compute the allocations after using the
bottom levels - We compare four ways to compute BLs
- use 1-processor allocations
- use all-processor allocations
- use CPA-computed allocations, using all
processors - use CPA-computed allocations, using historical
average number of non-reserved processors - We find that the 4th method is marginally better
- wins in 78.4 of our simulations (more details on
simulations later) - All results hereafter use this method for
computing BLs
19Bounding Allocations
- A known problem with such a greedy approach is
that allocations are too large - reduction in parallelism ends up being
detrimental to makespan - Lets try to bound allocations
- Three methods
- BD_HALF bound to half of the processors
- BD_CPA bound by allocations in the CPA schedule
computed using all processors - BD_CPAR bound by allocations in the CPA schedule
computed using the historical average number of
non-reserved processors
20Reservation Schedule Model?
- We conduct our experiments in simulation
- cheap, repeatable, controllable
- We need to simulate environments for given
reservation schedules - Question what does a typical reservation
schedule look like? - Answer we dont really know yet
- There is no reservation schedule archive
- Lets look at what people have done in the
past...
21Synthetic Reservation Schedules
- We have schedules of batch jobs
- e.g., parallel workload archive, by D.
Feitelson - Typical approach, e.g., in Smith et al., 2000
- Take a batch job schedule
- Mark some jobs as reserved
- Remove all other jobs
- Problem the amount of reservation is
approximately constant, while in the real world
we expect it to be approximately decreasing - And we see it to behave in this way in a
real-world 2.5-year trace from the Grid5K
platform - We should generate reservation schedules where
the amount of reservation decreases with time
22Synthetic Reservation Schedules
- Three methods to drop reservations after the
simulated application start time - Linearly or exponentially
- so that there are no reservations after 7 days
- Based on job submission time
- Preliminary evaluations indicate that the
exponential method leads to schedules that are
more correlated to the Grid5K data - For 4 logs from the parallel workload archive
- But this is not conclusive because we have only
one (good) data set at this point - We run simulations with 4 logs, the 3 above
methods, and with the Grid5K data - Bottom-line for this work we do not observe
discrepancies in our results for our purpose
regarding any of the above
23Simulation Procedure
- We use 40 application specifications
- DAG size, width, regularity, etc.
- 20 samples
- We use 36 reservation schedule specifications
- batch log, generation method, etc.
- 50 samples
- Total 1,440 x 1,000 1,440,000 experiments
- Two metrics
- Makespan
- CPU-hour consumptions
24Simulation Results
Algorithm Makespan Makespan CPU-hours CPU-hours
Algorithm avg. deg. from best of wins avg. deg. from best of wins
BD_ALL 33.75 36 42.48 0
BD_HALF 28.38 3 37.83 1
BD_CPA 0.29 1,026 0.75 6
BD_CPAR 0.21 386 0.00 1,434
- Similar results for Grid5K reservation schedules
25Presentation Outline
- Mixed-Parallel Scheduling
- The Scheduling Problem with Reservations
- Models and Assumptions
- Algorithms for Minimizing Makespan
- Algorithms for Meeting a Deadline
- Conclusion
26Meeting a Deadline
- A simple approach for meeting a deadline is to
simply schedule backwards from the deadline - Picking tasks by increasing bottom-levels
- The way to be as safe as possible is to find for
each task the feasible allocation that starts as
late as possible given that - The exit task must complete before the deadline
- The task must complete before all of its children
begin - Lets see this on a simple example
27Meeting a Deadline Example
possible Task 1 configurations
A
B
C
Task 1
E
D
possible Task 2 configurations
A
D
C
Task 2
B
E
28Meeting a Deadline Example
A
D
C
B
deadline
E
A
processors
time
29Meeting a Deadline Example
A
D
C
B
deadline
E
processors
B
time
30Meeting a Deadline Example
A
D
C
B
deadline
E
processors
C
time
31Meeting a Deadline Example
A
D
C
B
deadline
E
D
processors
time
32Meeting a Deadline Example
A
D
C
B
deadline
E
processors
E
time
33Meeting a Deadline Example
A
D
C
B
deadline
E
processors
Task 2
time
34Meeting a Deadline Example
A
B
C
deadline
E
D
A
processors
Task 2
time
35Meeting a Deadline Example
A
B
C
deadline
E
D
B
processors
Task 2
time
36Meeting a Deadline Example
A
B
C
deadline
E
D
C
processors
Task 2
time
37Meeting a Deadline Example
A
B
C
deadline
E
D
processors
Task 2
D
time
38Meeting a Deadline Example
A
B
C
deadline
E
D
processors
Task 2
E
time
39Meeting a Deadline Example
A
B
C
deadline
E
D
Task 1
processors
Task 2
time
40Algorithms
- We can employ the same techniques for bounding
allocations as for the makespan minimization
algorithms - BD_ALL, BD_HALF, BD_CPA, BD_CPAR
- Problem the algorithms do not consider the
tightness of the deadline - If the deadline is loose, the above algorithms
will consume unnecessarily high numbers of
CPU-hours - For a very loose deadline there should be no
data-parallelism, and thus no parallel efficiency
loss due to Amdahls law - Question How can we reason about deadline
tightness?
41Deadline Tightness
- For each task we have a choice of allocations
- Ones that use too many processors may be wasteful
- Ones that use too few processors may be dangerous
- Idea
- Consider the CPA-computed schedule assuming an
empty reservation schedule - Using all processors, or the historical average
number of non-reserved processors - Determine when the task would start in that
schedule, i.e., at which fraction of the overall
makespan - Pick the allocation that allows the task to start
at the same fraction of the time interval between
now and the deadline
42Matching the CPA schedule
processors
q procs
time
a
b
43Matching the CPA schedule
processors
q procs
time
a
b
c
d
Schedule with Reservation
p
processors
time
task deadline
44Matching the CPA schedule
processors
q procs
time
a
b
Pick the cheapest allocation such that b / (ab)
gt d / (cd)
c
d
Schedule with Reservation
p
processors
time
task deadline
45Simulation Experiments
- We call this new approach resource conservative
(RC) - We conduct simulation similar to those for the
makespan minimization algorithms - Issue the RC approach can be in trouble when it
tries to schedule the first tasks - if the reservation schedule is non-stationary
and/or tight - could be addressed via some tunable parameter
(e.g., pick an allocation that starts at least x
after the scaled CPA start time) - We do not use such a parameter in our results
- We use two metrics
- Tightest deadline achieved
- Necessary because deadline tightness depends on
instance - Determined via binary search
- CPU-hours consumption for a deadline thats 50
later than the tightest deadline
46Simulation Results
Algorithm Tightest deadline (average degradation from best) Tightest deadline (average degradation from best) Tightest deadline (average degradation from best) Tightest deadline (average degradation from best) CPU-hours consumed for a loose deadline CPU-hours consumed for a loose deadline CPU-hours consumed for a loose deadline CPU-hours consumed for a loose deadline
Algorithm Reservation schedule Reservation schedule Reservation schedule Reservation schedule Reservation schedule Reservation schedule Reservation schedule Reservation schedule
Algorithm sparse medium tight Grid5K sparse medium tight Grid5K
BD_ALL 178 175 188 227 3556 3486 3768 2006
BD_CPAR 6.52 6.44 6.91 8.38 231 236 243 179
RC_CPA 13.17 13.27 17.36 19.51 6.39 6.80 7.98 2.15
RC_CPAR 4.12 4.27 8.26 15.14 0.16 0.15 0.16 0.09
47Conclusions
- Makespan minimization
- Bounding task allocations based on the CPA
schedule works well - Meeting a deadline
- Using the CPA schedule for determining task start
times works well, at least when the reservation
schedule isnt to tight - Some tuning parameter may help for tight
schedules - Or, one can use the same approach as for makespan
minimization but backwards - In both cases using the historical number of
unreserved processors leads to marginal
improvements
48Possible Future Directions
- Use a recent one-step algorithm instead of CPA
- iCASLB Vydyanathan, 2006
- Experiments in a real-world setting
- What kind of interface should a batch scheduler
expose if the full reservation schedule must
remain hidden? - Reservation schedule archive
- Needs to be a community effort
49- Scheduling Mixed-Parallel Applications with
Advance Reservations, Kento Aida and Henri
Casanova, to appear in Proc. of HPDC 2008 - Questions?