Scheduling Mixed Parallel Applications with Reservations - PowerPoint PPT Presentation

1 / 49

About This Presentation

Title:

Scheduling Mixed Parallel Applications with Reservations

Description:

'Malleable tasks with precedence constraints' time. procs. Mixed Parallelism. Mixed parallelism arises in many applications, many of them scientific workflows ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 50

Provided by: henri170

Category:

more less

Transcript and Presenter's Notes

Title: Scheduling Mixed Parallel Applications with Reservations

1
Scheduling Mixed Parallel Applications with
Reservations

Henri Casanova
Information and Computer Science Dept.
University of Hawaii at Manoa
henric_at_hawaii.edu

2
Mixed Parallelism

Both task- and data-parallelism
Malleable tasks with precedence constraints

. . .
3
Mixed Parallelism

Mixed parallelism arises in many applications,
many of them scientific workflows
Example Image processing applications that apply
a graph of data-parallel filters
e.g., Hastings et al., 2003
Many workflow toolkits support mixed-parallel
applications
e.g., Stef-Praun et al., 2007, Kanazawa,
2005, Hunold et al., 2003

4
Mixed-Parallel Scheduling

Mixed-parallel scheduling has been studied by
several researchers
NP-hard, with guaranteed algorithms Lepere et
al., 2001 Jansen et al., 2006
Several heuristics have been proposed in the
literature
One-step algorithms Boudet et al., 2003
Vydyanathan et al., 2006
Task allocations and task mapping decisions
happen concurrently
Two-step algorithms Radulescu et al., 2001
Bandala et al., 2006 Rauber et al., 1998
Suter et al. 2007
First, compute task allocations
Second, map tasks to processors using some
standard list-scheduling approach

5
The Allocation Problem

We can give each task very few (one?) processors
We have tasks that run for a long time
But we can do a lot of them in parallel
We can give each task many (all?) processors
We have tasks that run quickly, but typically
with diminishing return due to lt1 parallel
efficiencies
But we cant run many tasks in parallel
Trade-off parallelism and task execution times
Question How do we achieve a good trade-off?

6
Critical Path and Work
total work sum of rectangle surfaces critical
path length execution time of the longest path
in the DAG
processors
time

Two constraints
Makespan procs gt total work
Makespan gt critical path length

7
Work vs. CP Trade-off
best lower bound on makespan
total work / procs
critical path
large
small
task allocations
8
The CPA 2-Step Algorithm

Original Algorithm Radulescu et al., 2001
For a homogeneous platform
Start by allocating 1 processor to all tasks
Then pick a task and increase its allocation by 1
processor
Picking the task that benefits the most from one
extra processor, in terms of execution time
Repeat until the critical path length and the
total work / procs become approximately equal
Improved Algorithm Suter et al., 2007
Uses an empirically better stopping criterion

9
Presentation Outline

Mixed-Parallel Scheduling
The Scheduling Problem with Reservations
Models and Assumptions
Algorithms for Minimizing Makespan
Algorithms for Meeting a Deadline
Conclusion

10
Batch Scheduling and Reservations

Platforms are shared by users, today typically by
batch schedulers
Batch schedulers have known drawbacks
non-deterministic queue waiting times
In many scenarios, one needs guarantees regarding
application completion times
As a result, most batch schedulers today support
advance reservations
One can acquire reservations for some number of
processors and for some period of time

11
Reservations
We have to schedule around the holes in the
reservation schedule
processors
time
12
Reservations
One reservation per task
processors
time
13
Complexity

The makespan minimization problem is NP-hard at
several levels (and thus also for meeting a
deadline)
Mixed-parallel scheduling is NP-hard
Guaranteed algorithms Lepère et al., 2001
Jansen et al., 2006
Scheduling independent tasks with reservations is
NP-hard and unapproximable in general
Eyraud-Dubois et al., 2007
Guaranteed algorithms with restrictions
Guaranteed algorithms for mixed-parallel
scheduling with reservations are open
In this work we focus on developing heuristics

14
Presentation Outline

Mixed-Parallel Scheduling
The Scheduling Problem with Reservations
Models and Assumptions
Algorithms for Minimizing Makespan
Algorithms for Meeting a Deadline
Conclusion

15
Models and Assumptions

Application
We assume that the application is fully specified
and static
Conservative reservations can be used to be safe
Random DAGs are generated using the method in
Suter et al., 2007
Data-parallelism is modeled based on Amdahls law
Platform
We assume that the reservation schedule does not
change while we compute the schedule
We assume that we know the reservation schedule
Sometimes not enabled by cluster administrators
We ignore communication between tasks
Since a parent task may complete well before one
of its children can start, data must be written
to disk anyway
Can be modeled via task execution time and/or
Amdahls law parameter

16
Minimizing Makespan

Natural approach adapt the CPA algorithm
Its a simple algorithm
First phase compute allocations
Second phase list-scheduling
Problem
Allocations are computed without considering
reservations
Considering reservations would involve
considering time, which is only done in the
second phase
Greedy Approach
Sort the tasks by decreasing bottom-level
For each task in this order, determine the best
feasible processor allocation
i.e., the one that has the earliest completion
time

17
Example
B
C
A
possible task configurations
D
processors
B
time
18
Computing Bottom-Levels

Problem
Computing bottom levels (BL) requires that we
know task execution times
Task execution times depend on allocations
But we compute the allocations after using the
bottom levels
We compare four ways to compute BLs
use 1-processor allocations
use all-processor allocations
use CPA-computed allocations, using all
processors
use CPA-computed allocations, using historical
average number of non-reserved processors
We find that the 4th method is marginally better
wins in 78.4 of our simulations (more details on
simulations later)
All results hereafter use this method for
computing BLs

19
Bounding Allocations

A known problem with such a greedy approach is
that allocations are too large
reduction in parallelism ends up being
detrimental to makespan
Lets try to bound allocations
Three methods
BD_HALF bound to half of the processors
BD_CPA bound by allocations in the CPA schedule
computed using all processors
BD_CPAR bound by allocations in the CPA schedule
computed using the historical average number of
non-reserved processors

20
Reservation Schedule Model?

We conduct our experiments in simulation
cheap, repeatable, controllable
We need to simulate environments for given
reservation schedules
Question what does a typical reservation
schedule look like?
Answer we dont really know yet
There is no reservation schedule archive
Lets look at what people have done in the
past...

21
Synthetic Reservation Schedules

We have schedules of batch jobs
e.g., parallel workload archive, by D.
Feitelson
Typical approach, e.g., in Smith et al., 2000
Take a batch job schedule
Mark some jobs as reserved
Remove all other jobs
Problem the amount of reservation is
approximately constant, while in the real world
we expect it to be approximately decreasing
And we see it to behave in this way in a
real-world 2.5-year trace from the Grid5K
platform
We should generate reservation schedules where
the amount of reservation decreases with time

22
Synthetic Reservation Schedules

Three methods to drop reservations after the
simulated application start time
Linearly or exponentially
so that there are no reservations after 7 days
Based on job submission time
Preliminary evaluations indicate that the
exponential method leads to schedules that are
more correlated to the Grid5K data
For 4 logs from the parallel workload archive
But this is not conclusive because we have only
one (good) data set at this point
We run simulations with 4 logs, the 3 above
methods, and with the Grid5K data
Bottom-line for this work we do not observe
discrepancies in our results for our purpose
regarding any of the above

23
Simulation Procedure

We use 40 application specifications
DAG size, width, regularity, etc.
20 samples
We use 36 reservation schedule specifications
batch log, generation method, etc.
50 samples
Total 1,440 x 1,000 1,440,000 experiments
Two metrics
Makespan
CPU-hour consumptions

24
Simulation Results
Algorithm Makespan Makespan CPU-hours CPU-hours
Algorithm avg. deg. from best of wins avg. deg. from best of wins
BD_ALL 33.75 36 42.48 0
BD_HALF 28.38 3 37.83 1
BD_CPA 0.29 1,026 0.75 6
BD_CPAR 0.21 386 0.00 1,434

Similar results for Grid5K reservation schedules

25
Presentation Outline

Mixed-Parallel Scheduling
The Scheduling Problem with Reservations
Models and Assumptions
Algorithms for Minimizing Makespan
Algorithms for Meeting a Deadline
Conclusion

26
Meeting a Deadline

A simple approach for meeting a deadline is to
simply schedule backwards from the deadline
Picking tasks by increasing bottom-levels
The way to be as safe as possible is to find for
each task the feasible allocation that starts as
late as possible given that
The exit task must complete before the deadline
The task must complete before all of its children
begin
Lets see this on a simple example

27
Meeting a Deadline Example
possible Task 1 configurations
A
B
C
Task 1
E
D
possible Task 2 configurations
A
D
C
Task 2
B
E
28
Meeting a Deadline Example
A
D
C
B
deadline
E
A
processors
time
29
Meeting a Deadline Example
A
D
C
B
deadline
E
processors
B
time
30
Meeting a Deadline Example
A
D
C
B
deadline
E
processors
C
time
31
Meeting a Deadline Example
A
D
C
B
deadline
E
D
processors
time
32
Meeting a Deadline Example
A
D
C
B
deadline
E
processors
E
time
33
Meeting a Deadline Example
A
D
C
B
deadline
E
processors
Task 2
time
34
Meeting a Deadline Example
A
B
C
deadline
E
D
A
processors
Task 2
time
35
Meeting a Deadline Example
A
B
C
deadline
E
D
B
processors
Task 2
time
36
Meeting a Deadline Example
A
B
C
deadline
E
D
C
processors
Task 2
time
37
Meeting a Deadline Example
A
B
C
deadline
E
D
processors
Task 2
D
time
38
Meeting a Deadline Example
A
B
C
deadline
E
D
processors
Task 2
E
time
39
Meeting a Deadline Example
A
B
C
deadline
E
D
Task 1
processors
Task 2
time
40
Algorithms

We can employ the same techniques for bounding
allocations as for the makespan minimization
algorithms
BD_ALL, BD_HALF, BD_CPA, BD_CPAR
Problem the algorithms do not consider the
tightness of the deadline
If the deadline is loose, the above algorithms
will consume unnecessarily high numbers of
CPU-hours
For a very loose deadline there should be no
data-parallelism, and thus no parallel efficiency
loss due to Amdahls law
Question How can we reason about deadline
tightness?

41
Deadline Tightness

For each task we have a choice of allocations
Ones that use too many processors may be wasteful
Ones that use too few processors may be dangerous
Idea
Consider the CPA-computed schedule assuming an
empty reservation schedule
Using all processors, or the historical average
number of non-reserved processors
Determine when the task would start in that
schedule, i.e., at which fraction of the overall
makespan
Pick the allocation that allows the task to start
at the same fraction of the time interval between
now and the deadline

42
Matching the CPA schedule

CPA
Schedule

processors
q procs
time
a
b
43
Matching the CPA schedule

CPA
Schedule

processors
q procs
time
a
b
c
d
Schedule with Reservation
p
processors
time
task deadline
44
Matching the CPA schedule

CPA
Schedule

processors
q procs
time
a
b
Pick the cheapest allocation such that b / (ab)
gt d / (cd)
c
d
Schedule with Reservation
p
processors
time
task deadline
45
Simulation Experiments

We call this new approach resource conservative
(RC)
We conduct simulation similar to those for the
makespan minimization algorithms
Issue the RC approach can be in trouble when it
tries to schedule the first tasks
if the reservation schedule is non-stationary
and/or tight
could be addressed via some tunable parameter
(e.g., pick an allocation that starts at least x
after the scaled CPA start time)
We do not use such a parameter in our results
We use two metrics
Tightest deadline achieved
Necessary because deadline tightness depends on
instance
Determined via binary search
CPU-hours consumption for a deadline thats 50
later than the tightest deadline

46
Simulation Results
Algorithm Tightest deadline (average degradation from best) Tightest deadline (average degradation from best) Tightest deadline (average degradation from best) Tightest deadline (average degradation from best) CPU-hours consumed for a loose deadline CPU-hours consumed for a loose deadline CPU-hours consumed for a loose deadline CPU-hours consumed for a loose deadline
Algorithm Reservation schedule Reservation schedule Reservation schedule Reservation schedule Reservation schedule Reservation schedule Reservation schedule Reservation schedule
Algorithm sparse medium tight Grid5K sparse medium tight Grid5K
BD_ALL 178 175 188 227 3556 3486 3768 2006
BD_CPAR 6.52 6.44 6.91 8.38 231 236 243 179
RC_CPA 13.17 13.27 17.36 19.51 6.39 6.80 7.98 2.15
RC_CPAR 4.12 4.27 8.26 15.14 0.16 0.15 0.16 0.09
47
Conclusions

Makespan minimization
Bounding task allocations based on the CPA
schedule works well
Meeting a deadline
Using the CPA schedule for determining task start
times works well, at least when the reservation
schedule isnt to tight
Some tuning parameter may help for tight
schedules
Or, one can use the same approach as for makespan
minimization but backwards
In both cases using the historical number of
unreserved processors leads to marginal
improvements

48
Possible Future Directions

Use a recent one-step algorithm instead of CPA
iCASLB Vydyanathan, 2006
Experiments in a real-world setting
What kind of interface should a batch scheduler
expose if the full reservation schedule must
remain hidden?
Reservation schedule archive
Needs to be a community effort