Scheduling Mixed Parallel Applications with Reservations - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Scheduling Mixed Parallel Applications with Reservations

Description:

'Malleable tasks with precedence constraints' time. procs. Mixed Parallelism. Mixed parallelism arises in many applications, many of them scientific workflows ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 50
Provided by: henri170
Category:

less

Transcript and Presenter's Notes

Title: Scheduling Mixed Parallel Applications with Reservations


1
Scheduling Mixed Parallel Applications with
Reservations
  • Henri Casanova
  • Information and Computer Science Dept.
  • University of Hawaii at Manoa
  • henric_at_hawaii.edu

2
Mixed Parallelism
  • Both task- and data-parallelism
  • Malleable tasks with precedence constraints

. . .
3
Mixed Parallelism
  • Mixed parallelism arises in many applications,
    many of them scientific workflows
  • Example Image processing applications that apply
    a graph of data-parallel filters
  • e.g., Hastings et al., 2003
  • Many workflow toolkits support mixed-parallel
    applications
  • e.g., Stef-Praun et al., 2007, Kanazawa,
    2005, Hunold et al., 2003

4
Mixed-Parallel Scheduling
  • Mixed-parallel scheduling has been studied by
    several researchers
  • NP-hard, with guaranteed algorithms Lepere et
    al., 2001 Jansen et al., 2006
  • Several heuristics have been proposed in the
    literature
  • One-step algorithms Boudet et al., 2003
    Vydyanathan et al., 2006
  • Task allocations and task mapping decisions
    happen concurrently
  • Two-step algorithms Radulescu et al., 2001
    Bandala et al., 2006 Rauber et al., 1998
    Suter et al. 2007
  • First, compute task allocations
  • Second, map tasks to processors using some
    standard list-scheduling approach

5
The Allocation Problem
  • We can give each task very few (one?) processors
  • We have tasks that run for a long time
  • But we can do a lot of them in parallel
  • We can give each task many (all?) processors
  • We have tasks that run quickly, but typically
    with diminishing return due to lt1 parallel
    efficiencies
  • But we cant run many tasks in parallel
  • Trade-off parallelism and task execution times
  • Question How do we achieve a good trade-off?

6
Critical Path and Work
total work sum of rectangle surfaces critical
path length execution time of the longest path
in the DAG
processors
time
  • Two constraints
  • Makespan procs gt total work
  • Makespan gt critical path length

7
Work vs. CP Trade-off
best lower bound on makespan
total work / procs
critical path
large
small
task allocations
8
The CPA 2-Step Algorithm
  • Original Algorithm Radulescu et al., 2001
  • For a homogeneous platform
  • Start by allocating 1 processor to all tasks
  • Then pick a task and increase its allocation by 1
    processor
  • Picking the task that benefits the most from one
    extra processor, in terms of execution time
  • Repeat until the critical path length and the
    total work / procs become approximately equal
  • Improved Algorithm Suter et al., 2007
  • Uses an empirically better stopping criterion

9
Presentation Outline
  • Mixed-Parallel Scheduling
  • The Scheduling Problem with Reservations
  • Models and Assumptions
  • Algorithms for Minimizing Makespan
  • Algorithms for Meeting a Deadline
  • Conclusion

10
Batch Scheduling and Reservations
  • Platforms are shared by users, today typically by
    batch schedulers
  • Batch schedulers have known drawbacks
  • non-deterministic queue waiting times
  • In many scenarios, one needs guarantees regarding
    application completion times
  • As a result, most batch schedulers today support
    advance reservations
  • One can acquire reservations for some number of
    processors and for some period of time

11
Reservations
We have to schedule around the holes in the
reservation schedule
processors
time
12
Reservations
One reservation per task
processors
time
13
Complexity
  • The makespan minimization problem is NP-hard at
    several levels (and thus also for meeting a
    deadline)
  • Mixed-parallel scheduling is NP-hard
  • Guaranteed algorithms Lepère et al., 2001
    Jansen et al., 2006
  • Scheduling independent tasks with reservations is
    NP-hard and unapproximable in general
    Eyraud-Dubois et al., 2007
  • Guaranteed algorithms with restrictions
  • Guaranteed algorithms for mixed-parallel
    scheduling with reservations are open
  • In this work we focus on developing heuristics

14
Presentation Outline
  • Mixed-Parallel Scheduling
  • The Scheduling Problem with Reservations
  • Models and Assumptions
  • Algorithms for Minimizing Makespan
  • Algorithms for Meeting a Deadline
  • Conclusion

15
Models and Assumptions
  • Application
  • We assume that the application is fully specified
    and static
  • Conservative reservations can be used to be safe
  • Random DAGs are generated using the method in
    Suter et al., 2007
  • Data-parallelism is modeled based on Amdahls law
  • Platform
  • We assume that the reservation schedule does not
    change while we compute the schedule
  • We assume that we know the reservation schedule
  • Sometimes not enabled by cluster administrators
  • We ignore communication between tasks
  • Since a parent task may complete well before one
    of its children can start, data must be written
    to disk anyway
  • Can be modeled via task execution time and/or
    Amdahls law parameter

16
Minimizing Makespan
  • Natural approach adapt the CPA algorithm
  • Its a simple algorithm
  • First phase compute allocations
  • Second phase list-scheduling
  • Problem
  • Allocations are computed without considering
    reservations
  • Considering reservations would involve
    considering time, which is only done in the
    second phase
  • Greedy Approach
  • Sort the tasks by decreasing bottom-level
  • For each task in this order, determine the best
    feasible processor allocation
  • i.e., the one that has the earliest completion
    time

17
Example
B
C
A
possible task configurations
D
processors
B
time
18
Computing Bottom-Levels
  • Problem
  • Computing bottom levels (BL) requires that we
    know task execution times
  • Task execution times depend on allocations
  • But we compute the allocations after using the
    bottom levels
  • We compare four ways to compute BLs
  • use 1-processor allocations
  • use all-processor allocations
  • use CPA-computed allocations, using all
    processors
  • use CPA-computed allocations, using historical
    average number of non-reserved processors
  • We find that the 4th method is marginally better
  • wins in 78.4 of our simulations (more details on
    simulations later)
  • All results hereafter use this method for
    computing BLs

19
Bounding Allocations
  • A known problem with such a greedy approach is
    that allocations are too large
  • reduction in parallelism ends up being
    detrimental to makespan
  • Lets try to bound allocations
  • Three methods
  • BD_HALF bound to half of the processors
  • BD_CPA bound by allocations in the CPA schedule
    computed using all processors
  • BD_CPAR bound by allocations in the CPA schedule
    computed using the historical average number of
    non-reserved processors

20
Reservation Schedule Model?
  • We conduct our experiments in simulation
  • cheap, repeatable, controllable
  • We need to simulate environments for given
    reservation schedules
  • Question what does a typical reservation
    schedule look like?
  • Answer we dont really know yet
  • There is no reservation schedule archive
  • Lets look at what people have done in the
    past...

21
Synthetic Reservation Schedules
  • We have schedules of batch jobs
  • e.g., parallel workload archive, by D.
    Feitelson
  • Typical approach, e.g., in Smith et al., 2000
  • Take a batch job schedule
  • Mark some jobs as reserved
  • Remove all other jobs
  • Problem the amount of reservation is
    approximately constant, while in the real world
    we expect it to be approximately decreasing
  • And we see it to behave in this way in a
    real-world 2.5-year trace from the Grid5K
    platform
  • We should generate reservation schedules where
    the amount of reservation decreases with time

22
Synthetic Reservation Schedules
  • Three methods to drop reservations after the
    simulated application start time
  • Linearly or exponentially
  • so that there are no reservations after 7 days
  • Based on job submission time
  • Preliminary evaluations indicate that the
    exponential method leads to schedules that are
    more correlated to the Grid5K data
  • For 4 logs from the parallel workload archive
  • But this is not conclusive because we have only
    one (good) data set at this point
  • We run simulations with 4 logs, the 3 above
    methods, and with the Grid5K data
  • Bottom-line for this work we do not observe
    discrepancies in our results for our purpose
    regarding any of the above

23
Simulation Procedure
  • We use 40 application specifications
  • DAG size, width, regularity, etc.
  • 20 samples
  • We use 36 reservation schedule specifications
  • batch log, generation method, etc.
  • 50 samples
  • Total 1,440 x 1,000 1,440,000 experiments
  • Two metrics
  • Makespan
  • CPU-hour consumptions

24
Simulation Results
Algorithm Makespan Makespan CPU-hours CPU-hours
Algorithm avg. deg. from best of wins avg. deg. from best of wins
BD_ALL 33.75 36 42.48 0
BD_HALF 28.38 3 37.83 1
BD_CPA 0.29 1,026 0.75 6
BD_CPAR 0.21 386 0.00 1,434
  • Similar results for Grid5K reservation schedules

25
Presentation Outline
  • Mixed-Parallel Scheduling
  • The Scheduling Problem with Reservations
  • Models and Assumptions
  • Algorithms for Minimizing Makespan
  • Algorithms for Meeting a Deadline
  • Conclusion

26
Meeting a Deadline
  • A simple approach for meeting a deadline is to
    simply schedule backwards from the deadline
  • Picking tasks by increasing bottom-levels
  • The way to be as safe as possible is to find for
    each task the feasible allocation that starts as
    late as possible given that
  • The exit task must complete before the deadline
  • The task must complete before all of its children
    begin
  • Lets see this on a simple example

27
Meeting a Deadline Example
possible Task 1 configurations
A
B
C
Task 1
E
D
possible Task 2 configurations
A
D
C
Task 2
B
E
28
Meeting a Deadline Example
A
D
C
B
deadline
E
A
processors
time
29
Meeting a Deadline Example
A
D
C
B
deadline
E
processors
B
time
30
Meeting a Deadline Example
A
D
C
B
deadline
E
processors
C
time
31
Meeting a Deadline Example
A
D
C
B
deadline
E
D
processors
time
32
Meeting a Deadline Example
A
D
C
B
deadline
E
processors
E
time
33
Meeting a Deadline Example
A
D
C
B
deadline
E
processors
Task 2
time
34
Meeting a Deadline Example
A
B
C
deadline
E
D
A
processors
Task 2
time
35
Meeting a Deadline Example
A
B
C
deadline
E
D
B
processors
Task 2
time
36
Meeting a Deadline Example
A
B
C
deadline
E
D
C
processors
Task 2
time
37
Meeting a Deadline Example
A
B
C
deadline
E
D
processors
Task 2
D
time
38
Meeting a Deadline Example
A
B
C
deadline
E
D
processors
Task 2
E
time
39
Meeting a Deadline Example
A
B
C
deadline
E
D
Task 1
processors
Task 2
time
40
Algorithms
  • We can employ the same techniques for bounding
    allocations as for the makespan minimization
    algorithms
  • BD_ALL, BD_HALF, BD_CPA, BD_CPAR
  • Problem the algorithms do not consider the
    tightness of the deadline
  • If the deadline is loose, the above algorithms
    will consume unnecessarily high numbers of
    CPU-hours
  • For a very loose deadline there should be no
    data-parallelism, and thus no parallel efficiency
    loss due to Amdahls law
  • Question How can we reason about deadline
    tightness?

41
Deadline Tightness
  • For each task we have a choice of allocations
  • Ones that use too many processors may be wasteful
  • Ones that use too few processors may be dangerous
  • Idea
  • Consider the CPA-computed schedule assuming an
    empty reservation schedule
  • Using all processors, or the historical average
    number of non-reserved processors
  • Determine when the task would start in that
    schedule, i.e., at which fraction of the overall
    makespan
  • Pick the allocation that allows the task to start
    at the same fraction of the time interval between
    now and the deadline

42
Matching the CPA schedule
  • CPA
  • Schedule

processors
q procs
time
a
b
43
Matching the CPA schedule
  • CPA
  • Schedule

processors
q procs
time
a
b
c
d
Schedule with Reservation
p
processors
time
task deadline
44
Matching the CPA schedule
  • CPA
  • Schedule

processors
q procs
time
a
b
Pick the cheapest allocation such that b / (ab)
gt d / (cd)
c
d
Schedule with Reservation
p
processors
time
task deadline
45
Simulation Experiments
  • We call this new approach resource conservative
    (RC)
  • We conduct simulation similar to those for the
    makespan minimization algorithms
  • Issue the RC approach can be in trouble when it
    tries to schedule the first tasks
  • if the reservation schedule is non-stationary
    and/or tight
  • could be addressed via some tunable parameter
    (e.g., pick an allocation that starts at least x
    after the scaled CPA start time)
  • We do not use such a parameter in our results
  • We use two metrics
  • Tightest deadline achieved
  • Necessary because deadline tightness depends on
    instance
  • Determined via binary search
  • CPU-hours consumption for a deadline thats 50
    later than the tightest deadline

46
Simulation Results
Algorithm Tightest deadline (average degradation from best) Tightest deadline (average degradation from best) Tightest deadline (average degradation from best) Tightest deadline (average degradation from best) CPU-hours consumed for a loose deadline CPU-hours consumed for a loose deadline CPU-hours consumed for a loose deadline CPU-hours consumed for a loose deadline
Algorithm Reservation schedule Reservation schedule Reservation schedule Reservation schedule Reservation schedule Reservation schedule Reservation schedule Reservation schedule
Algorithm sparse medium tight Grid5K sparse medium tight Grid5K
BD_ALL 178 175 188 227 3556 3486 3768 2006
BD_CPAR 6.52 6.44 6.91 8.38 231 236 243 179
RC_CPA 13.17 13.27 17.36 19.51 6.39 6.80 7.98 2.15
RC_CPAR 4.12 4.27 8.26 15.14 0.16 0.15 0.16 0.09
47
Conclusions
  • Makespan minimization
  • Bounding task allocations based on the CPA
    schedule works well
  • Meeting a deadline
  • Using the CPA schedule for determining task start
    times works well, at least when the reservation
    schedule isnt to tight
  • Some tuning parameter may help for tight
    schedules
  • Or, one can use the same approach as for makespan
    minimization but backwards
  • In both cases using the historical number of
    unreserved processors leads to marginal
    improvements

48
Possible Future Directions
  • Use a recent one-step algorithm instead of CPA
  • iCASLB Vydyanathan, 2006
  • Experiments in a real-world setting
  • What kind of interface should a batch scheduler
    expose if the full reservation schedule must
    remain hidden?
  • Reservation schedule archive
  • Needs to be a community effort

49
  • Scheduling Mixed-Parallel Applications with
    Advance Reservations, Kento Aida and Henri
    Casanova, to appear in Proc. of HPDC 2008
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com