Title: Static scheduling of dependent parallel tasks on heterogeneous clusters
1Static scheduling of dependent parallel tasks on
heterogeneous clusters
- J. Barbosa and C. Morais
- University of Porto
2Introduction
- Cluster Homogeneous v.s. Heterogeneous
- Scheduling Static v.s. Dynamic
- Task Dependent v.s. Independent
- Focus Task parallel v.s. Data parallel
- An application ? Tasks ? DAG ? Scheduling
- Issues on Heterogeneous clusters
- Size of subtasks
- How much data and how many operations
- Processor speed
- Capacity (affects FLOP floating point operation)
- Network communication cost
- Impact of using Idling processors
3Objectives
- Scheduling of a parallel application represented
by DAG - At any given time
- n tasks ready to start processing
- m processors available
- The goal is to minimize maxi FT(ni)
- DAG G(V,E)
- ST(ni) starting time of task node i
- FT(ni) finishing time of task node i
4Linear Algebra Kernel
- BLAS (LAPACK)
- routines that provide standard building blocks
for performing basic - vector and vector operations
- Matrix-vector multiplication
- Matrix-matrix multiplication
5Computational Model
- Estimation of processing time for each task
- Si Capacity of processor (measured in M
flop/sec) - TL Network latency
- w Bandwidth
- Total computation time (TcommTparallel)
- Tcomm time spent communicating
- where b is message size k is latency for a
message divided into k packets - Tparallel time spent in parallel operation
- where f(n) is cost function of the algorithm
(e.g. FLOP overhead)
6DAG Generation
- Task dependency ? DAG for scheduling
- Parameters
- Number of nodes
- CCR (communication to computation ratio)
- Average out-degree of a node
- Node ID
- 1 Na node has only outgoing edge
- Na Nb node has both outgoing and incoming
edge - Nb Nc node has only incoming edge
7DAG Generation
nodes havingoutgoing edge
nodes havingno incoming edge
ni
nj
i lt j
8Scheduling Algorithm
- List scheduling technique
- Determine the available tasks to schedule
- Define a priority to them
- Until all tasks are scheduled, select the task
with higher priority and assign the processor to
it - Lower bound of execution time
- Ti,p minimum processing time of task i on
machine, achieved when the fastest p processors
are used - T8 (theoretical, ideal) lower bound of
execution time - Sum of T
- Maximum capacity
- Si capacity required to achieve Ti,p
9Scheduling Algorithm
- Node Priority
- t-level and b-level
- Nodes along the DAG with higher b-level belong to
the critical path
10Simulation Setup
For each ready taskDetermine computational
capacity of processor While ready tasks !
0Assign optimal processor (best estimated)
Correct the last scheduleby assigning more
processors (from low priority) to the task
having higher b-level
11Simulation Results
T8 lower bound (if all tasks execute with S,
i.e. best machine) Tseq processing time if all
tasks execute with S and one task at a time
(i.e. sequential on one machine) Tsched
processing time with proposed algorithm
12Simulation Results
13Simulation Results