Load Balancing Part 2: Static Load Balancing - PowerPoint PPT Presentation

1 / 70
About This Presentation
Title:

Load Balancing Part 2: Static Load Balancing

Description:

Load balancing differs with properties of the tasks (chunks of work) ... Work stealing. Work stealing. Work stealing. Task tree (unknown shape) Self Scheduling ... – PowerPoint PPT presentation

Number of Views:379
Avg rating:3.0/5.0
Slides: 71
Provided by: kathyy
Category:

less

Transcript and Presenter's Notes

Title: Load Balancing Part 2: Static Load Balancing


1
Load Balancing Part 2 Static Load Balancing
  • Kathy Yelick
  • yelick_at_cs.berkeley.edu
  • www.cs.berkeley.edu/yelick/cs194f07

2
Load Balancing Overview
  • Load balancing differs with properties of the
    tasks (chunks of work)
  • Tasks costs
  • Do all tasks have equal costs?
  • If not, when are the costs known?
  • Before starting, when task created, or only when
    task ends
  • Task dependencies
  • Can all tasks be run in any order (including
    parallel)?
  • If not, when are the dependencies known?
  • Before starting, when task created, or only when
    task ends
  • Locality
  • Is it important for some tasks to be scheduled on
    the same processor (or nearby) to reduce
    communication cost?
  • When is the information about communication known?

3
Task Cost Spectrum
4
Task Dependency Spectrum
5
Task Locality Spectrum (Communication)
6
Spectrum of Solutions
  • A key question is when certain information about
    the load balancing problem is known.
  • Many combinations of answer leads to a spectrum
    of solutions
  • Static scheduling. All information is available
    to scheduling algorithm, which runs before any
    real computation starts.
  • Off-line algorithms make decisions before
    execution time
  • Semi-static scheduling. Information may be known
    at program startup, or the beginning of each
    timestep, or at other well-defined points.
  • Offline algorithms may be used, between major
    steps.
  • Dynamic scheduling. Information is not known
    until mid-execution.
  • On-line algorithms make decisions mid-execution

7
Solutions for Specific Problems
  • For the solutions we have so far, locality is not
    considered, i.e., the techniques do not optimized
    for it
  • Loops with independent iterations
  • Divide-and-conquer problems with little/no
    communication (bound may be communicated in
    branch-and-bound search)
  • Computationally intensive tasks like matrix
    multiply

Equal cost tasks Unequal, but known cost Unpredictable cost
Unordered bag of tasks Trivial Bin packing Self Scheduling
Task tree (unknown shape) Work stealing Work stealing Work stealing
Task graph (DAG) ? ? ?
8
Solutions for Specific Problems
  • If locality is important then we may need other
    solutions
  • Two cases
  • Task bag (independent tasks) that need to
    communicate ? run on same processor (serialize)
    or nearby
  • Task graph (dependencies) ? if two dependent
    tasks need to share data, try to schedule on same
    processor

Equal cost tasks Unequal, but known cost Unpredictable cost
Unordered bag of tasks Minimize surface to volume ratio Array decomposition or graph partition Minimize surface to volume ratio Array decomposition or graph partition Minimize surface to volume ratio Array decomposition or graph partition
Task tree Treat as general DAG if locality is really critical Treat as general DAG if locality is really critical Treat as general DAG if locality is really critical
Task graph (DAG) General scheduling problem General scheduling problem General scheduling problem
9
Regular Meshes (e.g., Game of Life)
  • Independent tasks (bag not DAG or tree)
  • Load balancing ? equal size partitions
  • Locality ? minimize perimeter using low aspect
    ratio partitions
  • Will hopefully reduce cache misses, false sharing

2n(p1/2 1) edge crossings
n(p-1) edge crossings
10
Irregular Communication Patterns
  • A task interaction graph shows which tasks
    communicate/share date with others
  • May be weighted by volume of data shared
  • If the data is constant, it may be replicated and
    doesnt count
  • The task interaction graph for the game-of-life
    is a regular 2D mesh
  • For animations, simulations of complex
    structures, etc., unstructured meshes are used
    instead

11
Definition of Graph Partitioning
  • Given a graph G (N, E, WN, WE)
  • N nodes (or vertices),
  • WN node weights
  • E edges
  • WE edge weights
  • Ex N tasks, WN task costs, edge (j,k) in
    E means task j sends WE(j,k) words to task k
  • Choose a partition N N1 U N2 U U NP such that
  • The sum of the node weights in each Nj is about
    the same
  • The sum of all edge weights of edges connecting
    all different pairs Nj and Nk is minimized
  • Ex balance the work load, while minimizing
    communication
  • Special case of N N1 U N2 Graph Bisection

2 (2)
3 (1)
1
4
1 (2)
2
4 (3)
3
1
2
2
5 (1)
8 (1)
1
6
5
6 (2)
7 (3)
12
Definition of Graph Partitioning
  • Given a graph G (N, E, WN, WE)
  • N nodes (or vertices),
  • WN node weights
  • E edges
  • WE edge weights
  • Ex N tasks, WN task costs, edge (j,k) in
    E means task j sends WE(j,k) words to task k
  • Choose a partition N N1 U N2 U U NP such that
  • The sum of the node weights in each Nj is about
    the same
  • The sum of all edge weights of edges connecting
    all different pairs Nj and Nk is minimized
    (shown in black)
  • Ex balance the work load, while minimizing
    communication
  • Special case of N N1 U N2 Graph Bisection

2 (2)
3 (1)
1
4
1 (2)
2
4 (3)
3
1
2
2
5 (1)
8 (1)
1
6
5
6 (2)
7 (3)
13
Applications
  • Telephone network design
  • Original application, algorithm due to Kernighan
  • Load Balancing while Minimizing Communication
  • Sparse Matrix times Vector Multiplication
  • Solving PDEs
  • N 1,,n, (j,k) in E if A(j,k) nonzero,
  • WN(j) nonzeros in row j, WE(j,k) 1
  • VLSI Layout
  • N units on chip, E wires, WE(j,k) wire
    length
  • Sparse Gaussian Elimination
  • Used to reorder rows and columns to increase
    parallelism, and to decrease fill-in
  • Data mining and clustering
  • Physical Mapping of DNA

14
Sparse Matrix Vector Multiplication y y Ax
declare A_local, A_remote(1num_procs),
x_local, x_remote, y_local y_local y_local
A_local x_local for all procs P that need part
of x_local send(needed part of x_local, P) for
all procs P owning needed part of
x_remote receive(x_remote, P) y_local y_local
A_remote(P)x_remote
15
Cost of Graph Partitioning
  • Many possible partitionings
    to search
  • Just to divide in 2 parts there are
  • n choose n/2
  • sqrt(2/(np))2n possibilities
  • Choosing optimal partitioning is NP-complete
  • (NP-complete we can prove it is a hard as other
    well-known hard problems in a class
    Nondeterministic Polynomial time)
  • Only known exact algorithms have cost
    exponential(n)
  • We need good heuristics

16
Overview of heuristics
17
First Heuristic Repeated Graph Bisection
  • To partition N into 2k parts
  • bisect graph recursively k times
  • Henceforth discuss mostly graph bisection

18
Edge Separators vs. Vertex Separators
  • Edge Separator Es (subset of E) separates G if
    removing Es from E leaves two equal-sized,
    disconnected components of N N1 and N2
  • Vertex Separator Ns (subset of N) separates G if
    removing Ns and all incident edges leaves two
    equal-sized, disconnected components of N N1
    and N2
  • Making an Ns from an Es pick one endpoint of
    each edge in Es
  • Ns lt Es
  • Making an Es from an Ns pick all edges incident
    on Ns
  • Es lt d Ns where d is the maximum degree of
    the graph
  • We will find Edge or Vertex Separators, as
    convenient

G (N, E), Nodes N and Edges E Es green edges
or blue edges Ns red vertices
19
Overview of Bisection Heuristics
  • Partitioning with Nodal Coordinates
  • Each node has x,y,z coordinates ? partition space
  • Partitioning without Nodal Coordinates
  • E.g., Sparse matrix of Web documents
  • A(j,k) times keyword j appears in URL k
  • Multilevel acceleration (advanced topic)
  • Approximate problem by coarse graph, do so
    recursively

20
Partitioning with Nodal Coordinatesi.e., nodes
at point in (x,y) or (x,y,z) space
21
Nodal Coordinates How Well Can We Do?
  • A planar graph can be drawn in plane without edge
    crossings
  • Ex m x m grid of m2 nodes vertex separator Ns
    with Ns m sqrt(N) (see last slide for m5
    )
  • Theorem (Tarjan, Lipton, 1979) If G is planar,
    Ns such that
  • N N1 U Ns U N2 is a partition,
  • N1 lt 2/3 N and N2 lt 2/3 N
  • Ns lt sqrt(8 N)
  • Theorem motivates intuition of following
    algorithms

22
Nodal Coordinates Inertial Partitioning
  • For a graph in 2D, choose line with half the
    nodes on one side and half on the other
  • In 3D, choose a plane, but consider 2D for
    simplicity
  • Choose a line L, and then choose a line L
    perpendicular to it, with half the nodes on
    either side

23
Inertial Partitioning Choosing L
  • Clearly prefer L on left below
  • Mathematically, choose L to be a total least
    squares fit of the nodes
  • Minimize sum of squares of distances to L (green
    lines on last slide)
  • Equivalent to choosing L as axis of rotation that
    minimizes the moment of inertia of nodes (unit
    weights) - source of name

L
N1
N1
N2
L
N2
24
Inertial Partitioning choosing L (continued)
(a,b) is unit vector perpendicular to L
Sj (length of j-th green line)2 Sj (xj -
xbar)2 (yj - ybar)2 - (-b(xj - xbar) a(yj -
ybar))2 Pythagorean
Theorem a2 Sj (xj - xbar)2 2ab Sj
(xj - xbar)(xj - ybar) b2 Sj (yj - ybar)2
a2 X1 2ab X2
b2 X3 a b
X1 X2 a X2 X3
b Minimized by choosing (xbar , ybar)
(Sj xj , Sj yj) / n center of mass (a,b)
eigenvector of smallest eigenvalue of X1
X2
X2 X3
25
Nodal Coordinates Random Spheres
  • Generalize nearest neighbor idea of a planar
    graph to higher dimensions
  • Any graph can fit in 3D with edge crossings
  • Capture intuition of planar graphs of being
    connected to nearest neighbors but in
    higher than 2 dimensions
  • For intuition, consider graph defined by a
    regular 3D mesh
  • An n by n by n mesh of N n3 nodes
  • Edges to 6 nearest neighbors
  • Partition by taking plane parallel to 2 axes
  • Cuts n2 N2/3 O(E2/3) edges
  • For the general graphs
  • Need a notion of well-shaped like mesh

26
Random Spheres Well Shaped Graphs
  • Approach due to Miller, Teng, Thurston, Vavasis
  • Def A k-ply neighborhood system in d dimensions
    is a set D1,,Dn of closed disks in Rd such
    that no point in Rd is strictly interior to more
    than k disks
  • Def An (a,k) overlap graph is a graph defined in
    terms of a gt 1 and a k-ply neighborhood system
    D1,,Dn There is a node for each Dj, and an
    edge from j to i if expanding the radius of the
    smaller of Dj and Di by gta causes the two disks
    to overlap

Ex n-by-n mesh is a (1,1) overlap graph Ex Any
planar graph is (a,k) overlap for some a,k
2D Mesh is (1,1) overlap graph
27
Generalizing Lipton/Tarjan to Higher Dimensions
  • Theorem (Miller, Teng, Thurston, Vavasis, 1993)
    Let G(N,E) be an (a,k) overlap graph in d
    dimensions with nN. Then there is a vertex
    separator Ns such that
  • N N1 U Ns U N2 and
  • N1 and N2 each has at most n(d1)/(d2) nodes
  • Ns has at most O(a k1/d n(d-1)/d ) nodes
  • When d2, same as Lipton/Tarjan
  • Algorithm
  • Choose a sphere S in Rd
  • Edges that S cuts form edge separator Es
  • Build Ns from Es
  • Choose S randomly, so that it satisfies Theorem
    with high probability

28
Stereographic Projection
  • Stereographic projection from plane to sphere
  • In d2, draw line from p to North Pole,
    projection p of p is where the line and sphere
    intersect
  • Similar in higher dimensions

p
p
p (x,y) p (2x,2y,x2 y2 1) / (x2
y2 1)
29
Choosing a Random Sphere
  • Do stereographic projection from Rd to sphere S
    in Rd1
  • Find centerpoint of projected points
  • Any plane through centerpoint divides points
    evenly
  • There is a linear programming algorithm, cheaper
    heuristics
  • Conformally map points on sphere
  • Rotate points around origin so centerpoint at
    (0,0,r) for some r
  • Dilate points (unproject, multiply by
    sqrt((1-r)/(1r)), project)
  • this maps centerpoint to origin (0,,0), spreads
    points around S
  • Pick a random plane through origin
  • Intersection of plane and sphere S is circle
  • Unproject circle
  • yields desired circle C in Rd
  • Create Ns j belongs to Ns if aDj intersects C

30
Random Sphere Algorithm (Gilbert)
31
Random Sphere Algorithm (Gilbert)
32
Random Sphere Algorithm (Gilbert)
33
Random Sphere Algorithm (Gilbert)
34
Random Sphere Algorithm (Gilbert)
35
Random Sphere Algorithm (Gilbert)
36
Nodal Coordinates Summary
  • Other variations on these algorithms
  • Algorithms are efficient
  • Rely on graphs having nodes connected (mostly) to
    nearest neighbors in space
  • algorithm does not depend on where actual edges
    are!
  • Common when graph arises from physical model
  • Ignores edges, but can be used as good starting
    guess for subsequent partitioners that do examine
    edges
  • Can do poorly if graph connection is not spatial
  • Details at
  • www.cs.berkeley.edu/demmel/cs267/lecture18/lectur
    e18.html
  • www.cs.ucsb.edu/gilbert
  • www.cs.bu.edu/steng

37
Partitioning without Nodal CoordinatesE.g., In
the WWW, nodes are web pages
38
Coordinate-Free Breadth First Search (BFS)
  • Given G(N,E) and a root node r in N, BFS produces
  • A subgraph T of G (same nodes, subset of edges)
  • T is a tree rooted at r
  • Each node assigned a level distance from r

Level 0 Level 1 Level 2 Level 3 Level 4
N1
N2
Tree edges Horizontal edges Inter-level edges
39
Partitioning via Breadth First Search
  • BFS identifies 3 kinds of edges
  • Tree Edges - part of T
  • Horizontal Edges - connect nodes at same level
  • Interlevel Edges - connect nodes at adjacent
    levels
  • No edges connect nodes in levels
  • differing by more than 1 (why?)
  • BFS partioning heuristic
  • N N1 U N2, where
  • N1 nodes at level lt L,
  • N2 nodes at level gt L
  • Choose L so N1 close to N2

BFS partition of a 2D Mesh using center as root
N1 levels 0, 1, 2, 3 N2 levels 4, 5, 6
40
Coordinate-Free Kernighan/Lin
  • Take a initial partition and iteratively improve
    it
  • Kernighan/Lin (1970), cost O(N3) but easy to
    understand
  • Fiduccia/Mattheyses (1982), cost O(E), much
    better, but more complicated
  • Given G (N,E,WE) and a partitioning N A U B,
    where A B
  • T cost(A,B) S W(e) where e connects nodes in
    A and B
  • Find subsets X of A and Y of B with X Y
  • Swapping X and Y should decrease cost
  • newA A - X U Y and newB B - Y U X
  • newT cost(newA , newB) lt cost(A,B)
  • Need to compute newT efficiently for many
    possible X and Y, choose smallest

41
Kernighan/Lin Preliminary Definitions
  • T cost(A, B), newT cost(newA, newB)
  • Need an efficient formula for newT will use
  • E(a) external cost of a in A S W(a,b) for b
    in B
  • I(a) internal cost of a in A S W(a,a) for
    other a in A
  • D(a) cost of a in A E(a) - I(a)
  • E(b), I(b) and D(b) defined analogously for b in
    B
  • Consider swapping X a and Y b
  • newA A - a U b, newB B - b U a
  • newT T - ( D(a) D(b) - 2w(a,b) ) T -
    gain(a,b)
  • gain(a,b) measures improvement gotten by swapping
    a and b
  • Update formulas
  • newD(a) D(a) 2w(a,a) - 2w(a,b) for a
    in A, a ! a
  • newD(b) D(b) 2w(b,b) - 2w(b,a) for b
    in B, b ! b

42
Kernighan/Lin Algorithm
Compute T cost(A,B) for initial A, B
cost O(N2)
Repeat One pass greedily computes
N/2 possible X,Y to swap, picks best
Compute costs D(n) for all n in N
cost O(N2)
Unmark all nodes in N
cost O(N)
While there are unmarked nodes
N/2
iterations Find an unmarked pair
(a,b) maximizing gain(a,b) cost
O(N2) Mark a and b (but do not
swap them)
cost O(1) Update D(n) for all
unmarked n, as though a
and b had been swapped
cost O(N) Endwhile
At this point we have computed a sequence of
pairs (a1,b1), , (ak,bk)
and gains gain(1),., gain(k)
where k N/2, numbered in the order in which
we marked them Pick m maximizing Gain
Sk1 to m gain(k)
cost O(N) Gain is reduction
in cost from swapping (a1,b1) through (am,bm)
If Gain gt 0 then it is worth swapping
Update newA A - a1,,am U
b1,,bm cost O(N)
Update newB B - b1,,bm U a1,,am
cost O(N)
Update T T - Gain
cost O(1)
endif Until Gain lt 0
43
Comments on Kernighan/Lin Algorithm
  • Most expensive line shown in red, O(n3)
  • Some gain(k) may be negative, but if later gains
    are large, then final Gain may be positive
  • can escape local minima where switching no pair
    helps
  • How many times do we Repeat?
  • K/L tested on very small graphs (Nlt360) and
    got convergence after 2-4 sweeps
  • For random graphs (of theoretical interest) the
    probability of convergence in one step appears to
    drop like 2-N/30

44
Coordinate-Free Spectral Bisection
  • Based on theory of Fiedler (1970s), popularized
    by Pothen, Simon, Liou (1990)
  • Motivation, by analogy
    to a vibrating
    string
  • Implementation via the
    Lanczos Algorithm
  • To optimize sparse-matrix-vector multiply, we
    graph partition
  • To graph partition, we find an eigenvector of a
    matrix associated with the graph
  • To find an eigenvector, we do sparse-matrix
    vector multiply
  • No free lunch ...

45
(No Transcript)
46
What About DAG Scheduling?
10
Each node weight corresponds to task execution
time, e.g.
5
12
4
4
7
8
0
47
List Scheduling
min max
min max different between 2nd smallest and
smallest
48
List Scheduling
  • MinMin (aggressively pick the task that can be
    done soonest)
  • for each task T pick the host H that achieves the
    smallest CT for task T
  • pick the task with the smallest such CT
  • schedule T on H
  • MaxMin (pick the largest tasks first)
  • for each task T pick the host H that achieves the
    smallest CT for task T
  • pick the task with the largest such CT
  • schedule T on H
  • Sufferage (pick the task that would suffer the
    most if not picked)
  • for each task T pick the host H that achieves the
    smallest CT for task T
  • for each task T pick the host H that achieves
    the second smallest CT for task T
  • pick the task with the largest (CT - CT) value
  • schedule T on H

49
Example (MinMin)
  • 3 tasks, 3 machines

tasks


10 24 23 16 8 30 70 12 27
machines
  • MinMin algorithm
  • P110, P28, P323

50
Example (MinMin)
  • 3 tasks, 3 machines

tasks


10 24 23 16 8 30 70 12 27
machines
  • MinMin algorithm
  • P110, P28, P323
  • Pick T2, schedule it on H2

51
Example (MinMin)
  • 3 tasks, 3 machines

tasks


10 24 23 16 8 30 70 12 27
machines
  • MinMin algorithm
  • P110, P28, P323
  • Pick T2, schedule it on H2
  • Update matrix



tasks
10 23 24 38 70 27
machines
52
Example (MinMin)
  • 3 tasks, 3 machines

tasks


10 24 23 16 8 30 70 12 27
machines
  • MinMin algorithm
  • P110, P28, P323
  • Pick T2, schedule it on H2
  • Update matrix
  • P110, P323



tasks
10 23 24 38 70 27
machines
53
Example (MinMin)
  • 3 tasks, 3 machines

tasks


10 24 23 16 8 30 70 12 27
machines
  • MinMin algorithm
  • P110, P28, P323
  • Pick T2, schedule it on H2
  • Update matrix
  • P110, P323
  • Pick T1, schedule it on H1



tasks
10 23 24 38 70 27
machines
54
Example (MinMin)
  • 3 tasks, 3 machines

tasks


10 24 23 16 8 30 70 12 27
machines
  • MinMin algorithm
  • P110, P28, P323
  • Pick T2, schedule it on H2
  • Update matrix
  • P110, P323
  • Pick T1, schedule it on H1
  • Update matrix



tasks
10 23 24 38 70 27
machines
tasks


33 38 27
machines
55
Example (MinMin)
  • 3 tasks, 3 machines

tasks


10 24 23 16 8 30 70 12 27
machines
  • MinMin algorithm
  • P110, P28, P323
  • Pick T2, schedule it on H2
  • Update matrix
  • P110, P323
  • Pick T1, schedule it on H1
  • Update matrix
  • P3 27
  • Pick T3, schedule it on H3
  • makespan 27 seconds



tasks
10 23 24 38 70 27
machines
tasks


31 38 27
machines
56
Example (MaxMin)
tasks


10 24 23 16 8 30 70 12 27
  • 3 tasks, 3 machines

machines
  • MaxMin algorithm
  • P110, P28, P323
  • Pick T3, schedule it on H1
  • Update matrix
  • P124, P28
  • Pick T1, schedule it on H2
  • Update matrix
  • P2 12
  • Pick T2, schedule it on H3
  • Makespan 24 seconds

tasks


33 47 24 8 70 12
machines
tasks


47 32 12
machines
57
Resulting Schedules
MinMin
machine 1
Task 1
machine 2
Task 2
machine 3
Task 3
machine 1
Task 3
machine 2
Task 1
MaxMin
machine 3
Task 2
58
DAGs?
  • While independent tasks occur in real
    application, the most general model of
    computation is a Directed Acyclic Graph (DAG)
  • A set of weighted nodes
  • A set of edges
  • Representative of tasks that have dependencies
    among each other

59
Example of DAG
10
DAG Length 5 number of nodes on the longest path
5
12
4
4
7
8
0
60
Example of DAG
10
DAGs levels 5 5 sets of tasks that can be
done concurrently
5
12
4
4
7
8
0
61
Example of DAG
10
DAGs width 3 size of the largest level No more
than 3 processors are useful for running this DAG
5
12
4
4
7
8
0
62
Example of DAG
10
Critical path 34 Sum of the weights along the
heaviest path Is a lower bound on the DAG
execution time
5
12
4
4
7
8
0
63
Where do DAGs come from?
  • Some applications are naturally structured as
    DAGs
  • Example image processing
  • Apply a bunch of filters, whose output feeds into
    each others input
  • But other than that, DAGs emerge from the code to
    parallelize
  • Example Linear system back-solve
  • Ax b, where A is lower triangular
  • for (i0 iltn i)
  • xi bi / aii // Task
    Ti,i
  • for (ji1 jltn j)
  • bj bj - aji xi // Task
    Ti,j
  • Leads to a DAG (see next slide)

64
Where do DAGs come from?
T1,1
T1,2
T1,3
T1,4
T1,5
T2,2
T2,3
T2,4
T2,5
T3,3
T3,4
T3,5
T4,4
  • for (i0 iltn i)
  • xi bi / aii // Task
    Ti,i
  • for (ji1 jltn j)
  • bj bj - aji xi // Task
    Ti,j

T4,5
T5,5
65
Where do DAGs come from?
T1,1
T1,2
T1,3
T1,4
T1,5
T2,2
T2,3
T2,4
T2,5
T3,3
T3,4
T3,5
T4,4
  • for (i0 iltn i)
  • xi bi / aii // Task
    Ti,i
  • for (ji1 jltn j)
  • bj bj - aji xi // Task
    Ti,j

T4,5
T5,5
66
Where do DAGs come from?
T1,1
T1,2
T1,3
T1,4
T1,5
9 levels width 4 length 9
T2,2
T2,3
T2,4
T2,5
T3,3
T3,4
T3,5
  • for (i0 iltn i)
  • xi bi / aii // Task
    Ti,i
  • for (ji1 jltn j)
  • bj bj - aji xi // Task
    Ti,j

T4,4
T4,5
T5,5
67
DAG Scheduling Problem
  • Question
  • I have a bunch of processors
  • Lets assume that theyre identical
  • I have a DAG
  • Which processor does which task so that the DAG
    execution time is minimized?
  • The solution is called a schedule
  • A list of assignments of tasks to processors
  • P1 does T1 and T4
  • P2 does T2 and T7
  • etc..
  • Goal find the optimal schedule
  • NP-hard
  • List-scheduling is often used

68
Critical Path
  • The critical path gives a lower bound on the
    execution time
  • Therefore, its intuitively a good idea to
    perform all tasks on the critical path as fast as
    possible
  • Running them slowly is certain to decrease
    performance
  • Therefore, people have developed DAG scheduling
    techniques that account for the critical path
    when scheduling tasks
  • Lets look at one possibility

69
Scheduling for Critical Path
  • First step compute the weight of the critical
    path

T1
10
T2
CP 33
4
T3
T4
8
T5
8
18
1
2
T7
T6
0
T8
70
Scheduling for Critical Path
  • Second step for each task, compute the weight of
    the heaviest path from the task to the exit node

T1
T1
33
10
T2
T2
23
4
T3
T3
T4
T4
9
T5
8
T5
10
8
19
18
1
2
1
2
T7
T7
T6
T6
0
T8
0
T8
71
Scheduling for Critical Path
  • Third step Compute the CP - the obtained weight

T1
T1
0
33
T2
T2
10
23
T3
T3
T4
T4
25
T5
9
T5
23
10
12
19
32
31
1
2
T7
T7
T6
T6
33
T8
0
T8
72
Scheduling for Critical Path
  • Fourth step Sort the tasks in order

T1
0
T2
10
T3
T4
T1, T2, T4, T5, T3, T7, T6, T8
25
T5
23
12
32
31
T7
T6
33
T8
73
Scheduling for Critical Path
  • Fifth step assign to processors

T1
10
T2
4
T3
T4
8
T5
T1, T2, T4, T5, T3, T7, T6, T8
8
18
1
2
T7
T6
0
T8
P1 P2
74
More Complex cases
  • There can be communications among tasks
  • denoted by edge weights
  • network transfer times in addition to computation
    time
  • The underlying platform can be heterogeneous
  • makes the scheduling process much more
    complicated

75
Conclusion
  • Scheduling is the land of heuristics
  • come up with an intuitive reasoning for what a
    good schedule may look like
  • validate it via simulation
  • analytical results are typically not possible
  • announce it as yet another scheduling heuristic
    (MCP, ETF, DSC, DLS, ..)
  • Its a good idea to know what type of heuristics
    are out there
  • people in the field often implement nothing
    beyond a greedy algorithm, which may be extremely
    harmful in many cases
Write a Comment
User Comments (0)
About PowerShow.com