Scheduling Task Dependence Graphs with Variable Task Execution Times onto Heterogeneous Multiprocess

1 / 27
About This Presentation
Title:

Scheduling Task Dependence Graphs with Variable Task Execution Times onto Heterogeneous Multiprocess

Description:

Intra MB: Depend on decoded neighboring MBs in current frame. Predicted ... Variation in I- (a) and P- macroblock (b) execution times in H.264 decoding (a) (b) ... –

Number of Views:143
Avg rating:3.0/5.0
Slides: 28
Provided by: hsienhs
Category:

less

Transcript and Presenter's Notes

Title: Scheduling Task Dependence Graphs with Variable Task Execution Times onto Heterogeneous Multiprocess


1
Scheduling Task Dependence Graphs with Variable
Task Execution Times onto Heterogeneous
Multiprocessors
  • N. R. Satish
  • K. Ravindran
  • K. Keutzer
  • University of California at Berkeley

2
Outline
  • Motivation
  • Static Scheduling
  • Statistical Variations in real-life applications
  • Statistical Scheduling
  • Optimization Techniques
  • Results

3
Outline
  • Motivation
  • Static Scheduling
  • Statistical Variations in real-life applications
  • Statistical Scheduling
  • Optimization Techniques
  • Results

4
Static task allocation and scheduling
  • Important step in deploying concurrent
    applications on multiprocessors
  • Key components
  • Allocate tasks to processors
  • Schedule task executions in time
  • Assume knowledge of workload and parallel tasks
    at compile time
  • Relevant when dynamic scheduling is prohibitive
  • Viable for design space exploration

Multiprocessor platform
Application description
Platform Constraints
Task Graph
Allocation/Scheduling
Implementation
5
Limitations of static models
  • Static models do not capture variations in task
    execution times and dependencies
  • Dependence on inputs
  • Variations in memory access time due to cache
    effects 1
  • Static scheduling methods rely on corner-case or
    average-case estimates
  • Worst-case estimates are used in hard real-time
    apps
  • Soft real-time applications (video
    encoding/decoding, networking, gaming) only
    require statistical guarantees on latency and
    throughput hard to provide with static methods

1 P. Moge and A. Kalavade, A Tool for
Performance Estimation of Networked Embedded
End-Systems, DAC 98, pages 257-262, 1998
6
Proposed solution
  • Capture runtime variations using a statistical
    model for task execution times
  • Compute statistical distributions for the metric
    of interest (schedule length/makespan)
  • Compute percentiles to provide statistical
    guarantees

7
Outline
  • Motivation
  • Static Scheduling
  • Statistical Variations in real-life applications
  • Statistical Scheduling
  • Optimization Techniques
  • Results

8
Static Scheduling Model
  • Task dependence graph
  • G (V, E)
  • DAG with V set of tasks, E task dependencies
  • w(v,p) execution time of task v on processor p
  • each task is executed sequentially without
    preemption
  • c(e,l) communication delay along edge e
  • incurred when tasks (u,v) on edge e communicate
    over communication link l
  • Multiprocessor model
  • P set of processors
  • Connected by communication links

Task dependence graph for IPv4 packet forwarding
Architecture model for a pipeline of three
processors instantiated on Xilinx FPGA
M1
M2
P1
P2
P3
  • Restrictions
  • Lookups have to be on P2 or P3
  • Recv must be on P1
  • Send must be on P3

9
Optimization Problem Definition
  • Find
  • Allocation
  • Schedule
  • Subject to

Makespan 165
  • Minimize
  • Makespan

Longest path Makespan
10
Outline
  • Motivation
  • Static Scheduling
  • Statistical Variations in real-life applications
  • Statistical Scheduling
  • Optimization Techniques
  • Results

11
Variability in IPv4 packet forwarding
  • IPv4 forwarding involves route table lookup
  • Longest prefix match lookup on a trie table
  • Number of IP address bits required for lookup can
    vary
  • We implement IPv4 on a soft multiprocessor on
    FPGA minimal variation due to architecture

12
Example H.264 video decoding
Decoding
Parsing
Filtering
  • H.264 video is organized into frames and
    16x16-pixel macro blocks (MBs)
  • H.264 standard contains two types of MBs
  • Intra MB Depend on decoded neighboring MBs in
    current frame
  • Predicted MB Depend on MBs from previously
    decoded frames
  • Both type contain input dependent variable amount
    of residual data

Intra Frame
Predicted Frame
Task graphs for decoding in intra (I) and
predicted (P) frames
13
Variations in H.264 decoding
  • Probabilistic dependencies
  • At compile-time, we can only get probabilities of
    a macro-block being a I- or P- macroblock
  • Probabilistic execution times
  • Execution times of I and P macroblocks depend on
    residual data present in macroblock and also on
    memory access time variations

Prob (MB is of type I) p gt Prob. of existence
of each edge p
Variation in I- (a) and P- macroblock (b)
execution times in H.264 decoding
(b)
(a)
14
Outline
  • Motivation
  • Static Scheduling
  • Statistical Variations in real-life applications
  • Statistical Scheduling
  • Optimization Techniques
  • Results

15
Statistical Optimization
  • Optimization metrics like throughput will be
    distributions
  • Typically optimize for a fixed QoS (soft
    real-time applications like media and networking)
  • For instance, we may want to define the makespan
    as the 99th percentile of completion time
  • We could consider worst-case execution times, but
    that is too conservative, while average-case
    could be too optimistic

16
Statistical Models
  • Model task execution times using distributions to
    account for variations
  • Simulation based model
  • Use discretized bins for storing the pdf
  • Task dependencies may be probabilistic
  • edges may have probabilities associated with them

17
Statistical Analysis Monte Carlo simulations
  • Given a schedule, compute the finish time
    (makespan) distribution
  • Add edges corresponding to total ordering of
    tasks within a processor (ordering edges)
  • Longest path problem on the graph with ordering
    edges
  • Monte Carlo analysis

Deterministic worst case 190 Deterministic
average case 125
18
Statistical Analysis chooses better schedules
Det. avg. case 125 (opt) Det worst case
190 99 percentile 170
Det. avg. case 150 Det worst case 165
(opt) 99 percentile 150
Det. avg. case 125 Det worst case 165 99
percentile 145 (opt)
Worst and average case scheduling can judge
sub-optimal solutions to be optimal !
19
Outline
  • Motivation
  • Static Scheduling
  • Statistical Variations in real-life applications
  • Statistical Scheduling
  • Optimization Techniques
  • Results

20
Techniques for Statistical Optimization
  • Heuristics
  • List scheduling
  • Clustering
  • Force directed scheduling
  • Evolutionary Algorithms
  • Simulated Annealing
  • Hill climbing, tabu search, genetic algorithms,
    ant-colony optimization
  • Exact constraint optimization based techniques
  • Based on mathematical programming

21
Static List Scheduling Example (DLS)
while( all tasks not scheduled) Compute a
priority level for task-processor pairs
Select the task-processor pair with highest
priority and schedule task to that processor
Algorithm execution
Task dependence graph
50
30
90
70
110
150
130
Lookup 1
Lookup 2
Lookup 3
Lookup 4
Lookup 5
Lookup 6
Lookup 7
20
20
20
20
20
20
20
5
5
5
5
5
5
5
5
40
155
80
35
10
Recv
Verify TTL
Send
Update TTL
Update checksum
5
40
5
25
20
20
20
20
10
65
5
Worst-case Makespan 205 99 of makespan
185 Optimum 99 145
5
Verify checksum
25
static_level(v) delay of the longest
path from v to snk
priority_level(v,p) static_level(v)
max earliest_start_time(v,p),
earliest_ready_time(p)
Ref G. C. Sih and E. A. Lee, A Compile-time
Scheduling Heuristic for Interconnection-Constrain
ed Heterogeneous Processor Architectures, IEEE
Trans. Parallel Distrib. Syst. 4, 6, June 1993,
pp 75-87.
22
Problem with static list scheduling
  • List scheduling works by using task criticalities
  • Static analysis wrongly evaluates critical tasks

90 /70
70 /50
130 /110
110 /90
150 /130
Worst case
50
30
At 99th percentile
hhh
Lookup 1
Lookup 2
L ookup 3
Lookup 4
Lookup 5
Lookup 6
Lookup 7
20
20
0
20
20
20
20
5
5
5
5
5
5
155/135
5
5
40
80
35
10
Recv
Verify TTL
Send
Update TTL
Update checksum
5
40
5
25
20
20
20
20
10
65
5
5
Verify checksum
25
static_level(v) delay of the longest
path from v to snk
  • Solution use statistically analyzed static
    levels, proc. finish times
  • Rest of algorithm is unchanged obtain makespan
    165 instead of 185

priority_level(v,p) statistical_static_level(v)
max statistical_earliest_start_time(v,p
), statistical_earliest_ready_time(
p)
23
Simulated Annealing (SA)
Inputs Initial state s0 , Initial temperature
t0,, final temperature t8
  • Key characteristics
  • Function Temp defines the temperature update
    function
  • Function Move specifies state neighborhoods
  • Function Cost the optimization objective
  • Function Prob transition acceptance
    probabilities
  • Parameters t0 , t8 initial and final
    temperatures

Iteration count i 0
ti Temp(t0, i)
s Move(si)
Increment i
? Cost(s) Cost(s)
Yes
? 0 or Prob(? ,ti) Rand(0,1)
ti lt t8
No
No
Output best state
Yes
Accept transition si s Update best seen state
24
SA Strategy for Statistical Scheduling
  • State space set of valid allocations and
    schedules
  • Cost The required percentile of the makespan of
    a valid allocation and schedule (statistical
    analysis / deterministic worst-case analysis)
  • Move Follow the approach for static scheduling
    in Koch et al.
  • Move a randomly selected task from one processor
    to a random position in another processor
  • Not all positions are acceptable ordering or
    dependence constraints may be violated. If so, we
    undo the move
  • Compute the new schedule

25
Results
  • Two sets of experiments
  • Practical applications IPv4 packet forwarding,
    H.264 video decoding, MPEG2 encoding
  • Code profiled for IPv4 on soft Microblaze, others
    on a 2.4 GHz Pentium
  • IPv4 task graph is replicated for multiple input
    ports
  • For H.264, we assume knowledge of per-macroblock
    probabilities of being an I-macroblock
  • Random task graphs from Davidovic et al.
  • Means taken from benchmarks, standard dev
    0-0.7mean
  • Compared statistical DLS and statistical SA to
    worst-case and average-case SA results

26
Results
Statistical scheduling works 10-30 better than
static scheduling
  • Det. worst-case is about 10-20 off of
    statistical makespans at ? 99 and 30 off at ?
    90
  • Expected trend since worst-case estimates become
    worse predictors of makespan at lower ?
  • Statistical optimization is customized for a
    particular ? value
  • More improvement from larger benchmarks

Percentage makespan difference between det.
worst-case and statistical scheduling at
different percentiles on realistic apps
Average percentage makespan difference between
det. worst-case and statistical scheduling at
different percentiles for random task graph
instances on 4,6 and 8 processors
27
Thank you for your attention!
  • Thank you for your attention!
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com