Lifetime ReliabilityAware Task Allocation and Scheduling for MPSoC Platforms

1 / 29
About This Presentation
Title:

Lifetime ReliabilityAware Task Allocation and Scheduling for MPSoC Platforms

Description:

... processor usage combinations in pre-calculation phase only ... Approximate aging effect in one period based on the task changes and using steady temperature ... –

Number of Views:68
Avg rating:3.0/5.0
Slides: 30
Provided by: hl860
Category:

less

Transcript and Presenter's Notes

Title: Lifetime ReliabilityAware Task Allocation and Scheduling for MPSoC Platforms


1
Lifetime Reliability-Aware Task Allocation and
Scheduling for MPSoC Platforms
  • Lin Huang, Feng Yuan and Qiang Xu
  • Reliable Computing Laboratory
  • Department of Computer Science Engineering
  • The Chinese University of Hong Kong
  • DATE09

2
Lifetime Reliability of Embedded Multiprocessor
Platform
  • Multiprocessor system-on-a-chip (MPSoC)
  • Platform-based design
  • Hardware / software co-synthesis
  • Reliability issue
  • IC product wear-out ? lifetime reliability
    threats
  • Time dependent dielectric breakdown (TDDB),
    electromigration (EM), stress migration (SM),
    negative bias temperature instability (NBTI)
  • Soft errors

3
Prior Work
  • Prior work in reliability-driven task allocation
    and scheduling
  • Constant failure rate
  • Limitation of thermal-aware task scheduling
  • Might improve the systems lifetime reliability
    implicitly
  • Not readily applicable, especially for
    heterogeneous MPSoC

4
Problem Motivation Example
  • Electromigration
  • Suppose , and all other
  • parameters are the same
  • P1 ages much faster than P2,
  • dominating the MPSoC lifetime

5
Problem Formulation
  • Task allocation and scheduling
  • Output
  • Aim to maximize the expected service life (mean
    time to failure, MTTF) of the MPSoC system under
    the performance constraint

Binding Scheduling
6
Lifetime Reliability Estimation
  • Electromigration
  • Denote by the reliability of a
    single processor at time
  • Expected service life
  • Weibull distribution

Computed by existing hard error models
Reflect some important factors (e.g.,
architecture properties)
7
Main Approach Simulated Annealing
  • Solution representation
  • (schedule order sequence resource assignment
    sequence)
  • For example, (0, 1, 3, 2, 4 P2, P2, P2, P1, P1)
  • Schedule order sequence partial order defined by
    task graph
  • Every solution corresponds to a feasible schedule
  • Schedule Reconstruction

8
Main Approach Simulated Annealing
  • Transforms of directed acyclic graph
  • Expanded task graph
  • Undirected complement graph
  • Lemma Given a valid schedule order
    , swapping adjacent nodes
    leads to another valid schedule order, provided
    there is an edge between these two nodes in the
    complement graph

T0
T1
T0
T1
T0
T1
T2
T3
T4
T2
T3
T4
T2
T3
T4
Task Graph
Expanded Task Graph
Complement Graph
9
Main Approach Simulated Annealing
  • Theorem Starting from a valid schedule order
    we are able to reach any
    other valid schedule order
  • after finite times of adjacent swapping
  • For example

2
3
0
4
1
0
2
3
4
1
2
0
3
4
1
0
2
3
1
4
T0
T1
T0
T1
T0
T1
T2
T3
T4
T2
T3
T4
T2
T3
T4
Task Graph
Expanded Task Graph
Complement Graph
10
Main Approach Simulated Annealing
  • Moves
  • M1 Swap two adjacent nodes in both schedule
    order sequence and resource assignment sequence,
    if there is an edge between these two nodes in
    the complement graph
  • M2 Swap two adjacent nodes in resource
    assignment sequence
  • M3 Change the resource assignment of a task

T0
T1
T0
T1
T0
T1
T2
T3
T4
T2
T3
T4
T2
T3
T4
Task Graph
Expanded Task Graph
Complement Graph
11
Main Approach Simulated Annealing
  • Three moves are defined, so that
  • Starting from a valid schedule order A, we are
    able to reach any other valid schedule order B
    after finite times of adjacent swapping
  • Cost function
  • First term guarantees a schedule meet all tasks
    deadlines
  • Second term indicates the system lifetime

Significant large
12
Main Approach Simulated Annealing
  • Key problem Computation time
  • Source of time overhead
  • Run temperature simulator EVERY TIME
  • we reach a new solution
  • Simulator is called 3105 times
  • Every time trace the temperature variation
  • for entire service life
  • In range of years
  • Accurate calculation requires fine-
  • grained variation trace file
  • Significant / within very short time
  • An efficient cost computation strategy is
    essential !

SA parameters
13
Revisit System Lifetime Reliability Estimation
Speedup I
  • It will be better if we are able to compute MTTF
    by tracing the temperature variation of only one
    period

14
Revisit System Lifetime Reliability Estimation
Speedup I
  • A subdivision of time


15
Revisit System Lifetime Reliability Estimation
Speedup I
  • Given
  • Aging effect in one period
  • Property does not vary from period to period
  • This property enables us to trace the temperature
    variation of only ONE period

16
Revisit System Lifetime Reliability Estimation
Speedup I
  • The expected service life of one processor is
  • Provided no redundant processors in the system,
    expected service life of entire system is

17
Revisit System Lifetime Reliability Estimation
Speedup II
  • Given
  • Instead of computing the
  • aging effect in every period,
  • we propose to compute the
  • aging effect of periods at
  • one time

18
Revisit System Lifetime Reliability Estimation
Speedup III
  • Accurate calculation requests setting the
    length of time intervals as very small value
  • Use steady temperature rather than accurate
    temporal temperature

Temperature Variation Example
Task Schedule
19
Revisit System Lifetime Reliability Estimation
Speedup IV
  • Need to run temperature simulator every time we
    reach a new solution
  • There can be at most
    kinds of processor usage
    combinations in task schedules
  • Given 3, 4, we need only 255 times
    pre-computation, each for a steady temperature
  • Estimate processors temperature for various
    processor usage combinations in pre-calculation
    phase only

20
Revisit System Lifetime Reliability Estimation
Speedup IV
Processor index under usage
  • Time slot
  • The set of under-used processors
  • The power consumption of the tasks running on
    these processors
  • Categorize the tasks into types according to
    power consumption
  • E.g.,

Task power consumption type
21
Revisit System Lifetime Reliability Estimation
Speedup IV
  • Pre-calculate the steady temperature of
    processor in time slot
  • The aging effect in unit time in this case is
    therefore
  • The aging effect of P1 in this schedule in a
    period is

22
Revisit System Lifetime Reliability Estimation
Summary
  • A summary of speedup techniques
  • Rewrite MTTF expression in terms of aging effect
    in one period
  • Compute the aging effect of several periods at
    one time
  • Approximate aging effect in one period based on
    the task changes and using steady temperature
  • Call temperature estimation simulator in the
    pre-calculation phase only
  • The time consumption of pre-calculation can be
    even reduced

23
Experimental Setup
  • Random task graphs generated by TGFF
  • Task numbers range from 20 to 260
  • Hypothetical MPSoC platforms
  • Processor core numbers range from 2 to 8
  • Homogeneous / Heterogeneous
  • Take electromigration model in Goel-IEEEPress07
    as example
  • Note that, our model also applied to other
    failure mechanisms
  • Compare our method with a thermal-aware task
    scheduling algorithm proposed in Xie-JVLSISP06

24
Accuracy
  • Comparison between approximated MTTF and accurate
    value

25
Lifetime Reliability of Various Platforms with
Various Task Graphs
? Difference ratio between MTTF of simulated
annealing and that of thermal aware
DR Deadline Relaxation
26
Lifetime Reliability of 8-Processor Platforms
27
Efficiency
  • The simulated annealing process requests 50-200s
    of CPU time on Intel(R) Core(TM) 2 CPU 2.13GHz
    for each case
  • 4 processors 49 tasks 84s
  • 8 processors 101 tasks 158s
  • The CPU time spending on pre-calculation ranges
    from 3s to 160s

28
Conclusion
  • Technology advancement has brought with adverse
    impact of on lifetime reliability of MPSoC
    embedded systems
  • Prior work on task allocation and scheduling does
    not explicitly take wearout failure into account
  • We propose an analytical model to estimate the
    lifetime reliability of multiprocessor platforms
    under periodical tasks
  • We present a novel lifetime reliability-aware
    algorithm based on simulated annealing technique
  • We propose several speedup techniques to simplify
    the design space exploration process with
    satisfactory solution quality
  • Experimental results demonstrate the effectiveness

29
Lifetime Reliability-Aware Task Allocation and
Scheduling for MPSoC Platforms
Thank you for your attention !
Write a Comment
User Comments (0)
About PowerShow.com