Title: Une Heuristique d
1Une Heuristique dOrdonnancement et de
Distribution Tolérante aux Pannes pour Sytèmes
Temps-Réel Embarqués
MSR'03
- Alain Girault - Hamoudi Kalla - Yves Sorel
POP ART OSTRE Teams
Metz, France - 06 Octobre 2003
2Outline
- Introduction
- Modeling distributed real-time systems
- Problem How to introduce fault-tolerance ?
- The proposed solution for fault-tolerance
- Principles and example
- Simulations
- Conclusion and future work
3- Introduction
High level program
Compiler
Model of the algorithm
Architecture specification Distribution
constraints Execution times Real-time
constraints Failure specification
Distribution and scheduling fault-tolerant
heuristic
Fault-tolerant distributed static schedule
Code generator
Fault-tolerant distributed code
4- Modeling distributed real-time systems
- Architecture Model
- Algorithm Model
P1
I1
B
C
O
m1
m2
A
I2
P2
P3
m3
I1 and I2 are inputs operations O is
output operation A, B and C are computations
operations
P1, P2 and P3 are processors m1, m2 and m3
are communications links
5- Problem How to introduce fault-tolerance ?
Problem
- Find a distributed schedule of the algorithm on
the architecture which is fault-tolerant to
processors failures ?
P1
I1
B
schedule
m1
m2
C
O
A
I2
P2
P3
m3
6- The proposed solution for fault-tolerance
Solution
- A list scheduling heuristic which use the active
software replication of operations and
communications.
- Tolerate a number of processor failures Npf ? 1
Assumption
- Processors are assumed to be fail-silent
7- The proposed solution for fault-tolerance
Principles (1)
- Each operation/communication is replicated more
than Npf1 times on different processors/links of
the architecture graph.
8- The proposed solution for fault-tolerance
Principles (2)
9- The proposed solution for fault-tolerance
Principles (3)
10- The proposed solution for fault-tolerance
Principles (4)
- The schedule pressure ? is used as a cost
function to select the best processor p for each
operation o
where,
11- Heuristic
- ? ? o o is an input operation?
? ? - While ? ? do
- Compute the schedule pressure ? for each
operation o of on each processor p and
keep the smallest Npf1 results - Select the best candidate operation obest which
has the greatest schedule pressure ? (obest , p)
- Schedule obest on each processor p computed at
step a and the communications implied by this
schedule are replicated Npf1 times and scheduled
on parallel links - Try to minimise the start time of obest on each
processor p computed at step a by replicating
these predecessors on p ahmad and al. - Update the list of candidate operations
- ? - ?obest? ? ? o o ?
(succs obest) (preds o) ? )? - ? ? ?obest?
- end while
12- Example
P1
I1
B
C
O
m1
m2
A
I2
P2
P3
m3
Algorithm graph
Architecture graph
Number of fail-silent processor that the system
must tolerate Npf 1
Failures
13- Heuristic
- ? ? o o is an input operation?
? ? - While ? ? do
- Compute the schedule pressure ? for each
operation o of on each processor p and
keep the smallest Npf1 results - Select the best candidate operation obest which
has the greatest schedule pressure ? (obest , p)
- Schedule obest on each processor p computed at
step a and the communications implied by this
schedule are replicated Npf1 times and scheduled
on parallel links - Try to minimise the start time of obest on each
processor p computed at step a by replicating
these predecessors on p ahmad and al. - Update the list of candidate operations
- ? - ?obest? ? ? o o ?
(succs obest) (preds o) ? )? - ? ? ?obest?
- end while
14- Example
Step 1.(1)
I1
B
C
O
I1 , I2
A
I2
?
Npf 1
15- Example
Step 2.(1)
Schedule I1 on P1 and P2
I1
B
C
O
A
I2
I1
I1
Npf 1
16- Example
Step 2.(2)
Schedule I2 on P1 and P2
I1
B
C
O
A
I2
I1
I1
I2
I2
Npf 1
17- Example
Step 2.(3)
I1
B
C
O
I1, I2
A
I2
A, B
I1
I1
I2
I2
Npf 1
18- Heuristic
- ? ? o o is an input operation?
? ? - While ? ? do
- Compute the schedule pressure ? for each
operation o of on each processor p and
keep the smallest Npf1 results - Select the best candidate operation obest which
has the greatest schedule pressure ? (obest , p)
- Schedule obest on each processor p computed at
step a and the communications implied by this
schedule are replicated Npf1 times and scheduled
on parallel links - Try to minimise the start time of obest on each
processor p computed at step a by replicating
these predecessors on p ahmad and al. - Update the list of candidate operations
- ? - ?obest? ? ? o o ?
(succs obest) (preds o) ? )? - ? ? ?obest?
- end while
19- Example
Step 2.a.(3)
I1, I2
I1
B
A, B
C
O
A
I2
I1
I1
I2
I2
Npf 1
20- Heuristic
- ? ? o o is an input operation?
? ? - While ? ? do
- Compute the schedule pressure ? for each
operation o of on each processor p and
keep the smallest Npf1 results - Select the best candidate operation obest which
has the greatest schedule pressure ? (obest , p)
- Schedule obest on each processor p computed at
step a and the communications implied by this
schedule are replicated Npf1 times and scheduled
on parallel links - Try to minimise the start time of obest on each
processor p computed at step a by replicating
these predecessors on p ahmad and al. - Update the list of candidate operations
- ? - ?obest? ? ? o o ?
(succs obest) (preds o) ? )? - ? ? ?obest?
- end while
21- Example
Step 2.b.(3)
I1, I2
A, B
I1
B
C
O
A
I2
? ( A, P1 , P3 ) 7, 9
? ( B, P2 , P3 ) 6, 8
I1
I1
I2
I2
Npf 1
22- Heuristic
- ? ? o o is an input operation?
? ? - While ? ? do
- Compute the schedule pressure ? for each
operation o of on each processor p and
keep the smallest Npf1 results - Select the best candidate operation obest which
has the greatest schedule pressure ? (obest , p)
- Schedule obest on each processor p computed at
step a and the communications implied by this
schedule are replicated Npf1 times and scheduled
on parallel links - Try to minimise the start time of obest on each
processor p computed at step a by replicating
these predecessors on p ahmad and al. - Update the list of candidate operations
- ? - ?obest? ? ? o o ?
(succs obest) (preds o) ? )? - ? ? ?obest?
- end while
23- Example
Step 2.c.(3)
I1
B
? ( A, P1 , P3 ) 7, 9
C
O
A
I2
Schedule A on P1 and P3
I1
I1
I2
I2
A
A
Npf 1
24- Heuristic
- ? ? o o is an input operation?
? ? - While ? ? do
- Compute the schedule pressure ? for each
operation o of on each processor p and
keep the smallest Npf1 results - Select the best candidate operation obest which
has the greatest schedule pressure ? (obest , p)
- Schedule obest on each processor p computed at
step a and the communications implied by this
schedule are replicated Npf1 times and scheduled
on parallel links - Try to minimise the start time of obest on each
processor p computed at step a by replicating
these predecessors on p ahmad and al. - Update the list of candidate operations
- ? - ?obest? ? ? o o ?
(succs obest) (preds o) ? )? - ? ? ?obest?
- end while
25- Example
Step 2.d.(3)
I1
B
C
O
A
I2
I2
I1
I1
Replicating I2 on P3
I2
I2
A
A
Npf 1
26- Heuristic
- ? ? o o is an input operation?
? ? - While ? ? do
- Compute the schedule pressure ? for each
operation o of on each processor p and
keep the smallest Npf1 results - Select the best candidate operation obest which
has the greatest schedule pressure ? (obest , p)
- Schedule obest on each processor p computed at
step a and the communications implied by this
schedule are replicated Npf1 times and scheduled
on parallel links - Try to minimise the start time of obest on each
processor p computed at step a by replicating
these predecessors on p ahmad and al. - Update the list of candidate operations
- ? - ?obest? ? ? o o ?
(succs obest) (preds o) ? )? - ? ? ?obest?
- end while
27- Example
Step 2.e.(3)
I1
B
C
O
A, B
A
I2
I1, I2
I1
I1
I2
I2
I2
A
A
Npf 1
28- Simulations
- Aim
- Compare the proposed heuristic with the HBP
heuristic Hashimoto and al. 2002. - Assumptions
- Architecture with fully connect processors,
- Number of fail-silent processor Npf 1.
- Simulation parameters
- Communication-to-computation ratio, defined as
the average communication time divided by the
average computation time, CCR 0.1, 0.5, 1, 2, 5
and 10, - Number of operations N 10, 20, , 80.
- Comparison parameter
- Overhead
length (HTBR or HBP) - length (HTBR without
fault-tolerance)
x 100
longueur (HTBR without fault-tolerance)
29Impact of the number of operation
One processor fails
No processor failure
30Impact of the communication-to computation ratio
No processor failure
One processor fails
31- Conclusion and future work
Result
- A new scheduling heuristics based on the
active replication strategy. - It produces a static distributed schedule of a
given algorithm on a given distributed
architecture, tolerant to Npf processor failures.
Future work
- A new fault-tolerant scheduling heuristics
- Processors and communications links failures.
- Maximise the systems reliability.
32Merci