Title: RealTime Scheduling Analysis for Multiprocessor Platforms
1Real-Time Scheduling AnalysisforMultiprocessor
Platforms
- Marko Bertogna
- PhD dissertation
- Scuola Superiore S.Anna,
- Pisa, Italy
2Overview
- The Multicore Revolution
- Real-Time Multiprocessor Systems existing
results - Schedulability Analysis for global schedulers
- Experimental evaluation
- Conclusions
- Other research activities
3Main Contributions
- Systematization of existing results for RT
scheduling and schedulability analysis on MP - Polynomial and pseudo-polynomial schedulability
tests for - Work-conserving schedulers
- FP
- EDF
- EDZL
- Experimental comparison of existing techniques
4Real-Time Systems
- Solid theory of single processor systems
- Optimal schedulers, tight schedulability tests,
shared resource protocols, bandwidth reservation
schemes, hierarchical schedulers, OS, etc. - Much less results for multiprocessors
- Many NP-hard problems, few optimal results,
heuristic approaches, simplified task models,
only sufficient schedulability tests, etc. - Do we really need to investigate Multi-Processors
Real-Time Systems?
5As Moores law goes on
- Number of transistor/chip doubles every 18 to 24
mm months
6heating becomes a problem
- P ? V ? f Clock speed limited to less than 4 GHz
7Solution
Use a higher number of slower logic gates
Denser chips with transistor operating at lower
frequencies
MULTICORE SYSTEMS
8The Multicore invasion
- Intels Core2, Itanium, Xeon 2, 4 cores
- AMDs Opteron, Athlon 64 X2, Phenom 2, 4 cores
- IBM-Toshiba-Sony Cell processor 8 cores (PSX3)
- Microsofts Xenon 3 cores (Xbox 360)
- ARMs MPCore 4 cores
- Suns Niagara UltraSPARC 8 cores
- Tileras TILE64 64-core
- Nios II x soft Cores
- TI, Freescale, Atmel, Broadcom,Picochip
(picoArray up to 300 DSP cores), ...
9Identical vs heterogenous cores
ARMs MPCore
STIs Cell Processor
- One Power Processor Element (PPE)
- 8 Synergistic Processing Element (SPE)
10System model
- Platform with m identical processors
- Task set t with n periodic or sporadic tasks ti
- Period or minimum inter-arrival time Ti
- Worst-case execution time Ci
- Deadline Di
- Utilization UiCi/Ti, density liCi/min(Di,Ti)
11Problems addressed
- Run-time scheduling problem
- Schedulability problem
CPU1
t1
?
t2
CPU2
t3
t4
CPU3
t5
12Assumptions
- Independent tasks
- Job-level parallelism prohibited
- the same job cannot be contemporarily executed on
more than one processor - Preemption and Migration support
- a preempted task can resume its execution on a
different processor - Cost of preemption/migration integrated into task
WCET
13Global vs partitioned scheduling
- Single system-wide queue or multiple
per-processor queues
Global scheduler
Partitioned scheduler
14Partitioned Scheduling
- The scheduling problem reduces to
- Global (work-conserving) and partitioned
approaches are incomparable
Uniprocessor scheduling problem
Bin-packing problem
t1
t3
t5
t2
t4
NP-hard in the strong sense
Well known
EDF Utot 1
RM (RTA)
...
Various heuristics used FF, NF, BF, FFDU, BFDD,
etc.
15Global scheduling
- The m highest priority ready jobs are always the
one executing - Work-conserving scheduler
- No processor is ever idled when a task is ready
to execute.
16Global scheduling advantages
- Load automatically balanced
- Easier re-scheduling (dynamic loads, selective
shutdown, etc.) - Lower average response time (see queueing theory)
- More efficient reclaiming and overload management
- Number of preemptions
- Migration cost can be mitigated by proper HW
(e.g., MPCores Direct Data Intervention) - Few schedulability tests ? Further research needed
17Uniprocessor scheduling
- EDF optimal for arbitrary job collections
- Exact schedulability conditions
- linear test for implicit deadlines Utot 1
- Pseudo-polynomial test for constrained and
arbitrary deadlines Baruah et al. 90 - Optimal priority assignments for sporadic and
synchronous periodic task systems - RM for implicit deadlines
- DM for constrained deadlines
- Exact pseudo-polynomial schedulability test for
FP - Response Time Analysis (RTA)
18Global Scheduling
- No optimal scheduler known for general task
models - Pfair optimal for implicit deadlines Utot m
- preemption and synchronization issues
- Classic schedulers are not optimal (Dhalls
effect) - Hybrid schedulers EDF-US, RM-US, DM-DS,
AdaptiveTkC, fpEDF, EDF(k), EDZL,
m light tasks 1 heavy task Utot?1
19Global scheduling main results
- Only sufficient schedulability tests
- Utilization-based tests (implicit deadlines)
- EDF ? Goossens et al. Utot m(1-Umax)Umax
- fpEDF ? Baruah Utot (m1)/2
- RM-US ? Andersson et al. Utot m2/(3m-2)
- Polynomial tests
- EDF, FP ? Baker O(n2) and O(n3) tests
- EDZL ? Cirinei,Baker O(n2) test
- Pseudo-polynomial tests
- EDF, FP ? Fisher,Baruah load-based tests
20Density-based tests
- EDF ltot m(1-lmax)lmax
- EDF-DS1/2 ltot (m1)/2
- DM ltot m(1lmax)/2lmax
- DM-DS1/3 ltot (m1)/3
ECRTS05
Gives highest priority to (at most m-1) tasks
having lt 1/2, and schedules the remaining ones
with EDF
OPODIS05
Gives highest priority to (at most m-1) tasks
having lt 1/3, and schedules the remaining ones
with DM (only constrained deadlines)
21Critical instant
- A particular configuration of releases that leads
to the largest possible response time of a task. - Possible to derive exact schedulability tests
analyzing just the critical instant situation. - Uniprocessor FP and EDF a critical instant is
when - all tasks arrive synchronously
- all jobs are released as soon as permitted
- Response Time Analysis for uniprocessors
- FP ? the response time of task k is given by the
fixed point of Rk in the iteration
22Multiprocessor anomaly
- Synchronous periodic arrival of jobs is not a
critical instant for multiprocessors
t1 (1,1,2) t2 (1,1,3) t3 (5,6,6)
Synchronous periodic situation
Second job of t2 delayed by one unit
from Bar07
Need to find pessimistic situations to derive
sufficient schedulability tests
23Introducing the interference
Ik Total interference suffered by task tk
Iki Interference of task ti on task tk
Ik1
Ik6
Ik3
Ik3
CPU3
tk
Ik2
Ik2
Ik5
Ik5
CPU2
tk
tk
Ik3
Ik4
Ik8
Ik7
CPU1
rkRk
rk
24Limiting the interference
It is sufficient to consider at most the portion
(Rk-Ck1) of each term Iik in the sum
Ik1
Ik6
Ik3
Ik3
CPU3
tk
Ik2
Ik2
Ik5
Ik5
CPU2
tk
tk
Ik3
Ik4
Ik8
Ik7
CPU1
rkRk
rk
It can be proved that WCRTk is given by the fixed
point of
25Bounding the interference
- Exactly computing the interference is complex
- Pessimistic assumptions
- Bound the interference of a task with the
workload - Use an upper bound on the workload.
26Bounding the workload
- Consider a situation in which
- The first job executes as close as possible to
its deadline - Successive jobs execute as soon as possible
( jobs excluded the last one)
where
(last job)
27RTA for generic global schedulers
- An upper bound on the WCRT of task k is given by
the fixed point of Rk in the iteration - The slack of task k is at least
Rk
Sk
28Improvement using slack values
- Consider a situation in which
- The first job executes as close as possible to
its deadline - Successive jobs execute as soon as possible
( jobs excluded the last one)
where
(last job)
29Improvement using slack values
- Consider a situation in which
- The first job executes as close as possible to
its deadline - Successive jobs execute as soon as possible
where
30RTA for generic global schedulers
- An upper bound on the WCRT of task k is given by
the fixed point of Rk in the iteration
1.
2.
If a fixed point Rk Dk is reached for every
task k in the system, the task set is schedulable
with any work-conserving global scheduler.
31Iterative schedulability test
- All slacks initialized to zero
- Compute slack lower bound for tasks 1,,n
- if higher than old value ? update slack bound
- If lower, do nothing
- If all tasks have a positive slack lower bound ?
return success - If no slack has been updated for tasks 1,,n ?
return fail - Otherwise, return to point 2
32RTA refinement for Fixed Priority
- The interference on higher priority tasks is
always null - An upper bound on the WCRT of task k can be given
by the fixed point of Rk in the iteration
1.
2.
33RTA refinement for EDF
- A different bound can be derived analyzing the
worst-case workload in a situation in which - The interfering and interfered tasks have a
common deadline - All jobs execute as late as possible
- An upper bound on the WCRT of task k is given by
the fixed point of Rk in the iteration
1.
2.
34Complexity
- Pseudo-polynomial complexity
- Fast average behavior
- We verified the schedulability of millions of
task sets in a few minutes on a normal device. - Lower complexity for Fixed Priority systems
- at most one slack update per task, if slacks are
updated in decreasing priority order. - Possible to reduce complexity limiting the number
of rounds
35Polynomial complexity test
- A simpler test can be derived avoiding the
iterations on the response times - A lower bound on the slack of tk is given by
- The iteration on the slack values is the same
- Performances comparable to RTA-based test
- Complexity down to O(n2)
36Experimental results for EDF
- 2 processors
- Constrained
- deadlines
- 1.000.000
- task sets
- generated
- Our test is
- constantly
- superior at all
- utilizations
Total task sets
generated task sets
task sets
I-BCL EDF
Goossens et al.03
Baker et al.07
Bertogna et al.05
our test
Improvement over existing solutions
Task set utilization
37Experimental results for FP
Total task sets
generated task sets
task sets
- 2 processors
- Constrained
- deadlines
- 1.000.000
- task sets
- generated
- Our test is
- constantly
- superior at all
- utilizations
I-BCL FP
Bertogna et al.05
Baker et al.07
Density bound
our test
Task set utilization
38FP vs EDF
- 4 processors
- Constrained
- deadlines
- 1.000.000
- task sets
- generated
- our FP test is
- constantly
- superior to all
- tests at every
- utilization
generated task sets
Total task sets
task sets
I-BCL FP
Baker et al.07
I-BCL EDF
Goossens et al.03
our FP test
our EDF test
Task set utilization
39Conclusions
- Multiprocessor Real-Time systems are a promising
field to explore. - Still few existing results far from tight
conditions. - We contributed filling this gap.
- Future work
- Find tighter schedulability tests.
- Use our techniques to analyze the efficiency of
other scheduling algorithms (EDZL, EDF-US, FP-DS,
etc). - Take into account exclusive resources access.
- Integrate into Resource Reservation framework.
40The end
41Other research activities
- Limited-preemption EDF
- Reducing Resource Holding Times
- Shared resources and open environments
42ARMs MPcore
43Frequency and power
- f operating frequency
- V supply voltage (V0.30.7 f)
- Reducing the voltage causes a higher frequency
reduction - Ileak leakage current (becomes non-negligible)
- P Pdynamic Pstatic power consumed
- Pdynamic ? ACV2f (main contributor until hundreds
nm) - Pstatic ? VIleak (always present, due to
subthreshold and gate-oxide leakage) - Reducing V allows a quadratic reduction of
Pdynamic
44Power density
45How many cores in the future?
- Intels 80 core prototype already available
- Able to transfers a TB of data/s (Core 2 Duo
reaches 1.66GB data/s) - To be released in 5 years
46Beyond 2 billion transistors/chip
- Intels Tukwila
- Itanium based
- 2.046 B FET
- Quad-core
- 65 nm technology
- 2 GHz on 170W
- 30 MB cache
- 2 SMT ? 8 threads/ck
47Intels timeline
48- From 4004 (1971) to Pentium D (2005)
- Tech 10 um ? 65 nm 150 x
- f 100kHz ? 3 GHz 25000 x
- MOS 2.300?291.000.000 125.000 x
- P 0.2W?100W 500 x
- Vdd reduced (from 5V to 1V)
- Not all MOS change state
- Great part of chip occupied by cache
- f ? Vdd-Vtt
- Ileak ? Vdd, 1/Vtt
49Intel Pentium IV (2000)
Intel 4004 (1971)
50Itanium temperature plot
51Problems addressed
- Run-time scheduling problem
- Schedulability problem
t1
CPU1
?
t2
CPU2
t3
t4
CPU3
t5
52- Incandescent light bulb 25-100 W
- Compact fluorescent lights 5-30 W
- Typical car 25 kW
- Human climbing stairs 200 W
- 1 kWh 1 kW constantly supplied for 1 h
- ENEL 0.13-0.18 /kWh
53Density and utilization bounds
54Uniprocessor feasibility
55Uniprocessor static priority run-time scheduling
56Uniprocessor static priority feasibility
57Uniprocessor static priority schedulability
58Multiprocessor feasibility
59Multiprocessor run-time scheduling
60Feasibility conditions
Utot gt m
Not feasible
load gt m
load gt m
???
Sufficient feasibility and schedulability tests
Feasible
Si Ci /min(Di,Ti) m
61Multiprocessor static job priority feasibility
62Multiprocessor static job priority schedulability
63Multiprocessor static priority run-time scheduling
64Multiprocessor static priority feasibility
65Multiprocessor static priority schedulability
66RTA for Uniprocessors
- For FP, the worst-case response time of a task is
given by the first instance released at a
critical instant - For EDF, it is given by an instance in a busy
interval starting with a critical instant - With these observations it is possible to compute
the WCRT of all tasks. Example for FP, the WCRT
of a task k is given by the fixed point of
67RTA refinement for EDF
- Still valid the bound
- A different bound can be derived analyzing the
worst-case workload in a situation in which - The interfering and interfered tasks have a
common deadline - All jobs execute as late as possible
Di
Ti
Si
Ci
Ci
Ci
Dk
68RTA refinement for EDF
- A different bound can be derived analyzing the
worst-case workload in a situation in which - The interfering and interfered tasks have a
common deadline - All jobs execute as late as possible
Di
Ti
Si
Ci
Ci
Ci
Dk
with
and
69Polynomial complexity test
- A lower bound on the slack of tk is given by
- For EDF
- For FP
70Limiting the number of iterations