RealTime Scheduling Analysis for Multiprocessor Platforms - PowerPoint PPT Presentation

1 / 70

About This Presentation

Title:

RealTime Scheduling Analysis for Multiprocessor Platforms

Description:

Systematization of existing results for RT scheduling and schedulability analysis on MP ... Tilera's TILE64: 64-core. Nios II: x soft Cores ... – PowerPoint PPT presentation

Number of Views:132

Avg rating:3.0/5.0

Slides: 71

Provided by: retis

Category:

more less

Transcript and Presenter's Notes

Title: RealTime Scheduling Analysis for Multiprocessor Platforms

1
Real-Time Scheduling AnalysisforMultiprocessor
Platforms

Marko Bertogna
PhD dissertation
Scuola Superiore S.Anna,
Pisa, Italy

2
Overview

The Multicore Revolution
Real-Time Multiprocessor Systems existing
results
Schedulability Analysis for global schedulers
Experimental evaluation
Conclusions
Other research activities

3
Main Contributions

Systematization of existing results for RT
scheduling and schedulability analysis on MP
Polynomial and pseudo-polynomial schedulability
tests for
Work-conserving schedulers
FP
EDF
EDZL
Experimental comparison of existing techniques

4
Real-Time Systems

Solid theory of single processor systems
Optimal schedulers, tight schedulability tests,
shared resource protocols, bandwidth reservation
schemes, hierarchical schedulers, OS, etc.
Much less results for multiprocessors
Many NP-hard problems, few optimal results,
heuristic approaches, simplified task models,
only sufficient schedulability tests, etc.
Do we really need to investigate Multi-Processors
Real-Time Systems?

5
As Moores law goes on

Number of transistor/chip doubles every 18 to 24
mm months

6
heating becomes a problem

P ? V ? f Clock speed limited to less than 4 GHz

7
Solution
Use a higher number of slower logic gates
Denser chips with transistor operating at lower
frequencies
MULTICORE SYSTEMS
8
The Multicore invasion

Intels Core2, Itanium, Xeon 2, 4 cores
AMDs Opteron, Athlon 64 X2, Phenom 2, 4 cores
IBM-Toshiba-Sony Cell processor 8 cores (PSX3)
Microsofts Xenon 3 cores (Xbox 360)
ARMs MPCore 4 cores
Suns Niagara UltraSPARC 8 cores
Tileras TILE64 64-core
Nios II x soft Cores
TI, Freescale, Atmel, Broadcom,Picochip
(picoArray up to 300 DSP cores), ...

9
Identical vs heterogenous cores
ARMs MPCore
STIs Cell Processor

One Power Processor Element (PPE)
8 Synergistic Processing Element (SPE)

4 identical ARMv6 cores

10
System model

Platform with m identical processors
Task set t with n periodic or sporadic tasks ti
Period or minimum inter-arrival time Ti
Worst-case execution time Ci
Deadline Di
Utilization UiCi/Ti, density liCi/min(Di,Ti)

11
Problems addressed

Run-time scheduling problem
Schedulability problem

CPU1
t1
?
t2
CPU2
t3
t4
CPU3
t5
12
Assumptions

Independent tasks
Job-level parallelism prohibited
the same job cannot be contemporarily executed on
more than one processor
Preemption and Migration support
a preempted task can resume its execution on a
different processor
Cost of preemption/migration integrated into task
WCET

13
Global vs partitioned scheduling

Single system-wide queue or multiple
per-processor queues

Global scheduler
Partitioned scheduler
14
Partitioned Scheduling

The scheduling problem reduces to
Global (work-conserving) and partitioned
approaches are incomparable

Uniprocessor scheduling problem
Bin-packing problem

t1
t3
t5
t2
t4
NP-hard in the strong sense
Well known
EDF Utot 1
RM (RTA)
...
Various heuristics used FF, NF, BF, FFDU, BFDD,
etc.
15
Global scheduling

The m highest priority ready jobs are always the
one executing
Work-conserving scheduler
No processor is ever idled when a task is ready
to execute.

16
Global scheduling advantages

Load automatically balanced
Easier re-scheduling (dynamic loads, selective
shutdown, etc.)
Lower average response time (see queueing theory)
More efficient reclaiming and overload management
Number of preemptions
Migration cost can be mitigated by proper HW
(e.g., MPCores Direct Data Intervention)
Few schedulability tests ? Further research needed

17
Uniprocessor scheduling

EDF optimal for arbitrary job collections
Exact schedulability conditions
linear test for implicit deadlines Utot 1
Pseudo-polynomial test for constrained and
arbitrary deadlines Baruah et al. 90
Optimal priority assignments for sporadic and
synchronous periodic task systems
RM for implicit deadlines
DM for constrained deadlines
Exact pseudo-polynomial schedulability test for
FP
Response Time Analysis (RTA)

18
Global Scheduling

No optimal scheduler known for general task
models
Pfair optimal for implicit deadlines Utot m
preemption and synchronization issues
Classic schedulers are not optimal (Dhalls
effect)
Hybrid schedulers EDF-US, RM-US, DM-DS,
AdaptiveTkC, fpEDF, EDF(k), EDZL,

m light tasks 1 heavy task Utot?1
19
Global scheduling main results

Only sufficient schedulability tests
Utilization-based tests (implicit deadlines)
EDF ? Goossens et al. Utot m(1-Umax)Umax
fpEDF ? Baruah Utot (m1)/2
RM-US ? Andersson et al. Utot m2/(3m-2)
Polynomial tests
EDF, FP ? Baker O(n2) and O(n3) tests
EDZL ? Cirinei,Baker O(n2) test
Pseudo-polynomial tests
EDF, FP ? Fisher,Baruah load-based tests

20
Density-based tests

EDF ltot m(1-lmax)lmax
EDF-DS1/2 ltot (m1)/2
DM ltot m(1lmax)/2lmax
DM-DS1/3 ltot (m1)/3

ECRTS05
Gives highest priority to (at most m-1) tasks
having lt 1/2, and schedules the remaining ones
with EDF
OPODIS05
Gives highest priority to (at most m-1) tasks
having lt 1/3, and schedules the remaining ones
with DM (only constrained deadlines)
21
Critical instant

A particular configuration of releases that leads
to the largest possible response time of a task.
Possible to derive exact schedulability tests
analyzing just the critical instant situation.
Uniprocessor FP and EDF a critical instant is
when
all tasks arrive synchronously
all jobs are released as soon as permitted
Response Time Analysis for uniprocessors
FP ? the response time of task k is given by the
fixed point of Rk in the iteration

22
Multiprocessor anomaly

Synchronous periodic arrival of jobs is not a
critical instant for multiprocessors

t1 (1,1,2) t2 (1,1,3) t3 (5,6,6)
Synchronous periodic situation
Second job of t2 delayed by one unit
from Bar07
Need to find pessimistic situations to derive
sufficient schedulability tests
23
Introducing the interference
Ik Total interference suffered by task tk
Iki Interference of task ti on task tk
Ik1
Ik6
Ik3
Ik3
CPU3
tk
Ik2
Ik2
Ik5
Ik5
CPU2
tk
tk
Ik3
Ik4
Ik8
Ik7
CPU1
rkRk
rk
24
Limiting the interference
It is sufficient to consider at most the portion
(Rk-Ck1) of each term Iik in the sum
Ik1
Ik6
Ik3
Ik3
CPU3
tk
Ik2
Ik2
Ik5
Ik5
CPU2
tk
tk
Ik3
Ik4
Ik8
Ik7
CPU1
rkRk
rk
It can be proved that WCRTk is given by the fixed
point of
25
Bounding the interference

Exactly computing the interference is complex
Pessimistic assumptions
Bound the interference of a task with the
workload
Use an upper bound on the workload.

26
Bounding the workload

Consider a situation in which
The first job executes as close as possible to
its deadline
Successive jobs execute as soon as possible

( jobs excluded the last one)
where
(last job)
27
RTA for generic global schedulers

An upper bound on the WCRT of task k is given by
the fixed point of Rk in the iteration
The slack of task k is at least

Rk
Sk
28
Improvement using slack values

Consider a situation in which
The first job executes as close as possible to
its deadline
Successive jobs execute as soon as possible

( jobs excluded the last one)
where
(last job)
29
Improvement using slack values

Consider a situation in which
The first job executes as close as possible to
its deadline
Successive jobs execute as soon as possible

where
30
RTA for generic global schedulers

An upper bound on the WCRT of task k is given by
the fixed point of Rk in the iteration

1.
2.
If a fixed point Rk Dk is reached for every
task k in the system, the task set is schedulable
with any work-conserving global scheduler.
31
Iterative schedulability test

All slacks initialized to zero
Compute slack lower bound for tasks 1,,n
if higher than old value ? update slack bound
If lower, do nothing
If all tasks have a positive slack lower bound ?
return success
If no slack has been updated for tasks 1,,n ?
return fail
Otherwise, return to point 2

32
RTA refinement for Fixed Priority

The interference on higher priority tasks is
always null
An upper bound on the WCRT of task k can be given
by the fixed point of Rk in the iteration

1.
2.
33
RTA refinement for EDF

A different bound can be derived analyzing the
worst-case workload in a situation in which
The interfering and interfered tasks have a
common deadline
All jobs execute as late as possible

An upper bound on the WCRT of task k is given by
the fixed point of Rk in the iteration

1.
2.
34
Complexity

Pseudo-polynomial complexity
Fast average behavior
We verified the schedulability of millions of
task sets in a few minutes on a normal device.
Lower complexity for Fixed Priority systems
at most one slack update per task, if slacks are
updated in decreasing priority order.
Possible to reduce complexity limiting the number
of rounds

35
Polynomial complexity test

A simpler test can be derived avoiding the
iterations on the response times
A lower bound on the slack of tk is given by
The iteration on the slack values is the same
Performances comparable to RTA-based test
Complexity down to O(n2)

36
Experimental results for EDF

2 processors
Constrained
deadlines
1.000.000
task sets
generated
Our test is
constantly
superior at all
utilizations

Total task sets
generated task sets
task sets
I-BCL EDF
Goossens et al.03
Baker et al.07
Bertogna et al.05
our test
Improvement over existing solutions
Task set utilization
37
Experimental results for FP
Total task sets
generated task sets
task sets

2 processors
Constrained
deadlines
1.000.000
task sets
generated
Our test is
constantly
superior at all
utilizations

I-BCL FP
Bertogna et al.05
Baker et al.07
Density bound
our test
Task set utilization
38
FP vs EDF

4 processors
Constrained
deadlines
1.000.000
task sets
generated
our FP test is
constantly
superior to all
tests at every
utilization

generated task sets
Total task sets
task sets
I-BCL FP
Baker et al.07
I-BCL EDF
Goossens et al.03
our FP test
our EDF test
Task set utilization
39
Conclusions

Multiprocessor Real-Time systems are a promising
field to explore.
Still few existing results far from tight
conditions.
We contributed filling this gap.
Future work
Find tighter schedulability tests.
Use our techniques to analyze the efficiency of
other scheduling algorithms (EDZL, EDF-US, FP-DS,
etc).
Take into account exclusive resources access.
Integrate into Resource Reservation framework.

40
The end
41
Other research activities

Limited-preemption EDF
Reducing Resource Holding Times
Shared resources and open environments

42
ARMs MPcore
43
Frequency and power

f operating frequency
V supply voltage (V0.30.7 f)
Reducing the voltage causes a higher frequency
reduction
Ileak leakage current (becomes non-negligible)
P Pdynamic Pstatic power consumed
Pdynamic ? ACV2f (main contributor until hundreds
nm)
Pstatic ? VIleak (always present, due to
subthreshold and gate-oxide leakage)
Reducing V allows a quadratic reduction of
Pdynamic

44
Power density
45
How many cores in the future?

Intels 80 core prototype already available
Able to transfers a TB of data/s (Core 2 Duo
reaches 1.66GB data/s)
To be released in 5 years

46
Beyond 2 billion transistors/chip

Intels Tukwila
Itanium based
2.046 B FET
Quad-core
65 nm technology
2 GHz on 170W
30 MB cache
2 SMT ? 8 threads/ck

47
Intels timeline
48

From 4004 (1971) to Pentium D (2005)
Tech 10 um ? 65 nm 150 x
f 100kHz ? 3 GHz 25000 x
MOS 2.300?291.000.000 125.000 x
P 0.2W?100W 500 x
Vdd reduced (from 5V to 1V)
Not all MOS change state
Great part of chip occupied by cache
f ? Vdd-Vtt
Ileak ? Vdd, 1/Vtt

49
Intel Pentium IV (2000)
Intel 4004 (1971)
50
Itanium temperature plot
51
Problems addressed

Run-time scheduling problem
Schedulability problem

t1
CPU1
?
t2
CPU2
t3
t4
CPU3
t5
52

Incandescent light bulb 25-100 W
Compact fluorescent lights 5-30 W
Typical car 25 kW
Human climbing stairs 200 W
1 kWh 1 kW constantly supplied for 1 h
ENEL 0.13-0.18 /kWh

53
Density and utilization bounds
54
Uniprocessor feasibility
55
Uniprocessor static priority run-time scheduling
56
Uniprocessor static priority feasibility
57
Uniprocessor static priority schedulability
58
Multiprocessor feasibility
59
Multiprocessor run-time scheduling
60
Feasibility conditions
Utot gt m
Not feasible
load gt m
load gt m
???
Sufficient feasibility and schedulability tests
Feasible
Si Ci /min(Di,Ti) m
61
Multiprocessor static job priority feasibility
62
Multiprocessor static job priority schedulability
63
Multiprocessor static priority run-time scheduling
64
Multiprocessor static priority feasibility
65
Multiprocessor static priority schedulability
66
RTA for Uniprocessors

For FP, the worst-case response time of a task is
given by the first instance released at a
critical instant
For EDF, it is given by an instance in a busy
interval starting with a critical instant
With these observations it is possible to compute
the WCRT of all tasks. Example for FP, the WCRT
of a task k is given by the fixed point of

67
RTA refinement for EDF

Still valid the bound
A different bound can be derived analyzing the
worst-case workload in a situation in which
The interfering and interfered tasks have a
common deadline
All jobs execute as late as possible

Di
Ti
Si
Ci
Ci
Ci
Dk
68
RTA refinement for EDF

A different bound can be derived analyzing the
worst-case workload in a situation in which
The interfering and interfered tasks have a
common deadline
All jobs execute as late as possible

Di
Ti
Si
Ci
Ci
Ci
Dk
with
and
69
Polynomial complexity test