Title: Operating System Requirements for Embedded Systems
1Operating System Requirements for Embedded Systems
2Complexity trends in OS
- OS functionality drives the
- complexity
Small controllers
sensors
Home appliances
Mobile phones
PDAs
Game Machines
Router
3Requirements of EOS
- Memory Resident size is important consideration
- Data structure optimized
- Kernel optimized and usually in assembly language
- Support of signaling interrupts
- Real-time scheduling
- tight-coupled scheduler and interrupts
- Power Management capabilities
- Power aware schedule
- Control of non-processor resources
4Embedded OS Design approach
- Traditional OS Monolithic or Distribute
- Embedded Layered is the key (Constantine D. P,
UIUC 2000)
Power Management
Basic Loader
Interrupt signalling
Real-Time Scheduler
Memory Management
Custom device support
Networking support
5Power Management by OS
- Static Approach Rely on pre-set parameters
- Example switch off the power to devices that are
not in use for a while (pre-calculated number of
cycles). Used in laptops now. - Dynamic Approach Based on dynamic condition of
workloads and per specification of power
optimization guidelines - Example Restrict multitasking progressively,
reduce context switching, even avoid cache/memory
access, etc..
6 Popular E-OS
- WinCE (proprietary, optimized assembly..)
- VxWorks
- Micro Linux
- MuCOS
- Java Virtual Machine (Picojava) OS
- Most likely first open EOS!
7Interrupts
- Each device has 1-bit arm register to be set by
software if interrupt from the device to be
accepted. - CCR is used to program the interrupts
- A good design should provide for extensibility in
the number of devices that can issue interrupts
and also number of ISRs. - Either polled or vectored interrupts depending on
nature of processors and I/O devices. - Polling Dedicated controllers, data acquisition
with periodicity and the I/O devices are slow - Interrupts Real-time environments, when events
are unpredictable and asynchronous
8Direct Memory Access
- DMA is used when low latency and/or high
bandwidth is required. (disk IO, video output or
low latency data acquisition) - Software DMA starts with normal interrupts, the
ISR sets the device resisters and initiate I/O,
processor returns to normal operation, on
completion of I/O device inform the processor. - Hardware DMA the above can be implemented in
hardware - Burst DMA when buffers are put in I/O devices
(disk) - Low latency asynchronous I/O can not use burst
DMA.
9Real-Time Scheduling
- Interrupts are heavily used in scheduling when
real-time events are to be completed by some
deadline. - Events or threads or tasks or processes need to
use priority, deadline, blocking, restoring and
nesting - NP-hard problem with out an optimal solution.
- Greedy heuristics are proposed as working
solutions with some assumptions. - Dynamic RT Scheduling Use greedy heuristics
together with priority-based interrupts.
10OS directed power reduction
- Dynamic power management determine the power
state of a device based on the current workload,
move through the power transitions based on shot
down policy - Usually, in stead of power off/on, there are
dynamic voltage setting and variable clock speeds
gt multiple power states - Previous works
- Shot down device if idle long enough
- Hardware centric gt observe past requests at
device to predict future idleness, no OS info, no
study on characteristics of requsters - Use stochastic model and assume randomly one
request without distinguishing the source of the
requester
11OS directed power reduction
- Disk request sources compiler, text editor, ftp
program - Network card internet browser or telnet session?
- Important that we have accurate model of
requesters in concurrent environment.( Task Based
Power Management) A software-centric approach - Two methods to reduce power adjust CPU clock
speed, sleeping states
12Process states
new
terminated
admitted
interrupt
exit
ready
running
Scheduler dispatch
IO or event wait
IO or event completion
waiting
13TBPMs supplement on device drivers
- Four problems
- Requesters are generated by multiple tasks. TBPM
uses the knowledge from OS kernel to separate
tasks - Tasks are created, executed and terminated. (DD
has know knowledge on multiple tasks and their
termination) - Tasks have different characteristics in device
utilization. - Task can generate requests while running. TBPM
considers CPU time of tasks while deciding the
power states - Data structures
- device-requester utilization matrix U (d, r)
utilization of device d by requester r - processor utilization vector P ( r )
percentage of processor time used by requester.
14Updating U, P
Gcc emacs netscape
12
HDD
0.4
0.7
NIC
0
0
2.3
Matrix element refers to the reciprocal of the
average Time between requests (TBR) TBRn ?. TBR
(1- ?). TBRn-1 U(d,r) 1/ TBRn 0 lt ? lt1 If ?
0, TBRn is constant using the first TBR and for ?
1, TBRn is last TBR.
15Updating U, P
- P(r ) is the percentage of CPU time executing
task r or - CPU time (r )/ ? CPU time by all
requester - Updated based on sliding window scheme but not a
discounted scheme as used for U. - Incase of IO bound bursty requests, TBR will show
on high utilization but can not capture the
running time requirements - Sliding window is used to compute CPU time
distributed among processes. But the window time
should be such that it samples all processes
(long) and also reflect the workload variation
(short).
16Shutdown condition
- Break-even-time minimum length of idle time
- Depends on device characteristics
- Independent of workloads
- Performance Consideration
- Interactive system If many shutdowns issued in
short time, will increase response time gt
degrade perceived interactivity - User might react to obtain response and hence
steep increase in system load. - Restrict two consecutive shutdowns within time
to wake up (say)
17 TBPM Procedure
- Integration of power management with process
management
new
terminated
Allocate column
Delete column
ready
running
Update P(r )
Update P(r )
Update U(d,r)
waiting
18TBPM procedure
- A requester column is allocated when a new task
is created and the column is deleted when task
terminates - Utilization set to zero but updated on issue of a
request. The PM evaluates the utilization in the
process scheduler. - In lightly loaded system
- Sparse requests will not cause the PM to keep a
device in working state long since P( r ) is
small for this requester - With heavy workload
- Does not use device frequently since the PM shuts
the device after its use ( U(d) is small)
19Experiments
- Platform
- Personal computer, TBPM in Linux kernel, Redhat
6.0 - To control power states of HD and network
transmitter ( wireless) - Modify kernel and device drivers of PC with
xWindow and NW, configured as a client, server
daemons (http server, internet news server) are
turned off, cron tasks are scheduled at low
frequencies. Power state changes in an HD and NIC
are emulated with two states working sleeping - To compare with other PM policies, power state
changes were emulated without actually setting
the hardware power state. By maintaining a set of
variables, record was maintained on device
statistics number of shutdown and wake up by
various policies. - See tables 1 for hardware parameters
20Experimental Results
- Other PM policies
- Exponential regression relationship between two
adjacent idle periods - Event driven semi markov model
- Policy that set the time out value to Tbe
- Time out with one and two minutes
- At least 10 hours of work load running
- Table 2 shows the compared results
- Ts time in the sleeping state, Nd number of
shutdowns, St longest sequence that cause delay
ever 30 sec, Pa average power in W, R power
consumption relative to TBPM.
21Dynamic Voltage Scaling in processors
- Processor usage model
- Compute intensive use full throughput
- Low-speed fraction of full throughput, not
required fast processing - Idle
Compute intensive and Short-latency process
Maximum processor speed
Desired Throughput
System idle
Background Long-latency processes
22Why DVS
- Design objective of a processor provide the
highest possible peak throughput for
compute-intensive tasks while maximizing battery
life for remaining low-speed and idle periods - Common Power saving technique Reduce clock
frequency during non-compute-intensive activity. - This reduces power but not the total energy
consumption per task, since energy consumption is
frequency independent to a first order
approximation. - Conversely, reducing voltage improves the energy
efficiency, but compromises peak throughput
23DVS
- If, both clock frequency and voltage are
dynamically varied in response to computational
load demands, then energy consumed per task can
be reduced for low computational period and while
retain the peak throughput when required - The strategy, which achieves the highest possible
energy efficiency for time-varying computational
load, is called DVS.
24DVS overview
- Key components
- an OS that can intelligently vary the processor
speed, - regulation loop to generate minimum voltage
required for desired frequency, - processor that can operate over wide voltage
range - Circuit characteristics? (F V)
- HW or SW control of Processor speed?
- SW, since hw can not know if instruction being
executed is part of compute intensive task! - Control from Application program?
- NO, it can not set the processor speed being
unaware of other tasks. But can give useful
information about their requirements.
25DVS
- As frequency varies, Vdd must vary in order to
optimize energy consumption. But the SW is not
aware of minimum required supply voltage for a
given speed. It is a function of hw
implementation, process variation and temperature - A ring oscillator provides this translation
CPU
64KB SRAM
ARM8
16KB cache
B U S
3.3V
Co-proc
Write buffer
V C O
I/O Chip
Fdesired
Vdd
Regulator
V battery
26DVS
- Know the transition time and transition energy in
order to know cost of interrupt and wakeup
latency. - Voltage scheduler as new OS component
- Controls the processor speed by writing desired f
to systems control register, that is used by
regulation loop in adjusting the voltage
frequency - operates processor at minimum throughput level
required by current tasks and minimizes energy
consumption - Note job of determining optimal frequency and
job scheduling are independent of each other. - Hence, voltage scheduler can be retrofitted to
the OS.
27Voltage Scheduling Algorithm
- Determines the optimal clock frequency by
combining computation requirements of all the
active tasks in the system and ensure that
latency requirements are met given the task
ordering of temporal scheduler. - Multiple tasks case is complex. Considers
predicting the workload and updated by the VS at
the end of each task. - Research issue!