Title: Fred Kuhns
1Scheduling
- Fred Kuhns
- (fredk_at_arl.wustl.edu, http//www.arl.wustl.edu/fr
edk) - Department of Computer Science and Engineering
- Washington University in St. Louis
2CPU Scheduling
- Multiprogrammed Operating System maximize
utilization by switching the CPU between
competing processes. - goal is to always have some process running
- one process runs until it must wait for some
event (for example an I/O operation) - rather than remaining idle the OS removes the
waiting process from the CPU and inserts another
ready process onto the CPU - Overview of scheduling module
- Basic Concepts burst cycle, scheduler,
preemption, dispatcher - Scheduling Criteria
- Scheduling Algorithms
- Scheduling Issues, Implementations and Mechanisms
3Burst Cycle
- CPUI/O Burst Cycle Process execution consists
of a cycle of CPU execution and I/O wait. - process begins with a CPU burst then waits for an
I/O operation. When the I/O completes the process
will run fora period of time then wait again for
another I/O operation. This cycle repeats ending
with a final CPU burst which ends with the
process voluntarily terminating. - CPU burst distribution generally consists of
many short CPU bursts and few longer bursts.
4Scheduling
- Short-term scheduler
- Selects next process to run from among the Ready
processes in memory (i.e. the Ready Queue) - Ready queue may be implemented as a FIFO queue,
priority queue, tree or random list. - CPU scheduling decisions occur when process
- 1. Switches from running to waiting state.
- 2. Switches from running to ready state.
- 3. Switches from waiting to ready.
- Terminates.
- Process (or thread) Preemption
- Scheduling under 1 and 4 is nonpreemptive.
- 2 and 3 preemptive.
- With non-preemptive when a process get the CPU it
keeps running until it either terminates or
voluntarily gives up the CPU - With Preemptive schemes the OS may force a
process to give up the CPU
5Dispatcher
- Dispatcher module gives control of the CPU to the
process selected by the short-term scheduler
this involves - switching context
- switching to user mode
- jumping to the proper location in the user
program to restart that program - Dispatch latency time it takes for the
dispatcher to stop one process and start another
running.
6Scheduling Criteria
- CPU utilization Percent time CPU busy
- Throughput of processes that complete their
execution per time unit - Turnaround time time to execute a process from
start to completion. sum of - waiting to be loaded into memory,
- waiting in ready queue,
- executing
- Blocked waiting for an event, for example waiting
for I/O - Waiting time time in Ready queue, directly
impacted by scheduling algorithm - Response time time from submission of request
to production of first response.
7Optimization Criteria
- General goals
- Max CPU utilization
- Max throughput
- Min turnaround time
- Min waiting time
- Min response time
- May target average times or choose to place
bounds on maximum or minimum values. - May also choose to minimize variance in these
times.
8Scheduling Algorithms
- FIFO (or FCFS)
- Easily implemented
- Average waiting time may be long with this policy
- Convoy effect or head-of-line blocking short
process behind long process many I/O bound
processes behind CPU-bound process results in
inefficient use of I/O resources - Shortest-Job-First (SJF)
- optimal in that it provides the minimum average
waiting time for a given set of processes - can use exponential averaging to estimate next
burst size - non-preemptive and preemptive versions
- Priority-based
- preemptive or non-preemptive
- static or dynamic priorities
- problem Starvation. Solution Aging.
- Round-robin
- time-sharing, define quantum, bounded wait times
(n-1)/q - large q gt FCFS, small q gt dedicated processor
of speed F/n (processor sharing) - Multilevel Queue
- ready queue partitioned into bands or priorities
- Multilevel feedback queue
- use feedback to move process between queues
9Policy versus Mechanism
- Policies set rules for determining when to switch
and which process/thread to run next - Mechanisms are used to implement the desired
policy. They consists of the data structures and
algorithms used to implement policy - Policy and Mechanisms
- influenced by platform - for example, context
switch cost - balance needs of different application types
Interactive, Batch and Real-Time
10Consider Three Basic Policies
- First-In First-Out (FIFO)
- runs to completion
- Round Robin (RR)
- runs for a specified time quantum
- Time-Sharing (TS)
- Multilevel feedback queues
11Platform Issues
- Interrupt latency
- Clock resolution
- Cost of a context switch
- saving processor state and loading new
- instruction and data cache misses/flushes
- memory map cache (TLB)
- FPU state
12Typical Scheduler Goals
- Interactive examples are shells and GUI.
- Spend most of their time waiting for I/O.
- Minimize perceived delay (50-150msec)
- Batch
- compiles long running computations.
- Optimize throughput can tolerate large
scheduling latencies but want to maximize the
number of jobs completed over some given time
interval. - Real-time
- Require predictable behavior.
- May require guarantees on throughput, latency or
delay - real-time does not equal fast, it implies
temporal constraints.
13Periodic Execution of OS Clock Interrupts
- Common requirement is for a system to track the
passage of time - processes request timed event notifications
- time-sharing of resources
- periodic system housekeeping
- Implemented using a hardware clock interrupt
- generally set to a periodic rate (clock tick),
typically 10msec, see tick frequency defined by
HZ in param.h - however may use a random timeout interval or
synchronize to some other external event. - High priority, only NMI (non-maskable interrupt)
are higher.
14Interrupt Latency
- time to recognize interrupt and execute first
instruction of ISR - hardware CPU finish current instruction,
acknowledge interrupt over bus - software dispatch interrupt to correct ISR
- plus time to run ISR
- affected by interrupt nesting
- plus worst-case interrupt disable time
15Clock Interrupt Handler
- Update CPU usage for current process
- scheduler functions - priority computation,
time-slice expiration (major/minor ticks) - quota policing
- update time-of-day and other clocks
- callout queue processing
- process alarms
- run system processes
16Callout queue
- Queue of functions to be processed at a
particular tick - System context, base interrupt priority (sw int)
- Example
- packet retransmission, system management
functions, device monitoring and polling - for real-time systems insertion time is important
- Typical to optimize lookup time not insertion
time - time-to-fire,
- absolute time,
- timing wheel (fixed size circular array)
17Alarms
- BSD alarm(), setitimer()
- SVR4 alarm(), hrtsys()
- Real-time - actual elapsed time
- sends SIGALRM
- profiling - execution time
- sends SIGPROF
- virtual time - user mode time
- sends SIGVTALRM
18BSD Scheduler
- Policy - Multilevel Feedback queues
- Policy Goal - good response time for interactive
tasks while guaranteeing batch job progress - How achieve
- leverage priority-based scheduler by adjusting
priorities in response to I/O events - Preemptive time-slicing (time quantum), usage
based priority assignment - Mechanisms
- Priority-based always run highest priority
process - Dynamically adjustable process priorities
- Clock interrupt to vary priorities
194.4 BSD Process Scheduling
- Scheduling priority range, hi to lo 0 - 127
- 0-49 kernel mode
- 50-127 user mode
- 32 run queues (prio/4 run queue)
- Priority adjusted based on resource usage.
- Time quantum set to 0.1 second for over 20 years.
- Sleep priorities
20Scheduling Related Attributes
Predefined Process Priorities
PROC structure
PSWP 0 while swapping process PVM
4 wait for memory PINOD 8 wait for
file control info PRIBIO 16 wait on disk
I/O PVFS 20 wait for kernel-level FS
lock PZERO 22 baseline priority PSOCK
24 wait on socket PWAIT 32 wait for child to
exit PLOCK 36 wait for user-level FS
lock PPAUSE 40 wait for signal to
arrive PUSER 50 base user priority
p_priority (kernel pri)
p_usrpri (user pri)
p_estcpu
p_slptime
...
21Calculation of Priority
- Proc structure
- p_estcpu - estimate of cpu utilization
- p_nice - user-settable (-20 to 19, default 0)
- priority (p_usrpri) recalculated every 4 clock
ticks - p_estcpu incremented each clock tick process is
running - every second CPU usage is decayed
- sleeping processes are ignored
22BSD Formulas
- Priority calculation (PUSER 50)
- p_usrpri PUSER (p_estcpu/4) 2 ? p_nice
- p_estcpu 1 // each clock tick process is
running - Decay calculation - schedcpu () each second
- p_estcpu((2load_avg)/(2load_avg1))p_estcpup_
nice - ex, load1, p_estcpu 0.66T4 0.13T0
- Processes sleeping for gt 1 second
- p_slptime set to 0 then incremented by 1 ea.
second - p_estcpu p_estcpu((2load)/(2load1))p_slptime
23Context switch on 4.4 BSD
- Synchronous vs. asynchronous
- Interprocess voluntary vs. involuntary
- voluntary - process blocks because it requires an
unavailable resource. sleep () - involuntary - time slice expires or higher
priority process runnable. mi_switch ()
24BSD - Voluntary Context Switch
- Invoke sleep() with a wait channel (resource
address) and (kernel) priority - sleeping process organized as an array of queues.
Wait channel is hashed to find bin. - sleep() raises priority level splhigh (block
interrupts). Set kernel priority (p_priority) - wakeup() remove process(es) from queue splhigh
and recalculate user priority (p_usrpri).
25BSD - Involuntary context switch
- Results from an asynchronous event
- kernel schedules an AST, need_resched (), and
sets global flag want_resched - current proc checks this flag before returning to
- user mode. If set, then call mi_switch ()
- Note BSD does not preempt process in kernel mode.
26BSD Interprocess Context Switch
- Change user and kernel context
- All user mode state located in
- kernel-mode HW state - stored in PCB (u area)
- user-mode HW state - kernel stack (u area)
- proc structure (not swapped)
- kernel changes the current process pointers and
loads new state.
27Selecting a process to run
- cpu_switch (), called by mi_switch ()
- block interrupts and check whichqs for nonempty
run queue. If non, unblock interrupts and loop. - Remove first process in queue. If queue is not
empty then reset bit in whichqs - clear curproc and want_resched
- set new process (context switch) and unblock
interrupts
28Limitations of the BSD Model
- Limited scalability
- No resource guarantees
- non-deterministic, unbounded delays (priority
inversion) - limited application control over priorities
29Scheduler Implementations
- SVR4
- Solaris
- Mach
- Digital UNIX
- Other RT
30SVR4 Scheduler
- Redesigned from traditional approach
- Separate policy from implementation
- Define scheduling classes
- define framework with well defined interfaces
- Attempt to bound dispatch latencies
- Enhance application control
31SVR4 Scheduling Classes
- Scheduler represents an abstract base class
- defines interface
- performs class independent processing
- Derived classes are specific implementations of a
scheduling policy - priority computation, range of priorities,
whether quantums are used or if priorities can
vary dynamically.
32SVR4 - Issues Addressed
- Dispatch Latencies - time between a process
becoming runnable and when it executes. - Does not include interrupt processing. Includes
nonpreemptive kernel processing and context
switch time. - Kernel preemption points - PREMPT checks kprunrun
at well defined points in the kernel - runrun also used as in traditional
implementations - Response time - total time from event to
application response.
33SVR4 - Class-Independent
- Responsible for
- context switching, run queue management and
preemption. - Highest (global) priority always runs
- priority range 0 - 160, each with own queue
- Default allocations
- real-time 100-159,
- system 60-99,
- timesharing 0-59
34SVR4 - Scheduler Interface
CL_TICK CL_FORK, CL_FORKRET CL_ENTERCLASS,
CL_EXITCLASS CL_SLEEP, CL_WAKEUP CL_PREEMPT
CL_YIELD CL_SETRUN
...
35SVR4 Class Implementations
- Time-Sharing event driven scheduling.
- priority changed dynamically, round-robin within
priority, time-slice and priority static
parameter table. - System - fixed priority (FIFO). System tasks.
- Real-time fixed priority and time quantum.
36Solaris Overview
- Multithreaded, Symmetric Multi-Processing
- Preemptive kernel - protected data structures
- Interrupts handled using threads
- MP Support - per cpu dispatch queues, one global
kernel preempt queue. - System Threads
- Priority Inheritance
- Turnstiles rather than wait queues
37Hidden Scheduling
- Hidden Scheduling - kernel performs work
asynchronously on behalf of threads, without
considering priority of requester - Examples STREAMS process and callout queue
- Solaris addresses by using system threads
- run at kernel priority which is lower than the RT
class. - Callout processing is also a problem. Solaris
uses two different callout queues real-time and
non-real-time (callout thread).
38Priority Inversion
- Low priority thread holds a resource required by
a higher priority thread. - Partial solution - Priority Inheritance.
- High priority thread lends its priority to the
lower priority thread. - Must be transitive
- kernel keeps a synchronization chain
39Priority Inheritance in Solaris
- Each thread has a global and inherited priority.
- Object must keep a reference to the owner
- If requester priority gt owner, then owner
priority raised to that of the requesters - This works with Mutexes, owner always known
- Not used for semaphores or condition variables
- For reader/writer,
- owner of record inherits priority, only partial
solution
40MACH
- Inherited base scheduling priority which is
combined with a CPU usage factor - CPU usage factor decayed 5/8 each second inactive
- threads set own priority after waking up.
- Clock handler charges current thread.
- Every 2 seconds, system thread scans run queue
and recomputes priorities (addresses starvation) - Fixed quantums, preemptive scheduling
- handoff scheduling - used by IPC
41MACH MP Support
- No cross processor interrupts
- processor sets
- thread runs on any of the processors from the
assigned set. - Processor allocation can be handled by a
user-level server - Gang scheduling
- dedicate one processor per thread.
- Minimize barrier synchronization delay
42Digital UNIX
- Time-sharing and real-time classes SCHED_OTHER,
SCHED_FIFO, SCHED_RR, - Highest priority process is always run
- Priority range 0-60 (0-29 TS, 20-31 SYS, 32-63
RT) - nonpreemptive kernel
- no control for priority inversion
43Digital UNIX - MP
- Per processor dispatch queues
- Scheduler will move threads between queues to
balance load - soft affinity