Title: Scheduling in Linux and Windows 2000
1Scheduling in Linux and Windows 2000
- CS 431
- Sanjay Kumar (01D07041)
- Sanjay Kumar Ram (01D07042)
2- Linux kernel
- Linux process
- Scheduling policy
- Data Structures and Scheduling Algorithm in
Uniprocessor System - Scheduling algorithm for multiprocessor
- Scheduling in windows 2000
3Linux Kernel
Kernel subsystem overview
4Linux Processes
- Three class of processes
- Interactive processes
- Batch processes
- Real-time processes
- Linux processes have the following states
- Running
- Waiting
- Stopped
- Zombie
5Scheduling Policy
- Linux kernel is non-preemptive but processes are
preemptive - The set of rules used to determine when and how
selecting a new process to run is called
scheduling policy - Linux scheduling is based on the time-sharing
technique -several processes are allowed to run
"concurrently - CPU time is roughly divided into "slices
- scheduling policy is also based on ranking
processes according to their priority - Static priority - assigned by the users to
real-time processes and ranges from 1 to 99, and
is never changed by the scheduler - Dynamic priority - the sum of the base time
quantum (the base priority of the process) and of
the number of ticks of CPU time left to the
process before its quantum expires in the current
epoch
6Scheduling Policy (contd.)
- Process Preemption
- When the dynamic priority of the currently
running process is lower than the process waiting
in the runqueue - A process may also be preempted when its time
quantum expires - How Long Must a Quantum Last?
- Quantum duration too short - system overhead
caused by task switching becomes excessively high - Quantum duration too long - processes no longer
appear to be executed concurrently, degrades the
response time of interactive applications, and
degrades the responsiveness of the system - The rule of thumb adopted by Linux is choose a
duration as long as possible, while keeping good
system response time
7Data Structures and Scheduling Algorithm in
Uniprocessor System
- The Linux scheduling algorithm works by dividing
the CPU time into epochs - In a single epoch, every process has a specified
time quantum - epoch ends when all runnable processes have
exhausted their quantum - Each process has a base time quantum
- the time-quantum value assigned by the scheduler
to the process if it has exhausted its quantum in
the previous epoch - users can change the base time quantum of their
processes by using the nice( ) and setpriority( )
system calls - The INIT_TASK macro sets the value of the base
time quantum of process 0 (swapper) to
DEF_PRIORITY, which is approx. 200ms
8Data Structures and Scheduling Algorithm in
Uniprocessor System (contd.)
- The first entry in the process table is the
special init process - Task_struct - represents the states of all tasks
running in the systems - schedule policy - SCHED_OTHER, SCHED_FIFO,
SCHED_RR - field for process state running, waiting,
stopped, zombie - a field that indicates the processes priority
- counter - a field which holds the number of clock
ticks - a doubly linked list (next_task and prev_task) -
to keep track of all executing processes - mm_struct - contains a process's memory
management information - process ID and group ID are also stored
- fs_struct - File specific process data
- fields that hold timing information
9Data Structures and Scheduling Algorithm in
Uniprocessor System (contd.)
- Each time the scheduler is run it does the
following - Kernel work - scheduler runs the bottom half
handlers and processes the scheduler task queue - Current process
- current process must be processed before another
process can be selected to run - If the scheduling policy of the current processes
is round robin then it is put onto the back of
the run queue - If the task is INTERRUPTIBLE and it has received
a signal since the last time it was scheduled
then its state becomes RUNNING - If the current process has timed out, then its
state becomes RUNNING - If the current process is RUNNING then it will
remain in that state - Processes that were neither RUNNING nor
INTERRUPTIBLE are removed from the run queue
10Data Structures and Scheduling Algorithm in
Uniprocessor System (contd.)
- Process Selection
- most deserving process is selected by the
scheduler - real time processes are given higher priority
than ordinary processes - when several processes have the same priority,
the one nearest the front of the run queue is
chosen - when a new process is created the number of ticks
left to the parent is split in two halves, one
for the parent and one for the child - priority and counter fields are used both to
implement time-sharing and to compute the process
dynamic priority
11Data Structures and Scheduling Algorithm in
Uniprocessor System (contd.)
- schedule( ) Function
- implements the scheduler.
- finds a process in the runqueue list and then
assigns the CPU to it - Direct invocation
- 1. Inserts current in the proper wait queue
- 2. Changes the state of current either to
TASK_INTERRUPTIBLE or to TASK_UNINTERRUPTIBLE - 3. Invokes schedule( )
- 4. Checks if the resource is available if not,
goes to step 2 - 5. Once the resource is available, removes
current from the wait queue - Lazy invocation
- When current has used up its quantum of CPU time
this is done by the update_process_times( )
function - When a process is woken up and its priority is
higher than that of the current process
performed by the reschedule_idle( ) function,
which is invoked by the wake_up_process( )
function - When a sched_setscheduler( ) or sched_ yield( )
system call is issued
12Data Structures and Scheduling Algorithm in
Uniprocessor System (contd.)
- Actions performed by schedule( )
- Before actually scheduling a process, the
schedule( ) function starts by running the
functions left by other kernel control paths in
various queues - The function then executes all active unmasked
bottom halves - Scheduling
- value of current is saved in the prev local
variable and the need_resched field of prev is
set to 0 - a check is made to determine whether prev is a
Round Robin real-time process. If so, schedule( )
assigns a new quantum to prev and puts it at the
bottom of the runqueue list - if state is TASK_INTERRUPTIBLE, the function
wakes up the process - schedule( ) repeatedly invokes the goodness( )
function on the runnable processes to determine
the best candidate - when counter field becomes zero, schedule( )
assigns to all existing processes a fresh
quantum, whose duration is the sum of the
priority value plus half the counter value
13Data Structures and Scheduling Algorithm in
Uniprocessor System (contd.)
- goodness( ) function
- identify the best candidate among all processes
in the runqueue list. - It receives as input parameters prev (the
descriptor pointer of the previously running
process) and p (the descriptor pointer of the
process to evaluate) - The integer value c returned by goodness( )
measures the "goodness" of p and has the
following meanings - c -1000, p must never be selected this value
is returned when the runqueue list contains only
init_task - c 0, p has exhausted its quantum. Unless p is
the first process in the runqueue list and all
runnable processes have also exhausted their
quantum, it will not be selected for execution. - 0 lt c lt 1000, p is a conventional process that
has not exhausted its quantum a higher value of
c denotes a higher level of goodness. - c gt 1000, p is a real-time process a higher
value of c denotes a higher level of goodness. -
14Scheduling algorithm for multiprocessor
- The Linux scheduler is defined in kernel/sched.c.
- The new scheduler have been designed to
accomplish specific goals - Implement fully O(1) scheduling, Every algorithm
in the new scheduler completes in constant-time,
regardless of the number of running processes or
any other input. - Implement perfect SMP scalability, Each processor
has its own locking and individual runqueue. - Implement improved SMP affinity, Naturally
attempt to group tasks to a specific CPU and
continue to run them there. Only migrate tasks
from one CPU to another to resolve imbalances in
runqueue sizes.
15Scheduling algorithm for multiprocessor (contd.)
- Provide good interactive performance, Even during
considerable system load, the system should react
and schedule interactive tasks immediately. - Provide fairness, No process should find itself
starved of timeslice for any reasonable amount of
time. Likewise, no process should receive an
unfairly high amount of timeslice. - Optimize for the common case of only 1-2 runnable
processes, yet scale well to multiple processors
each with many processes - Runqueues
- It is the basic data structure in the scheduler.
- It is defined in kernel/sched.c as struct
runqueue. - It is the list of runnable processes on a given
processor there is one runqueue per processor.
Each runnable process is on exactly one runqueue.
- It additionally contains per-processor scheduling
information.
16Scheduling algorithm for multiprocessor (contd.)
- A group of macros is used to obtain specific
runqueues. - cpu_rq(processor) returns a pointer to the
runqueue associated with the given processor. - this_rq() returns the runqueue of the current
processor. - task_rq(task) returns a pointer to the runqueue
on which the given task is queued. - this_rq_lock() locks the current runqueue
- rq_unlock(struct runqueue rq) unlocks the given
runqueue. - double_rq_lock() and double_rq_unlock(), to avoid
deadlock
17Scheduling algorithm for multiprocessor (contd.)
- The Priority Arrays
- Each runqueue contains two priority arrays, the
active and the expired array. - Each priority array contains one queue of
runnable processors per priority level. - Priority arrays are the data structure that
provide O(1) scheduling. - It also contain a priority bitmap used to
efficiently discover the highest priority
runnable task in the system. - Each priority array contains a bitmap field that
has at least one bit for every priority on the
system.
18Scheduling algorithm for multiprocessor (contd.)
- Finding the highest priority task on the system
is therefore only a matter of finding the first
set bit in the bitmap. - Each priority array also contains an array called
queue of struct list_head queues, one queue for
each priority. - The Linux O(1) scheduler algorithm.
19Scheduling algorithm for multiprocessor (contd.)
- Recalculating Timeslices
- The new Linux scheduler alleviates the need for a
recalculate loop. Instead, it maintains two
priority arrays for each processor - 1. active array, contains all the tasks in
the associated runqueue - that have timeslice left.
- 2. expired array , contains all the tasks in
the associated - runqueue that have exhausted their
timeslice. - When each task's timeslice reaches zero, its
timeslice is recalculated before it is moved to
the expired array. -
20Scheduling algorithm for multiprocessor (contd.)
- Recalculating all the timeslices is then as
simple as just switching the active and expired
arrays. Because the arrays are accessed only via
pointer, switching them is as fast as swapping
two pointers. - This swap is a key feature of the new O(1)
scheduler. Instead of recalculating each
process's priority and timeslice all the time,
the O(1) scheduler performs a simple two-step
array swap.
21Scheduling algorithm for multiprocessor (contd.)
22Scheduling algorithm for multiprocessor (contd.)
23Scheduling in windows 2000
- Windows 2000 scheduler
- Concern is with threads not processes
- i.e. threads are scheduled, not processes
- Threads have priorities 0 through 31
31
16 real-time levels
16
15
15 variable levels
Used by zero page thread
1
0
Used by idle thread(s)
i
24Manual Process Priority Adjustments
25Thread Scheduling
- Strictly priority driven
- 32 queues (FIFO lists) of ready threads
- One queue for each priority level
- Queues are common to all CPUs
- When thread becomes ready, it
- either runs immediately, or
- is inserted at the tail end of the Ready queue
for its current priority - On a uniprocessor, highest priority Ready thread
always runs - Time-sliced, round-robin within a priority level
26Thread Scheduling Multiprocessor Issues
- On a multiprocessor, highest-priority n threads
will always run (subject to Affinity, explained
later) - No attempt is made to share processors fairly
among processes, only among threads - Tries to keep threads on the same CPU
27Scheduling Scenarios
- Preemption
- A thread becomes ready at a higher priority than
the running thread - Lower-priority thread is preempted
- Preempted thread goes to the tail of its Ready
queue
28Scheduling Scenarios (contd.)
- Voluntary switch
- When the running thread gives up the CPU
- Waiting on a dispatcher object
- Termination
- Explicit lowering of priority
- Schedule the thread at the head of the next
non-empty Ready queue - Running thread experiences quantum end
- Priority is decremented unless at thread base
priority - Thread goes to tail of Ready queue for its new
priority - May continue running if no equal or
higher-priority threads are Ready i.e. it
gets the new quantum
29Priority Adjustments
- Threads in dynamic classes can have priority
adjustments applied to them (Boost or Decay) - Idle, Below Normal, Normal, Above Normal and High
- Carried out upon wait completion
- Used to avoid CPU starvation
- No automatic adjustments in real-time class
- Priority 16 and above
- Scheduling is therefore predictable with
respect to the real-time threads - Note though that this does not mean that there
are absolute latency guarantees
30Priority Boosting
- Priority boost takes place after a wait
- Occurs when a wait (usually I/O) is resolved
- Slow devices / long waits big boost
- Fast devices / short waits small boost
- Boost is applied to threads base priority
- Does not go over priority 15
- Keeps I/O devices busy
31CPU Starvation
- Balance Set Manager looks for starved threads
- BSM is a thread running at priority 16, waking up
once per second - Looks at threads that have been Ready for 4
seconds or more - Boosts up to 10 Ready threads per pass
- Special boost applied
- Priority 15
- Quantum is doubled
- Does not apply in real-time range
32Multiprocessor Support
- By default, threads can run on any available
processor - Soft affinity (introduced in NT 4.0)
- Every thread has an ideal processor
- When thread becomes ready
- if ideal is idle, it runs there
- else, if previous processor is idle, it runs
there - else, may look at next thread, to run on its
ideal processor
33Multiprocessor Support (contd.)
- Hard affinity
- Restricts thread to a subset of the available
CPUs - Can lead to
- threads getting less CPU time that they normally
would
34References
- http//iamexwiwww.unibe.ch/studenten/schlpbch/linu
xScheduling/LinuxScheduling.html - http//oopweb.com/OS/Documents/tlk/VolumeFrames.ht
ml - http//www.samspublishing.com/articles/article.asp
?p101760rl1 - http//www.oreilly.com/catalog/linuxkernel/chapter
/ch10.html - G. M. Candea and M. B. Jones, Vassal Loadable
Scheduler Support for Multi-Policy Scheduling,
Proceedings of the 2nd USENIX Windows NT
Symposium Seattle, Washington, August 34, 1998