Title: Multiprocessor Scheduling
1Multiprocessor Scheduling
- Module 3.1
- For a good summary on multiprocessor and
real-time scheduling, - visit http//www.cs.uah.edu/weisskop/osnotes_htm
l/M8.html
2Classifications of Multiprocessor Systems
- Loosely coupled multiprocessor, or clusters
- Each processor has its own memory and I/O
channels. - Functionally specialized processors
- Such as I/O processor
- Nvidia GPGPU
- Controlled by a master processor
- Tightly coupled multiprocessing
- MCMP
- Processors share main memory
- Controlled by operating system
- More economical than clusters
3Types of parallelism
- Bit-level parallelism
- Instruction-level parallelism
- Data parallelism
- Task parallelism ? our focus
4Synchronization Granuality
- Refers to frequency of synchronization or
parallelism among processes in the system - Five classes exist
- Independent (SI is not applicable)
- Very coarse (2000 lt SI lt 1M)
- Course (200 lt SI lt 2000)
- Medium (20 lt SI lt 200)
- Fine (SI lt 20)
SI is called the Synchronization Interval, and
measured in instructions.
5Wikipedia on Fine-grained, coarse-grained, and
embarrassing parallelism
- Applications are often classified according to
how often their subtasks need to synchronize or
communicate with each other. - An application exhibits fine-grained parallelism
if its subtasks must communicate many times per
second - it exhibits coarse-grained parallelism if they do
not communicate many times per second, - and it is embarrassingly parallel if they rarely
or never have to communicate. Embarrassingly
parallel applications are considered the easiest
to parallelize.
6Independent Parallelism
- Multiple unrelated processes
- Separate application or job, e.g. spreadsheet,
word processor, etc. - No synchronization
- More than one processor is available
- Average response time to users is less
7Coarse and Very Coarse-Grained Parallelism
- Very coarse distributed processing across
network nodes to form a single computing
environment - Coarse Synchronization among processes at a very
gross level - Good for concurrent processes running on a
multiprogrammed uniprocessor - Can by supported on a multiprocessor with little
change
8Medium-Grained Parallelism
- Parallel processing or multitasking within a
single application - Single application is a collection of threads
- Threads usually interact frequently, leading to a
medium-level synchronization.
9Fine-Grained Parallelism
- Highly parallel applications
- Synchronization every few instructions (on very
short events). - Fill the gap between ILP (instruction level
parallelism) and Medium-grained parallelism. - Can be found in small inner loops
- Use of MPI and OpenMP programming languages
- OS should not intervene. Usually done by HW
- In practice, this is very specialized and
fragmented area
10Scheduling
- Scheduling on a multiprocessor involves 3
interrelated design issues - Assignment of processes to processors
- Use of multiprogramming on individual processors
- Makes sense for processes (coarse-grained)
- May not be good for threads (medium-grained)
- Actual dispatching of a process
- What scheduling policy should we use FCFS, RR,
etc. Sometimes a very sophisticated policy
becomes counter productive.
11Assignment of Processes to Processors
- Treat processors as a pooled resource and assign
process to processors on demand - Permanently assign process to a processor
- Dedicate short-term queue for each processor
- Less overhead. Each does its own scheduling on
its queue. - Disadvantage Processor could be idle (has an
empty queue) while another processor has a
backlog.
12Assignment of Processes to Processors
- Global queue
- Schedule to any available processor
- During the lifetime of the process, process may
run on different processors at different times. - In SMP architecture, context switching can be
done with small cost. - Master/slave architecture
- Key kernel functions always run on a particular
processor - Master is responsible for scheduling
- Slave sends service request to the master
- Synchronization is simplified
- Disadvantages
- Failure of master brings down whole system
- Master can become a performance bottleneck
13Assignment of Processes to Processors
- Peer architecture
- Operating system can execute on any processor
- Each processor does self-scheduling from a pool
of available processes - Complicates the operating system
- Make sure two processors do not choose the same
process - Needs lots of synchronization
14Process Scheduling in Todays SMP
- M/M/M/K Queueing system
- Single queue for all processes
- Multiple queues are used for priorities
- All queues feed to the common pool of processors
- Specific scheduling disciplines is less important
with more than one processor - A simple FCFS discipline with a static priority
may suffice for a multi-processor system. - Illustrate using graph, p460
- In conclusion, specific scheduling discipline is
much less important with SMP than UP
15Threads
- Executes separate from the rest of the process
- An application can be a set of threads that
cooperate and execute concurrently in the same
address space - Threads running on separate processors yields a
dramatic gain in performance
16Multiprocessor Thread Scheduling --Four General
Approaches (1/2)
- Load sharing
- Processes are not assigned to a particular
processor - Gang scheduling
- A set of related threads is scheduled to run on a
set of processors at the same time
17Multiprocessor Thread Scheduling --Four General
Approaches (2/2)
- Dedicated processor assignment
- Threads are assigned to a specific processor.
(each thread can be run on a processor.) - When program terminates, processors are returned
to the processor available pool - Dynamic scheduling
- Number of threads can be altered during course of
execution
18Load Sharing
- Load is distributed evenly across the processors
- No centralized scheduler required
- OS runs on every processor to select the next
thread. - Use global queues for ready threads
- Usually FCFS policy
19Disadvantages of Load Sharing
- Central queue needs mutual exclusion
- May be a bottleneck when more than one processor
looks for work at the same time - A noticeable problem when there are many
processors. - Preemptive threads will unlikely resume execution
on the same processor - Cache use is less efficient
- If all threads are in the global queue, all
threads of a program will not gain access to the
processors at the same time. - Performance is compromised if coordination among
threads is high.
20Gang Scheduling
- Simultaneous scheduling of threads that make up a
single process - Useful for applications where performance
severely degrades when any part of the
application is not running - Threads often need to synchronize with each other
21Advantages
- Closely-related threads execute in parallel
- Synchronization blocking is reduced
- Less context switching
- Scheduling overhead is reduced, as a single sync
to signal() may affect many threads
22Scheduling Groups two time slicing divisions
23Dedicated Processor Assignment Affinitization
(1/2)
- When application is scheduled, each of its
threads is assigned a processor that remains
dedicated/affinitized to that thread until the
application runs to completion - No multiprogramming of processors, i.e. one
processor per a specific thread, and no other
thread. - Some processors may be idle as threads may block
- Eliminates context switches with certain types
of applications this can save enough time to
compensate for the possible idle time penalty. - However, in a highly parallel environment with
100 of processors, utilization of processors is
not a major issue, but performance is. The total
avoidance of switching results in substantial
speedup.
24Dedicated Processor Assignment Affinitization
(2/2)
- In Dedicated Assignment, when the number of
threads exceeds the number of available
processors, efficiency drops. - Due to thread preemption, context switching,
suspending of other threads, and cache pollution - Notice that both gang scheduling and dedicated
assignment are more concerned with allocation
issues than scheduling issues. - The important question becomes "How many
processors should a process be assigned?" rather
than "How shall I choose the next process?"
25Dynamic Scheduling
- Threads within a process are variable
- Sharing the work between OS and application.
- When a job originates, it requests a certain
number of processors. The OS grants some or all
of the request, based on the number of processors
currently available. - Then the application itself can decide which
threads run when on which processors. This
requires language support as would be provided
with thread libraries. - When processors become free due to the
termination of threads or processes, the OS can
allocate them as needed to satisfy pending
requests. - Simulations have shown that this approach is
superior to gang scheduling and or dedicated
scheduling.