Process Scheduling in Multiprocessor and Multithreaded Systems - PowerPoint PPT Presentation

About This Presentation

Title:

Process Scheduling in Multiprocessor and Multithreaded Systems

Description:

Number of active parallel threads number of allocated processors. Overhead ... Issue instructions from multiple threads simultaneously on a superscalar processor ... – PowerPoint PPT presentation

Number of Views:63

Avg rating:3.0/5.0

Slides: 25

Provided by: int668

Learn more at: http://web.cs.wpi.edu

Category:

more less

Transcript and Presenter's Notes

Title: Process Scheduling in Multiprocessor and Multithreaded Systems

1
Process Scheduling in Multiprocessor and
Multithreaded Systems

Matt Davis
CS535
4/7/2003

2
Outline

Multiprocessor Systems
Issues in MP Scheduling
How to Allocate Processors
Cache Affinity
Linux MP Scheduling
Simultaneous Multithreaded Systems
Issues in SMT Scheduling
Symbiotic Jobscheduling
SMT and Priorities
Linux SMT Scheduling
Conclusions

3
Multiprocessor Systems

Symmetric Multiprocessing (SMP)
One copy of OS in memory, any CPU can use it
OS must ensure that multiple processors cannot
access shared data structures at the same time

Shared Memory Multiprocessors
4
Issues in MP Scheduling

Starvation
Number of active parallel threads lt number of
allocated processors
Overhead
CPU time used to transfer and start various
portions of the application
Contention
Multiple threads attempt to use same shared
resource
Latency
Delay in communication between processors and I/O
devices

5
How to allocate processors

Allocate proportional to average parallelism
Other factors
System load
Variable parallelism
Min/Max parallelism
Acquire/relinquish processors based on current
program needs

6
Cache Affinity

While a program runs, data needed is placed in
local cache
When job is rescheduled, it will likely access
some of the same data
Scheduling jobs where they have affinity
improves performance by reducing cache penalties

7
Cache Affinity (cont)

Tradeoff between processor reallocation and cost
of reallocation
Utilization versus cache behavior
Scheduling policies
Equipartition constant number of processors
allocated evenly to all jobs. Low overhead.
Dynamic constantly reallocates jobs to maximize
utilization. High utilization.

8
Cache Affinity (cont)

Vaswani and Zahoran, 1991
When a processor becomes available, allocate it
to runnable process that was last run on
processor, or higher priority job
If a job requests additional processors, allocate
critical tasks on processor with highest affinity
If an allocated processor becomes idle, hold it
for a small amount of time in case task with
affinity comes along

9
Vaswani and Zahoran, 1991

Results showed that utilization was dominant
effect on performance, not cache affinity
But their algorithm did not degrade performance
Predicted that as processor speeds increase,
significance of cache affinity will also increase
Later studies validated their predictions

10
Linux 2.5 MP Scheduling

Each processor responsible for scheduling own
tasks
schedule()
After process switch, check if new process should
be transferred to other CPU running lower
priority task
reschedule_idle()
Cache affinity
Affinity mask stored in /proc/pid/affinity
sched_setaffinity(), sched_getaffinity()

11
What is SMT?

Simultaneous Multithreading
aka HyperThreading
Issue instructions from multiple threads
simultaneously on a superscalar processor

12
Why SMT?

Technique to exploit parallelism in and between
programs with minimal additions in chip resources
Operating system treats SMT processor as two
separate processors

13
Issues With SMT Scheduling

Not really separate processors
Share same caches
MP scheduling attempts to avoid idle processors
SMT-aware scheduler must differentiate between
physical and logical processors

14
Symbiotic Jobscheduling

Recent studies from U of Washington
Origin of early research into SMT
OS coschedules jobs to run on hardware threads
of coscheduled jobs lt SMT level
Occasionally swap out running set to ensure
fairness

15
Symbiotic Jobscheduling (cont)

Shared system resources
Functional units, caches, TLBs, etc
Coscheduled jobs may interact well
Few resource conflicts, high utilization
Or they may interact poorly
Many resource conflicts, lower utilization
Choice of coscheduled jobs can have large impact
on system performance

16
Symbiotic Jobscheduling (cont)

Improve symbiosis by coscheduling jobs that get
along well
Two phases of SOS (Sample, Optimize, Symbios)
jobscheduler
Sample Gather data on current performance
Symbios Use computed scheduling configuration

17
Symbiotic Jobscheduling (cont)

Sample phase
Periodically alter coscheduled job mix
Record system utilization from hardware
performance counter registers
Symbios phase
Pick job mix that had the highest utilization
Trade-off between sampling often or infrequently

18
How to Measure Utilization?

IPC not necessarily best predictor
IPC can have high variations throughout process
High-IPC threads may unfairly take system
resources from low-IPC threads
Other predictors low conflicts, high cache hit
rate, diverse instruction mix
Balance schedule with lowest deviation in IPC
between coschedules is considered best

19
What About Priorities?

Scheduler estimates the natural IPC of job
If a high-priority jobs is not meeting the
desired IPC, it will be exclusively scheduled on
CPU
Provides a truer implementation of priority
Normal schedulers only guarantee proportional
resource sharing, assumes no interaction between
jobs

20
Another Priority Algorithm

SMT hardware fetches instructions to issue from
queue
Scheduler can bias fetching algorithm to give
preference to high-priority threads
Hardware already exists, minimal modifications

21
Symbiosis Performance Results

Without priorities
Up to 17 improvement
Software-enforced priorities
Up to 20, average 8
Hardware-based priorities
Up to 30, average 15

22
Linux 2.5 SMT Scheduling

Immediate reschedule forced when HT CPU is
executing two idle processes
HT-aware affinity processes prefer same physical
CPU
HT-aware load-balancing distinguish logical and
physical CPU in resource allocation

23
Conclusions

Intelligent allocation of resources can improve
performance in parallel systems
Dynamic scheduling of processors in MP systems
produces better utilization as processor speeds
increase
Cache affinity can help improve throughput
Symbiotic coscheduling of tasks in SMT systems
can improve average response time

24
Resources

Kenneth Sevcik, Characterizations of Parallelism
in Applications and Their Use in Scheduling
Raj Vaswani and John Zahoran, The Implications
of Cache Affinity on Processor Scheduling for
Multiprogrammed, Shared Memory Multiprocessors
Allan Snavely et al., Symbiotic Jobscheduling
with Priorities for a Simultaneous Multithreading
Processor
Linux MP cache affinity, http//www.tech9.net/rml/
linux
Linux Hyperthreading Scheduler,
http//www.kernel.org/pub/linux/kernel/people/rust
y/Hyperthread_Scheduler_Modifications.html
Daniel Bovet and Marco Cesati, Understanding the
Linux Kernel