HyperThreading Aware Process Scheduling Heuristics - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

HyperThreading Aware Process Scheduling Heuristics

Description:

Hyper-Threading Aware Process Scheduling Heuristics. James ... e.g. avoid co-scheduling two cache-thrashers. More important in multi-package/core systems ... – PowerPoint PPT presentation

Number of Views:125
Avg rating:3.0/5.0
Slides: 11
Provided by: JRB73
Category:

less

Transcript and Presenter's Notes

Title: HyperThreading Aware Process Scheduling Heuristics


1
Hyper-Threading Aware Process Scheduling
Heuristics
  • James Bulpin and Ian Pratt
  • University of Cambridge
  • Computer Laboratory

2
Introduction
  • Simultaneous Multithreading (SMT)
  • Fine-grain hardware multithreading
  • Processor resources dynamically shared
  • Extracts thread-level parallelism
  • Intel Hyper-Threading (HT)
  • 2 heavyweight threads
  • Some static partitioning, dynamic core with some
    fairness mechanisms
  • Shared caches

3
Effect on Scheduling
  • HT abstracted as independent logical processors
  • Processes can have mutual detrimental effect
  • Varies with workloads, large range of combined
    performance
  • Can improve performance by scheduling processes
    that perform well together
  • e.g. avoid co-scheduling two cache-thrashers
  • More important in multi-package/core systems

4
What is good performance?
  • Optimise for performance - how to measure?
  • Maximising IPC is biased towards high-IPC tasks
  • Want to compare performance under HT to what we
    would have had with no HT
  • Fair to high-IPC and low-IPC tasks
  • HT performance ratio pi,b for code block s in
    interval i
  • System performance ratio is sum of running thread
    performance ratios

Time to execute s with no HT Time to execute s
with HT and code block b running on the other
thread
5
Performance Estimation
  • Can only get pi,b exactly by measuring both cases
  • Use online method to estimate
  • Processor performance counter data
  • Estimate pi,b as f (perf counter increments)
  • f is linear function of increments
  • Coefficients learned by measuring HT and no-HT
    cases for a training set
  • Break tasks into multiple, short code blocks
  • Time code block running no-HT and HT against
    various other tasks
  • Multiple linear regression on results

6
Scheduling (1)
  • Modify scheduling decision to try to maximise
    system performance ratio
  • Estimate performance ratio of task pairs
  • Rolling average for each pairing that occurs
  • Small cache indexed by hash(pid1, pid2)
  • Use this to influence scheduling
  • e.g. avoid co-scheduling a pair of tasks that
    performed badly together in the past
  • e.g. select best performing task for one logical
    processor given the task running on the other
    logical processor

7
Scheduling (2)
  • Implementation tryhard
  • Modify dynamic priority of a task depending on
    the task running on the other logical processor
  • Implementation plan
  • Gang schedule tasks based on a scheduling plan
    created occasionally based on recorded
    performance data
  • Allow other pairings to run occasionally
  • Dont want to miss a potentially good pairing
  • Use heuristics rather than rigid rules
  • Respect static priorities, starvation avoidance,
    etc.

8
Results
  • Linux 2.4.29
  • P4 Xeon HT 2.4GHz
  • Benchmarks from SPEC CPU2000
  • Speedup is akin to system performance ratio
  • basic is native package, rather than logical
    processor, affinity
  • ipc is a simple IPC-maximising scheme
  • Checked for fairness and respect of static
    priority

9
Summary Conclusions
  • Performance on Hyper-Threading can be estimated
    using performance counters
  • More fair than counting IPC
  • Performance gains are possible with HT-aware
    scheduling
  • Gains are relatively small and incur some
    complexity
  • May not be worthwhile
  • Further work
  • Improved estimation
  • Application to other schedulers, e.g. Linux 2.6

10
Related Work
  • Thread-sensitive scheduling Parekh 2000
  • Maximise a metric (e.g. IPC, cache hit rate) per
    scheduling quantum
  • Symbiotic jobscheduling Snavely 2000, 2002
  • Sample, optimize, symbios
  • Hyper-Threading in Linux Nakajima 2002
  • Longer period performance counter sampling,
    modify hard CPU affinity of tasks
  • Chip multithreading Fedorova 2004, 2005
  • L2 cache-conscious scheduling to increase hit rate
Write a Comment
User Comments (0)
About PowerShow.com