Techniques for Multicore Thermal Management - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Techniques for Multicore Thermal Management

Description:

PowerPoint Presentation Author: bernie Last ... 9 Migration Policies Summary & Conclusion DVFS Challenge DFVS Open-Loop Control Using Feedback (Close-loop) ... – PowerPoint PPT presentation

Number of Views:112
Avg rating:3.0/5.0
Slides: 24
Provided by: bern3152
Learn more at: https://cs.login.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Techniques for Multicore Thermal Management


1
Techniques for Multicore Thermal Management
Field Cady, Bin Fu and Kai Ren
2
Techniques for Multicore Thermal Management
  • Overview and comparison of techniques
  • Plus determining the critical thread
  • DVFS details
  • Thread movement

3
Taxonomy
  • Stop Go vs DVFS
  • Stop Go suspend core operation for 30
    millisecs when temperature above threshold
  • DVFS dynamic voltage and frequency scaling,
    from control theory
  • Distributed vs Global
  • Apply above to all cores or individually
  • Performance asymmetry different demands on
    different cores

4
Taxonomy (cont.)
  • Migration
  • Moving threads between cores
  • Timescale on order of a millisecond, much slower
    than DVFS
  • Migration is outer loop or control, riding on
    top of DVFS or Stop-Go
  • Migrate critical thread
  • Measure criticality with heat sensor
  • Or with cache misses as a proxy

5
Aside Criticality
  • In separate paper, Abhishek et. al. defines
    critical as slowest thread
  • If we know which is critical
  • Task stealing from critical thread
  • Guide DVFS to prefer critical thread
  • Explored proxies
  • 13-32 performance boost in task stealing on
    32-core machine

6
Criticality (cont.)
  • Cache misses an excellent proxy

7
Donald and Martonosi comparison of techniques
  • Goal maximize performance subject to
    temperature constraint
  • Measure performance in BIPS and duty cycle,
    i.e. useful time, scaled for DVFS frequency
  • Run on SPEC benchmarks
  • Simulated 4-core processor

8
Results
  • All normalized to distributed Stop-Go

9
  • Stop-Go was terrible!
  • Why didnt they try with lower frequency?
  • Was 30 milliseconds the right time to stop?
  • They subsequently focus solely on DVFS, even
    though the hardware is trickier

10
Migration Policies
11
Summary Conclusion
  • DVFS far superior to Stop-Go
  • Distributed control helps, esp. for Stop-Go
  • Migration helps for Stop-Go
  • Counter and Sensor-based migration comparable

12
DVFS
  • Dynamic voltage and frequency scaling (per core).
  • Dynamic voltage scaling is a power management
    technique in computer architecture, where the
    voltage used in a component is increased or
    decreased
  • Dynamic frequency scaling (also known as CPU
    throttling) is a technique in computer
    architecture where a processor is run at a
    less-than-maximum frequency in order to conserve
    power.

13
Challenge
  • Multiple cores may need to be manipulated
    simultaneously to control both power and
    temperature for a CMP chip. Require a
    Multi-Input-Multi-Output (MIMO) control
  • Application software is always designed for
    single-core processors. Power shifting needed.
  • Heterogeneous cores
  • Workload of a CMP processor is unpredictable at
    design time and may vary significantly at runtime

14
DFVS
15
Open-Loop Control
  • P(k1) P (k) A ? f(k)

16
Using Feedback (Close-loop)
  • Dynamically change matrix A.

17
(No Transcript)
18
Thread Motion Fine-Grained Power Management for
Multi-Core Systems
19
Motivation
  • Limitations of DVFS
  • Coarse grained
  • Initiated by OS in milliseconds
  • Voltage transition delay 10 microseconds
  • Too slow to respond fine variations in program
    behavior (Cache miss nanoseconds)
  • Per-core DVFS with multiple VF settings
  • High cost of off-chip regulators
  • Bad scalability with a large number of cores

20
Thread Motion
  • Idea of Thread Motion
  • Moving threads between cores with two VF domains
  • Threads experience virtually continuous Voltage

21
Thread Motion
  • TM Manager
  • A separate embedded microcontroller running TM
    algorithm
  • Effective IPC
  • maintain a table of IPC for each application
  • high IPC compute-intensive
  • low IPC cache miss, memory access latency

22
Thread Motion Algorithm
  • Movement Policy
  • Assign a thread in a compute-intensive phase to a
    high VF core
  • Intra-cluster movement considered first
  • Trigger point
  • TM-interval fixed intervals 200 cycles
  • Miss-driven move a cache-missed thread

23
Thread Motion
Better Quality
Write a Comment
User Comments (0)
About PowerShow.com