TemperatureSensitive Loop Parallelization for Chip Multiprocessors - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

TemperatureSensitive Loop Parallelization for Chip Multiprocessors

Description:

Temperature-Sensitive Loop Parallelization for Chip Multiprocessors. Sri HK ... When one unit overheats, migrate its functionality to a distant, spare unit ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 31
Provided by: wei293
Category:

less

Transcript and Presenter's Notes

Title: TemperatureSensitive Loop Parallelization for Chip Multiprocessors


1
Temperature-Sensitive Loop Parallelization for
Chip Multiprocessors
International Conference on Computer Design,
10/2-5, 2005, San Jose
  • Sri HK Narayanan, Guilin Chen, Mahmut Kandemir,
    Yuan Xie
  • Embedded Mobile Computing Center (EMC2)
  • The Pennsylvania State University

2
Outline
  • Motivation
  • Related Works
  • Our Approach
  • Example
  • Experimental Results Conclusion

3
Motivation
  • Thermal Hotspots are a cause for concern
  • Caused due to increasing power density
  • Can result in the permanent chip damage
  • How to avoid damage
  • Cooling techniques
  • How to prevent HotSpots
  • Hardware techniques
  • This paper proposes a compiler directed technique
    to avoid hotspots in CMPs

4
Related work Dynamic Thermal Management
  • When one unit overheats, migrate its
    functionality to a distant, spare unit
  • Dual pipeline (Intel, ISQED 02)
  • Spare register file (Skadron et al. 2003)
  • Separate core (CMP) (Heo et al. ISLPED 2003)
  • Microarchitectural clusters (Intel, ICCD 2004)
  • Raises many interesting issues
  • Cost-benefit tradeoff for extra area
  • Use both resources (scheduling)
  • Run-time Thermal sensing/estimation
  • Yesterday, UC Riverside paper _at_ Session 2.2
    proposes a run-time thermal tracking method

5
Related work Design-time techniques
  • MDL _at_ PSU
  • Thermal-Aware IP Virtualization and Placement for
    Networks-on-Chip
  • Architecture, ICCD 2004
  • Thermal-Aware Allocation and Scheduling for MPSOC
    Design, DATE 2005
  • Thermal-Aware Floorplanning Using Genetic
    Algorithms ISQED 2005
  • Thermal-Aware Voltage-island architecting, the
    other paper in this session
  • Other groups
  • Thermal-Aware High Level Synthesis (Northwestern
    Univ. Memik, R.Dick (ISLPED 2005, ASP-DAC 2006)
  • Many more in this conference
  • Industry
  • Gradient Design Automation (a start-up
    showcases at DAC 2005)

6
CMP
Intel researchers and scientists are
experimenting with "many tens of cores,
potentially even hundreds of cores per die, per
single processor die. ..
Justin R. Rattner, Intel director of the
Corporate Technology Group, Spring 2005 IDF
Industry examples
Last night, Panel discussion on CMP
7
This paper- compiler approach
  • Temperature and performance sensitive loop
    scheduling
  • Schedules different loop iterations on CMP
  • Data locality aware and hence performance aware
  • Intuition behind the approach
  • Let hot cores idle while cool cores work.
  • Static scheduling of parallelized loop iterations
    at compiler time

8
How can the compiler schedule temperature aware
code?
  • This work targets loop intensive programs run on
    embedded CMPs
  • Loop nests are divided into chunks.
  • The number of cycles in a chunk is ?.
  • Let the starting temperature of a processor be Tc
  • The temperature after execution the chunk is
  • Tc F(Tc , ? , floorplan, power? )
  • ?, power? are obtained by profiling the code.
  • Floorplan and physical parameters remain constant.

9
Thermal modeling
  • Want a good model of chip temperature
  • That accounts for adjacency and package
  • That does not require detailed designs
  • That is fast enough for practical use
  • A compact model based on thermal R, C (Hotspot)
  • Parameterized to automatically derive a model
    based on various
  • Architectures
  • Power models
  • Floorplans
  • Thermal Packages

10
Temperature Estimation
  • The temperature of each block depends on the
    power consumption and the location of blocks.
  • The thermal resistance Rij of PEi with respect to
    PEj can be represented by units of temperature
    rise at PEi due to one unit of power dissipated
    at PEj.

11
Running ExampleBasic Schedule
Jacobis Algorithm
for (i1 ilt600 i) for (j1 jlt1000 j)
Bij (Ai-1j Ai1j
Aij-1 Aij1) / 4
Parallel Schedule
Parallelized Algorithm for 5 cores
for (ik1201 ilt(k1)120 i) for (j1
jlt1000 j) Bij (Ai-1j
Ai1j Aij-1 Aij1)
/ 4
12
Analysis of Basic Schedule
  • Assumptions in the example
  • Initial temperature is 0
  • Threshold temperature is 2
  • An idle slot reduces the temperature by 1 degree
    ( but ?0)
  • So at most 2 active slots can be scheduled
    together on one core
  • The ideal number of active processors at any time
    is 5.
  • Due to Jacobis algorithm consecutive iteration
    chunk exhibit locality
  • Analysis
  • Great locality
  • Uses only 5 processors
  • Will definitely overheat

13
Pure Temperature Aware Scheduling
  • Algorithm
  • Start with time slot as 0 and all iterations as
    unscheduled
  • While unscheduled iterations exit
  • Select the coolest A processors whose temperature
    is less than the threshold.
  • Schedule the chunks on those processors at
    current timeslot.
  • Reduce number of chunks to be scheduled.
  • Increase the time slot by 1.
  • Analysis
  • Poor locality
  • 1 extra time slot is used.
  • No temperature problems

14
Pure Temperature Aware Scheduling
15
Pure Locality Aware Scheduling
  • Algorithm
  • Start with a clean slate.
  • For each iteration chunk
  • Schedule it on the processor with greatest
    locality with it keeping at most two chunks
    together.
  • If more slots are required (when all processors
    are exhausted), increase the scheduling length.
  • Otherwise move to the next processor
  • Analysis
  • Very good locality
  • However 2 extra time slots are used.
  • No temperature problems

16
Locality and temperature aware scheduling
  • Algorithm
  • Use temperature aware scheduling to obtain the
    schedulable slots.
  • Use locality aware scheduling to assign chunks to
    these slots.
  • Analysis - Best of both worlds
  • Great Locality
  • No temperature problems
  • Good performance

C I0, I1, I2, I3, I4
C
17
(No Transcript)
18
Experiments
  • 5 codes loop intensive codes were tested

19
adi - Threshold Temperature 88 ºC
20
eflux - Threshold Temperature 88 ºC
21
adi - Threshold Temperature 88 ºC
22
eflux - Threshold Temperature 88 ºC
23
Sensitivity Analysis adi - Threshold Temperature
87 ºC
24
Sensitivity Analysis adi - Threshold
Temperature 86 ºC
25
Sensitivity Analysis adi - Threshold
Temperature 85 ºC
26
Sensitivity Analysis adi - Threshold
Temperature 84 ºC
27
Experiments
28
Experiments
29
Conclusion
  • Implemented a compiler directed combined
    temperature sensitive and performance aware
    scheduling algorithm.
  • Achieve impressive average and peak chip
    temperature reductions.
  • This allows software to take up the burden of
    preventing chip damage due to thermal effects.
  • Chips can be aggressively scaled
  • Cooling costs can be reduced
  • Lowers the need for hardware based thermal
    management schemes.

30
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com