Title: Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection
1Improving Energy Efficiency of Configurable
Caches via Temperature-Aware Configuration
Selection
- Hamid Noori , Maziar Goudarzi , Koji Inoue ,
and - Kazuaki Murakami
- Speaker Tohru Ishihara
- Institute of Systems Information
Technologies/KYUSHU, Japan - Kyushu University, Japan
2Outline
- Background
- Motivation
- Problem Definition
- Proposed Approach
- Architecture
- Reconfiguration Flow
- Experimental Results
- Conclusions
3Outline
- Background
- Motivation
- Problem Definition
- Proposed Approach
- Architecture
- Reconfiguration Flow
- Experimental Results
- Conclusions
4Background(1/2)
The dynamic energy per a cache access
The leakage power of a cache memory
5Background(2/2)
6Outline
- Background
- Motivational Example
- Problem Definition
- Proposed Approach
- Architecture
- Reconfiguration Flow
- Experimental Results
- Conclusions
7Motivational Example (1/3)
8Motivational Example (2/3)
Total dynamic energy for executing a program
Total static energy for executing a program
9Motivational Example (3/3)
Minimum-energy cache size
10Outline
- Background
- Motivation
- Problem Definition
- Proposed Approach
- Architecture
- Reconfiguration Flow
- Experimental Results
- Conclusions
11Problem Definition (1/3)
- Objective function total memory energy
- Cache dynamic energy
- Cache static energy
- Off-chip memory access energy
- Energy consumption during processor stall
Main memory
CPU
I-
D-
12Problem Definition (2/3)
- energy_memory(C, Temp, Tech)
- energy_dynamic(C, Tech) energy_static(C, Temp,
Tech) (1) - energy_dynamic(C, Tech)
- cache_accesses(C) energy_cache_access(C, Tech)
cache_misses(C) energy_miss(C,Tech)
(2) - energy_miss(C, Tech)
- energy_off_chip_stall energy_cache_block_refill(
C, Tech) (3) - energy_static(C, Temp, Tech)
- executed_clock_cycles(C) clock_period
- leakage_power(C, Temp, Tech) (4)
13Problem Definition (3/3)
- For a given application, processor architecture,
technology, and valid configurations of the
configurable cache, find a valid cache
configuration that results in minimum energy
consumption in a specific temperature over the
entire execution of the given application.
14Outline
- Background
- Motivation
- Problem Definition
- Proposed Approach
- Architecture
- Reconfiguration Flow
- Experimental Results
- Conclusions
15Architecture
- TACC
- BCC (proposed by Zhang et al. 1)
- Cache size (way shutdown)
- Number of ways (way concatenation)
- Line size
- Thermal sensor
- Accessible port for reading the thermal sensor
1 C. Zang, F. Vahid and W. Najjar,.A Highly
Configurable Cache Architecture for Embedded
Systems, ACM Trans. on Embedded Computing
Systems, vol.4, no.2, May 2005
16Reconfiguration Flow
17Outline
- Background
- Motivation
- Problem Definition
- Proposed Approach
- Architecture
- Reconfiguration Flow
- Experimental Results
- Conclusions
18Experiment Setup (1/2)
- Mibench
- Simplescalar
- Cache hit one clock cycle
- Cache miss 100 clock cycles
- Clock freq of the base processor 200 MHz
- CACTI 4.2
- Target technology 70nm (Vdd0.9)
- BCC (16KB)
- 16KB (4-, 2-, 1-way)
- 8KB (2-, and 1-way)
- 4KB (1-way)
- The line size for each of the configurations can
be 8-, 16-, or 32-byte.
19Experimental Setup (2/2)
- Base Configurable Cache (BCC)
- It has the same architecture proposed by Zhang et
al. 1 - It supports a limited set of configurations
- It is configured for each application for
corner-case (i.e. leakage at 100C) - Temperature-Aware Configurable Cache (TACC)
- TACC is configured for each execution of an
application considering the chip temperature at
that time
1 C. Zang, F. Vahid and W. Najjar,.A Highly
Configurable Cache Architecture for Embedded
Systems, ACM Trans. on Embedded Computing
Systems, vol.4, no.2, May 2005
20Energy Performance Evaluation
100
Performance Enhancement
100
21Data and Instruction Cache
D qsort djpeg lame dijkstra patricia sha adpcm crc fft
0C 16K, 32, 2 16K, 32, 2 16K, 32, 4 16K, 32, 2 16K, 32, 2 16K, 32, 2 8K, 32, 2 8K, 32, 2 16K, 32, 4
20C 8K, 32, 2 16K, 32, 2 16K, 32, 4 16K, 32, 2 16K, 32, 2 8K, 32, 1 8K, 32, 2 8K, 32, 2 16K, 32, 4
40C 8K, 32, 2 16K, 32, 2 16K, 32, 4 8K, 32, 2 16K, 32, 2 4K, 32, 1 8K, 32, 2 8K, 32, 2 16K, 32, 4
60C 8K, 32, 2 16K, 32, 2 16K, 32, 2 8K, 32, 2 8K, 32, 2 4K, 32, 1 4K, 16, 1 8K, 32, 2 8K, 32, 2
80C 8K, 32, 2 8K, 32, 2 16K, 32, 2 8K, 32, 2 8K, 32, 2 4K, 32, 1 4K, 16, 1 4K, 32, 1 8K, 32, 2
100C 4K, 32, 1 8K, 32, 2 8K, 32, 2 8K, 32, 2 8K, 32, 2 4K, 32, 1 4K, 32, 1 4K, 32, 1 8K, 32, 2
I basimath qsort djpeg lame dijkstra blowfish rijndael gsm fft
0C 16K, 8, 4 16K, 8, 4 16K, 32, 1 16K, 32, 2 16K, 32, 1 16K, 16, 2 16K, 32, 1 16K, 16, 4 8K, 32, 1
20C 16K, 16, 4 16K, 16, 4 16K, 32, 1 16K, 32, 2 16K, 32, 1 16K, 16, 2 16K, 32, 1 16K, 32, 2 8K, 32, 1
40C 16K, 16, 4 16K, 16, 4 8K, 32, 2 8K, 32, 2 8K, 32, 2 16K, 32, 2 16K, 32, 1 16K, 32, 2 8K, 32, 1
60C 16K, 16, 4 16K, 16, 4 8K, 32, 2 8K, 32, 2 8K, 32, 2 16K, 32, 2 16K, 32, 1 8K, 32, 2 8K, 32, 1
80C 16K, 32, 4 16K, 32, 4 8K, 32, 2 8K, 32, 2 8K, 32, 2 8K, 32, 2 16K, 32, 1 4K, 32, 1 8K, 32, 1
100C 16K, 32, 4 16K, 32, 4 8K, 32, 2 8K, 32, 2 8K, 32, 2 8K, 32, 2 16K, 32, 2 4K, 32, 1 8K, 32, 1
22Energy Saving
23Performance Enhancement
24Outline
- Background
- Motivation
- Problem Definition
- Proposed Approach
- Architecture
- Reconfiguration Flow
- Experimental Results
- Conclusions
25Conclusions
- Importance of temperature-aware configurable
cache for finer technologies. Up to 61 (17 on
average) energy consumption in 70nm technology
for instruction cache - Data cache is more easily affected by temperature
than instruction cache. Using a configurable data
cache, up to 77 (36 on average) energy can be
saved in 70nm technology. - The TACC improves the performance for instruction
cache up to 28 (5 on average) and for data
cache, it is up to 17 (8.1 in average).
26- Thank you for your attention
- Please ask any questions to noori_at_c.csce.kyushu-u.
ac.jp
27Backup slides
28(No Transcript)
29(No Transcript)
30ARM7TDMI ARM966E-S
130nm Power consumption 7.98 mW 62.5 mW
130nm Frequency 133 MHz 250 MHz
90nm Power consumption 7.08 mW 51.7 mW
90nm Frequency 236 MHz 470 MHz