Title: A Low-Power Instruction Cache Architecture Exploiting Program Execution Footprints
1A Low-Power Instruction Cache Architecture
Exploiting Program Execution Footprints
Koji Inoue and Kazuaki Murakami
Kyushu University
2Introduction
Increase in cache size
Power consumed in on-chip caches
DEC 21164 CPU
StrongARM SA-110 CPU
Bipolar ECL CPU
50
25
43
Kamble et. al., Analytical energy Dissipation
Models for Low Power Caches, ISLPED97
Joouppi et. al., A 300-MHz 115-W 32-b Bipolar
ECL Microprocessor ,IEEE Journal
of
Solid-State Circuits93
3Breakdown of Cache Energy
Word 64 bits Cache Size 64 KB Line Size 64 B
Energy consumed in Cache Edecode Esram Eio
Breakdown of Esram per access
Others
Etag Edata
Data (bit-lines)
Cache Subbanking
Tag (bit-lines)
Tag Memory
Data Memory (Cache Lines)
of words in a Subbank-entry (Total of
Subbanks)
Subbank
This calculation is based on Kamble, et. Al.,
Analytical energy Dissipation Models for Low
Power Caches, ISLPED97
4History-Based Tag-Comparison (HBTC) Instruction
Cache -Motivation-
Hit rate of instruction cache (I-) is quite HIGH!
Most of the tag-comparisons result in HIT
5Can We Know Existence of Instructions in Cache
without Tag-Comparison?
YES!
Consider
- An instruction has been executed at least once.
- No cache miss has occurred since the last
execution of the instruction.
We know that the instruction exists in cache
without any tag-comparison.
6But, How?
Keep the track of instruction execution by
leaving footprints in BTB!
1. Execute an instruction block A at time T
Leave the execution footprint in the
corresponding BTB-entry.
A?
(2. If a cache miss occurs, then erase all the
footprints.)
3. Try to execute the instruction block A at time
TX
If the footprint is detected in BTB, then omit
the tag comparisons for all the instruction in A!
A
7HBTC I- Architecture
EFT (Execution Footprint on Taken)
EFN (EF on Not-taken)
Instruction Block
Top
Target Address
Branch Inst. Addr.
I-
BTB
Not-taken
Tail
Target Address
Branch Inst. Addr.
Branch Prediction Result
TCO (Tag Comparison Omitting flag)
Tag comparison enable?
8Operation Example
EFT
EFN
TCO
Iteration Count
1-C
1
2
3
4
5
6
7
A
Top
Branch Target Buffer
B
Time
Branch to F
(Iteration Count Address of Branch)
C
Branch to A
D
Branch to A
State of BTB
F
Top
Execution Flow
9Operation Example
Iteration Count
1
2
3
4
5
6
7
A
Top
B
Branch to F
C
Branch to A
D
Branch to A
Performing!
F
Top
10Operation Example
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
B
Branch to F
C
Branch to A
D
Branch to A
Performing!
F
Top
11Operation Example
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
B
Branch to F
C
Branch to A
D
Branch to A
Performing!
F
Top
12Operation Example
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
C
Branch to A
D
Branch to A
Performing!
F
Top
13Operation Example
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
C
Branch to A
D
Branch to A
Omitting!
F
Top
14Operation Example
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
D
Branch to A
Omitting!
F
Top
15Operation Example
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
D
Branch to A
Omitting!
F
Top
16Operation Example
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
4-C
Branch-C
D
Branch to A
Omitting!
F
Top
17Operation Example
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
4-C
Branch-C
D
Branch to A
Performing!
F
Top
18Operation Example
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
4-C
Branch-C
D
Branch to A
Performing!
4-D
F
Top
Branch-C
Branch-D
19Operation Example
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
4-C
Branch-C
D
Branch to A
Performing!
4-D
F
Top
Branch-C
Branch-D
20Operation Example
EFT
EFN
TCO
EFN of Branch-C
Iteration Count
1-C
5-C
Branch-C
Branch-C
1
2
3
4
5
6
7
Branch-D
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
4-C
Branch-C
D
Branch to A
Performing!
4-D
F
Top
Branch-C
Branch-D
21Operation Example
EFT
EFN
TCO
EFN of Branch-C
Iteration Count
1-C
5-C
Branch-C
Branch-C
1
2
3
4
5
6
7
Branch-D
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
4-C
Branch-C
D
Branch to A
Omitting!
4-D
F
Top
Branch-C
Branch-D
22Evaluation
Simulation Results
Integer Programs
Normalized Tag-Comparison Count
FP Programs
099.go 129.compress 130.li 134.perl
102.swim 110.applu 141.apsi
124.m88ksim 126.gcc 132.ijpeg 147.vortex
107.mgrid 125.turb3d
Simulator SimpleScalar Cache size 32 KB, block
size 32 B, Branch predictor 2-bit counter, of
BPT entry 2K of BTB entry 2 K, BTB
associativity 4 RAS 8
23Conclusions
History-Based Tag-Comparison Instruction Cache
- Exploits execution footprints recorded in BTB.
- Reduces tag-comparison count.
- Reduces tag-comparison count by 99 (107.mgrid).
Future work
- Analyze energy consumption with more accurate
cache-energy models. - Evaluate performance with cycle-base simulation.
24Buck Up Slides (History-based Tag-Comparison
Cache)
25Outline
- Introduction
- History-Based Tag-Comparison Cache
- Motivation
- Mechanism
- Architecture
- Operation
- Evaluations
- Conclusions
26Conventional Direct-Mapped Cache
ECache Edecode Esram Eio
Etag Edata
Tag memory
Reference-address
Data memory
Tag
Index
Offset
Tag
Line
Direct-Mapped Cache
Word Data
Hit?
27History-Based Tag-Comparison Cache-Operation
Example-
Iteration Count
1
2
3
4
5
6
7
A
Top
B
Branch to F
C
Branch to A
D
Branch to A
Performing!
F
Top
28History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
B
Branch to F
C
Branch to A
D
Branch to A
Performing!
F
Top
29History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
B
Branch to F
C
Branch to A
D
Branch to A
Performing!
F
Top
30History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
C
Branch to A
D
Branch to A
Performing!
F
Top
31History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
C
Branch to A
D
Branch to A
Omitting!
F
Top
32History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
D
Branch to A
Omitting!
F
Top
33History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
D
Branch to A
Omitting!
F
Top
34History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
4-C
Branch-C
D
Branch to A
Omitting!
F
Top
35History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
4-C
Branch-C
D
Branch to A
Performing!
F
Top
36History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
4-C
Branch-C
D
Branch to A
Performing!
4-D
F
Top
Branch-C
Branch-D
37History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
4-C
Branch-C
D
Branch to A
Performing!
4-D
F
Top
Branch-C
Branch-D
38History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
EFN of Branch-C
Iteration Count
1-C
5-C
Branch-C
Branch-C
1
2
3
4
5
6
7
Branch-D
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
4-C
Branch-C
D
Branch to A
Performing!
4-D
F
Top
Branch-C
Branch-D
39History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
EFN of Branch-C
Iteration Count
1-C
5-C
Branch-C
Branch-C
1
2
3
4
5
6
7
Branch-D
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
4-C
Branch-C
D
Branch to A
Omitting!
4-D
F
Top
Branch-C
Branch-D
40History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
EFN of Branch-C
Iteration Count
1-C
5-C
Branch-C
Branch-C
1
2
3
4
5
6
7
Branch-D
A
Top
2-C
5-D
Branch-C
Branch-C
Branch-D
B
Branch to F
3-C
Branch-C
C
Branch to A
4-C
Branch-C
D
Branch to A
Omitting!
4-D
F
Top
Branch-C
Branch-D
41History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
EFN of Branch-C
Iteration Count
1-C
5-C
Branch-C
Branch-C
1
2
3
4
5
6
7
Branch-D
A
Top
2-C
5-D
Branch-C
Branch-C
Branch-D
B
Branch to F
3-C
Branch-C
C
Branch to A
4-C
Branch-C
D
Branch to A
Omitting!
4-D
F
Top
Branch-C
Branch-D
42History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
EFN of Branch-C
Iteration Count
1-C
5-C
Branch-C
Branch-C
1
2
3
4
5
6
7
Branch-D
A
Top
2-C
5-D
Branch-C
Branch-C
Branch-D
B
Branch to F
3-C
6-C
Branch-C
Branch-C
C
Branch-D
Branch to A
4-C
Branch-C
D
Branch to A
Omitting!
4-D
F
Top
Branch-C
Branch-D
43History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
EFN of Branch-C
Iteration Count
1-C
5-C
Branch-C
Branch-C
1
2
3
4
5
6
7
Branch-D
A
Top
2-C
5-D
Branch-C
Branch-C
Branch-D
B
Branch to F
3-C
6-C
Branch-C
Branch-C
C
Branch-D
Branch to A
4-C
Branch-C
D
Branch to A
Omitting!
4-D
F
Top
Branch-C
Branch-D
44History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
EFN of Branch-C
Iteration Count
1-C
5-C
Branch-C
Branch-C
1
2
3
4
5
6
7
Branch-D
A
Top
2-C
5-D
Branch-C
Branch-C
Branch-D
B
Branch to F
3-C
6-C
Branch-C
Branch-C
C
Branch-D
Branch to A
4-C
6-D
Branch-C
Branch-C
D
Branch to A
Branch-D
Omitting!
4-D
F
Top
Branch-C
Branch-D
45History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
EFN of Branch-C
Iteration Count
1-C
5-C
Branch-C
Branch-C
1
2
3
4
5
6
7
Branch-D
A
Top
2-C
5-D
Branch-C
Branch-C
Branch-D
B
Branch to F
3-C
6-C
Branch-C
Branch-C
C
Branch-D
Branch to A
4-C
6-D
Branch-C
Branch-C
D
Branch to A
Branch-D
Omitting!
4-D
F
Top
Branch-C
Branch-D
46History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
EFN of Branch-C
Iteration Count
1-C
5-C
Branch-C
Branch-C
1
2
3
4
5
6
7
Branch-D
A
Top
2-C
5-D
Branch-C
Branch-C
Branch-D
B
Branch to F
3-C
6-C
Branch-C
Branch-C
C
Branch-D
Branch to A
4-C
6-D
Branch-C
Branch-C
D
Branch to A
Branch-D
7-B
4-D
Branch-B
F
Top
Branch-C
Branch-C
Branch-D
Branch-D
Performing!
47History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
RCN of Branch-C
Iteration Count
1-C
5-C
Branch-C
Branch-C
1
2
3
4
5
6
7
Branch-D
A
Top
2-C
5-D
Branch-C
Branch-C
Branch-D
B
Branch to F
3-C
6-C
Branch-C
Branch-C
C
Branch-D
Branch to A
4-C
6-D
Branch-C
Branch-C
D
Branch to A
Branch-D
7-B
4-D
Branch-B
F
Top
Branch-C
Branch-C
Branch-D
Branch-D
Performing!
Omitting!
48Low Power Caches- Reducing both Etag and Edata -
Processor
Adding a small L0 cache
L0 Cache
- Filter Cache
- S-Cache
- Block Buffering
L1 Cache
Dividing cache module
Cache
Multiple accessing
Sequential Way-Access
- MRU Cache
- Hash-Rehash Cache
way3
way0
way1
way2
49Low Power Caches- Reducing Edata -
Dividing cache module
Tag
Line
Accessing sequentially
- Phased Cache
- Pipelined Cache
Tag
Line
Miss!
Hit!
Replace
50Low Power Caches- Reducing Etag -
Conditional Tag Compare
- Inter-Line Tag Comparison
Successive Instructions i and j
Intra-line sequential flow Consecutive
addresses, and same cache line Intra-line
non-sequential flow Non-Consecutive addresses,
and same cache line Inter-line sequential
flow Consecutive addresses, and different cache
lines Inter-line non-sequential
flow Non-Consecutive addresses, and different
cache lines
Perform tag comparison only on inter-line flows
51Breakdown of Esram
CS 32 KB L S 32 B
CS 64 KB LS 64 B
32-bit CPU
64-bit CPU
Breakdown of Energy
of words in a Subbank (Total of Subbanks)
CS Cache Size LS Line Size
Esram_others
Esram_data_bit
This calculation is based on Kamble, et. Al.,
Analytical energy Dissipation Models for Low
Power Caches, ISLPED97
Esram_tag_bit
52History-Based Tag-Comparison Cache-Operation
Flow-
On BTB access
53History-Based Tag-Comparison Cache-Operation
Flow-
Start
On PC recovery
Y
BTB update?
Replacement?
N
N
Y
Wrong Prediction?
N
Y
TCO RCT
TCO RCN
RCN TCO
RCT TCO
1 RCN
1 RCT
Go to start
54Evaluations-Simulation Environment-
8 integer and 5 FP programs from the SPEC95
Cache Simulator
Address Traces
Report
Branch Target Buffer
Branch Prediction Table
Total count of tag-comparison
Functional Execution
SimpleScalar Processor
55Evaluation-Cache Models-
- C-TC (Conventional Tag-ComparisonBase)
- IL-TC (Interline Tag-Comparison)
- H-TC (History-based Tag-Comparison)
- H-TCideal (History-based Tag-Comparison)
- HIL-TC (History-based Interline Tag-Comparison)
Perform tag comparison in every cache access
Perform tag comparison only on inter-line flow
Perform tag comparison only when TCO flag is 0
Nearly ideal H-TC (perfect instruction cache and
full-associative BTB)
Combination of IL-TC and H-TC
56Evaluation-Simulation Results-
Normalized Total Count of Tag-Comparisons
57Evaluation-Effect of BTB Associativity-
2way 8way 32way 128way 512way 2048way H-TCIdeal
Normalized Total Counts of Tag-Comparison
58Evaluation-Effect of Cache Size-
4 KB 8 KB 16 KB 32 KB 64 KB 512
KB Perfect H-TCideal
Normalized Total Counts of Tag-Comparison
59Evaluation-Energy Overhead -
0.1
0.09
0.08
0.07
0.06
Ave. of Erased Footprints per I-fetch bit
Ave. of Erased Footprints per Erase-Operation
bit
0.05
0.04
0.03
0.02
0.01
0.00