Cache Performance Optimization by ApplicationSpecific ReConfigurable Indexing - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Cache Performance Optimization by ApplicationSpecific ReConfigurable Indexing

Description:

Occur only in direct-mapped and set-associative caches. ... z bit register. Vdd. P2. P1. N1. NP. z. address bits. Selection. value. 2 FO4 maximum delay! ... – PowerPoint PPT presentation

Number of Views:20

Avg rating:3.0/5.0

Slides: 25

Provided by: massimo91

Category:

more less

Transcript and Presenter's Notes

Title: Cache Performance Optimization by ApplicationSpecific ReConfigurable Indexing

1
Cache Performance Optimization by
Application-Specific Re-Configurable Indexing
K. Patel E. Macii M. Poncino Politecnico di
Torino 10129 Torino, Italy
L. Benini Universita di Bologna 40136 Bologna,
Italy
ICCAD 2004 San Jose, CA November 9-11, 2004
2
Outline

Introduction.
Previous work.
Cache indexing Overview.
Application specific indexing.
Re-configurable indexing.
Experimental results.
Conclusions.

3
Introduction

Caches in embedded systems
Want low miss rates.
but tight area/power budgets.
Trade-offs Low miss rate vs. complexity.
Direct-mapped caches
Low complexity, faster access, but higher miss
rate.
Associative caches
Higher complexity, slower access, but smaller
miss rate.
Can we escape this trade-off?
Yes By exploit the knowledge on memory
references.

4
Our Contribution

Target
Embedded systems running a given application mix.
Objective
Optimization technique for reducing conflict
misses in direct-mapped caches.
Solution
Use an application-specific indexing mechanism.
Re-configurable upon re-targeting of the system.

5
Smart Indexing Previous Work

General-purpose indexing.
XOR-based indexing Frailong, 1985.
Irreducible polynomials Rau,1997.
Bit selection.
Application-specific indexing.
Trace-driven bit selection Givargis, 2003.

6
Cache Indexing Revisited

Strong analogy between cache indexing and
hashing.
Objective Map a large set of objects onto a much
smaller space.
Difference
Must use simple hashing functions for HW
implementation!

7
Cache Misses Revisited

Compulsory misses.
First access to line not in cache.
Capacity misses.
Active portion of memory exceeds cache size.
Conflict misses.
Active portion of address space fits in cache,
but too many lines map to the same cache entry.
Occur only in direct-mapped and set-associative
caches.

We target conflict misses, which are the weak
point of direct-mapped caches.
8
Cache Indexing and Conflict Misses

Conflict misses are affected by how cache lines
are addressed.
Traditional indexing
S Cache size B Block size n of address
bits.
b log2 B of offset bits.
S/B of cache lines.
m log2 (S/B) of index bits.
t n (mb) of tag bits.

n bits
9
Proposed Indexing Scheme

Compute index by doing a selection of all
non-offset bits (i.e., m bits out of z).

Problem What bits should we consider?
Use a specific address trace to select bits so as
to minimize conflict misses.

10
Application-Specific Cache Indexing

Modeling of conflict conditions
Trace T a0,,aL-1.
Direct Conflict Pattern between two addresses ai
and aj DCPij
Boolean conditions under which ai and aj would
conflict.
DCPij ?k0,..,z-1 ( ak yk)
yk variable which is 1 if bit k is in the
index.
ak 1 if ai and aj differ in bit k.

(? AND)
11
Application-Specific Cache Indexing

Modeling of conflict conditions
Example
z 6 bits
ai (010101)aj (110111)

DCPij y5y1
12
Application-Specific Cache Indexing

Modeling of conflict conditions
Total Conflict Pattern for and address ai CPi
OR of all the DCPs between ai and its successors
in the trace.
CPi ?ki1, , L DCPik

(? OR)
13
Application-Specific Cache Indexing

Modeling of conflict conditions
Example
Trace T (0, 1, 5, 6, 1, 5, 6, 5, 6).
For a1 (CP for 2nd reference)
CP1 DCP12 DCP13
DCP12
a1(001)
a2(101)
DCP13
a1(001)
a3(110)
CP1 DCP12 DCP13 y2 y2 y1 y0 y2

y2
y2 y1 y0
14
Application-Specific Cache Indexing

From conflict conditions to conflict misses.
Consider each CPi as an integer-valued term.
Value of each CPi is either 0 or 1.
Sum over all addresses in the trace
Cost ?i0,,L-1 CPi
Cost is an integer-valued function of all yis
Find an assignment of the yis that minimizes
Cost.
Proposed algorithm uses
BDDs for Boolean functions.
ADDs for integer-valued function.

(? arithmetic sum)
15
Application-Specific Cache Indexing

Example
z3 (8 memory words) and m2 (4 cache slots).
T (0, 1, 5, 6, 1, 5, 6, 5, 6).
m2 index bits ? 3 choices
bit 0 and bit 1 (traditional indexing)
bit 0 and bit 2
bit 1 and bit 2
Resulting ADD

Optimal indexing
y01, y10, y21
bit 0 and bit 2

16
Making Bit Selection Re-Configurable

Architecture of a bit-slice of the bit selector

z address bits
Selectionvalue
z bit register
Vdd
P2
NP
Vdd
P1
index bit i
N1
2 FO4 maximum delay!
17
Bit Selector Implementation
18
Experimental Flow
InputTrace
CacheParameters
19
Experimental Results Embedded Applications