Title: BALANCED CACHE
1BALANCED CACHE
- Ayse BAKIR, Zeynep ZENGIN
2 Outline
- Introduction
- Motivation
- The B-Cache Organization
- Experimental Methodology and Results
- Programmable Decoder Design
- Analysis
- Related Work
- Conclusion
3Introduction
- Increasing gap between memory latency and
- processor speed is a critical bottleneck to
achieve a high performance computing system. - Multilevel memory hierarchy has been developed to
hide the memory latency.
4Introduction
Level one cache normally resides on a processors
critical path, fast access to level one cache is
an important issue for improved processor
performance.
5Introduction
- There are two cache organization models that
- have been developed
- Direct-Mapped Cache
- Set-Associative Cache
6Introduction
7Introduction
8Introduction
9Introduction
- Frequent hit sets have many more cache hits than
other sets. - The cache misses occur more frequently in
Frequent miss sets. - Less accessed sets are accessed less than 1 of
the total cache - references.
10Introduction
- Balanced Cache (B-Cache)
- A mechanism to provide the benefit of cache
- block replacement while maintaining the
- constant access time of a direct-mapped cache
11Introduction
- The decoder length of a traditional direct-mapped
cache is increased by three bits - accesses to heavily used sets can be reduced to
1/8th of the original design. - only 1/8th of the memory address space has a
mapping to the cache sets. - A replacement policy is added.
- A programmable decoder is used.
12Motivation - Example
13Motivation - Example
- 8-bit adress
- same as in 2-way cache
14B-Cache Organization - Terminology
- Memory address mapping factor (MF)
-
- B-Cache associativity (BAS)
- PI index length of PD
- NPI index length of NPD
- OI index length of original direct-mapped cache
MF 2(PINPI)/2OI , where MF1
BAS 2OI/2NPI , where BAS1
15B-Cache Organization
MF 2(PINPI)/2OI 2(66)/298
BAS 2(OI)/2NPI 2(3)/268
16B-Cache OrganizationReplacement Policy
- Random Policy
- Simple to design and needs very few extra
hardware. - Least Recently Used(LRU)
- May achieve a better hit rate but will have more
area overhead than the random policy.
17Experimental Methodology and Results
- Miss rate is used as the primary metric to
measure the BCache effectiveness, and MP and BAS
parameters are determined. - Results are compared with baseline level one
cache(a direct-mapped 16kB cache with a line size
of 32 bytes for instruction and data caches) - 4-issue out-of-order processor simulator is used
to collect the miss rate. 26 SPEC2K benchmarks
are run using the SimpleScalar tool set.
18Experimental Methodology and Results
- 16 entry victim buffer
- set-associative caches
B-Caches with dif. MFs
19Experimental Methodology and Results
- 16 entry victim buffer
- set-associative caches
B-Caches with dif. MFs - The miss rate reduction of the B-Cache is as good
as a 4-way - cache for the data cache.
- For the instruction cache, on average, the miss
rate reduction is - 5 better than a 4-way cache.
20Programmable Decoder Design
- Latency,
- Storage,
- Power Costs
21Timing Analysis
- Critical path
- Direct mapped Tag side
- B-Cache May be on tag side or data side
- B-Cache modifies local decoder
22Timing Analysis
23Storage Overhead
- B-cache uses CAM cells additionally
- CAM cell is 25 larger than the SRAM cell used by
data and tag memory
24Power Overhead
- Extra power consumption PD of each subarray.
- Power reduction
- 3-bit data length reduction
- Removal of 3 input NAND gates
25ANALYSIS
- Overall Performance
- Overall Energy
- Design Tradeoffs for MP and BAS for a Fixed
Length of PD - Balance Evaluation
- The Effect of L1 Cache Sizes
- Comparison
26Overall Performance
27Overall Energy
- Static Dynamic Power Dissipation
- Charging and discharging of the load capacitance
- Memory Related
- Chip caches
- Offchip memory
28Design Tradeoffs for MP and BAS for a Fixed
Length of PD
The question is which design has a higher miss
rate reduction???
29Design Tradeoffs for MP and BAS for a Fixed
Length of PD
30Balance Evaluation
- Frequent hit sets Hits 2 times higher
- Frequent Miss sets misses 2 times higher
- Less accessed sets accesses below half
31- The miss rate reductions increase when the MF is
increased - B-Cache, the design with MF 8 and BAS 8 is
the best
32Comparison
- With a victim buffer the miss rate reduction of
the B-Cache is higher than the victim buffer - with a highly associative cache
- HAC is for low-power embedded systems
- HAC is an extreme case of the B-Cache, where the
decoder of the HAC is fully programmable.
33RELATED WORK
- Reduce the miss rate of direct mapped caches
- Reduce the access time of set associative caches
34Reducing Miss Rate of Direct Mapped Caches
- TECHNIQUES
- Page allocation
- Column associative cache
- Adaptive group associative cache
- Skewed associative cache
35Reducing Access Time of Set-associative Caches
- Partial address matcing predicting hit way
- Difference bit cache
36B-CACHE SUMMARY
- B-cache can be applied to both high performance
and low-power embedded systems. - Balanced without any software intervention.
- Feasible and easy to implement
37Conclusion
- B-Cache allows the accesses to cache sets to be
balanced by increasing the decoder length and
incorporating a replacement policy to a
direct-mapped cache design. - programmable decoders dynamically determine which
memory address has a mapping to the cache set - A 16kB level one B-Cache outperforms a
traditional same sized direct mapped cache by
64.5 and 37.8 for instruction and data cache,
respectively - Average IPC improvement 5.9
- Energy reduction 2.
- Access time same as a traditional direct mapped
cache
38References
- C. Zhang,Balanced CacheReducing Conflict Misses
of Direct-Mapped Caches through Programmable
Decoders,ISCA 2006,IEEE. - C. Zhang,Balanced Instruction CacheReducing
Conflict Misses of Direct-Mapped Caches through
Balanced Subarray Accesses,IEEE Computer
Architecture Letter, May 2005. - Wilkonson, B.(1996), Computer Architecture
Design and Performance, Prentice Hall Europe. - University of Maryland http//www.cs.umd.edu/class
/fall2001/cmsc411/proj01/cache/cache.html