BALANCED CACHE - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

BALANCED CACHE

Description:

Direct mapped: Tag side. B-Cache: May be on tag side or data side. B-Cache modifies local decoder ... 25% larger than the SRAM cell used by data and tag memory ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 39
Provided by: cmpeBo
Category:
Tags: balanced | cache | tagon

less

Transcript and Presenter's Notes

Title: BALANCED CACHE


1
BALANCED CACHE
  • Ayse BAKIR, Zeynep ZENGIN

2
Outline
  • Introduction
  • Motivation
  • The B-Cache Organization
  • Experimental Methodology and Results
  • Programmable Decoder Design
  • Analysis
  • Related Work
  • Conclusion

3
Introduction
  • Increasing gap between memory latency and
  • processor speed is a critical bottleneck to
    achieve a high performance computing system.
  • Multilevel memory hierarchy has been developed to
    hide the memory latency.

4
Introduction
Level one cache normally resides on a processors
critical path, fast access to level one cache is
an important issue for improved processor
performance.
5
Introduction
  • There are two cache organization models that
  • have been developed
  • Direct-Mapped Cache
  • Set-Associative Cache

6
Introduction
  • Direct-Mapped Cache

7
Introduction
  • Set Associative Cache

8
Introduction
9
Introduction
  • Frequent hit sets have many more cache hits than
    other sets.
  • The cache misses occur more frequently in
    Frequent miss sets.
  • Less accessed sets are accessed less than 1 of
    the total cache
  • references.

10
Introduction
  • Balanced Cache (B-Cache)
  • A mechanism to provide the benefit of cache
  • block replacement while maintaining the
  • constant access time of a direct-mapped cache

11
Introduction
  • The decoder length of a traditional direct-mapped
    cache is increased by three bits
  • accesses to heavily used sets can be reduced to
    1/8th of the original design.
  • only 1/8th of the memory address space has a
    mapping to the cache sets.
  • A replacement policy is added.
  • A programmable decoder is used.

12
Motivation - Example
  • 8-bit adresses

13
Motivation - Example
  • 8-bit adress
  • same as in 2-way cache

14
B-Cache Organization - Terminology
  • Memory address mapping factor (MF)
  • B-Cache associativity (BAS)
  • PI index length of PD
  • NPI index length of NPD
  • OI index length of original direct-mapped cache

MF 2(PINPI)/2OI , where MF1
BAS 2OI/2NPI , where BAS1
15
B-Cache Organization
MF 2(PINPI)/2OI 2(66)/298
BAS 2(OI)/2NPI 2(3)/268
16
B-Cache OrganizationReplacement Policy
  • Random Policy
  • Simple to design and needs very few extra
    hardware.
  • Least Recently Used(LRU)
  • May achieve a better hit rate but will have more
    area overhead than the random policy.

17
Experimental Methodology and Results
  • Miss rate is used as the primary metric to
    measure the BCache effectiveness, and MP and BAS
    parameters are determined.
  • Results are compared with baseline level one
    cache(a direct-mapped 16kB cache with a line size
    of 32 bytes for instruction and data caches)
  • 4-issue out-of-order processor simulator is used
    to collect the miss rate. 26 SPEC2K benchmarks
    are run using the SimpleScalar tool set.

18
Experimental Methodology and Results
  • 16 entry victim buffer
  • set-associative caches
    B-Caches with dif. MFs

19
Experimental Methodology and Results
  • 16 entry victim buffer
  • set-associative caches
    B-Caches with dif. MFs
  • The miss rate reduction of the B-Cache is as good
    as a 4-way
  • cache for the data cache.
  • For the instruction cache, on average, the miss
    rate reduction is
  • 5 better than a 4-way cache.

20
Programmable Decoder Design
  • Latency,
  • Storage,
  • Power Costs

21
Timing Analysis
  • Critical path
  • Direct mapped Tag side
  • B-Cache May be on tag side or data side
  • B-Cache modifies local decoder

22
Timing Analysis
23
Storage Overhead
  • B-cache uses CAM cells additionally
  • CAM cell is 25 larger than the SRAM cell used by
    data and tag memory

24
Power Overhead
  • Extra power consumption PD of each subarray.
  • Power reduction
  • 3-bit data length reduction
  • Removal of 3 input NAND gates

25
ANALYSIS
  • Overall Performance
  • Overall Energy
  • Design Tradeoffs for MP and BAS for a Fixed
    Length of PD
  • Balance Evaluation
  • The Effect of L1 Cache Sizes
  • Comparison

26
Overall Performance
27
Overall Energy
  • Static Dynamic Power Dissipation
  • Charging and discharging of the load capacitance
  • Memory Related
  • Chip caches
  • Offchip memory

28
Design Tradeoffs for MP and BAS for a Fixed
Length of PD
The question is which design has a higher miss
rate reduction???
29
Design Tradeoffs for MP and BAS for a Fixed
Length of PD
30
Balance Evaluation
  • Frequent hit sets Hits 2 times higher
  • Frequent Miss sets misses 2 times higher
  • Less accessed sets accesses below half

31
  • The miss rate reductions increase when the MF is
    increased
  • B-Cache, the design with MF 8 and BAS 8 is
    the best

32
Comparison
  • With a victim buffer the miss rate reduction of
    the B-Cache is higher than the victim buffer
  • with a highly associative cache
  • HAC is for low-power embedded systems
  • HAC is an extreme case of the B-Cache, where the
    decoder of the HAC is fully programmable.

33
RELATED WORK
  • Reduce the miss rate of direct mapped caches
  • Reduce the access time of set associative caches

34
Reducing Miss Rate of Direct Mapped Caches
  • TECHNIQUES
  • Page allocation
  • Column associative cache
  • Adaptive group associative cache
  • Skewed associative cache

35
Reducing Access Time of Set-associative Caches
  • Partial address matcing predicting hit way
  • Difference bit cache

36
B-CACHE SUMMARY
  • B-cache can be applied to both high performance
    and low-power embedded systems.
  • Balanced without any software intervention.
  • Feasible and easy to implement

37
Conclusion
  • B-Cache allows the accesses to cache sets to be
    balanced by increasing the decoder length and
    incorporating a replacement policy to a
    direct-mapped cache design.
  • programmable decoders dynamically determine which
    memory address has a mapping to the cache set
  • A 16kB level one B-Cache outperforms a
    traditional same sized direct mapped cache by
    64.5 and 37.8 for instruction and data cache,
    respectively
  • Average IPC improvement 5.9
  • Energy reduction 2.
  • Access time same as a traditional direct mapped
    cache

38
References
  • C. Zhang,Balanced CacheReducing Conflict Misses
    of Direct-Mapped Caches through Programmable
    Decoders,ISCA 2006,IEEE.
  • C. Zhang,Balanced Instruction CacheReducing
    Conflict Misses of Direct-Mapped Caches through
    Balanced Subarray Accesses,IEEE Computer
    Architecture Letter, May 2005.
  • Wilkonson, B.(1996), Computer Architecture
    Design and Performance, Prentice Hall Europe.
  • University of Maryland http//www.cs.umd.edu/class
    /fall2001/cmsc411/proj01/cache/cache.html
Write a Comment
User Comments (0)
About PowerShow.com