Reducing Cache Misses - PowerPoint PPT Presentation

About This Presentation
Title:

Reducing Cache Misses

Description:

Title: Computer Architecture Author: jb Last modified by: Engineering Science Created Date: 7/9/2001 10:13:06 AM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 22
Provided by: jb57
Category:

less

Transcript and Presenter's Notes

Title: Reducing Cache Misses


1
Reducing Cache Misses
Classifying Misses 3 Cs
  • 5.1 Introduction
  • 5.2 The ABCs of Caches
  • 5.3 Reducing Cache Misses
  • 5.4 Reducing Cache Miss Penalty
  • 5.5 Reducing Hit Time
  • 5.6 Main Memory
  • 5.7 Virtual Memory
  • 5.8 Protection and Examples of Virtual Memory
  • CompulsoryThe first access to a block is not in
    the cache, so the block must be brought into the
    cache. Also called cold start misses or first
    reference misses.(Misses in even an Infinite
    Cache)
  • CapacityIf the cache cannot contain all the
    blocks needed during execution of a program,
    capacity misses will occur due to blocks being
    discarded and later retrieved.(Misses in Fully
    Associative Size X Cache)
  • ConflictIf block-placement strategy is set
    associative or direct mapped, conflict misses (in
    addition to compulsory capacity misses) will
    occur because a block can be discarded and later
    retrieved if too many blocks map to its set. Also
    called collision misses or interference
    misses.(Misses in N-way Associative, Size X
    Cache)

2
3Cs Absolute Miss Rate (SPEC92)
Reducing Cache Misses
Classifying Misses 3 Cs
Conflict
Compulsory vanishingly small
3
21 Cache Rule
Reducing Cache Misses
Classifying Misses 3 Cs
miss rate 1-way associative cache size X
miss rate 2-way associative cache size X/2
Conflict
4
3Cs Relative Miss Rate
Reducing Cache Misses
Classifying Misses 3 Cs
Conflict
5
Reducing Cache Misses
1. Larger Block Size
Using the principle of locality. The larger the
block, the greater the chance parts of it will be
used again.
Size of Cache
6
2. Higher Associativity
Reducing Cache Misses
  • 21 Cache Rule
  • Miss Rate Direct Mapped cache size N
  • Miss Rate 2-way cache size N/2
  • But Beware Execution time is the only final
    measure we can believe!
  • Clock Cycle time increase as a result of having a
    more complicated cache.
  • Hill 1988 suggested hit time for 2-way vs.
    1-way is external cache 10internal 2

7
Avg. Memory Access Time vs. Miss Rate
Reducing Cache Misses
2. Higher Associativity
The time to access memory has several components.
The equation is Average Memory Access Time
Hit Time Miss Rate X Miss Penalty The miss
penalty is 50 cycles. See data on next page.
Associativity Clock Cycle Time
1 1.00
2 1.10
3 1.12
8 1.14
Result
8
Example Avg. Memory Access Time vs. Miss Rate
Reducing Cache Misses
2. Higher Associativity
9
Reducing Cache Misses
3. Victim Caches
  • How to combine fast hit time of direct mapped yet
    still avoid conflict misses?
  • Add buffer to place data discarded from cache
  • A 4-entry victim cache removed 20 to 95 of
    conflicts for a 4 KB direct mapped data cache
  • Used in Alpha, HP machines.
  • In effect, this gives the same behavior as
    associativity, but only on those cache lines that
    really need it.

10
Reducing Cache Miss Penalty
  • 5.1 Introduction
  • 5.2 The ABCs of Caches
  • 5.3 Reducing Cache Misses
  • 5.4 Reducing Cache Miss Penalty
  • 5.5 Reducing Hit Time
  • 5.6 Main Memory
  • 5.7 Virtual Memory
  • 5.8 Protection and Examples of Virtual Memory

Time to handle a miss is becoming more and more
the controlling factor. This is because of the
great improvement in speed of processors as
compared to the speed of memory.
Average Memory Access Time Hit Time Miss
Rate Miss Penalty
11
Reducing Cache Miss Penalty
Prioritization of Read Misses over Writes
  • Write through with write buffers offer RAW
    conflicts with main memory reads on cache misses
  • If simply wait for write buffer to empty, might
    increase read miss penalty (old MIPS 1000 by 50
    )
  • Check write buffer contents before read if no
    conflicts, let the memory access continue
  • Write Back?
  • Read miss replacing dirty block
  • Normal Write dirty block to memory, and then do
    the read
  • Instead copy the dirty block to a write buffer,
    then do the read, and then do the write
  • CPU stall less since restarts as soon as do read

12
Reducing Cache Miss Penalty
Sub Block Placement for Reduced Miss Penalty
  • Dont have to load full block on a miss
  • Have valid bits per subblock to indicate valid

Subblocks
Valid Bits
13
Reducing Cache Miss Penalty
Early Restart and Critical Word First
  • Dont wait for full block to be loaded before
    restarting CPU
  • Early restartAs soon as the requested word of
    the block arrives, send it to the CPU and let the
    CPU continue execution
  • Critical Word FirstRequest the missed word first
    from memory and send it to the CPU as soon as it
    arrives let the CPU continue execution while
    filling the rest of the words in the block. Also
    called wrapped fetch and requested word first
  • Generally useful only in large blocks,
  • Spatial locality a problem tend to want next
    sequential word, so not clear if benefit by early
    restart

block
14
Reducing Cache Miss Penalty
Second Level Caches
  • L2 Equations
  • Average Memory Access Time Hit TimeL1 Miss
    RateL1 x Miss PenaltyL1
  • Miss PenaltyL1 Hit TimeL2 Miss RateL2 x Miss
    PenaltyL2
  • Average Memory Access Time Hit TimeL1
  • Miss RateL1 x (Hit TimeL2 Miss RateL2
    Miss PenaltyL2)
  • Definitions
  • Local miss rate misses in this cache divided by
    the total number of memory accesses to this cache
    (Miss rateL2)
  • Global miss ratemisses in this cache divided by
    the total number of memory accesses generated by
    the CPU (Miss RateL1 x Miss RateL2)
  • Global Miss Rate is what matters

15
Reducing Hit Time
This is about how to reduce time to access data
that IS in the cache. What techniques are useful
for quickly and efficiently finding out if data
is in the cache, and if it is, getting that data
out of the cache.
  • 5.1 Introduction
  • 5.2 The ABCs of Caches
  • 5.3 Reducing Cache Misses
  • 5.4 Reducing Cache Miss Penalty
  • 5.5 Reducing Hit Time
  • 5.6 Main Memory
  • 5.7 Virtual Memory
  • 5.8 Protection and Examples of Virtual Memory

Average Memory Access Time Hit Time Miss
Rate Miss Penalty
16
Reducing Hit Time
Small and Simple Caches
  • Why Alpha 21164 has 8 KB Instruction and 8 KB
    data cache 96 KB second level cache?
  • Small data cache and clock rate
  • Direct Mapped, on chip

17
Reducing Hit Time
Pipelining Writes for Fast Write Hits
  • Pipeline Tag Check and Update Cache as separate
    stages current write tag check previous write
    cache update
  • Only STORES in the pipeline empty during a
    missStore r2, (r1) Check r1Add
    --Sub --Store r4, (r3)
    Mr1lt-r2
  • In shade is Delayed Write Buffer must be
    checked on reads either complete write or read
    from buffer

Check r3
18
(No Transcript)
19
Way prediction to reduce Hit time Reduce
conflict-miss in associative caches. Predict
which of the block within the set contains the
current data. The multiplexor is preset to this
predicted value so that the delay caused by
multiplexer is avoided. If error, correct block
is chosen and prediction is updated. One-bit
history can be used for prediction.
20
  • Trace caches to reduce Hit time
  • used in Pentium 4.
  • idea is to use dynamic trace of memory access
    pattern to fetch a sequence of instructions.
  • complex to implement.
  • high overhead.

21
  • Nonblocking cache
  • Most caches can only handle one outstanding
    request at a time. If a request is made to the
    cache and there is a miss, the cache must wait
    for the memory to supply the value that was
    needed, and until then it is "blocked".
  • A non-blocking cache has the ability to work on
    other requests while waiting for memory to supply
    any misses.
  • The Intel Pentium Pro and Pentium
    II processors use this technology for their level
    2 caches, which can manage up to four
    simultaneous requests.
  • This is done by using a transaction-based
    architecture, and a dedicated "backside" bus for
    the cache that is independent of the main memory
    bus. Intel calls this "dual independent bus"
    (DIB) architecture.
Write a Comment
User Comments (0)
About PowerShow.com