Title: Multilevel%20Memory%20Caches
1Multilevel MemoryCaches
- Prof. Sirer
- CS 316
- Cornell University
2Storage Hierarchy
SRAM on chip
- Technology Capacity Cost/GB Latency
- Tape 1 TB .17 100s
- Disk 300 GB .34 4ms
- DRAM 4GB 520 20ns
- SRAM off 512KB 123000 5ns
- SRAM on 16 KB ??? 2ns
- Capacity and latency are closely coupled, cost is
inversely proportional - How do we create the illusion of large and fast
memory?
SRAM off chip
DRAM
Disk
Tape
3Memory Hierarchy
- Principle Hide latency using small, fast
memories called caches - Caches exploit locality
- Temporal locality If a memory location is
referenced, it is likely to be referenced again
in the near future - Spatial locality If a memory location is
referenced, other locations near it will be
referenced in the near future
4(No Transcript)
5(No Transcript)
6(No Transcript)
7Cache Lookups (Read)
- Look at address issued by processor, search cache
tags to see if that block is in the cache - Hit Block is in the cache, return requested data
- Miss Block is not in the cache, read line from
memory, evict an existing line from the cache,
place new line in cache, return requested data
8Cache Organization
- Cache has to be fast and small
- Gain speed by performing lookups in parallel,
requires die real estate - Reduce hardware required by limiting where in the
cache a block might be placed - Three common designs
- Fully associative Block can be anywhere in the
cache - Direct mapped Block can only be in one line in
the cache - Set-associative Block can be in a few (2 to 8)
places in the cache
9Tags and Offsets
- Cache block size determines cache organization
-
31 Virtual Address
0
31 Tag 5
4 Offset 0
Block
10Fully Associative Cache
V
Tag
Block
word/byte select
line select
Offset Tag
hit encode
11Direct Mapped Cache
V
Tag
Block
Offset Index Tag
122-Way Set-Associative Cache
V
Tag
Block
V
Tag
Block
Offset Index Tag
13Valid Bits
- Valid bits indicate whether cache line contains
an up-to-date copy of the values in memory - Must be 1 for a hit
- Reset to 0 on power up
- An item can be removed from the cache by setting
its valid bit to 0
14Eviction
- Which cache line should be evicted from the cache
to make room for a new line? - Direct-mapped
- no choice, must evict line selected by index
- Associative caches
- random select one of the lines at random
- round-robin similar to random
- FIFO replace oldest line
- LRU replace line that has not been used in the
longest time
15Cache Writes
Memory DRAM
CPU
addr
Cache SRAM
data
- No-Write
- writes invalidate the cache and go to memory
- Write-Through
- writes go to main memory and cache
- Write-Back
- write cache, write main memory only when block is
evicted
16Dirty Bits and Write-Back Buffers
D
Tag
Data Byte 0, Byte 1 Byte N
V
Line
1
0
1
1
1
0
- Dirty bits indicate which lines have been written
- Dirty bits enable the cache to handle multiple
writes to the same cache line without having to
go to memory - Write-back buffer
- A queue where dirty lines are placed
- Items added to the end as dirty lines are evicted
from the cache - Items removed from the front as memory writes are
completed
17Misses
- Three types of misses
- Cold
- The line is being referenced for the first time
- Capacity
- The line was evicted because the cache was not
large enough - Conflict
- The line was evicted because of another access
whose index conflicted
18Cache Design
- Need to determine parameters
- Block size
- Number of ways
- Eviction policy
- Write policy
- Separate I-cache from D-cache
19Virtual vs. Physical Caches
Memory DRAM
CPU
Cache SRAM
addr
MMU
data
Cache works on physical addresses
Memory DRAM
CPU
addr
Cache SRAM
MMU
data
Cache works on virtual addresses
- L1 (on-chip) caches are typically virtual
- L2 (off-chip) caches are typically physical
20Cache Conscious Programming
int aNCOLNROW int sum 0 for(i 0 i lt
NROW i) for(j 0 j lt NCOL j) sum
aji
21Cache Conscious Programming
1 11
2 12
3 13
4 14
5 15
6
7
8
9
10
int aNCOLNROW int sum 0 for(j 0 j lt
NCOL j) for(i 0 i lt NROW i) sum
aji
- Every access is a cache miss!
22Cache Conscious Programming
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15
int aNCOLNROW int sum 0 for(i 0 i lt
NROW i) for(j 0 j lt NCOL j) sum
aji
- Same program, trivial transformation, 3 out of
four accesses hit in the cache