CDA 5155 - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

CDA 5155

Description:

CDA 5155 Associativity in Caches Lecture 25 – PowerPoint PPT presentation

Number of Views:93

Avg rating:3.0/5.0

Slides: 21

Provided by: ToddA156

Learn more at: http://www.cs.fsu.edu

Category:

Tags: cda | eecs

more less

Transcript and Presenter's Notes

Title: CDA 5155

1
CDA 5155

Associativity in Caches
Lecture 25

2
New Topic Memory Systems

Cache 101 review of undergraduate material
Associativity and other organization issues
Advanced designs and interactions with pipelines
Tomorrows cache design (power/performance)
Advances in memory design
Virtual memory (and how to do it fast)

3
Direct-mapped cache
Memory
Cache
Address 01011
00000 00010 00100 00110 01000 01010 01100 01110 10
000 10010 10100 10110 11000 11010 11100 11110
V d tag data
78
23
29
218
0
120
10
0
123
44
0
71
16
0
150
141
162
28
173
214
Block Offset (1-bit)
18
33
21
98
Line Index (2-bit)
33
181
28
129
Tag (2-bit)
19
119
200
42
210
66
Compulsory Miss First reference to memory
block Capacity Miss Working set doesnt fit in
cache Conflict Miss Working set maps to same
cache line
225
74
4
2-way set associative cache
Memory
Cache
Address 01101
00000 00010 00100 00110 01000 01010 01100 01110 10
000 10010 10100 10110 11000 11010 11100 11110
V d tag data
78
23
29
218
0
120
10
0
123
44
0
71
16
0
150
141
162
28
173
214
Block Offset (unchanged)
18
33
21
98
1-bit Set Index
33
181
28
129
Larger (3-bit) Tag
19
119
200
42
210
66
Rule of thumb Increasing associativity decreases
conflict misses. A 2-way associative cache has
about the same hit rate as a direct mapped cache
twice the size.
225
74
5
Effects of Varying Cache Parameters

Total cache size block size ? sets ?
associativity
Positives
Should decrease miss rate
Negatives
May increase hit time
Increased area requirements

6
Effects of Varying Cache Parameters

Bigger block size
Positives
Exploit spatial locality reduce compulsory
misses
Reduce tag overhead (bits)
Reduce transfer overhead (address, burst data
mode)
Negatives
Fewer blocks for given size increase conflict
misses
Increase miss transfer time (multi-cycle
transfers)
Wasted bandwidth for non-spatial data

7
Effects of Varying Cache Parameters

Increasing associativity
Positives
Reduces conflict misses
Low-assoc cache can have pathological behavior
(very high miss)
Negatives
Increased hit time
More hardware requirements (comparators, muxes,
bigger tags)
Decreased improvements past 4- or 8- way.

8
Effects of Varying Cache Parameters

Replacement Strategy (for associative caches)
How is the evicted line chosen?
LRU intuitive difficult to implement with high
assoc worst case performance can occur (N1
element array)
Random Pseudo-random easy to implement
performance close to LRU for high associativity
Optimal replace block that has its next
reference farthest in the future Belady
replacement hard to implement ?

9
Other Cache Design Decisions

Write Policy How to deal with write misses?
Write-through / no-allocate
Total traffic? Read misses ? block size writes
Common for L1 caches back by L2 (esp. on-chip)
Write-back / write-allocate
Needs a dirty bit to determine whether cache data
differs
Total traffic? (read misses write misses) ?
block size
dirty-block-evictions ? block size
Common for L2 caches (memory bandwidth limited)
Variation Write validate
Write-allocate without fetch-on-write
Needs sub-block cache with valid bits for each
word/byte

10
Other Cache Design Decisions

Write Buffering
Delay writes until bandwidth available
Put them in FIFO buffer
Only stall on write if buffer is full
Use bandwidth for reads first (since they have
latency problems)
Important for write-through caches since write
traffic is frequent
Write-back buffer
Holds evicted (dirty) lines for Write-back caches
Also allows reads to have priority on the L2 or
memory bus.
Usually only needs a small buffer

11
Adding a Victim cache
V d tag data (Direct mapped)
V d tag data (fully associative)
0
0
0000
0
0
0001
0
0
0010
0
0
0011
Victim cache (4 lines)
0
0100
0
0101
0
0110
Ref 11010011 Ref 01010011
0
0111
0
1000
0
010
1001
0

Small victim cache adds associativity to hot
lines
Blocks evicted from direct-mapped cache go to
victim
Tag compares are made to direct mapped and
victim
Victim hits cause lines to swap from L1 and
victim
Not very useful for associative L1 caches

1010
0
1011
0
1100
0
1101
0
1110
0
1111
12
Hash-Rehash Cache
V d tag data (Direct mapped)
0
11010011 01010011 11010011
0
0
0
0
0
0
0
0
0
110
0
0
0
0
0
0
13
Hash-Rehash Cache
V d tag data (Direct mapped)
0
11010011 01010011 01000011 Allocate? 11010011
0
Miss Rehash miss
0
0
0
0
0
0
0
0
0
0
0
0
0
0
14
Hash-Rehash Cache
V d tag data (Direct mapped)
0
11010011 01010011 01000011 11010011
0
Miss Rehash miss
0
0
0
0
0
0
0
0
0
0
0
0
0
0
15
Hash-Rehash Cache
V d tag data (Direct mapped)
0
11010011 01010011 01000011 11010011 11000011
0
0
0
0
0
0
0
Miss Rehash Hit!
0
0
0
0
0
0
0
0
16
Hash-Rehash Cache

Calculating performance
Primary hit time (normal Direct mapped)
Rehash hit time (sequential tag lookups)
Block swap time?
Hit rate comparable to 2-way associative.

17
Compiler support for caching

Array Merging (array of structs vs. 2 arrays)
Loop interchange (row vs. column access)
Structure padding and alignment (malloc)
Cache conscious data placement
Pack working set into same line
Map to non-conflicting address is packing
impossible

18
Prefetching

Already done bring in an entire line assuming
spatial locality
Extend this Next Line Prefetch
Bring in the next block in memory as well a miss
line (very good for Icache)
Software prefetch
Loads to R0 have no data dependency
Aggressive/speculative prefetch useful for L2
Speculative prefetch problematic for L1

19
Calculating the Effects of Latency

Does a cache miss reduce performance?
It depends on whether there are critical
instructions waiting for the result

20
Calculating the Effects of Latency

It depends on whether critical resources are held
up
Blocking When a miss occurs, all later reference
to the cache must wait. This is a resource
conflict.
Non-blocking Allows later references to access
cache while miss is being processed.
Generally there is some limit to how many
outstanding misses can be bypassed.

Write a Comment

User Comments (0)