Title: CS1104: Computer Organisation http:www'comp'nus'edu'sgcs1104
1CS1104 Computer Organisation http//www.comp.nus.
edu.sg/cs1104
- School of Computing
- National University of Singapore
2PII Lecture 9 Cache
- Direct Mapped Cache
- Addressing Cache Tag, Index, Offset Fields
- Accessing Data in Direct Mapped Cache
- Block Size Trade-off
- Type of Cache Misses
- Fully Associative Cache
- Multi-Level Cache Hierarchy
3PII Lecture 9 Cache
- Reading
- Section 5.5.1 of Chapter 8 of textbook, which is
Chapter 5 in Computer Organization by Hamacher,
Vranesic and Zaky.
4Recap Current Memory Hierarchy
Speed(ns) 0.5ns 2ns 6ns 100ns 10,000,000ns Size
(MB) 0.0005 0.05 1-4 100-1000 100,000 Cost
(/MB) -- 100 30 1 0.05 Technology Regs SR
AM SRAM DRAM Disk
5Another View of Memory Hierarchy
6Cache 1st Level of Memory Hierarchy
- How do you know if something is in the cache?
- How to find it if it is in the cache?
- In a direct mapped cache, each memory address is
associated with one possible block (also called
line) within the cache. - Therefore, we only need to look in a single
location in the cache for the data if it exists
in the cache.
7Simplest Cache Direct Mapped
Cache index
4-byte Direct Mapped Cache
Address
Memory
0 1 2 3 4 5 6 7 8 9 A BC D E F
0 1 2 3
- Cache location 0 can be occupied by data from
- Memory locations 0, 4, 8, ...
- In general, any memory location whose 2 rightmost
bits of the address are 0s will go into cache
location 0. - Cache index last 2 bits of address (i.e.
address AND 00.011)
8Tag, Index, Offset Fields
- Which memory block is in the cache? What if
block size is gt 1 byte? - Divide memory address into 3 portions tag,
index, and byte offset within block. - The index tells where in the cache to look, the
offset tells which byte in block is start of the
desired data, and the tag tells if the data in
the cache corresponds to the memory address being
looking for.
9Tag, Index, Offset Fields (2)
- Assume
- 32-bit Memory Address
- Cache size 2N bytes
- Block (line) size 2M bytes
- Then
- The leftmost (32 N) bits are the Cache Tag.
- The rightmost M bits are the Byte Offset.
- Remaining bits are the Cache Index.
10Tag, Index, Offset Fields (3)
- Example A 16KB direct-mapped cache with blocks
of 4 words each. Determine the size of the tag,
index and offset field, assuming a 32-bit
architecture. - Offset
- To identify correct byte within a block.
- A block contains 4 words. Each word contains 4
bytes (because 32-bit architecture). - Therefore a block contains 16 bytes 24 bytes.
- Hence we need 4 bits for offset field.
11Tag, Index, Offset Fields (4)
- Index
- To identify correct block/line in the cache.
- Cache contains 16KB 214 bytes.
- A block contains 16 bytes 24 bytes.
- Therefore cache contains 214/24 blocks 210
blocks. - Hence we need 10 bits for index field.
12Tag, Index, Offset Fields (5)
- Tag
- To identify one of the blocks from main memory
that is mapped into each block in the cache. - Tag size address size offset size index
size 32 4 10 bits 18 bits. - Verify Main memory contains 232/24 228 blocks,
cache contains 210 blocks. Therefore, there are
228/210 or 218 blocks in the memory that can be
mapped to the same block in the cache. - Hence we need 18 bits for tag field.
13Direct Mapped Cache
A 64-KB cache using 4-word (16-byte) blocks.
Address (showing bit position)
1 word 4 bytes
14Direct Mapped Cache Accessing Data
- Lets go through accessing some data in a direct
mapped, 16KB cache - 16-byte blocks x 1024 cache blocks.
- Examples 4 Addresses divided (for convenience)
into Tag, Index, Byte Offset fields.
1516 KB Direct Mapped Cache, 16B Blocks
- Valid bit ? to check if block is valid.
16Address 000000000000000000 0000000001 0100
Example 1
So we read block 1 (0000000001)
17Address 000000000000000000 0000000001 0100
Example 1
18Address 000000000000000000 0000000001 0100
Example 1
19Address 000000000000000000 0000000001 1100
Example 2
20Address 000000000000000000 0000000011 0100
Example 3
21Address 000000000000000010 0000000001 1000
Example 4
22Block Size Trade-off
- In general, larger block size takes advantage of
spatial locality, but - Larger block size also means larger miss penalty
(takes longer time to fill block) - If block size is too big relative to cache size,
miss rate will go up (too few cache block) - In general, minimize average access time
- (Hit time x Hit rate) (Miss penalty x Miss rate)
23Extreme Case Single Big Block!
- Cache size 4 bytes Block size 4 bytes
- Only one entry in the cache!
- If item accessed, likely accessed again soon
- But unlikely will be accessed again immediately!
- The next access will likely to be a miss again
- Continually loading data into the cache but
discard data (forced out) before it is used
again. - Nightmare for cache designer Ping Pong Effect.
24Block Size Trade-off (2)
25Type of Cache Misses
- Compulsory Misses
- occur when a program is first started
- cache does not contain any of that programs data
yet, so misses are bound to occur - cannot be avoided easily, so wont focus on these
in this course
26Type of Cache Misses (2)
- Conflict Misses
- miss that occurs because two distinct memory
addresses map to the same cache location - two blocks (which happen to map to the same
location) can keep overwriting each other - big problem in direct-mapped caches
- how do we lessen the effect of these?
27Dealing with Conflict Misses
- Solution 1 Make the cache size bigger
- fails at some point
- Solution 2 Multiple distinct blocks can fit in
the same Cache Index?
28Fully Associative Cache
- Memory address fields
- Tag same as before
- Offset same as before
- Index non-existent
- What does this mean?
- no rows any block can go anywhere in the cache
- must compare with all tags in entire cache to see
if data is there
29Fully Associative Cache (2)
- Fully Associative Cache (e.g., 32 B block)
- Compare tags in parallel
No Conflict Miss (since data can go anywhere)
30Third Type of Cache Miss
- Capacity Misses
- miss that occurs because the cache has a limited
size - miss that would not occur if we increase the size
of the cache - This is the primary type of miss for Fully
Associate caches.
31Fully Associative Cache (3)
- Drawbacks of Fully Associative Cache
- need hardware comparator for every single entry
if we have a 64KB of data in cache with 4B
entries, we need 16K comparators infeasible - Set-Associative Cache combine the features of
direct-mapped cache and fully associative cache.
32Cache Replacement Algorithms
- In a fully associative cache, when the cache is
full and a new block is to be loaded into the
cache, which block should it replace? An
algorithm is needed. - LRU (Least Recently Used) algorithm replace the
block that was accessed least recently. - LFU (Least Frequently Used) algorithm replace
the block that is accessed least frequently.
33Cache Replacement Algorithms (2)
- Replace-Oldest-Block algorithm replace the block
that has been in the cache longest. - Random algorithm replace any block in random.
34Improving Caches
- In general, minimize average access time
- (Hit time x Hit rate) (Miss penalty x Miss
rate) - So far, we have look at improving Hit Rate
- Larger block size
- Larger cache
- Higher associativity
- What about Miss Penalty?
35Improving Miss Penalty
- When caches started becoming popular, Miss
Penalty was about 10 processor clock cycles. - Today 500 MHz Processor (2 nanoseconds per clock
cycle) and 200 ns to go to DRAM ? 100 processor
clock cycles! - Solution Place another cache between memory and
the processor cache Second Level (L2) Cache.
36Multi-Level Cache Hierarchy
- We consider the L2 hit and miss times to include
the cost of not finding the data in the L1 cache. - Similarly, the L2 cache hit rate is only for
accesses which actually make it to the L2 cache.
37Multi-Level Cache Hierarchy Calculations for L1
Cache
- Access time L1 hit time x L1 hit rate L1 miss
penalty x L1 miss rate - We simply calculate the L1 miss penalty as being
the access time for the L2 cache. - Access time L1 hit time x L1 hit rate (L2 hit
time x L2 hit rate L2 miss penalty x L2 miss
rate) x L1 miss rate.
38Multi-Level Cache Hierarchy Calculations for L1
Cache
- Assumptions
- L1 hit time 1 cycle, L1 hit rate 90
- L2 hit time (also L1 miss penalty) 4 cycles, L2
miss penalty 100 cycles, L2 hit rate 90 - Access time L1 hit time x L1 hit rate (L2 hit
time x L2 hit rate L2 miss penalty x (1 - L2
hit rate) ) x L1 miss rate - 1 x 0.9 (4 x 0.9 100 x 0.1) x(1-0.9)
0.9 (13.6) x 0.1 2.26 clock cycles
39What Would It Be Without L2 Cache?
- Assume that the L1 miss penalty would be 100
clock cycles - 1 x 0.9 (100) x 0.1
- 10.9 clock cycles vs. 2.26 with L2
- So gain a benefit from having the second, larger
cache before main memory. - Todays L1 cache size 16 KB-64 KB, L2 cache may
be 512 KB to 4096 KB.
40Conclusion
- Tag, index, offset to find matching data, support
larger blocks, reduce misses - Where in cache? Direct Mapped Cache
- Conflict Misses if memory addresses compete
- Fully Associative to let memory data be in any
block no Conflict Misses. - Set Associative Compromise, simpler hardware
than Fully Associative, fewer misses than Direct
Mapped. - LRU Use history to predict replacement.
- Improving miss penalty? Add L2 cache.
41Virtual Memory
- If Principle of Locality allows caches to offer
(usually) speed of cache memory with size of DRAM
memory,then why not use at next level to give
speed of DRAM memory with size of Disk memory? - Called Virtual Memory
- Also allows OS to share memory, protect programs
from each other. - Today, more important for protection vs. just
another level of memory hierarchy. - Historically, it predates caches.
42End of file