CS1104: Computer Organisation http:www'comp'nus'edu'sgcs1104 - PowerPoint PPT Presentation

1 / 42

About This Presentation

Title:

CS1104: Computer Organisation http:www'comp'nus'edu'sgcs1104

Description:

Section 5.5.1 of Chapter 8 of textbook, which is Chapter 5 ... Nightmare for cache designer; Ping Pong Effect. CS1104-P2-9. Cache. 24. Block Size Trade-off (2) ... – PowerPoint PPT presentation

Number of Views:49

Avg rating:3.0/5.0

Slides: 43

Provided by: aaro3

Category:

more less

Transcript and Presenter's Notes

Title: CS1104: Computer Organisation http:www'comp'nus'edu'sgcs1104

1
CS1104 Computer Organisation http//www.comp.nus.
edu.sg/cs1104

School of Computing
National University of Singapore

2
PII Lecture 9 Cache

Direct Mapped Cache
Addressing Cache Tag, Index, Offset Fields
Accessing Data in Direct Mapped Cache
Block Size Trade-off
Type of Cache Misses
Fully Associative Cache
Multi-Level Cache Hierarchy

3
PII Lecture 9 Cache

Reading
Section 5.5.1 of Chapter 8 of textbook, which is
Chapter 5 in Computer Organization by Hamacher,
Vranesic and Zaky.

4
Recap Current Memory Hierarchy
Speed(ns) 0.5ns 2ns 6ns 100ns 10,000,000ns Size
(MB) 0.0005 0.05 1-4 100-1000 100,000 Cost
(/MB) -- 100 30 1 0.05 Technology Regs SR
AM SRAM DRAM Disk
5
Another View of Memory Hierarchy
6
Cache 1st Level of Memory Hierarchy

How do you know if something is in the cache?
How to find it if it is in the cache?
In a direct mapped cache, each memory address is
associated with one possible block (also called
line) within the cache.
Therefore, we only need to look in a single
location in the cache for the data if it exists
in the cache.

7
Simplest Cache Direct Mapped
Cache index
4-byte Direct Mapped Cache
Address
Memory
0 1 2 3 4 5 6 7 8 9 A BC D E F
0 1 2 3

Cache location 0 can be occupied by data from
Memory locations 0, 4, 8, ...
In general, any memory location whose 2 rightmost
bits of the address are 0s will go into cache
location 0.
Cache index last 2 bits of address (i.e.
address AND 00.011)

8
Tag, Index, Offset Fields

Which memory block is in the cache? What if
block size is gt 1 byte?
Divide memory address into 3 portions tag,
index, and byte offset within block.
The index tells where in the cache to look, the
offset tells which byte in block is start of the
desired data, and the tag tells if the data in
the cache corresponds to the memory address being
looking for.

9
Tag, Index, Offset Fields (2)

Assume
32-bit Memory Address
Cache size 2N bytes
Block (line) size 2M bytes
Then
The leftmost (32 N) bits are the Cache Tag.
The rightmost M bits are the Byte Offset.
Remaining bits are the Cache Index.

10
Tag, Index, Offset Fields (3)

Example A 16KB direct-mapped cache with blocks
of 4 words each. Determine the size of the tag,
index and offset field, assuming a 32-bit
architecture.
Offset
To identify correct byte within a block.
A block contains 4 words. Each word contains 4
bytes (because 32-bit architecture).
Therefore a block contains 16 bytes 24 bytes.
Hence we need 4 bits for offset field.

11
Tag, Index, Offset Fields (4)

Index
To identify correct block/line in the cache.
Cache contains 16KB 214 bytes.
A block contains 16 bytes 24 bytes.
Therefore cache contains 214/24 blocks 210
blocks.
Hence we need 10 bits for index field.

12
Tag, Index, Offset Fields (5)

Tag
To identify one of the blocks from main memory
that is mapped into each block in the cache.
Tag size address size offset size index
size 32 4 10 bits 18 bits.
Verify Main memory contains 232/24 228 blocks,
cache contains 210 blocks. Therefore, there are
228/210 or 218 blocks in the memory that can be
mapped to the same block in the cache.
Hence we need 18 bits for tag field.

13
Direct Mapped Cache
A 64-KB cache using 4-word (16-byte) blocks.
Address (showing bit position)
1 word 4 bytes
14
Direct Mapped Cache Accessing Data

Lets go through accessing some data in a direct
mapped, 16KB cache
16-byte blocks x 1024 cache blocks.
Examples 4 Addresses divided (for convenience)
into Tag, Index, Byte Offset fields.

15
16 KB Direct Mapped Cache, 16B Blocks

Valid bit ? to check if block is valid.

16
Address 000000000000000000 0000000001 0100
Example 1
So we read block 1 (0000000001)
17
Address 000000000000000000 0000000001 0100
Example 1
18
Address 000000000000000000 0000000001 0100
Example 1
19
Address 000000000000000000 0000000001 1100
Example 2
20
Address 000000000000000000 0000000011 0100
Example 3
21
Address 000000000000000010 0000000001 1000
Example 4
22
Block Size Trade-off

In general, larger block size takes advantage of
spatial locality, but
Larger block size also means larger miss penalty
(takes longer time to fill block)
If block size is too big relative to cache size,
miss rate will go up (too few cache block)
In general, minimize average access time
(Hit time x Hit rate) (Miss penalty x Miss rate)

23
Extreme Case Single Big Block!

Cache size 4 bytes Block size 4 bytes
Only one entry in the cache!
If item accessed, likely accessed again soon
But unlikely will be accessed again immediately!
The next access will likely to be a miss again
Continually loading data into the cache but
discard data (forced out) before it is used
again.
Nightmare for cache designer Ping Pong Effect.

24
Block Size Trade-off (2)
25
Type of Cache Misses

Compulsory Misses
occur when a program is first started
cache does not contain any of that programs data
yet, so misses are bound to occur
cannot be avoided easily, so wont focus on these
in this course

26
Type of Cache Misses (2)

Conflict Misses
miss that occurs because two distinct memory
addresses map to the same cache location
two blocks (which happen to map to the same
location) can keep overwriting each other
big problem in direct-mapped caches
how do we lessen the effect of these?

27
Dealing with Conflict Misses

Solution 1 Make the cache size bigger
fails at some point
Solution 2 Multiple distinct blocks can fit in
the same Cache Index?

28
Fully Associative Cache

Memory address fields
Tag same as before
Offset same as before
Index non-existent
What does this mean?
no rows any block can go anywhere in the cache
must compare with all tags in entire cache to see
if data is there

29
Fully Associative Cache (2)

Fully Associative Cache (e.g., 32 B block)
Compare tags in parallel

No Conflict Miss (since data can go anywhere)
30
Third Type of Cache Miss

Capacity Misses
miss that occurs because the cache has a limited
size
miss that would not occur if we increase the size
of the cache
This is the primary type of miss for Fully
Associate caches.

31
Fully Associative Cache (3)

Drawbacks of Fully Associative Cache
need hardware comparator for every single entry
if we have a 64KB of data in cache with 4B
entries, we need 16K comparators infeasible
Set-Associative Cache combine the features of
direct-mapped cache and fully associative cache.

32
Cache Replacement Algorithms

In a fully associative cache, when the cache is
full and a new block is to be loaded into the
cache, which block should it replace? An
algorithm is needed.
LRU (Least Recently Used) algorithm replace the
block that was accessed least recently.
LFU (Least Frequently Used) algorithm replace
the block that is accessed least frequently.

33
Cache Replacement Algorithms (2)

Replace-Oldest-Block algorithm replace the block
that has been in the cache longest.
Random algorithm replace any block in random.

34
Improving Caches

In general, minimize average access time
(Hit time x Hit rate) (Miss penalty x Miss
rate)
So far, we have look at improving Hit Rate
Larger block size
Larger cache
Higher associativity
What about Miss Penalty?

35
Improving Miss Penalty

When caches started becoming popular, Miss
Penalty was about 10 processor clock cycles.
Today 500 MHz Processor (2 nanoseconds per clock
cycle) and 200 ns to go to DRAM ? 100 processor
clock cycles!
Solution Place another cache between memory and
the processor cache Second Level (L2) Cache.

36
Multi-Level Cache Hierarchy

We consider the L2 hit and miss times to include
the cost of not finding the data in the L1 cache.
Similarly, the L2 cache hit rate is only for
accesses which actually make it to the L2 cache.

37
Multi-Level Cache Hierarchy Calculations for L1
Cache

Access time L1 hit time x L1 hit rate L1 miss
penalty x L1 miss rate
We simply calculate the L1 miss penalty as being
the access time for the L2 cache.
Access time L1 hit time x L1 hit rate (L2 hit
time x L2 hit rate L2 miss penalty x L2 miss
rate) x L1 miss rate.

38
Multi-Level Cache Hierarchy Calculations for L1
Cache

Assumptions
L1 hit time 1 cycle, L1 hit rate 90
L2 hit time (also L1 miss penalty) 4 cycles, L2
miss penalty 100 cycles, L2 hit rate 90
Access time L1 hit time x L1 hit rate (L2 hit
time x L2 hit rate L2 miss penalty x (1 - L2
hit rate) ) x L1 miss rate
1 x 0.9 (4 x 0.9 100 x 0.1) x(1-0.9)
0.9 (13.6) x 0.1 2.26 clock cycles

39
What Would It Be Without L2 Cache?

Assume that the L1 miss penalty would be 100
clock cycles
1 x 0.9 (100) x 0.1
10.9 clock cycles vs. 2.26 with L2
So gain a benefit from having the second, larger
cache before main memory.
Todays L1 cache size 16 KB-64 KB, L2 cache may
be 512 KB to 4096 KB.

40
Conclusion

Tag, index, offset to find matching data, support
larger blocks, reduce misses
Where in cache? Direct Mapped Cache
Conflict Misses if memory addresses compete
Fully Associative to let memory data be in any
block no Conflict Misses.
Set Associative Compromise, simpler hardware
than Fully Associative, fewer misses than Direct
Mapped.
LRU Use history to predict replacement.
Improving miss penalty? Add L2 cache.

41
Virtual Memory

If Principle of Locality allows caches to offer
(usually) speed of cache memory with size of DRAM
memory,then why not use at next level to give
speed of DRAM memory with size of Disk memory?
Called Virtual Memory
Also allows OS to share memory, protect programs
from each other.
Today, more important for protection vs. just
another level of memory hierarchy.
Historically, it predates caches.

42
End of file

Write a Comment

User Comments (0)