Lecture 13: Caches - PowerPoint PPT Presentation

1 / 79
About This Presentation
Title:

Lecture 13: Caches

Description:

storage cell. Mackenzie Spring'03 7. Itanium 2 (McKinley) Die Photo. microprocessor. report ... Pick Your Storage Cells. DRAM: 'dynamic': must be refreshed ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 80
Provided by: Rand235
Category:
Tags: caches | lecture

less

Transcript and Presenter's Notes

Title: Lecture 13: Caches


1
Lecture 13Caches
  • Prof. Kenneth M. Mackenzie
  • Computer Systems and Networks
  • CS2200, Spring 2003

Includes slides from Bill Leahy
2
Review
  • Page tables
  • data structures!
  • Virtual memory
  • what happens when pages gt frames
  • policy questions
  • fetch policy
  • replacement policy
  • Performance of virtual memory
  • phenomenon of locality, working sets
  • average-memory-access-time (AMAT)

3
Page Fault
Disk
Physical Memory
Operating System
CPU
42
356
356
page table
i
4
Today Cachesand the full memory hierarchy
5
Problem1. want a big memory2. big memory is slow
Processor
Memory
6
Memory Background
row decoder
wordline
storage cell
bitline
address in
sense amplifiers
column mux
data in/out
7
Itanium 2 (McKinley) Die Photo
microprocessor report
8
Pick Your Storage Cells
  • DRAM
  • dynamic must be refreshed
  • densest technology. Cost/bit is paramount
  • SRAM
  • static value is stored in a latch
  • fastest technology 8-16x faster than DRAM
  • larger cell 4-8x larger
  • more expensive 8-16x more per bit
  • others
  • EEPROM/Flash high density, non-volatile
  • core...

9
Main Memory Deep Background
  • Out-of-Core, In-Core, Core Dump?
  • Core store bit as magnetic state (ca. 1955-75)
  • Non-volatile also radiation resistant
  • Replaced by 4 Kbit DRAM (current is 256Mbit)
  • Access time 750 ns, cycle time 1500-3000 ns

10
Pre-core Memory Technology Mercury Delay Lines!
shift register via accoustic wave in a tube of
mercury
Maurice Wilkes, Computing Perspectives
11
Problem1. want a big memory2. big memory is slow
Processor
Memory
12
How big is the problem?
Processor-DRAM Memory Gap (latency)
µProc 60/yr. (2X/1.5yr)
1000
CPU
Moores Law
100
Performance
10
DRAM 9/yr. (2X/10yrs)
DRAM
1
1980
1981
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
1982
Time
13
Ideally one would desire an indefinitely large
capacity memory such that any particular...word
would be immediately available.... We
are...forced to recognize the possibility of
constructing a hierarchy of memories, each of
which has greater capacity than the preceding but
which is less quickly accessible.
  • A. W. Burks, H. H. Goldstine, and J. von Neumann
  • Preliminary Discussions of the Logical Design of
    an Electronic Computing Instrument, 1946

14
SolutionSmall memory unit closer to processor
Processor
small, fast memory
BIG SLOW MEMORY
15
Terminology
Processor
upper level (the cache)
small, fast memory
Memory
lower level (sometimes called backing store)
BIG SLOW MEMORY
16
Terminology
Processor
hit rate fraction of accesses resulting in
hits.
A hit block found in upper lever
small, fast memory
Memory
BIG SLOW MEMORY
17
Terminology
Processor
hit rate fraction of accesses resulting in
hits.
A miss not found in upper level, must look in
lower level
small, fast memory
Memory
miss rate (1 - hit_rate)
BIG SLOW MEMORY
18
Terminology Summary
  • Hit data appears in some block in the upper
    level (example Block X in cache)
  • Hit Rate the fraction of memory access found in
    the upper level
  • Hit Time Time to access the upper level which
    consists of
  • RAM access time Time to determine hit/miss
  • Miss data needs to be retrieved from a block in
    the lower level (example Block Y in memory)
  • Miss Rate 1 - (Hit Rate)
  • Miss Penalty Extra time to replace a block in
    the upper level
  • Time to deliver the block the processor
  • Hit Time ltlt Miss Penalty (500 instructions on
    21264)

19
Average Memory Access Time
AMAT HitTime (1 - h) x MissPenalty
  • Hit time basic time of every access.
  • Hit rate (h) fraction of access that hit
  • Miss penalty extra time to fetch a block from
    lower level, including time to replace in CPU

20
The Full Memory Hierarchyalways reuse a good
idea
Upper Level
Capacity Access Time Cost
Staging Xfer Unit
faster
CPU Registers 100s Bytes lt10s ns
Registers
prog./compiler 1-8 bytes
Instr. Operands
Cache K Bytes 10-100 ns 1-0.1 cents/bit
Cache
cache cntl 8-128 bytes
Blocks
Main Memory M Bytes 200ns- 500ns .0001-.00001
cents /bit
Memory
OS 4K-16K bytes
Pages
Disk G Bytes, 10 ms (10,000,000 ns) 10 - 10
cents/bit
Disk
-5
-6
user/operator Mbytes
Files
Larger
Tape infinite sec-min 10
Tape
Lower Level
-8
21
Virtual Memory
  • Virtual memory is a kind of cache DRAM is used
    as a cache for disk.
  • Why does it work?
  • How did it work?

22
Virtual Memory
  • Virtual memory is a kind of cache DRAM is used
    as a cache for disk.
  • Why does it work?
  • locality! phenomena of locality means that you
    tend to reuse the same locations
  • How did it work?
  • 1. find block in upper level (DRAM) via page
    table (a map)
  • 2. replace least-recently-used (LRU) page on a
    miss

23
Virtual Memory
  • Timing was tough with virtual memory
  • AMAT Tmem (1-h) Tdisk
  • 100nS (1-h) 25,000,000nS
  • h (hit rate) had to be incredibly (almost
    unattainably) close to perfect to work
  • so VM is a cache but an odd one.

24
Hardware CacheTiming is much more feasible
Processor
cache
1nS
AMAT Thit (1-h) Tmem 1nS
(1-h) 100nS hit rate of 98 would yield an
AMAT of 3nS ... pretty good!
BIG SLOW MEMORY
100nS
25
Hardware CacheHow do you find things in the
upper level?
Processor
cache
1nS
dont have much time!
BIG SLOW MEMORY
100nS
26
One way
  • Have a scheme that allows the contents of an main
    memory address to be found in exactly one place
    in the cache.
  • Remember the cache is smaller than the level
    below it, thus multiple locations could map to
    the same place
  • Severe restriction! But lets see what we can do
    with it...

27
One way
Example Looking for Location 10011 (19) Look in
011 (3) 3 19 MOD 8
28
One way
If there are four possible locations in
memory which map into the same location in
our cache...
29
One way
TAG
000 001 010 011 100 101 110 111
We can add tags which tell us if we have a match.
00 00 00 10 00 00 00 00
30
One way
TAG
000 001 010 011 100 101 110 111
But there is still a problem! What if we havent
put anything into the cache? The 00 (for
example) will confuse us.
00 00 00 00 00 00 00 00
31
One way
V
000 001 010 011 100 101 110 111
Solution Add valid bit
0 0 0 0 0 0 0 0
32
One way
V
000 001 010 011 100 101 110 111
Now if the valid bit is set our match is good
0 0 0 1 0 0 0 0
33
Basic Algorithm
  • Assume we want contents of location M
  • Calculate CacheAddr M CacheSize
  • Calculate TargetTag M / CacheSize
  • if (ValidCacheAddr SET
  • TagCacheAddr TargetTag)
  • return DataCacheAddr
  • else
  • Fetch contents of location M from backup memory
  • Put in DataCacheAddr
  • Update TagCacheAddr and ValidCacheAddr

hit
miss
34
Questions?
35
Example
  • Cache is initially empty
  • We get following sequence of memory references
  • 10110
  • 11010
  • 10110
  • 11010
  • 10000
  • 00011
  • 10000
  • 10010

36
Example
TAG
V
000 001 010 011 100 101 110 111
Initial Condition
00 00 00 00 00 00 00 00
0 0 0 0 0 0 0 0
00000 00001 00010 00011 00100 00101 00110 00111
01000 01001 01010 01011 01100 01101 01110 01111
10000 10001 10010 10011 10100 10101 10110 10111
11000 11001 11010 11011 11100 11101 11110 11111
37
Example
TAG
V
000 001 010 011 100 101 110 111
10110 Result?
00 00 00 00 00 00 00 00
0 0 0 0 0 0 0 0
00000 00001 00010 00011 00100 00101 00110 00111
01000 01001 01010 01011 01100 01101 01110 01111
10000 10001 10010 10011 10100 10101 10110 10111
11000 11001 11010 11011 11100 11101 11110 11111
38
Example
TAG
V
000 001 010 011 100 101 110 111
10110 Miss
00 00 00 00 00 00 10 00
0 0 0 0 0 0 1 0
00000 00001 00010 00011 00100 00101 00110 00111
01000 01001 01010 01011 01100 01101 01110 01111
10000 10001 10010 10011 10100 10101 10110 10111
11000 11001 11010 11011 11100 11101 11110 11111
39
Example
TAG
V
000 001 010 011 100 101 110 111
11010 Result?
00 00 00 00 00 00 10 00
0 0 0 0 0 0 1 0
00000 00001 00010 00011 00100 00101 00110 00111
01000 01001 01010 01011 01100 01101 01110 01111
10000 10001 10010 10011 10100 10101 10110 10111
11000 11001 11010 11011 11100 11101 11110 11111
40
Example
TAG
V
000 001 010 011 100 101 110 111
11010 Miss
00 00 11 00 00 00 10 00
0 0 1 0 0 0 1 0
00000 00001 00010 00011 00100 00101 00110 00111
01000 01001 01010 01011 01100 01101 01110 01111
10000 10001 10010 10011 10100 10101 10110 10111
11000 11001 11010 11011 11100 11101 11110 11111
41
Example
TAG
V
000 001 010 011 100 101 110 111
10110 Result?
00 00 11 00 00 00 10 00
0 0 1 0 0 0 1 0
00000 00001 00010 00011 00100 00101 00110 00111
01000 01001 01010 01011 01100 01101 01110 01111
10000 10001 10010 10011 10100 10101 10110 10111
11000 11001 11010 11011 11100 11101 11110 11111
42
Example
TAG
V
000 001 010 011 100 101 110 111
10110 Hit
00 00 11 00 00 00 10 00
0 0 1 0 0 0 1 0
00000 00001 00010 00011 00100 00101 00110 00111
01000 01001 01010 01011 01100 01101 01110 01111
10000 10001 10010 10011 10100 10101 10110 10111
11000 11001 11010 11011 11100 11101 11110 11111
43
Example
TAG
V
000 001 010 011 100 101 110 111
11010 Result?
00 00 11 00 00 00 10 00
0 0 1 0 0 0 1 0
00000 00001 00010 00011 00100 00101 00110 00111
01000 01001 01010 01011 01100 01101 01110 01111
10000 10001 10010 10011 10100 10101 10110 10111
11000 11001 11010 11011 11100 11101 11110 11111
44
Example
TAG
V
000 001 010 011 100 101 110 111
11010 Hit
00 00 11 00 00 00 10 00
0 0 1 0 0 0 1 0
00000 00001 00010 00011 00100 00101 00110 00111
01000 01001 01010 01011 01100 01101 01110 01111
10000 10001 10010 10011 10100 10101 10110 10111
11000 11001 11010 11011 11100 11101 11110 11111
45
Example
TAG
V
000 001 010 011 100 101 110 111
10000 Result?
00 00 11 00 00 00 10 00
0 0 1 0 0 0 1 0
00000 00001 00010 00011 00100 00101 00110 00111
01000 01001 01010 01011 01100 01101 01110 01111
10000 10001 10010 10011 10100 10101 10110 10111
11000 11001 11010 11011 11100 11101 11110 11111
46
Example
TAG
V
000 001 010 011 100 101 110 111
10000 Miss
10 00 11 00 00 00 10 00
1 0 1 0 0 0 1 0
00000 00001 00010 00011 00100 00101 00110 00111
01000 01001 01010 01011 01100 01101 01110 01111
10000 10001 10010 10011 10100 10101 10110 10111
11000 11001 11010 11011 11100 11101 11110 11111
47
Example
TAG
V
000 001 010 011 100 101 110 111
00011 Result?
10 00 11 00 00 00 10 00
1 0 1 0 0 0 1 0
00000 00001 00010 00011 00100 00101 00110 00111
01000 01001 01010 01011 01100 01101 01110 01111
10000 10001 10010 10011 10100 10101 10110 10111
11000 11001 11010 11011 11100 11101 11110 11111
48
Example
TAG
V
000 001 010 011 100 101 110 111
00011 Miss
10 00 11 00 00 00 10 00
1 0 1 1 0 0 1 0
00000 00001 00010 00011 00100 00101 00110 00111
01000 01001 01010 01011 01100 01101 01110 01111
10000 10001 10010 10011 10100 10101 10110 10111
11000 11001 11010 11011 11100 11101 11110 11111
49
Example
TAG
V
000 001 010 011 100 101 110 111
10000 Result?
10 00 11 00 00 00 10 00
1 0 1 1 0 0 1 0
00000 00001 00010 00011 00100 00101 00110 00111
01000 01001 01010 01011 01100 01101 01110 01111
10000 10001 10010 10011 10100 10101 10110 10111
11000 11001 11010 11011 11100 11101 11110 11111
50
Example
TAG
V
000 001 010 011 100 101 110 111
10000 Hit
10 00 11 00 00 00 10 00
1 0 1 1 0 0 1 0
00000 00001 00010 00011 00100 00101 00110 00111
01000 01001 01010 01011 01100 01101 01110 01111
10000 10001 10010 10011 10100 10101 10110 10111
11000 11001 11010 11011 11100 11101 11110 11111
51
Example
TAG
V
000 001 010 011 100 101 110 111
10010 Result?
10 00 11 00 00 00 10 00
1 0 1 1 0 0 1 0
00000 00001 00010 00011 00100 00101 00110 00111
01000 01001 01010 01011 01100 01101 01110 01111
10000 10001 10010 10011 10100 10101 10110 10111
11000 11001 11010 11011 11100 11101 11110 11111
52
Example
TAG
V
000 001 010 011 100 101 110 111
10010 Miss
10 00 10 00 00 00 10 00
1 0 1 1 0 0 1 0
00000 00001 00010 00011 00100 00101 00110 00111
01000 01001 01010 01011 01100 01101 01110 01111
10000 10001 10010 10011 10100 10101 10110 10111
11000 11001 11010 11011 11100 11101 11110 11111
53
Hardware Cache Variations
  • 1. Block Size
  • 2. Associativity
  • 3. Write policy
  • 4. Multiple caches?

54
1. Block Size
  • Wouldnt make much sense to have a different
    entry for every byte!
  • Block number of bytes sharing the same tag.

55
1 KB Direct Mapped Cache, 32B blocks
  • For a 2 N byte cache
  • The uppermost (32 - N) bits are always the Cache
    Tag
  • The lowest M bits are the Byte Select (Block Size
    2 M)

0
4
31
9
Cache Index
Cache Tag
Example 0x50
Byte Select
Ex 0x01
Ex 0x00
Stored as part of the cache state
Cache Data
Valid Bit
Cache Tag

0
Byte 0
Byte 1
Byte 31

1
0x50
Byte 32
Byte 33
Byte 63
2
3




31
Byte 992
Byte 1023
56
Block Size
  • How big should the block size be?
  • (i.e. what happens as you change the block size?)

57
Misses via Block Size
25
1K
20
4K
15
Miss
16K
Rate
10
64K
5
256K
0
Why does it get worse later?
Why does it get better at first?
16
32
64
128
256
Block Size (bytes)
58
2. Associativity
  • Requiring that every memory location be cachable
    in exactly one place (direct-mapped) was simple
    but incredibly limiting
  • How can we relax this constraint?

59
Associativity
  • Block 12 placed in an 8 block cache
  • Fully associative, direct mapped, 2-way set
    associative
  • S.A. Mapping Block Number Modulo Number Sets

Direct Mapped (12 mod 8) 4
2-Way Assoc (12 mod 4) 0
Full Mapped
Cache
Memory
60
Two-way Set Associative Cache
  • N-way set associative N entries for each Cache
    Index
  • N direct mapped caches operates in parallel (N
    typically 2 to 4)
  • Example Two-way set associative cache
  • Cache Index selects a set from the cache
  • The two tags in the set are compared in parallel

Cache Index
Cache Data
Cache Tag
Valid
Cache Block 0



Adr Tag
Compare
0
1
Mux
Sel1
Sel0
OR
Cache Block
Advantage typically exhibits a hit rate equal to
a 2X-sized direct-mapped cache
Hit
61
Disadvantage of Set Associative Cache
  • N-way Set Associative Cache v. Direct Mapped
    Cache
  • N comparators vs. 1
  • Extra MUX delay for the data
  • Data comes AFTER Hit/Miss

62
Associativity
  • If you have associativity gt 1 you have to have a
    replacement policy (like VM!)
  • FIFO
  • LRU
  • random
  • Full or Full-map associativity means you
    check every tag in parallel and a memory block
    can go into any cache block
  • virtual memory is effectively fully associative

63
3. Write Policy
  • Write throughThe information is written to both
    the block in the cache and to the block in the
    lower-level memory.
  • Write backThe information is written only to the
    block in the cache. The modified cache block is
    written to main memory only when it is replaced.
  • need to remember whether block is clean or dirty
    (dirty bit with each tag in addition to the
    valid bit)
  • Pros of each
  • WB no repeated writes to same location
  • WT read misses cannot result in writes

64
4. Multiple Caches
  • Caches for different purposes
  • instructions vs. data
  • Multiple levels of caches
  • L1, L2, etc.

65
Separate Instruction and Data
Not two memories, just two caches!
M X
1
P C
Instr Cache
DPRF
BEQ
A
Data Cache
M X
M X
D
SE
WB
EX
MEM
ID
IF
66
Multilevel cachesRecall 1-level cache numbers
Processor
cache
1nS
AMAT Thit (1-h) Tmem 1nS
(1-h) 100nS hit rate of 98 would yield an
AMAT of 3nS ... pretty good!
BIG SLOW MEMORY
100nS
67
Multilevel CacheAdd a medium-size, medium-speed
L2
Processor
AMAT Thit_L1 (1-h_L1)
Thit_L2 ((1-h_L1)
(1-h_L2) Tmem) hit
rate of 98in L1 and 95 in L2 would yield an
AMAT of 1 0.2 0.1 1.3nS -- outstanding!
L1 cache
1nS
L2 cache
10nS
BIG SLOW MEMORY
100nS
68
Cache Mechanics Summary
  • Basic action
  • look up block
  • check tag
  • select byte from block
  • Block size
  • Associativity
  • Write Policy

69
Great Cache Questions
  • How do you use the processors address bits to
    look up a value in a cache?
  • How many bits of storage are required in a cache
    with a given organization

70
Great Cache Questions
  • How do you use the processors address bits to
    look up a value in a cache?
  • How many bits of storage are required in a cache
    with a given organization
  • E.g 64KB, direct, 16B blocks, write-back
  • 16KB 8 bits for data
  • 4K (16 1 1) for tag, valid and dirty bits

tag
index
offset
71
More Great Cache Questions
  • Suppose you have a loop like this
  • Whats the hit rate in a 64KB/direct/16B-block
    cache?

char a10241024 for (i 0 i lt 1024 i)
for (j 0 j lt 1024 j) aij
72
More Great Cache Questions
  • Suppose instead the loop is like this
  • Whats the hit rate in a 64KB/direct/16B-block
    cache?

char a10241024 for (i 0 i lt 1024 i)
for (j 0 j lt 1024 j) aji
73
Bonus Slides
74
Intel P6 Core
  • Core of PPro, PII, PIII
  • Split I D L1 (8K8K)
  • Split I D TLBs (3264)
  • L2 originally on MCM
  • PIII has 2x cache L2
  • Xeon has big L2
  • Aside note 5 FUs and
  • 3-wide fetch unit,
  • 40-entry ROB

75
Itanium 2 (McKinley) Die Photo
L2 cache 256KB
L3 cache 3MB
microprocessor report
76
Program Miss Rate Characteristics
  • What can you tell about a program from its miss
    rates?
  • Run your favorite cache simulator and see...
  • Three sample programs
  • SOR jacobi relaxation -- average points on
    2D grid
  • IJPEG integer version of JPEG compression
  • CC1 guts of gcc compiler
  • Warning these benchmarks and simulator are
    slightly different from those used in Project 1

77
Cache Misses in SORpercent misses vs. cache
sizesplit ID, direct-mapped, 16B blocks
key data structure 100x100 grid of doubles (80K
bytes)
78
Cache Misses in IJPEGpercent misses vs. cache
sizesplit ID, direct-mapped, 16B blocks
data input 938x636 array of 24-bit pixels
1.8Mbytes
79
Cache Misses in CC1percent misses vs. cache
sizesplit ID, direct-mapped, 16B blocks
Write a Comment
User Comments (0)
About PowerShow.com