Computer Architecture and Organization - PowerPoint PPT Presentation

1 / 33

About This Presentation

Title:

Computer Architecture and Organization

Description:

describe direct-mapped caches and how data items are found in such a cache ... What if processor is clocked twice as fast = penalty becomes 80 cycles. CPI = 4.752 ... – PowerPoint PPT presentation

Number of Views:24

Avg rating:3.0/5.0

Slides: 34

Provided by: HECO9

Category:

more less

Transcript and Presenter's Notes

Title: Computer Architecture and Organization

1
Computer Architecture and Organization

Ben Juurlink Module 8
Delft University of Technology The Memory
Hierarchy
April-May 2001
Additional information
http//ce.et.tudelft.nl/benj/Courses/CAO

2
Objectives

After this lecture, you should be able to
describe basics of caches
describe temporal and spatial locality
describe direct-mapped caches and how data items
are found in such a cache
describe set-associative caches and how data
items are found in it
given a cache description, compute the number of
sets and how large the tag is
given a sequence of addresses, compute the miss
rate
use several cache replacement strategies
describe virtual memory
translate virtual addresses to physical addresses
give a sequence of page references, compute the
number of page faults

3
Memory Hierarchy, why?

Users want large and fast memories!
Type Access times Cost/MB (in 1997)SRAM 2 -
25ns 100 - 250DRAM 60-120ns 5 - 10Disk
10 to 20 million ns .10 to .20
Try and give it to them anyway
build a memory hierarchy

4
Locality

A principle that makes having a memory hierarchy
a good idea
If an item is referenced,temporal locality it
will tend to be referenced again soon
spatial locality nearby items will tend to
be referenced soon.
Why does code have locality?
Our initial focus two levels (upper, lower)
block minimum unit of data
hit data requested is in the upper level
miss data requested is not in the upper level

5
Cache

Two issues
How do we know if a data item is in the cache?
If it is, how do we find it?
Our first example
block size is one word of data
"direct mapped"

For each item of data at the lower level, there
is exactly one location in the cache where it
might be. e.g., lots of items at the lower level
share locations in the upper level
6
Direct Mapped Cache

Mapping
block address (byte address) div (block size in
bytes)
cache address (block address) mod (cache size
in blocks)

7
Direct Mapped Cache Organization
Address (bit positions)
3
1

3
0

1
3

1
2

1
1

2

1

0

For MIPS
What kind of locality are we taking advantage of?

B
y
t
e
o
f
f
s
e
t
2
0
1
0
H
i
t
D
a
t
a
T
a
g
I
n
d
e
x
V
a
l
i
d
T
a
g
D
a
t
a
I
n
d
e
x
0
1
2
1
0
2
1
1
0
2
2
1
0
2
3
2
0
3
2
8
Hits vs. Misses

Read hits
this is what we want!
Read misses
stall the CPU, fetch block from memory, deliver
to cache, restart the load instruction
Write hits
can replace data in cache and memory
(write-through)
write the data only into the cache (write-back
the cache later)
Write misses
read the entire block into the cache, then write
the word (allocate on write miss)
do not read the cache line just write to memory
(no allocate on write miss)

9
Direct Mapped Cache

Taking advantage of spatial locality

Address (bit positions)
10
Hardware Issues

Make reading multiple words faster by using
multiple banks of memory

11
Performance

Increasing the block size tends to decrease miss
rate but increases miss penalty

12
Split caches

Split cache separate caches for instructions
(code) and data
Useful because there is more spatial locality in
code

13
Impact of Cache Performance on Execution Time

Texec Ninst CPI Tcycle
where
CPI CPIideal CPIstall
CPIstall reads missrateread
misspenaltyread
writes missratewrite misspenaltywrite
or
Texec (Nnormal-cycles Nstall-cycles )
Tcycle
where
Nstall-cycles Nreads missrateread
misspenaltyread
Nwrites missratewrite
misspenaltywrite
( Write-buffer stalls )

14
Impact of Cache Performance on Execution Time

Simplified model

Texec (Nnormal-cycles Nstall-cycles )
Tcycle where Nstall-cycles Naccess
miss-rate miss-penalty
15
Performance example

Assume GCC application (page 311)
I-cache miss rate 2
D-cache miss rate 4
CPIideal 2.0
Miss penalty 40 cycles
Calculate CPI
CPI 2.0 CPIstall
Nstall-cycles (instruction miss cycles) (data
miss cycles)
Instruction miss cycles Ninstr x 0.02 x 40
0.80 x Ninstr
loads and stores 36
Data miss cycles Ninstr x ld-st x 0.04 x 40
0.576 x Ninstr
CPI 3.376 x Ninstr
Slowdown 1.688 !!

16
Performance example (continued)

What if ideal processor had CPI 1.0 (instead of
2.0)
Slowdown would be 2.38 !
What if processor is clocked twice as fast
gt penalty becomes 80 cycles
CPI 4.752
Speedup N.CPIa.Tclock / (N.CPIb.Tclock/2)
3.376 / (4.752/2)
Speedup not 2, but only 1.42 !!

17
Improving performance

Two ways of improving performance
decrease miss ratio associativity
decrease miss penalty multilevel caches
Active Learning What happens if we increase
block size?

18
Decrease miss ratio using associative caches

2 blocks / set
block
4 blocks / set
8 blocks / set
19
Implementation 4-way associative
20
Active Learning

Useful formula
Active learning given the following
cache size 4 KB
associativity 4
block size 4 words
word 4 bytes
how many sets are there in the cache?

(cache size) (number of sets) x associativity x
(block size)
21
Replacement

Which block is replaced on a cache miss?
Cache Replacement Strategies
Random pick one block at random
First-In-First-Out (FIFO) replace block which is
longest in the cache
Least Recently Used (LRU) replace block which
has not been used for the longest time
Optimal algorithm (MIN) replace the block which
will not be used for the longest time

22
Performance
1 KB
2 KB
8 KB
23
Multilevel Caches

Add a second level cache
primary cache is often on the same chip as the
processor
use SRAMs to add another cache above primary
memory (DRAM)
miss penalty goes down if data is in 2nd level
cache
Example
CPI of 1.0 on a 500MHz machine with 5 miss rate,
200ns DRAM access
Adding 2nd level cache with 20ns access time
decreases miss rate to 2
Using multilevel caches
try and optimize the hit time on the 1st level
cache
try and optimize the miss rate on the 2nd level
cache

24
Virtual Memory

Main memory can act as a cache for secondary
storage (disk)
Advantages
illusion of having more physical memory
program relocation
protection

physical memory
virtual memory
25
Pages and Page Table

Pages virtual memory blocks
Page table mapping of virtual page numbers to
physical page numbers

valid
page 0
0
0
page 3
0
page 1
1
2
1
page 2
1
0
2
page 1
page 3
2
1
0
3
3
physical memory
page 63
0
63
virtual memory
page table
26
Page Faults

Page fault data is not in memory, retrieve it
from disk
huge miss penalty, thus pages should be fairly
large (e.g., 4KB)
reducing page faults is important (LRU is worth
the price)
can handle faults in software (OS) instead of
hardware
write-through is too expensive, use writeback

27
Page Tables

28
Making Address Translation Fast

A cache for address translations translation
lookaside buffer (TLB)

29
Active Learning

Suppose there is room for 3 pages in memory and
the processor references the following pages
7 0 1 2 0 3 0 4 2 3 0 3 2 1
How many page faults occur (assuming LRU
replacement)?

30
Modern Systems

First level cache organization

Pentium Pro dual chip module
31
Modern Systems

Very complicated memory systems
Virtual memory

32
Research Issues

Processor speeds continue to increase very
fast much faster than either DRAM or disk
access times
Design challenge dealing with this growing
disparity
Trends
synchronous SRAMs (provide a burst of data)
redesign DRAM chips to provide higher bandwidth
or processing
restructure code to increase locality
use prefetching (make cache visible to ISA)

33
Active Learning