Computer Architecture and Organization - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Computer Architecture and Organization

Description:

describe direct-mapped caches and how data items are found in such a cache ... What if processor is clocked twice as fast = penalty becomes 80 cycles. CPI = 4.752 ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 34
Provided by: HECO9
Category:

less

Transcript and Presenter's Notes

Title: Computer Architecture and Organization


1
Computer Architecture and Organization
  • Ben Juurlink Module 8
  • Delft University of Technology The Memory
    Hierarchy
  • April-May 2001
  • Additional information
  • http//ce.et.tudelft.nl/benj/Courses/CAO

2
Objectives
  • After this lecture, you should be able to
  • describe basics of caches
  • describe temporal and spatial locality
  • describe direct-mapped caches and how data items
    are found in such a cache
  • describe set-associative caches and how data
    items are found in it
  • given a cache description, compute the number of
    sets and how large the tag is
  • given a sequence of addresses, compute the miss
    rate
  • use several cache replacement strategies
  • describe virtual memory
  • translate virtual addresses to physical addresses
  • give a sequence of page references, compute the
    number of page faults

3
Memory Hierarchy, why?
  • Users want large and fast memories!
  • Type Access times Cost/MB (in 1997)SRAM 2 -
    25ns 100 - 250DRAM 60-120ns 5 - 10Disk
    10 to 20 million ns .10 to .20
  • Try and give it to them anyway
  • build a memory hierarchy

4
Locality
  • A principle that makes having a memory hierarchy
    a good idea
  • If an item is referenced,temporal locality it
    will tend to be referenced again soon
  • spatial locality nearby items will tend to
    be referenced soon.
  • Why does code have locality?
  • Our initial focus two levels (upper, lower)
  • block minimum unit of data
  • hit data requested is in the upper level
  • miss data requested is not in the upper level

5
Cache
  • Two issues
  • How do we know if a data item is in the cache?
  • If it is, how do we find it?
  • Our first example
  • block size is one word of data
  • "direct mapped"

For each item of data at the lower level, there
is exactly one location in the cache where it
might be. e.g., lots of items at the lower level
share locations in the upper level
6
Direct Mapped Cache
  • Mapping
  • block address (byte address) div (block size in
    bytes)
  • cache address (block address) mod (cache size
    in blocks)

7
Direct Mapped Cache Organization
Address (bit positions)
3
1

3
0







1
3

1
2

1
1







2

1

0
  • For MIPS
  • What kind of locality are we taking advantage of?

B
y
t
e
o
f
f
s
e
t
2
0
1
0
H
i
t
D
a
t
a
T
a
g
I
n
d
e
x
V
a
l
i
d
T
a
g
D
a
t
a
I
n
d
e
x
0
1
2
1
0
2
1
1
0
2
2
1
0
2
3
2
0
3
2
8
Hits vs. Misses
  • Read hits
  • this is what we want!
  • Read misses
  • stall the CPU, fetch block from memory, deliver
    to cache, restart the load instruction
  • Write hits
  • can replace data in cache and memory
    (write-through)
  • write the data only into the cache (write-back
    the cache later)
  • Write misses
  • read the entire block into the cache, then write
    the word (allocate on write miss)
  • do not read the cache line just write to memory
    (no allocate on write miss)

9
Direct Mapped Cache
  • Taking advantage of spatial locality

Address (bit positions)
10
Hardware Issues
  • Make reading multiple words faster by using
    multiple banks of memory

11
Performance
  • Increasing the block size tends to decrease miss
    rate but increases miss penalty

12
Split caches
  • Split cache separate caches for instructions
    (code) and data
  • Useful because there is more spatial locality in
    code

13
Impact of Cache Performance on Execution Time
  • Texec Ninst CPI Tcycle
  • where
  • CPI CPIideal CPIstall
  • CPIstall reads missrateread
    misspenaltyread
  • writes missratewrite misspenaltywrite
  • or
  • Texec (Nnormal-cycles Nstall-cycles )
    Tcycle
  • where
  • Nstall-cycles Nreads missrateread
    misspenaltyread
  • Nwrites missratewrite
    misspenaltywrite
  • ( Write-buffer stalls )

14
Impact of Cache Performance on Execution Time
  • Simplified model

Texec (Nnormal-cycles Nstall-cycles )
Tcycle where Nstall-cycles Naccess
miss-rate miss-penalty
15
Performance example
  • Assume GCC application (page 311)
  • I-cache miss rate 2
  • D-cache miss rate 4
  • CPIideal 2.0
  • Miss penalty 40 cycles
  • Calculate CPI
  • CPI 2.0 CPIstall
  • Nstall-cycles (instruction miss cycles) (data
    miss cycles)
  • Instruction miss cycles Ninstr x 0.02 x 40
    0.80 x Ninstr
  • loads and stores 36
  • Data miss cycles Ninstr x ld-st x 0.04 x 40
    0.576 x Ninstr
  • CPI 3.376 x Ninstr
  • Slowdown 1.688 !!

16
Performance example (continued)
  • What if ideal processor had CPI 1.0 (instead of
    2.0)
  • Slowdown would be 2.38 !
  • What if processor is clocked twice as fast
  • gt penalty becomes 80 cycles
  • CPI 4.752
  • Speedup N.CPIa.Tclock / (N.CPIb.Tclock/2)
    3.376 / (4.752/2)
  • Speedup not 2, but only 1.42 !!

17
Improving performance
  • Two ways of improving performance
  • decrease miss ratio associativity
  • decrease miss penalty multilevel caches
  • Active Learning What happens if we increase
    block size?

18
Decrease miss ratio using associative caches

2 blocks / set
block
4 blocks / set
8 blocks / set
19
Implementation 4-way associative
20
Active Learning
  • Useful formula
  • Active learning given the following
  • cache size 4 KB
  • associativity 4
  • block size 4 words
  • word 4 bytes
  • how many sets are there in the cache?

(cache size) (number of sets) x associativity x
(block size)
21
Replacement
  • Which block is replaced on a cache miss?
  • Cache Replacement Strategies
  • Random pick one block at random
  • First-In-First-Out (FIFO) replace block which is
    longest in the cache
  • Least Recently Used (LRU) replace block which
    has not been used for the longest time
  • Optimal algorithm (MIN) replace the block which
    will not be used for the longest time

22
Performance
1 KB
2 KB
8 KB
23
Multilevel Caches
  • Add a second level cache
  • primary cache is often on the same chip as the
    processor
  • use SRAMs to add another cache above primary
    memory (DRAM)
  • miss penalty goes down if data is in 2nd level
    cache
  • Example
  • CPI of 1.0 on a 500MHz machine with 5 miss rate,
    200ns DRAM access
  • Adding 2nd level cache with 20ns access time
    decreases miss rate to 2
  • Using multilevel caches
  • try and optimize the hit time on the 1st level
    cache
  • try and optimize the miss rate on the 2nd level
    cache

24
Virtual Memory
  • Main memory can act as a cache for secondary
    storage (disk)
  • Advantages
  • illusion of having more physical memory
  • program relocation
  • protection

physical memory
virtual memory
25
Pages and Page Table
  • Pages virtual memory blocks
  • Page table mapping of virtual page numbers to
    physical page numbers

valid
page 0
0
0
page 3
0
page 1
1
2
1
page 2
1
0
2
page 1
page 3
2
1
0
3
3
physical memory
page 63
0
63
virtual memory
page table
26
Page Faults
  • Page fault data is not in memory, retrieve it
    from disk
  • huge miss penalty, thus pages should be fairly
    large (e.g., 4KB)
  • reducing page faults is important (LRU is worth
    the price)
  • can handle faults in software (OS) instead of
    hardware
  • write-through is too expensive, use writeback

27
Page Tables

28
Making Address Translation Fast
  • A cache for address translations translation
    lookaside buffer (TLB)

29
Active Learning
  • Suppose there is room for 3 pages in memory and
    the processor references the following pages
  • 7 0 1 2 0 3 0 4 2 3 0 3 2 1
  • How many page faults occur (assuming LRU
    replacement)?

30
Modern Systems
  • First level cache organization

Pentium Pro dual chip module
31
Modern Systems
  • Very complicated memory systems
  • Virtual memory

32
Research Issues
  • Processor speeds continue to increase very
    fast much faster than either DRAM or disk
    access times
  • Design challenge dealing with this growing
    disparity
  • Trends
  • synchronous SRAMs (provide a burst of data)
  • redesign DRAM chips to provide higher bandwidth
    or processing
  • restructure code to increase locality
  • use prefetching (make cache visible to ISA)

33
Active Learning
  • Suggested exercises from Chapter seven
  • 7.2, 7.3
  • 7.7 - 7.10
  • 7.20, 7.21
  • 7.27, 7.32
Write a Comment
User Comments (0)
About PowerShow.com