Review CPSC 321 - PowerPoint PPT Presentation

About This Presentation

Title:

Review CPSC 321

Description:

Review CPSC 321 Andreas Klappenecker Announcements Tuesday, November 30, midterm exam Cache Placement strategies direct mapped fully associative set-associative ... – PowerPoint PPT presentation

Number of Views:90

Avg rating:3.0/5.0

Slides: 37

Provided by: facultyC70

Learn more at: https://people.engr.tamu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Review CPSC 321

1
ReviewCPSC 321

Andreas Klappenecker

2
Announcements

Tuesday, November 30, midterm exam

3
Cache

Placement strategies
direct mapped
fully associative
set-associative
Replacement strategies
random
FIFO
LRU

4
Direct Mapped Cache

Mapping address modulo the number of blocks in
the cache, x -gt x mod B

5
Set Associative Caches

Each block maps to a unique set,
the block can be placed into any element of that
set,
Position is given by
(Block number) modulo ( of sets in cache)
If the sets contain n elements, then the cache is
called n-way set associative

6
Direct Mapped Cache
The index is determined by address mod 1024

Cache with 1024210 words
tag from cache is compared against upper portion
of the address
If tagupper 20 bits and valid bit is set, then
we have a cache hit otherwise it is a cache
missWhat kind of locality are we
taking advantage of?

Byte offset
7
Direct Mapped Cache

Taking advantage of spatial locality

Block offset
8
Address Determination

reconstruction of the memory address
tag bits set index bits block offset
byte offset
Example
32 bit words, cache capacity 212 4096 words,
blocks of 8 words, direct mapped
byte offset 2 bits, block offset 3 bits, set
index bits 9 bits, tag bits 18 bits

9
(No Transcript)
10
Example

Suppose you want to realize a cache with a
capacity for 8 KB of data (32 bits of address
size). Assume that the blocksize is 4 words and a
word consists of 4 bytes.
How many bits are needed to realize a direct
mapped cache?
8 KByte 2K words 512 blocks 29 blocks
direct mapped gt index bits log(29)9.
29 x (128 (32 9 2 2) 1) 29 x 148
bits
number of blocks x (bits per block tag valid
bit)
How many bits are needed to realize a 8-way set
associative cache?
Number of tag bits increase by 3. Why?

11
Typical Questions

Show the evolution of a cache
Determine the number of bits needed in an
implementation of a cache
Know the placement and replacement strategies
Be able to design a cache according to
specifications
Determine the number of cache misses
Measure cache performance

12
Typical Questions

What kind of placement is typically used in
virtual memory systems?
What is a translation lookaside buffer?
Why is a TLB used?

13
Pages virtual memory blocks

Page faults if data is not in memory, retrieve
it from disk
huge miss penalty, thus pages should be fairly
large (e.g., 4KB)
reducing page faults is important (LRU is worth
the price)
can handle the faults in software instead of
hardware
using write-through takes too long so we use
writeback
Example page size 2124KB 218 physical pages
main memory lt 1GB virtual memory lt 4GB

14
Page Faults

Incredible high penalty for a page fault
Reduce number of page faults by optimizing page
placement
Use fully associative placement
full search of pages is impractical
pages are located by a full table that indexes
the memory, called the page table
the page table resides within the memory

15
Page Tables
The page table maps each page to either a page in
main memory or to a page stored on disk
16
Page Tables

17
Making Memory Access Fast

Page tables slow us down
Memory access will take at least twice as long
access page table in memory
access page
What can we do?

Memory access is local gt use a cache that keeps
track of recently used address translations,
called translation lookaside buffer
18
Making Address Translation Fast

A cache for address translations translation
lookaside buffer

19
MIPS Processor and Variations
20
Datapath for MIPS instructions
Note the seven control signals!
21
Single Cycle Datapath
22
Pipelined Version
23
Obstacles to Pipelining

Structural Hazards
hardware cannot support the combination of
instructions in the same clock cycle
Control Hazards
need to make decision based on results of one
instruction while other is still executing
Data Hazards
instruction depends on results of instruction
still in pipeline

Control Hazards Resolution (for branch)
Stall pipeline
predict result
delayed branch

25
Stall on Branch

Assume that all branch computations are done in
stage 2
Delay by one cycle to wait for the result

26
Branch Prediction

Predict branch result
For example, predict always that branch
is not taken
(e.g. reasonable for while instructions)
if choice is correct, then pipeline runs at full
speed
if choice is incorrect, then pipeline stalls

27
Branch Prediction
28
Delayed Branch
29
Data Hazards

A data hazard results if an instruction depends
on the result of a previous instruction
add s0, t0, t1
sub t2, s0, t3 // s0 to be determined
These dependencies happen often, so it is not
possible to avoid them completely
Use forwarding to get missing data from internal
resources once available

30
Forwarding

add s0, t0, t1
sub t2, s0, t3

31
(No Transcript)
32
(No Transcript)
33
Typical Questions

Given a brief specification of the processor and
a sequences of instructions, determine all
pipeline hazards.
Most typical question fill in some steps in a
timing diagram (almost every exam has such a
question, google).

34
Example