Title: Review CPSC 321
1ReviewCPSC 321
2Announcements
- Tuesday, November 30, midterm exam
3Cache
- Placement strategies
- direct mapped
- fully associative
- set-associative
- Replacement strategies
- random
- FIFO
- LRU
4Direct Mapped Cache
- Mapping address modulo the number of blocks in
the cache, x -gt x mod B
5Set Associative Caches
- Each block maps to a unique set,
- the block can be placed into any element of that
set, - Position is given by
- (Block number) modulo ( of sets in cache)
- If the sets contain n elements, then the cache is
called n-way set associative
6Direct Mapped Cache
The index is determined by address mod 1024
- Cache with 1024210 words
- tag from cache is compared against upper portion
of the address - If tagupper 20 bits and valid bit is set, then
we have a cache hit otherwise it is a cache
missWhat kind of locality are we
taking advantage of?
Byte offset
7Direct Mapped Cache
- Taking advantage of spatial locality
Block offset
8Address Determination
- reconstruction of the memory address
- tag bits set index bits block offset
byte offset - Example
- 32 bit words, cache capacity 212 4096 words,
blocks of 8 words, direct mapped - byte offset 2 bits, block offset 3 bits, set
index bits 9 bits, tag bits 18 bits
9(No Transcript)
10Example
- Suppose you want to realize a cache with a
capacity for 8 KB of data (32 bits of address
size). Assume that the blocksize is 4 words and a
word consists of 4 bytes. - How many bits are needed to realize a direct
mapped cache? - 8 KByte 2K words 512 blocks 29 blocks
- direct mapped gt index bits log(29)9.
- 29 x (128 (32 9 2 2) 1) 29 x 148
bits
number of blocks x (bits per block tag valid
bit) - How many bits are needed to realize a 8-way set
associative cache? - Number of tag bits increase by 3. Why?
11Typical Questions
- Show the evolution of a cache
- Determine the number of bits needed in an
implementation of a cache - Know the placement and replacement strategies
- Be able to design a cache according to
specifications - Determine the number of cache misses
- Measure cache performance
12Typical Questions
- What kind of placement is typically used in
virtual memory systems? - What is a translation lookaside buffer?
- Why is a TLB used?
13Pages virtual memory blocks
- Page faults if data is not in memory, retrieve
it from disk - huge miss penalty, thus pages should be fairly
large (e.g., 4KB) - reducing page faults is important (LRU is worth
the price) - can handle the faults in software instead of
hardware - using write-through takes too long so we use
writeback - Example page size 2124KB 218 physical pages
- main memory lt 1GB virtual memory lt 4GB
14Page Faults
- Incredible high penalty for a page fault
- Reduce number of page faults by optimizing page
placement - Use fully associative placement
- full search of pages is impractical
- pages are located by a full table that indexes
the memory, called the page table - the page table resides within the memory
15Page Tables
The page table maps each page to either a page in
main memory or to a page stored on disk
16Page Tables
17Making Memory Access Fast
- Page tables slow us down
- Memory access will take at least twice as long
- access page table in memory
- access page
- What can we do?
Memory access is local gt use a cache that keeps
track of recently used address translations,
called translation lookaside buffer
18Making Address Translation Fast
- A cache for address translations translation
lookaside buffer
19MIPS Processor and Variations
20Datapath for MIPS instructions
Note the seven control signals!
21Single Cycle Datapath
22Pipelined Version
23Obstacles to Pipelining
- Structural Hazards
- hardware cannot support the combination of
instructions in the same clock cycle - Control Hazards
- need to make decision based on results of one
instruction while other is still executing - Data Hazards
- instruction depends on results of instruction
still in pipeline
24- Control Hazards Resolution (for branch)
- Stall pipeline
- predict result
- delayed branch
25Stall on Branch
- Assume that all branch computations are done in
stage 2 - Delay by one cycle to wait for the result
26Branch Prediction
- Predict branch result
- For example, predict always that branch
- is not taken
- (e.g. reasonable for while instructions)
- if choice is correct, then pipeline runs at full
speed - if choice is incorrect, then pipeline stalls
27Branch Prediction
28Delayed Branch
29Data Hazards
- A data hazard results if an instruction depends
on the result of a previous instruction - add s0, t0, t1
- sub t2, s0, t3 // s0 to be determined
- These dependencies happen often, so it is not
possible to avoid them completely - Use forwarding to get missing data from internal
resources once available
30Forwarding
- add s0, t0, t1
- sub t2, s0, t3
31(No Transcript)
32(No Transcript)
33Typical Questions
- Given a brief specification of the processor and
a sequences of instructions, determine all
pipeline hazards. - Most typical question fill in some steps in a
timing diagram (almost every exam has such a
question, google).
34Example
- add 1, 2, 3 _ _ _ _ _
- add 4, 5, 6 _ _ _ _ _
- add 7, 8, 9 _ _ _ _ _
- add 10, 11, 12 _ _ _ _ _
- add 13, 14, 1 _ _ _ _ _ (data arrives
early OK) - add 15, 16, 7 _ _ _ _ _ (data
arrives on time OK) - add 17, 18, 13 _ _ _ _ _ (uh, oh)
- add 19, 20, 17 _ _ _ _ _ (uh, oh)
35Verilog
36Mixed Questions