Title: Caching IV
1Caching IV
- Andreas Klappenecker
- CPSC321 Computer Architecture
2Virtual Memory
- Processor generates virtual addresses
- Memory is accessed using physical addresses
- Virtual and physical memory is broken into blocks
of memory, called pages - A virtual page may be
- absent from main memory, residing on the disk
- or may be mapped to a physical page
3Virtual Memory
- Main memory can act as a cache for the secondary
storage (disk) - Virtual address generated by processor (left)
- Address translation (middle)
- Physical addresses (right)
4Pages virtual memory blocks
- Page faults if data is not in memory, retrieve
it from disk - huge miss penalty, thus pages should be fairly
large (e.g., 4KB) - reducing page faults is important (LRU is worth
the price) - can handle the faults in software instead of
hardware - using write-through takes too long so we use
writeback - Example page size 2124KB 218 physical pages
- main memory lt 1GB virtual memory lt 4GB
5Page Faults
- Incredible high penalty for a page fault
- Reduce number of page faults by optimizing page
placement - Use fully associative placement
- full search of pages is impractical
- pages are located by a full table that indexes
the memory, called the page table - the page table resides within the memory
6Page Tables
The page table maps each page to either a page in
main memory or to a page stored on disk
7Page Tables
8Making Memory Access Fast
- Page tables slow us down
- Memory access will take at least twice as long
- access page table in memory
- access page
- What can we do?
Memory access is local gt use a cache that keeps
track of recently used address translations,
called translation lookaside buffer
9Making Address Translation Fast
- A cache for address translations translation
lookaside buffer
10Translation Lookaside Buffer
- Some typical values for a TLB
- TLB size 32-4096
- Block size 1-2 page table entries (4-8bytes
each) - Hit time 0.5-1 clock cycle
- Miss penalty 10-30 clock cycles
- Miss rate 0.01-1
11TLBs and Caches
12More Modern Systems
- Very complicated memory systems
13Some Issues
- Processor speeds continue to increase very
fast much faster than either DRAM or disk
access times - Design challenge dealing with this growing
disparity - Trends
- synchronous SRAMs (provide a burst of data)
- redesign DRAM chips to provide higher bandwidth
or processing - restructure code to increase locality
- use prefetching (make cache visible to ISA)
14Where can a Block be Placed?
Name Number of Sets Blocks per Set
Direct mapped Blocks in Cache 1
Set associative (Blocks in Cache) Associativity Associativity (typically 2-8)
Fully associative 1 Number of blocks in cache
15How is a Block Found?
Associativity Number of Sets Comparisons
Direct mapped Index 1
Set associative Index the set, search among elements Degree of Associativity
Fully associative search all cache entries size of the cache
Fully associative separate lookup table 0
16Algorithm for Success
- Read Chapters 5 - 7
- get the big picture
- Read again
- focus on the little details
- do calculations
- work problems
- Get enough sleep!
- What should be reviewed?
17Project
- Provide a working solution
- it is better to submit a working solution
implementing a subset of instructions - if you submit a faulty version, comment your bugs
- have test programs that exercise all instruction
- have a full report that explains your design
- should include a table of control signals