Title: Virtual Memory
1Virtual Memory Address Translation
- Vivek Pai
- Princeton University
2General Memory Problem
- We have a limited (expensive) physical resource
main memory - We want to use it as efficiently as possible
- We have an abundant, slower resource disk
3Lots of Variants
- Many programs, total size less than memory
- Technically possible to pack them together
- Will programs know about each others existence?
- One program, using lots of memory
- Can you only keep part of the program in memory?
- Lots of programs, total size exceeds memory
- What programs are in memory, and how to decide?
4History Versus Present
- History
- Each variant had its own solution
- Solutions have different hardware requirements
- Some solutions software/programmer visible
- Present general-purpose microprocessors
- One mechanism used for all of these cases
- Present less capable microprocessors
- May still use historical approaches
5Many Programs, Small Total Size
- Observation we can pack them into memory
- Requirements by segments
- Text maybe contiguous
- Data keep contiguous, relocate at start
- Stack assume contiguous, fixed size
- Just set pointer at start, reserve space
- Heap no need to make it contiguous
6Many Programs, Small Total Size
- Software approach
- Just find appropriate space for data code
segments - Adjust any pointers to globals/functions in the
code - Heap, stack automatically adjustable
- Hardware approach
- Pointer to data segment
- All accesses to globals indirected
7One Program, Lots of Memory
- Observations locality
- Instructions in a function generally related
- Stack accesses generally in current stack frame
- Not all globals used all the time
- Goal keep recently-used portions in memory
- Explicit programmer/compiler reserves, controls
part of memory space overlays - Note limited resource may be address space
8Many Programs, Lots of Memory
- Software approach
- Keep only subset of programs in memory
- When loading a program, evict any programs that
use the same memory regions - Swap programs in/out as needed
- Hardware approach
- Dont permanently associate any address of any
program to any part of physical memory - Note doesnt address problem of too few address
bits
9Why Virtual Memory?
- Use secondary storage()
- Extend DRAM() with reasonable performance
- Protection
- Programs do not step over each other
- Communications require explicit IPC operations
- Convenience
- Flat address space
- Programs have the same view of the world
10How To Translate
- Must have some mapping mechanism
- Mapping must have some granularity
- Granularity determines flexibility
- Finer granularity requires more mapping info
- Extremes
- Any byte to any byte mapping equals program size
- Map whole segments larger segments problematic
11Translation Options
- Granularity
- Small of big fixed/flexible regions segments
- Large of fixed regions pages
- Visibility
- Translation mechanism integral to instruction set
segments - Mechanism partly visible, external to processor
obsolete - Mechanism part of processor, visible to OS
pages
12Translation Overview
CPU
- Actual translation is in hardware (MMU)
- Controlled in software
- CPU view
- what program sees, virtual memory
- Memory view
- physical memory
virtual address
Translation (MMU)
physical address
Physical memory
I/O device
13Goals of Translation
- Implicit translation for each memory reference
- A hit should be very fast
- Trigger an exception on a miss
- Protected from users faults
Registers
Cache(s)
10x
DRAM
100x
paging
Disk
10Mx
14Base and Bound
- Built in Cray-1
- A program can only access physical memory in
- base, basebound
- On a context switch save/restore base, bound
registers - Pros Simple
- Cons fragmentation, hard to share, and difficult
to use disks
bound
virtual address
gt
error
base
physical address
15Segmentation
- Have a table of (seg, size)
- Protection each entry has
- (nil, read, write, exec)
- On a context switch save/restore the table or a
pointer to the table in kernel memory - Pros Efficient, easy to share
- Cons Complex management and fragmentation within
a segment
Virtual address
segment
offset
gt
error
seg
size
. . .
physical address
16Paging
Virtual address
page table size
VPage
offset
- Use a page table to translate
- Various bits in each entry
- Context switch similar to the segmentation
scheme - What should be the page size?
- Pros simple allocation, easy to share
- Cons big table cannot deal with holes easily
error
gt
Page table
PPage
...
...
. . .
PPage
...
PPage
offset
Physical address
17How Many PTEs Do We Need?
- Assume 4KB page
- Equals low order 12 bits
- Worst case for 32-bit address machine
- of processes ? 220
- What about 64-bit address machine?
- of processes ? 252
18Segmentation with Paging
Virtual address
VPage
offset
Vseg
Page table
seg
size
PPage
...
...
. . .
. . .
PPage
...
gt
PPage
offset
Physical address
error
19Multiple-Level Page Tables
Virtual address
pte
dir
table
offset
. . .
Directory
. . .
. . .
. . .
What does this buy us? Sparse address spaces and
easier paging
20Inverted Page Tables
Physical address
Virtual address
- Main idea
- One PTE for each physical page frame
- Hash (Vpage, pid) to Ppage
- Pros
- Small page table for large address space
- Cons
- Lookup is difficult
- Overhead of managing hash chains, etc
pid
vpage
offset
k
offset
0
pid
vpage
k
n-1
Inverted page table
21Virtual-To-Physical Lookups
- Programs only know virtual addresses
- Each virtual address must be translated
- May involve walking hierarchical page table
- Page table stored in memory
- So, each program memory access requires several
actual memory accesses - Solution cache active part of page table
22Translation Look-aside Buffer (TLB)
Virtual address
offset
PPage
...
VPage
Real page table
Miss
PPage
...
VPage
. . .
PPage
...
VPage
TLB
Hit
PPage
offset
Physical address
23Bits in A TLB Entry
- Common (necessary) bits
- Virtual page number match with the virtual
address - Physical page number translated address
- Valid
- Access bits kernel and user (nil, read, write)
- Optional (useful) bits
- Process tag
- Reference
- Modify
- Cacheable
24Hardware-Controlled TLB
- On a TLB miss
- Hardware loads the PTE into the TLB
- Need to write back if there is no free entry
- Generate a fault if the page containing the PTE
is invalid - VM software performs fault handling
- Restart the CPU
- On a TLB hit, hardware checks the valid bit
- If valid, pointer to page frame in memory
- If invalid, the hardware generates a page fault
- Perform page fault handling
- Restart the faulting instruction
25Software-Controlled TLB
- On a miss in TLB
- Write back if there is no free entry
- Check if the page containing the PTE is in memory
- If no, perform page fault handling
- Load the PTE into the TLB
- Restart the faulting instruction
- On a hit in TLB, the hardware checks valid bit
- If valid, pointer to page frame in memory
- If invalid, the hardware generates a page fault
- Perform page fault handling
- Restart the faulting instruction
26Hardware vs. Software Controlled
- Hardware approach
- Efficient
- Inflexible
- Need more space for page table
- Software approach
- Flexible
- Software can do mappings by hashing
- PP ? (Pid, VP)
- (Pid, VP) ? PP
- Can deal with large virtual address space
27Cache vs. TLBs
- Similarities
- Both cache a portion of memory
- Both write back on a miss
- Combine L1 cache with TLB
- Virtually addressed cache
- Why wouldnt everyone use virtually addressed
caches?
- Differences
- Associativity
- TLB is usually fully set-associative
- Cache can be direct-mapped
- Consistency
- TLB does not deal with consistency with memory
- TLB can be controlled by software
28Caches vs. TLBs
- Similarities
- Both cache a portion of memory
- Both read from memory on misses
- Differences
- Associativity
- TLBs generally fully associative
- Caches can be direct-mapped
- Consistency
- No TLB/memory consistency
- Some TLBs software-controlled
- Combining L1 caches with TLBs
- Virtually addressed caches
- Not always used what are their drawbacks?
29Issues
- What TLB entry to be replaced?
- Random
- Pseudo LRU
- What happens on a context switch?
- Process tag change TLB registers and process
register - No process tag Invalidate the entire TLB
contents - What happens when changing a page table entry?
- Change the entry in memory
- Invalidate the TLB entry
30Consistency Issues
- Snoopy cache protocols can maintain consistency
with DRAM, even when DMA happens - No hardware maintains consistency between DRAM
and TLBs you need to flush related TLBs whenever
changing a page table entry in memory - On multiprocessors, when you modify a page table
entry, you need to do TLB shoot-down to flush
all related TLB entries on all processors
31Issues to Ponder
- Everyones moving to hardware TLB management
why? - Segmentation was/is a way of maintaining backward
compatibility how? - For the hardware-inclined what kind of hardware
support is needed for everything we discussed
today?