Title: Virtual Memory
1Virtual Memory
2Review The memory hierarchy
- Take advantage of the principle of locality to
present the user with as much memory as is
available in the cheapest technology at the speed
offered by the fastest technology
Processor
Increasing distance from the processor in access
time
L1
L2
Main Memory
Secondary Memory
(Relative) size of the memory at each level
3Virtual memory
- Use main memory as a cache for secondary memory
- Allows efficient and safe sharing of memory among
multiple programs - Provides the ability to easily run programs
larger than the size of physical memory - Automatically manages the memory hierarchy (as
one-level) - What makes it work? again the Principle of
Locality - A program is likely to access a relatively small
portion of its address space during any period of
time - Each program is compiled into its own address
space a virtual address space - During run-time each virtual address must be
translated to a physical address (an address in
main memory)
4IBM System/350 Model 67
5VM simplifies loading and sharing
- Simplifies loading a program for execution by
avoiding code relocation - Address mapping allows programs to be load in any
location in physical memory - Simplifies shared libraries, since all sharing
programs can use the same virtual addresses - Relocation does not need special OS hardware
support as in the past
6Virtual memory motivation
- Historically, there were two major motivations
for virtual memory to allow efficient and safe
sharing of memory among multiple programs, and to
remove the programming burden of a small, limited
amount of main memory. - PattHenn
- a system has been devised to make the core
drum combination appear to programmer as a single
level store, the requisite transfers taking place
automatically - Kilbum et al.
7Terminology
- Page fixed sized block of memory 512-4096 bytes
- Segment contiguous block of segments
- Page fault a page is referenced, but not in
memory - Virtual address address seen by the program
- Physical address address seen by the cache or
memory - Memory mapping or address translation next slide
8Memory management unit
from Processor
to Memory
9Address translation
- A virtual address is translated to a physical
address by a combination of hardware and software
Virtual Address (VA)
31 30 . . .
12 11 . .
. 0
Page offset
Virtual page number
- So each memory request first requires an address
translation from the virtual space to the
physical space - A virtual memory miss (i.e., when the page is not
in physical memory) is called a page fault
10Mapping virtual to physical space
64K virtual address space 32K main memory
Main memory address
Virtual address
4K
4K
(b)
(a)
11A paging system
Physical memory
Virtual page number
Page table
The page table maps each page in virtual memory
to either a page in physical memory or a page
stored on disk, which is the next level in the
hierarchy.
12A virtual address cache (TLB)
The TLB acts as a cache on the page table for
the entries that map to physical pages only
13Two Programs Sharing Physical Memory
- A programs address space is divided into pages
(all one fixed size) or segments (variable sizes) - The starting location of each page (either in
main memory or in secondary memory) is contained
in the programs page table
Program 1 virtual address space
main memory
Program 2 virtual address space
14Typical ranges of VM parameters
- These figures, contrasted with the values for
caches, represent increases of 10 to 100,000
times.
15Some virtual memory design parameters
16Technology
- Technology Access Time per GB in 2004
- SRAM 0.5 5ns 4,000 10,000
- DRAM 50 - 70ns 100 - 200
- Magnetic disk 5 -20 x 106ns 0.5 - 2
17Address Translation Consideration
- Direct mapping using register sets
- Indirect mapping using tables
- Associative mapping of frequently used pages
18Fundamental considerations
- The Page Table (PT) must have one entry for each
page in virtual memory! - How many Pages?
- How large is PT?
194 key design issues
- Pages should be large enough to amortize the high
access time. From 4 KB to 16 KB are typical, and
some designers are considering size as large as
64 KB. - Organizations that reduce the page fault rate are
attractive. The primary technique used here is to
allow flexible placement of pages. (e.g. fully
associative)
204 key design issues (cont.)
- Page fault (misses) in a virtual memory system
can be handled in software, because the overhead
will be small compared to the access time to
disk. Furthermore, the software can afford to
used clever algorithms for choosing how to place
pages, because even small reductions in the miss
rate will pay for the cost of such algorithms. - Using write-through to manage writes in virtual
memory will not work since writes take too long.
Instead, we need a scheme that reduce the number
of disk writes.
21Page Size Selection Constraints
- Efficiency of secondary memory device (slotted
disk/drum) - Page table size
- Page fragmentation last part of last page
- Program logic structure logic block size lt 1K
4K - Table fragmentation full PT can occupy large,
sparse space - Uneven locality text, globals, stack
- Miss ratio
22An Example
- Case 1
- VM page size 512
- VM address space 64K
-
- Total virtual page 128 pages
64K 512
23An Example (cont.)
- Case 2
- VM page size 512 29
- VM address space 4G 232
- Total virtual page 8M pages
- Each PTE has 32 bits so total PT size
- 8M x 4 32M bytes
- Note assuming main memory has working set
- 4M byte or 213
8192 pages
4G 512
4M 512
222 29
24An Example (cont.)
- How about
- VM address space 252 (R-6000)
- (4 Petabytes)
- page size 4K bytes
- so total number of virtual pages
252 212
240 !
25Techniques for Reducing PT Size
- Set a lower limit, and permit dynamic growth
- Permit growth from both directions (text, stack)
- Inverted page table (a hash table)
- Multi-level page table (segments and pages)
- PT itself can be paged ie., put PT itself in
virtual address space (Note some small portion
of pages should be in main memory and never paged
out)
26LSI-11/73 Segment Registers
27VM implementation issues
- Page fault handling hardware, software or both
- Efficient input/output slotted drum/disk
- Queue management. Process can be linked on
- CPU ready queue waiting for the CPU
- Page in queue waiting for page transfer from
disk - Page out queue waiting for page transfer to disk
- Protection issues read/write/execute
- Management bits dirty, reference, valid.
- Multiple program issues context switch,
timeslice end
28Where to place pages
- Placement
- OS designers always pick lower miss rates vs.
simpler placement algorithm - So, fully associativity -
- VM pages can go anywhere in the main M (compare
with sector cache) - Question
- why not use associative hardware?
- ( of PT entries too big!)
29How to handle protection and multiple users
If s/u 1 - supervisor mode PME(x) C 1-page
PFA modified PME(x) P 1-page is private to
process PME(x) pid is process identification
number PME(x) PFA is page frame address
Virtual to read address translation using page map
30Page fault handling
- When a virtual page number is not in TLB, then PT
in M is accessed (through PTBR) to find the PTE - Hopefully, the PTE is in the data cache
- If PTE indicates that the page is missing a page
fault occurs - If so, put the disk sector number and page number
on the page-in queue and continue with the next
process - If all page frames in main memory are occupied,
find a suitable one and put it on the page-out
queue
31Fast address translation
- PT must involve at least two accesses of memory
for each memory fetch or store - Improvement
- Store PT in fast registers example Xerox 256
regs - Implement VM address cache (TLB)
- Make maximal use of instruction/data cache
32Some typical values for a TLB might be
Miss penaly some time may be as high as upto 100
cycles. TLB size can be as long as 16 entries.
33TLB design issues
- Placement policy
- Small TLBs full-associative can be used
- large TLBs full-associative may be too slow
- Replacement policy random policy is used for
speed/simplicity - TLB miss rate is low (Clark-Emer data 85 34
times smaller then usual cache miss rate - TLB miss penalty is relatively low it usually
results in a cache fetch
34TLB design issues (cont.)
contd
- TLB-miss implies higher miss rate for the main
cache - TLB translation is process-dependent
- strategies for context switching
- 1. tagging by context
- 2. flushing
complete purge by
context (shared)
No absolute answer
35A Case Study DECStation 3100
Virtual address
31 30 29 28 27 .....15 14 13 12 11 10 9 8
..3 2 1 0
Virtual page number Page offset
12
20
TLB
20
TLB hit
Physical address
16
Tag
14
2
Index
Byte offset
Valid Tag Data
Cache
32
Cache hit
Data
36DECStation 3100 TLB and cache
37IBM System/360-67 memory management unit
CPU cycle time 200 nsMem cycle time 750 ns
38IBM System/360-67 address translation
Offset (12)
Page (12)
Bus-out Address (from CPU)
Segment (12)
Offset (12)
Virtual Address (32)
Page (8)
Dynamic Address Translation (DAT)
Offset (12)
Page (12)
Bus-in Address (to memory)
39IBM System/360-67 associative registers
Offset (12)
VM Page (12)
Bus-out Address (from CPU)
115
22
5
59
31
88
44
45
9
110
130
41
77
7
12
27
Offset (12)
PH Page (12)
Bus-in Address (to memory)
40IBM System/360-67 segment/page mapping
Virtual Address (24)
(4)
Offset (12)
Page (8)
Segment Table Reg (32)
Segment Table
Phys Page (24 bit addr)
Page Table 2
0
VRW
0
0
1
Virtual Page (32 bit addr)
VRW
1
1
2
0
3
2
VRW
255
1
4
3
4
Page Table 4
1,048,575
4095
5
VRW
0
VRW
1
4095
V Valid bitR Reference BitW Write (dirty)
Bit
VRW
255
41Virtual addressing with a cache
- Thus it takes an extra memory access to translate
a VA to a PA
- This makes memory (cache) accesses very expensive
(if every access was really two accesses) - The hardware fix is to use a Translation
Lookaside Buffer (TLB) a small cache that keeps
track of recently used address mappings to avoid
having to do a page table lookup
42Making address translation fast
Virtual page
Physical page base addr
V
1 1 1 1 1 1 0 1 0 1 0
Main memory
Page Table (in physical memory)
Disk storage
43Translation lookaside buffers (TLBs)
- Just like any other cache, the TLB can be
organized as fully associative, set associative,
or direct mapped
V Virtual Page Physical Page
Dirty Ref Access
- TLB access time is typically smaller than cache
access time (because TLBs are much smaller than
caches) - TLBs are typically not more than 128 to 256
entries even on high end machines
44A TLB in the memory hierarchy
- A TLB miss is it a page fault or merely a TLB
miss? - If the page is loaded into main memory, then the
TLB miss can be handled (in hardware or software)
by loading the translation information from the
page table into the TLB - Takes 10s of cycles to find and load the
translation info into the TLB - If the page is not in main memory, then its a
true page fault - Takes 1,000,000s of cycles to service a page
fault - TLB misses are much more frequent than true page
faults
45Two Machines Cache Parameters
46TLB Event Combinations
47TLB Event Combinations
Yes what we want!
Yes although the page table is not checked if
the TLB hits
Yes TLB miss, PA in page table
Yes TLB miss, PA in page table, but data not in
cache
Yes page fault
Impossible TLB translation not possible if page
is not present in memory
Impossible data not allowed in cache if page
is not in memory
48Reducing Translation Time
- Can overlap the cache access with the TLB access
- Works when the high order bits of the VA are used
to access the TLB while the low order bits are
used as index into cache
Block offset
2-way Associative Cache
Index
PA Tag
VA Tag
Tag
Data
Tag
Data
PA Tag
TLB Hit
Cache Hit
Desired word
49Why Not a Virtually Addressed Cache?
- A virtually addressed cache would only require
address translation on cache misses
- but
- Two different virtual addresses can map to the
same physical address (when processes are sharing
data), i.e., two different cache entries hold
data for the same physical address synonyms - Must update all cache entries with the same
physical address or the memory becomes
inconsistent
50The Hardware/Software Boundary
- What parts of the virtual to physical address
translation is done by or assisted by the
hardware? - Translation Lookaside Buffer (TLB) that caches
the recent translations - TLB access time is part of the cache hit time
- May allot an extra stage in the pipeline for TLB
access - Page table storage, fault detection and updating
- Page faults result in interrupts (precise) that
are then handled by the OS - Hardware must support (i.e., update
appropriately) Dirty and Reference bits (e.g.,
LRU) in the Page Tables - Disk placement
- Bootstrap (e.g., out of disk sector 0) so the
system can service a limited number of page
faults before the OS is even loaded
51Very little hardware with software assisst
Software
The TLB acts as a cache on the page table for
the entries that map to physical pages only
52Summary
- The Principle of Locality
- Program likely to access a relatively small
portion of the address space at any instant of
time. - Temporal Locality Locality in Time
- Spatial Locality Locality in Space
- Caches, TLBs, Virtual Memory all understood by
examining how they deal with the four questions - Where can block be placed?
- How is block found?
- What block is replaced on miss?
- How are writes handled?
- Page tables map virtual address to physical
address - TLBs are important for fast translation