Title: Virtual Memory
1Virtual Memory
2Announcements
- Prelim coming up in one week
- In 203 Thurston, Thursday October 16th,
10101125pm, 1½ hour
- Topics Everything up to (and including)
Thursday, October 9th
- Lectures 1-13, chapters 1-9, and 13 (8th ed)
- Review Session will be this Thursday, October
9th
- Time and Location TBD Possibly 630pm 730pm
- Nazruls office hours changed for today
- 1230m - 230pm in Upson 328
- Homework 3 due today, October 7th
- CS 4410 Homework 2 graded. (Solutions avail via
CMS).
- Mean 45 (stddev 5), High 50 out of 50
- Common problems
- Q1 did not satisfy bounded waiting
- mutual exclusion was not violated
2
3Homework 2, Question 1
CSEnter(int i) insidei true while(
insideJ) if (turn J) inside
i false while(turn J) continue
insidei true
CSEnter(int i) insidei true while(
insideJ) insidei false
while(turn J) continue insidei tr
ue
- CSExit(int i)
-
- turn J
- insidei false
4Review Multi-level Translation
- Illusion of a contiguous address space
- Physicall reality
- address space broken into segments or fixed-size
pages
- Segments or pages spread throughout physical
memory
- Could have any number of levels. Example (top
segment)
- What must be saved/restored on context switch?
- Contents of top-level segment registers (for this
example)
- Pointer to top-level table (page table)
4
5Review Two-Level Page Table
- Tree of Page Tables
- Tables fixed size (1024 entries)
- On context-switch save single PageTablePtr
register
- Sometimes, top-level page tables called
directories (Intel)
- Each entry called a (surprise!) Page Table Entry
(PTE)
5
6What is in a PTE?
- What is in a Page Table Entry (or PTE)?
- Pointer to next-level page table or to actual
page
- Permission bits valid, read-only, read-write,
execute-only
- Example Intel x86 architecture PTE
- Address same format previous slide (10, 10,
12-bit offset)
- Intermediate page tables called Directories
- P Present (same as valid bit in other
architectures)
- W Writeable
- U User accessible
- PWT Page write transparent external cache
write-through
- PCD Page cache disabled (page cannot be
cached)
- A Accessed page has been accessed recently
- D Dirty (PTE only) page has been modified
recently
- L L1?4MB page (directory only). Bottom 22
bits of virtual address serve as offset
6
7Examples of how to use a PTE
- How do we use the PTE?
- Invalid PTE can imply different things
- Region of address space is actually invalid or
- Page/directory is just somewhere else than
memory
- Validity checked first
- OS can use other (say) 31 bits for location info
- Usage Example Demand Paging
- Keep only active pages in memory
- Place others on disk and mark their PTEs invalid
- Usage Example Copy on Write
- UNIX fork gives copy of parent address space to
child
- Address spaces disconnected after child created
- How to do this cheaply?
- Make copy of parents page tables (point at same
memory)
- Mark entries in both sets of page tables as
read-only
- Page fault on write creates two copies
- Usage Example Zero Fill On Demand
- New data pages must carry no information (say be
zeroed)
- Mark PTEs as invalid page fault on use gets
zeroed page
7
8How is the translation accomplished?
- What, exactly happens inside MMU?
- One possibility Hardware Tree Traversal
- For each virtual address, takes page table base
pointer and traverses the page table in hardware
- Generates a Page Fault if it encounters invalid
PTE
- Fault handler will decide what to do
- More on this next lecture
- Pros Relatively fast (but still many memory
accesses!)
- Cons Inflexible, Complex hardware
- Another possibility Software
- Each traversal done in software
- Pros Very flexible
- Cons Every translation must invoke Fault!
- In fact, need way to cache translations for
either case!
8
9Caching Concept
- Cache a repository for copies that can be
accessed more quickly than the original
- Make frequent case fast and infrequent case less
dominant
- Caching underlies many of the techniques that are
used today to make computers fast
- Can cache memory locations, address
translations, pages, file blocks, file names,
network routes, etc
- Only good if
- Frequent case frequent enough and
- Infrequent case not too expensive
- Important measure Average Access time (Hit
Rate x Hit Time) (Miss Rate x Miss Time)
9
10Why Bother with Caching?
Processor-DRAM Memory Gap (latency)
1000
Moores Law (really Joys Law)
100
Performance
10
Less Law?
1
1989
1980
1981
1983
1984
1985
1986
1987
1988
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
1982
Time
10
11Another Major Reason to Deal with Caching
- Too expensive to translate on every access
- At least two DRAM accesses per actual DRAM
access
- Or perhaps I/O if page table partially on disk!
- Even worse problem What if we are using caching
to make memory access faster than DRAM access???
- Solution? Cache translations!
- Translation Cache TLB (Translation Lookaside
Buffer)
11
12Why Does Caching Help? Locality!
- Temporal Locality (Locality in Time)
- Keep recently accessed data items closer to
processor
- Spatial Locality (Locality in Space)
- Move contiguous blocks to the upper levels
12
13Review Memory Hierarchy of a Modern Computer
System
- Take advantage of the principle of locality to
- Present as much memory as in the cheapest
technology
- Provide access at speed offered by the fastest
technology
13
14A Summary on Sources of Cache Misses
- Compulsory (cold start) first reference to a
block
- Cold fact of life not a whole lot you can do
about it
- Note When running billions of instruction,
Compulsory Misses are insignificant
- Capacity
- Cache cannot contain all blocks access by the
program
- Solution increase cache size
- Conflict (collision)
- Multiple memory locations mapped to same cache
location
- Solutions increase cache size, or increase
associativity
- Two others
- Coherence (Invalidation) other process (e.g.,
I/O) updates memory
- Policy Due to non-optimal replacement policy
14
15Review Where does a Block Get Placed in a Cache?
- Example Block 12 placed in 8 block cache
15
16Other Caching Questions
- What line gets replaced on cache miss?
- Easy for Direct Mapped Only one possibility
- Set Associative or Fully Associative
- Random
- LRU (Least Recently Used)
- What happens on a write?
- Write through The information is written to both
the cache and to the block in the lower-level
memory
- Write back The information is written only to
the block in the cache
- Modified cache block is written to main memory
only when it is replaced
- Question is block clean or dirty?
16
17Caching Applied to Address Translation
TLB
Physical Memory
CPU
Cached?
Translate (MMU)
- Question is one of page locality does it exist?
- Instruction accesses spend a lot of time on the
same page (since accesses sequential)
- Stack accesses have definite locality of
reference
- Data accesses have less page locality, but still
some
- Can we have a TLB hierarchy?
- Sure multiple levels at different sizes/speeds
17
18What Actually Happens on a TLB Miss?
- Hardware traversed page tables
- On TLB miss, hardware in MMU looks at current
page table to fill TLB (may walk multiple
levels)
- If PTE valid, hardware fills TLB and processor
never knows
- If PTE marked as invalid, causes Page Fault,
after which kernel decides what to do afterwards
- Software traversed Page tables (like MIPS)
- On TLB miss, processor receives TLB fault
- Kernel traverses page table to find PTE
- If PTE valid, fills TLB and returns from fault
- If PTE marked as invalid, internally calls Page
Fault handler
- Most chip sets provide hardware traversal
- Modern operating systems tend to have more TLB
faults since they use translation for many
things
- Examples
- shared segments
- user-level portions of an operating system
18
19Goals for Today
- Virtual memory
- How does it work?
- Page faults
- Resuming after page faults
- When to fetch?
- What to replace?
- Page replacement algorithms
- FIFO, OPT, LRU (Clock)
- Page Buffering
- Allocating Pages to processes
19
20What is virtual memory?
- Each process has illusion of large address space
- 232 for 32-bit addressing
- However, physical memory is much smaller
- How do we give this illusion to multiple
processes?
- Virtual Memory some addresses reside in disk
20
21Virtual Memory
- Separates users logical memory from physical
memory.
- Only part of the program needs to be in memory
for execution
- Logical address space can therefore be much
larger than physical address space
- Allows address spaces to be shared by several
processes
- Allows for more efficient process creation
21
22Virtual Memory
- Load entire process in memory (swapping), run it,
exit
- Is slow (for big processes)
- Wasteful (might not require everything)
- Solutions partial residency
- Paging only bring in pages, not all pages of
process
- Demand paging bring only pages that are
required
- Where to fetch page from?
- Have a contiguous space in disk swap file
(pagefile.sys)
22
23How does VM work?
- Modify Page Tables with another bit (valid)
- If page in memory, valid 1, else valid 0
- If page is in memory, translation works as
before
- If page is not in memory, translation causes a
page fault
32 V1 4183 V0 177 V1 5721 V0
0 1 2 3
Mem
Page Table
23
24Page Faults
- On a page fault
- OS finds a free frame, or evicts one from memory
(which one?)
- Want knowledge of the future?
- Issues disk request to fetch data for page (what
to fetch?)
- Just the requested page, or more?
- Block current process, context switch to new
process (how?)
- Process might be executing an instruction
- When disk completes, set valid bit to 1, and
current process in ready queue
24
25Steps in Handling a Page Fault
25
26Resuming after a page fault
- Should be able to restart the instruction
- For RISC processors this is simple
- Instructions are idempotent until references are
done
- More complicated for CISC
- E.g. move 256 bytes from one location to another
- Possible Solutions
- Ensure pages are in memory before the instruction
executes
26
27Page Fault (Cont.)
- Restart instruction
- block move
- auto increment/decrement location
27
28When to fetch?
- Just before the page is used!
- Need to know the future
- Demand paging
- Fetch a page when it faults
- Prepaging
- Get the page on fault some of its neighbors,
or
- Get all pages in use last time process was swapped
28
29Performance of Demand Paging
- Page Fault Rate 0 ? p ? 1.0
- if p 0 no page faults
- if p 1, every reference is a fault
- Effective Access Time (EAT)
- EAT (1 p) x memory access
- p (page fault overhead
- swap page out
- swap page in
- restart overhead
-
)
29
30Demand Paging Example
- Memory access time 200 nanoseconds
- Average page-fault service time 8
milliseconds
- EAT (1 p) x 200 p (8 milliseconds)
- (1 p) x 200 p x 8,000,000
- 200 p x 7,999,800
- If one access out of 1,000 causes a page fault
- EAT 8.2 microseconds.
- This is a slowdown by a factor of 40!!
30
31What to replace?
- What happens if there is no free frame?
- find some page in memory, but not really in use,
swap it out
- Page Replacement
- When process has used up all frames it is allowed
to use
- OS must select a page to eject from memory to
allow new page
- The page to eject is selected using the Page
Replacement Algorithm
- Goal Select page that minimizes future page
faults
31
32Page Replacement
- Prevent over-allocation of memory by modifying
page-fault service routine to include page
replacement
- Use modify (dirty) bit to reduce overhead of page
transfers only modified pages are written to
disk
- Page replacement completes separation between
logical memory and physical memory large
virtual memory can be provided on a smaller
physical memory
32
33Page Replacement
33
34Page Replacement Algorithms
- Random Pick any page to eject at random
- Used mainly for comparison
- FIFO The page brought in earliest is evicted
- Ignores usage
- Suffers from Beladys Anomaly
- Fault rate could increase on increasing number of
pages
- E.g. 0 1 2 3 0 1 4 0 1 2 3 4 with frame sizes 3
and 4
- OPT Beladys algorithm
- Select page not used for longest time
- LRU Evict page that hasnt been used the
longest
- Past could be a good predictor of the future
34
35First-In-First-Out (FIFO) Algorithm
- Reference string 1, 2, 3, 4, 1, 2, 5, 1, 2, 3,
4, 5
- 3 frames (3 pages can be in memory at a time per
process) 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
- 4 frames 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
-
- Beladys Anomaly more frames ? more page faults
1
1
4
5
2
2
9 page faults
1
3
3
3
2
4
1
1
5
4
2
2
10 page faults
1
5
3
3
2
4
4
3
35
36FIFO Illustrating Beladys Anomaly
36
37Optimal Algorithm
- Replace page that will not be used for longest
period of time
- 4 frames example
- 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
- How do you know this?
- Used for measuring how well your algorithm
performs
1
4
6 page faults
2
3
4
5
37
38Least Recently Used (LRU) Algorithm
- Reference string 1, 2, 3, 4, 1, 2, 5, 1, 2, 3,
4, 5
-
1
5
1
1
1
2
2
2
2
2
5
4
3
4
5
3
3
4
3
4
38
39Implementing Perfect LRU
- On reference Time stamp each page
- On eviction Scan for oldest frame
- Problems
- Large page lists
- Timestamps are costly
- Approximate LRU
- LRU is already an approximation!
13
t4 t14 t14 t5
14
39
14
40LRU Clock Algorithm
- Each page has a reference bit
- Set on use, reset periodically by the OS
- Algorithm
- FIFO reference bit (keep pages in circular
list)
- Scan if ref bit is 1, set to 0, and proceed. If
ref bit is 0, stop and evict.
- Problem
- Low accuracy for large memory
R1
R1
R0
R0
R1
R0
R1
R1
R1
R0
R0
40
41LRU with large memory
- Solution Add another hand
- Leading edge clears ref bits
- Trailing edge evicts pages with ref bit 0
- What if angle small?
- What if angle big?
41
42Clock Algorithm Discussion
- Sensitive to sweeping interval
- Fast lose usage information
- Slow all pages look used
- Clock add reference bits
- Could use (ref bit, modified bit) as ordered
pair
- Might have to scan all pages
- LFU Remove page with lowest count
- No track of when the page was referenced
- Use multiple bits. Shift right by 1 at regular
intervals.
- MFU remove the most frequently used page
- LFU and MFU do not approximate OPT well
42
43Page Buffering
- Cute simple trick (XP, 2K, Mach, VMS)
- Keep a list of free pages
- Track which page the free page corresponds to
- Periodically write modified pages, and reset
modified bit
used
free
unmodified free list
modified list (batch writes speed)
43
44Allocating Pages to Processes
- Global replacement
- Single memory pool for entire system
- On page fault, evict oldest page in the system
- Problem protection
- Local (per-process) replacement
- Have a separate pool of pages for each process
- Page fault in one process can only replace pages
from its own process
- Problem might have idle resources
44
45Allocation of Frames
- Each process needs minimum number of pages
- Example IBM 370 6 pages to handle SS MOVE
instruction
- instruction is 6 bytes, might span 2 pages
- 2 pages to handle from
- 2 pages to handle to
- Two major allocation schemes
- fixed allocation
- priority allocation
45
46Summary
- Demand Paging
- Treat memory as cache on disk
- Cache miss ? get page from disk
- Transparent Level of Indirection
- User program is unaware of activities of OS
behind scenes
- Data can be moved without affecting application
correctness
- Replacement policies
- FIFO Place pages on queue, replace page at end
- OPT replace page that will be used farthest in
future
- LRU Replace page that hasnt be used for the
longest time
- Clock Algorithm Approximation to LRU
- Arrange all pages in circular list
- Sweep through them, marking as not in use
- If page not in use for one pass, than can
replace
46