Title: School of Computing Science
1- School of Computing Science
- Simon Fraser University
- CMPT 300 Operating Systems I
- Ch 9 Virtual Memory
- Dr. Mohamed Hefeeda
-
2Objectives
- Understand virtual memory system, its benefits,
and mechanisms that make it feasible - Demand paging
- Page-replacement algorithms
- Frame allocation
- Locality and working-set models
- Understand how the kernel memory is allocated and
used
3Background
- Virtual memory separation of user logical
memory from physical memory - Only part of the program needs to be in memory
for execution - Logical address space can therefore be much
larger than physical address space - Allows address spaces to be shared by several
processes - Allows for more efficient process creation
- Virtual memory can be implemented via
- Demand paging
- Demand segmentation
4Virtual Memory That is Larger Than Physical Memory
?
5Demand Paging
- The core enabling idea of virtual memory systems
- A page is brought into memory only when needed
- Why?
- Less I/O needed
- Less memory needed
- Faster response
- More processes can be admitted to the system
- How?
- Process generates logical (virtual) addresses
which are mapped to physical addresses using a
page table - If the requested page is not in memory, kernel
brings it from hard disk - How do we know whether a page is in memory?
6Valid-Invalid Bit
- Each page table entry has a validinvalid bit
- v ? in-memory,
- i ? not-in-memory
- Initially, it is set to i on for entries
- During address translation, if validinvalid bit
is i, then it could be - Illegal reference (outside process address
space) ? abort process - Legal reference but not in memory ? page fault
(bring the page from disk)
7Handling Page Fault
- OS looks at another table to decide
- Invalid reference ? abort
- Just not in memory ? bring it in
- Get empty frame
- Swap page into frame (I/O operation)
- Reset tables
- Set validation bit v
- Restart the instruction that caused page fault
8Handling a Page Fault (contd)
- Restarting an instruction e.g., C ? A B
- assume page fault when accessing C
- bring page that has C in memory (I/O ? process
may be suspended) - fetch ADD instruction (again)
- fetch A, B (again)
- do the addition (again)
- then and store in C
- Restarting an instruction can be complicated,
e.g., - MVC (Move Character) instruction in IBM 360/370
systems - Can move up to 256 bytes from one location to
another, possibly overlapping - Page fault may occur in the middle of copying ?
- some data may be overwritten ?
- simply restarting instruction is not enough (data
has been modified) - Solution hardware attempts to access both ends
of both blocks if any is not in memory, a page
fault occurs before executing instruction - Bottom line demanding paging may raise subtle
problems and they must be addressed
9Performance of Demand Paging
- Page Fault Rate 0 ? p ? 1.0
- if p 0 means no page faults
- if p 1, means every reference is a fault
- Effective Access Time (EAT)
- EAT (1 p) x memory access time
- p x (page fault time)
- Page fault time service page-fault interrupt
(microseconds) - read in
requested page (milliseconds) - restart process
(microseconds) -
- Note reading in requested page may require
writing another page to disk if there is no free
frame
10Demand Paging Example
- Memory access time 200 nanoseconds
- Average page fault time 8 milliseconds
- (disk latency, seek and transfer time)
- EAT (1 p) x 200 p (8 milliseconds)
- (1 p) x 200 p x 8,000,000
- 200 p x 7,999,800
- If one access out of 1,000 causes a page fault,
then - EAT 8.2 microseconds.
- This is a slowdown by a factor of 40!!
- Bottom line We should minimize number of page
faults they are very costly
11Virtual Memory and Process Creation
- VM allows faster/efficient process creation using
- Copy-on-Write (COW) technique
- COW allows both parent and child processes to
initially share the same pages in memory (during
fork()) - If either process modifies a shared page, page is
copied
Copy of C
12Page Replacement
- Page fault occurs ? need to bring requested page
in memory - Find location of the requested page on disk
- Find a free frame
- If there is a free frame, use it
- If there is no free frame, use a page
replacement algorithm to select a victim frame - Bring requested page into the free frame
- Update the page table and free frame list
- Restart the process
13Page Replacement (contd)
- Note
- we can save swap out overhead if victim page was
NOT modified - ? significant savings (I/O operation)
- We associate a dirty (modify) bit with each page
to indicate whether a page has been modified
14Page Replacement Algorithms
- Objective minimize page-fault rate
- Algorithm evaluation
- Take a particular string of memory references,
and - Compute number of page faults on that string
- The reference string looks like
- 1, 2, 3, 4, 1, 2, 5, 1, 2, 3,
4, 5 - Notes
- We use page numbers
- The address sequence could have been 100, 250,
270, 301, 490, , Assuming a page of size 100
bytes. - References 250 and 270 are in the same page (2)
only the first one may cause a page fault. It is
why we mention 2 only once
15Page Faults vs. Number of Frames
- We expect number of page faults decreases as
number of physical frames allocated to process
increases
16Page Replacement First-In-First-Out (FIFO)
- Reference string 1, 2, 3, 4, 1, 2, 5, 1, 2, 3,
4, 5 - 3 frames (3 pages can be in memory at any time)
- Let us work it out
- On every page fault, we show memory contents
- Number of page faults 9
- Pros
- Easy to understand and implement
- Cons
- Performance may not always be good
- It may replace a page that is used heavily (e.g.,
one that has a variable which is accessed most of
the time) - It suffers from Beladys anomaly
17FIFO Beladys Anomaly
- Assume reference string
- 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
- If we have 3 frames, how many page faults?
- 9 page faults
- If we have 4 frames, how many page faults?
- 10 page faults
- More frames are supposed to result in fewer page
faults! - Beladys Anomaly more frames ? more page faults
18FIFO Beladys Anomaly (contd)
19Optimal Algorithm
- Replace page that will not be used for longest
period of time - 4-frame example 1, 2, 3, 4, 1, 2, 5,
1, 2, 3, 4, 5 - 6 page faults
- How can we know the future? We cannot!
- Used for comparing algorithms
20Least Recently Used (LRU) Algorithm
- Try to approximate Optimal policy look at past
to infer future - LRU Replace page that has not been used for
longest period - Rational this page may not be needed anymore
(e.g., pages of initialization module) - 4-frame example 1, 2, 3, 4, 1, 2,
5, 1, 2, 3, 4, 5 - 8 page faults (compare to optimal 6, FIFO 10)
- LRU and Optimal do not suffer from Beladys
anomaly
21LRU Implementation Counters
- Every page-table entry has a time-of-use
(counter) field - When page is referenced, copy CPU logical clock
into this field - CPU clock is maintained in a register and
incremented with every memory access - Need to replace a page, search for the page with
smallest (oldest) value - Cons
- search time, updating the time-of-use fields
(writing to memory!), clock overflow - Need hardware support (increment clock and update
time-of-use field)
22 LRU Implementation Stack
- Keep stack of page numbers in a doubly-linked
list - If a page is referenced, move it to the top
- The least recently used page sinks to the bottom
- Cons
- Each memory reference is a bit expensive
(requires updating 6 pointers in worst case) - Pros
- No search for replacement
- Also needs hardware support to update the stack
23 LRU Implementation (contd)
- Can we implement LRU without hardware support?
- Say by using interrupts, i.e., when hardware
needs to update the stack or the counters, it
issues an interrupt and an ISR does the update? - NO. Too costly, it will slow every memory
reference by a factor of at least 10 - Even LRU (which approximates OPT) is not easy to
implement without hardware support!
24Second-chance (Clock) Replacement
- An approximation of LRU, aka Clock replacement
- Each page has a reference bit (ref_bit),
initially 0 - When page is referenced, ref_bit is set to 1 (by
hardware) - Maintain a moving pointer to the next (candidate)
victim - When choosing a page to replace, check the
ref_bit of the victim, - if ref_bit 0, replace it
- else set ref_bit to 0
- leave page in memory (give it another chance),
- move pointer to next page,
- repeat till a victim is found
25Second-Chance (Clock) Replacement
26Counting Replacement Algorithms
- Keep a counter of number of references that have
been made to each page - LFU Algorithm replace page with smallest count
- Argument page with smallest count is not used
often - Problem some pages were heavily used at earlier
time, but are no longer needed, will stay in (and
waste) memory - MFU Algorithm replace page with highest count
- Argument page with the smallest count was
probably just brought in and has yet to be used - Problem consider a code that uses a module or a
subroutine heavily, MFU will consider it a good
candidate for eviction!
27Counting Replacement Algorithms (contd)
- LFU vs. MFU
- Consider the following example
- A database code that reads many pages then
processes them - Which policy (LFU or MFU) would perform better?
- MFU Even though the read module accumulated
large frequency, we need to evict its pages
during processing
28Commercial Ad
- CMPT 371 Computer Networks (Spring 2007)
- Internet Real networks, real Protocols
- Lost of fun Projects (ALL in JAVA)
- Multi-threaded web server
- Ping client
- Reliable Data Transfer Protocol (part of TCP)
- Routing Protocol (RIP, used by many routers)
- Network measurements and analysis experiments
- http//nsl.cs.surrey.sfu.ca/teaching/07/371/
29Allocation of Frames
- Each process needs a minimum number of pages
- Defined by the computer architecture (hardware)
- instruction width and number of address
indirection levels - Consider an instruction that takes one operand
and allows one level of indirection. What is the
minimum number of frames needed to execute it? - load addr
- Answer 3 (load is in a page, addr is in
another, addr is in a third) - Note Maximum number of frames allocated to a
process is determined by the OS
30Frame Allocation
- Equal allocation All processes get the same
number of frames - m frames, n processes ? each process gets m/n
frames - Proportional allocation Allocate according to
the size of process
- Priority Use proportional allocation using
priorities rather than size
31Global vs. Local Frame Replacement
- If a page fault occurs and there is no free
frame, we need to free one. Two ways - Global replacement
- Process selects a replacement frame from the set
of all frames one process can take a frame from
another - Commonly used in operating systems
- Pros
- Better throughput (process can use any available
frame) - Cons
- A process cannot control its own page-fault rate
32Global vs. Local Frame Replacement (contd)
- Local replacement
- Each process selects from only its own set of
allocated frames - Pros
- Each process has its own share of frames not
impacted by the paging behavior of others - Cons
- A process may suffer from high page-fault rate
even though there are lightly used frames
allocated to other processes
33Thrashing
- What happens if a process does not have enough
frames to maintain its active set of pages in
memory? - Page-fault rate is very high. This leads to
- low CPU utilization, which
- makes the OS think that it needs to increase the
degree of multiprogramming, thus - OS admits another process to the system (making
it worse!) - Thrashing ? a process is busy swapping pages in
and out more than executing
34Thrashing (cont'd)
35Thrashing (contd)
- To prevent thrashing, we should provide each
process with as many frames as it needs - How do we know how many frames a process actually
needs? - A program is usually composed of several
functions or modules - When executing a function, memory references are
made to instructions and local variables of that
function and some global variables - So, we may need to keep in memory only the pages
needed to execute the function - After finishing a function, we execute another.
Then, we bring in pages needed by the new
function - This is called the Locality Model
36Locality Model
- The Locality Model states that
- As a process executes, it moves from locality to
locality, where a locality is a set of pages that
are actively used together - Notes
- locality is not restricted to functions/modules
it is more general. It could be a segment of code
in a function, e.g., loop touching
data/instructions in several pages - Localities may overlap
- Locality is a major reason behind the success of
demand paging - How can we know the size of a locality?
- Using the Working-Set model
37Working-Set Model
- Let ? be a fixed number of page references
- called working-set window
- The set of pages in the most recent ? references
is the working set - Example ? 10
- Size of WS at t1 is 5 pages, and at t2 is 2 pages
38Working-Set Model (contd)
- Accuracy of WS model depends on choosing ?
- if ? is too small, it will not encompass entire
locality - if ? is too large, it will encompass several
localities - if ? ? ? it will encompass entire program
- Using WS model
- OS monitors the WS of each process
- It allocates number of frames WS size to that
process - If we have more memory frames available, another
process can be started
39Keeping Track of the Working Set
- WS is a moving window
- At each memory reference, a new reference is
added at one end, and another is dropped off the
other end - Maintaining the entire window is costly
- Solution Approximate with interval timer a
reference bit - Example ? 10,000 references
- Timer interrupts every 5,000 references
- Keep in memory 2 bits for each page
- Whenever a timer interrupts, copy and sets the
values of all reference bits to 0 - If one of the bits in memory 1 ? page in
working set
40Thrashing Control Using WS Model
- WSSi ? the working set size of process Pi
- Total number of pages referenced in the most
recent ? - m ? memory size in frames
- D ? WSSi ? total demand frames
- if D gt m ? Thrashing
- Policy if D gt m, then suspend one of the
processes - But, marinating WS is costly. Is there an easier
way to control thrashing?
41Thrashing Control Using Page-Fault Rate
- Monitor page-fault rate and increase/decrease
allocated frames accordingly - Establish acceptable page-fault rate range
(upper and lower bounds) - If actual rate too low, process loses frame
- If actual rate too high, process gains frame
42Allocating Kernel Memory
- Treated differently from user memory, why?
- Kernel requests memory for structures of varying
sizes - Process descriptors (PCB), semaphores, file
descriptors, - Some of them are less than a page
- Some kernel memory needs to be contiguous
- some hardware devices interact directly with
physical memory without using virtual memory - Virtual memory may just be too expensive for the
kernel (cannot afford a page fault) - Often, a free-memory pool is dedicated to kernel
from which it allocates the needed memory using - Buddy system, or
- Slab allocation
43Buddy System
- Allocates memory from fixed-size segment
consisting of physically-contiguous pages - Memory allocated using power-of-2 allocator
- Satisfies requests in units sized as power of 2
- Request rounded up to next highest power of 2
- Fragmentation 17 KB request will be rounded to
32 KB! - When smaller allocation needed than is available,
current chunk split into two buddies of
next-lower power of 2 - Continue until appropriate sized chunk available
- Adjacent buddies are combined (or coalesced)
together to form a large segment - Used in older Unix/Linux systems
44Buddy System Allocator
45Slab Allocator
- Slab allocator
- Creates caches, each consisting of one or more
slabs - Slab is one or more physically contiguous pages
- Single cache for each unique kernel data
structure - Each cache is filled with objects
instantiations of the data structure - Objects are initially marked as free
- When structures stored, objects marked as used
- Benefits
- Fast memory allocation, no fragmentation
- Used in Solaris, Linux
46Slab Allocation
47VM and Memory-Mapped Files
- VM enables mapping a file to memory address space
of a process - How?
- A page-sized portion of the file is read from the
file system into a physical frame - Subsequent reads/writes to/from file are treated
as ordinary memory accesses - Example mmap() on Unix systems
- Why?
- I/O operations (e.g., read(), write()) on files
are treated as memory accesses ? Simplifies file
handling (simpler code) - More efficient memory accesses are less costly
than I/O system calls - One way of implementing shared memory for
inter-process communication
48Memory-Mapped Files and Shared Memory
- Memory-mapped files allow several processes to
map the same file ? - Allowing pages in memory to be shared
- Win XP implements shared memory using this
technique
49VM Issues Pre-paging
- Page size selection impacts
- fragmentation
- page table size
- I/O overhead
- locality
- Prepaging
- Prepage all or some of the pages a process will
need, before they are referenced - Tradeoff
- Reduce number of page faults at process startup
- But, may waste memory and I/O because some of the
prepaged pages may not be used
50VM Issues Program Structure
- Program structure
- int data 128128
- Each row is stored in one page allocated frames
lt128 - How many page faults in each of the following
programs? - Program 1
- for (j 0 j lt128 j)
for (i 0 i lt 128 i)
dataij 0 - page faults 128 x 128 16,384
- Program 2
- for (i 0 i lt 128 i)
for (j 0 j lt 128 j)
dataij 0 - page faults 128
51VM Issues I/O interlock
- Example Scenario
- A process allocates a buffer for an I/O request
(in its own address space) - The process issues an I/O request and waits
(blocks) for it - Meanwhile, CPU is given to another process, which
makes a page fault - The (global) replacement algorithm chooses the
page that contains the buffer as a victim! - Later, the I/O device sends an interrupt
signaling the request is ready - BUT the frame that contains the buffer is now
used by a different process! - Solutions
- Lock the (buffer) page in memory (I/O Interlock)
- Make I/O in kernel memory (not in user memory)
data is first transferred to kernel buffers then
copied to user space - Note page locking can be used in other
situations as well, e.g., kernel pages are locked
in memory
52OS Example Windows XP
- Uses demand paging with clustering
- Clustering brings in pages surrounding the
faulting page - Processes are assigned working set minimum and
working set maximum - Working set minimum is the minimum number of
pages the process is guaranteed to have in memory - A process may be assigned as many pages up to its
working set maximum - When the amount of free memory in the system
falls below a threshold, automatic working set
trimming is performed to restore the amount of
free memory - Working set trimming removes pages from processes
that have pages in excess of their working set
minimum
53Summary
- Virtual memory A technique to map a large
logical address space onto a smaller physical
address space - Uses demand paging bring pages into memory when
needed - Allows running large programs in small physical
memory, page sharing, efficient process creation,
and simplifies programming - Page fault occurs when a referenced page is not
in memory - Page replacement algorithms FIFO, OPT, LRU,
second-chance, - Frame allocation proportional, global and local
page replacement - Thrashing process does not have sufficient
frames ? too many page faults ? poor CPU
utilization - Locality and working-set models
- Kernel memory buddy system, slab allocator
- Many issues and tradeoffs page size, pre-paging,
I/O interlock, .