Title: Memory Management
1Memory Management
- Important expensive resource
- Parkinsons law Programs expand to fill the
memory available to hold them. - Ideally programmers want memory that is
- fast
- non volatile
- Large (if memory was cheep it would have been
large and we wouldnt have to discuss its
management). - Strong relation
- multi-programming lt-gt memory management
2Memory Management
- Memory hierarchy
- small amount of fast, expensive memory cache -
lt1M - some medium-speed, medium price main memory (RAM)
512M? - gigabytes of slow, cheap disk storage (portion
used for virtual memory) 16G? - Memory manager handles the memory hierarchy
3Memory Management - Motivation
- n processes, each spending a fraction p of their
time waiting for i/o, gives a probability of pn
of all processes waiting for i/o simultaneously - cpu utilization 1 - pn
4Utilizing Memory
- Assume each process takes 200k and so does the
operating system - Assume there is 1Mb of memory available and that
p0.8 - space for 4 processes ? 60 cpu utilization
- Another 1Mb enables 9 processes? 87 cpu
utilization
5Types of memory managers
- Those that move processes back and forth between
main-memory and disk - And those who dont
- Simplest form one process in memory at a time.
- User types a command
- System loads the program to the main memory and
executes it. - System reports when its done.
6Multiprogramming with Fixed Partitions
- How to organize the memory ?
- How to assign jobs to partitions ?
- Separate queues vs. single queue
7Allocating memory - growing segments
8Memory Allocation and Fragmentation
job queue
process memory time
P1 600K 10
P2 1000K 5
P3 300K 20
P4 700K 8
P5 500K 15
9Memory Allocation - Keeping Track (bitmaps
linked lists)
10Strategies for Allocation
- First fit do not search too much..
- Next fit - start search from last location
- Best fit - a drawback generates small holes
- Worst fit - solves the above problem, badly
- Quick fit - several queues of different sizes
- ( Try allocating 2 on the previous slide )
- Main problem of memory allocation -
Fragmentation - Internal wasted parts of allocated space
- External wasted unallocated space
11 The Buddy System
- An example elaborate scheme the Buddy system
(Knuth 1973) - Separate lists of free holes of sizes of powers
of two - For any request, pick the 1st hole of the right
size - Not very good memory utilization
- Freed blocks can only be merged with their
neighbors of their own size
12The Buddy System
13Fragmentation
- External Fragmentation total memory space
exists to satisfy a request, but it is not
contiguous. - Internal Fragmentation allocated memory may be
slightly larger than requested memory this size
difference is memory internal to a partition, but
not being used. - Reduce external fragmentation by compaction
- Shuffle memory contents to place all free memory
together in one large block. - Compaction is possible only if relocation is
dynamic, and is done at execution time. - I/O problem
- Latch job in memory while it is involved in I/O.
- Do I/O only into OS buffers.
14Memory Compaction
15Swapping
- A process can be swapped temporarily out of
memory to a backing store, and then brought back
into memory for continued execution. - Backing store fast disk large enough to
accommodate copies of all memory images for all
users must provide direct access to these memory
images. - Roll out, roll in swapping variant used for
priority-based scheduling algorithms
lower-priority process is swapped out so
higher-priority process can be loaded and
executed. - Major part of swap time is transfer time total
transfer time is directly proportional to the
amount of memory swapped. - Modified versions of swapping are found on many
systems, i.e., UNIX, Linux, and Windows.
16Schematic View of Swapping
17Managing memory by Swapping
- Processes from disk to memory and from memory to
disk - Whenever there are too many jobs to fit in memory
- Swapping can help solve fragmentation
- Allocating memory
- Freeing memory and holes
- possible solution swapping and memory compaction
- since swapping is performed on whole processes it
results in a noticeable response time - longer queues of blocked processes can lead to
many swaps - Allocating swap space
- Processes are swapped in/out from the same
location - Allocate maximum space? Or estimate maximum
- Dont allocate swap space for memory-resident
processes (e.g. Daemons)
18Swapping in Unix
- When? Kernel run out of memory
- a fork system call no space for child process
- a brk system call to expand data segment
(new?) - a stack becomes too large
- Who?
- a blocked process with highest priority
- a process which consumed much CPU
- How much space?
- maximum
- use holes and first/best fit (old unix)
19Issues - Relocation and Linking
- Compile time - create absolute code
- Load time - linker lists relocatable
instructions and loader changes instructions (at
each reload..) - Execution time - special hardware needed to
support moving of processes during run time - Dynamic Linking - used with system libraries and
includes only a stub in each user routine,
indicating how to locate the memory-resident
library function (or how to load it, if needed)
20Binding of Instructions and Data to Memory
Address binding of instructions and data to
memory addresses can happen at three different
stages.
- Compile time If memory location known a priori,
absolute code can be generated must recompile
code if starting location changes. - Load time Must generate relocatable code if
memory location is not known at compile time. - Execution time Binding delayed until run time
if the process can be moved during its execution
from one memory segment to another. Need
hardware support for address maps (e.g., base and
limit registers).
21Dynamic Linking
- Linking postponed until execution time.
- Small piece of code, stub, used to locate the
appropriate memory-resident library routine. - Stub replaces itself with the address of the
routine, and executes the routine. - Operating system needed to check if routine is in
processes memory address. - Dynamic linking is particularly useful for
libraries.
22Logical vs. Physical Address Space
- The concept of a logical address space that is
bound to a separate physical address space is
central to proper memory management. - Logical address generated by the CPU also
referred to as virtual address. - Physical address address seen by the memory
unit. - Logical and physical addresses are the same in
compile-time and load-time address-binding
schemes logical (virtual) and physical addresses
differ in execution-time address-binding scheme.
23Paging and Virtual Memory
- enable an address space that is independent of
physical memory - 232 addresses for a 32 bit (address bus) machine
- virtual addresses - can be achieved by segmenting the executable
(with segment registers..) or by dividing memory
using another method - Paging - Divide memory into fixed-size blocks
(page-frames) - Small enough blocks - many for one process
- Allocate to processes non-contiguous memory
chunks - avoiding holes..
24Memory-Management Unit (MMU)
- Hardware device that maps virtual to physical
address. - In MMU scheme, the value in the relocation
register is added to every address generated by a
user process at the time it is sent to memory. - The user program deals with logical addresses it
never sees the real physical addresses.
25Paging
26Memory Management Unit
27MMU Operation - page fault if accessed page is
absent
28Pages the dataPage frames the physical memory
locations
- Page Table Entries (PTE) contain (per page)
- Page frame number (physical address)
- Present/absent bit (valid bit)
- Dirty (modified) bit
- Referenced (accessed) bit
- Protection
- Caching disable/enable
page frame number
29Page vs Page-table sizes -Tradeoffs
- A logical address of 24 bits (16MB) (on 32-bit
machine with op-codes of 8 bits) can be divided
into - 1K page and 16K entries table (16K 8 128K )
- 4K page and 4K entries table (4K 8 32K )
- Large pages less number of pages, but waste in
last page. - Small pages- larger tables (also waste of space)
- A logical address of 32 bits (4GB) can be
divided into - 1K page and 4M entries table (4M 8 32M! )
- 4K page and 1M entries table (1M 8 8M )
- Huge tables! what to do?
30Two-Level Paging Example
- A logical address (on 32-bit machine with 4K page
size) is divided into - a page number consisting of 20 bits.
- a page offset consisting of 12 bits.
- Since the page table is paged, the page number is
further divided into - a 10-bit page number.
- a 10-bit page offset.
- Thus, a logical address is as follows
- Where pi is an index into the outer page table,
and p2 is the displacement within the page of the
outer page table.
31Two-Level Page-Table Scheme
32Two-Level Paging Example - Vax
- A logical address (on 32-bit machine) is divided
into - a page number consisting of 23 bits.
- a page offset consisting of 9 bits (page size
1/2K!). - Since the page table is paged, the page number is
further divided into - a 21-bit page number.
- a 2-bit section index. (code, heap, stack,
system) - Thus, a logical address is as follows
- Where s is an index into the section table, and
p is the pointer to the page table. Note, Section
table is always in memory. Page table may be
swapped. Its max size is 2M 4 8MB!
33SPARC 3 level pagingContext table (MMU
hardware) - 1 entry per process
34Page table considerations
- Can be very large (1M pages for 32bits addresses)
- Must be fast (every instruction needs it)
- One extreme will have it all in hardware - fast
registers that hold the page table and are loaded
with each process, too expensive for the above
size - The other extreme has it all in memory (using a
page table base register (ptbr) to point to it -
each memory reference during instruction
translation is doubled... - To avoid keeping complete page tables in memory -
make them multilevel (and avoid the danger of
accumulating memory references per instruction by
caching) -
35Multilevel Paging and Performance
- Since each level is stored as a separate table in
memory, covering a logical address to a physical
one may take four memory accesses. - Even though time needed for one memory access is
quintupled, caching permits performance to remain
reasonable. - Cache hit rate of 98 percent yields effective
access time 0.98 x 120 0.02 x 520
128 nanoseconds.Which is only a 28 percent
slowdown in memory access time.
36Inverted page tables
- for very large memories (page tables) one can
have an inverted page table sorted by
(physical) page frames - IBM RT HP Spectrum (thinking of 64 bit
memories) - to avoid linear search for every virtual
address of a process use a hash table (one or a
few memory references) - only one page table the physical one for all
processes currently in memory - in addition to the hash table, associative
memory registers are used to store recently used
page table entries - the only way to deal with a 64 bit memory 4k
size pages two-level page tables can result in
242 entries
37Inverted Page Table Architecture
38Shared Pages
39Motivation for Virtual Memory
- Unused code
- Error routines
- Rare functionality
- Unused data
- Array larger then needed
- Garbage not collected
40Demand Paging
- Bring a page into memory only when it is needed.
- Less I/O needed
- Less memory needed
- Faster response
- More users
- Page is needed ? reference it
- Invalid reference ? abort
- not-in-memory ? bring to memory
41In-memory Bit
- With each page table entry a valid-invalid bit is
associated (1 ? in-memory, 0 ? not-in-memory). - Initially valid-invalid but is set to 0 on all
entries. - Example of a page table snapshot.
- During address translation, if valid-invalid bit
in page table entry is 0 ? page fault.
42Page Fault
- If there is ever a reference to a page, first
reference will trap to OS ? page fault - OS looks at another table to decide
- Invalid reference ? abort.
- Just not in memory.
- Get empty frame.
- Swap page into frame.
- Reset tables, validation bit1.
- Restart instruction Least Recently Used
- block move
- Auto increment/decrement location
43What Happens if there is no Free Frame
- Page replacement find some page in memory, but
not really in use, swap it out. - Algorithm
- Performance want an algorithm which will result
in minimum number of page faults. - Same page may be brought into memory several times
44Page fault Handling
- 1. trap to kernel, save PC on stack and
(sometimes) partial state in registers (and/or
stack) - 2. assembly routine saves volatile information
and calls the operating system - 3. find requested virtual page
- 4. check protection. If legal, find free page
frame (or invoke page replacement algorithm) - 5. if replacing, check if modified and start
write to disk. Mark frame busy. Call scheduler
to block process until the write-to-disk process
has completed.
45Page fault Handling (contnd.)
- 6. transfer of requested page from disk
(scheduler runs alternative processes) - 7. upon transfer completion, enter page table,
mark new page as valid and update all other
parameters - 8. back up faulted instruction which was in
principle in mid execution now the PC can be
set back to its initial value - 9. schedule faulting process, return from
operating system - 10. restore state (i.e. all volatile information
stored by the assembly routine) and return to
user space for execution of faulted process
46Problem - instruction backup
- page faulting instructions trap to OS
- OS must restart instruction
- The page fault may originate at the op-code or
any of the operands - PC value useless - the location of the instruction itself is lost
- worse still, undoing of autoincrement or
autodecrement - was it already performed ?? - Hardware solutions
- Register to store PC value of instruction and
register to store changes to other registers
(increment/decrement) - Micro-code dumps all information on the stack
- Restart complete instruction and redo increments
etc. - Do nothing - RISC ......
47Memory access with page faults
- P probability of a page fault
- MA memory access time
- PF time to process page faults
- EMA Effective Memory Access
- (1-p) x MA P x PF
- where
- PF page-fault interrupt service time
- Read-in page time (maybe write-page too?)
- Restart process time
48Effective memory access
- For MA 100nsec and PF 25msec
- if P 0.001
- ? MA 10025 x 106 / 103 25100nsec
- if P 10-5
- ? MA 100250 350nsec
49Associative Memory - content addressable
memorypage insertion - complete entry from page
tablepage deletion - just the modified bit to
page table
50Associative Memory - comments
- With a large enough hit-ratio the average access
time is close to 0 - Only a complete virtual address (all levels) can
be counted as a hit - with multi-processing associative memory can be
cleared on context switch - wasteful.. - Add a field to the associative memory to hold
process ID and a special register for PID
51Fundamental Concepts (1)
- Virtual address space layout for 3 user processes
- White areas are private per process
- Shaded areas are shared among all processes
52Fundamental Concepts (2)
- Mapped regions with their shadow pages on disk
- The lib.dll file is mapped into two address
spaces simultanously
53Page Replacement Algorithms
- Page fault forces choice
- which page must be removed
- make room for incoming page
- Modified page must first be saved
- unmodified just overwritten
- Better not to choose an often used page
- will probably need to be brought back in soon
54Optimal page replacement
- Demand comes in for pages (3 Physical pages). The
Reference string - 7, 5, 1, 0, 5, 4, 7, 0, 2,
1, 0, 7 -
- an optimal algorithm faults on
- 7 5 1 (0,1) - (4,5) - - (2,4) (1,2)
- - - altogether 4 page - replacements
- take FIFO for example
- 7 5 1 (0,7) - (4,5) (7,1) - (2,0) (1,4)
(0,7)(7,2) - 3 additional page-replacements
55Good old FIFO
- implemented as a queue
- the usual drawback
- oldest page may be a referenced (needed) page
- second chance FIFO
- if reference bit is on - move to end of queue
- Better to implement as a circular queue
- save overhead of movements on the queue
56LRU - Least Recently Used
- Approximate the optimal algorithm -
- most recently used page as most probable next
reference - Replace page used furthest in the past
- Not easy to implement - needs counting of
references - Use a large counter (number of operations) and
save in a field in the page table, for each page
reference operation - Another option is to use a bit array of nxn bits
- In both cases the page entry with the smallest
number attached to it is selected for replacement
57LRU vs. Optimal
- reference string
- 7 0 1 2 0 3 0 4
2 3 0 3 2 1 2 0 1
7 0 1 - page frames
- Figure 9.10 Optimal page-replacement
algorithms - reference string
- 7 0 1 2 0 3 0 4
2 3 0 3 2 1 2 0 1
7 0 1 - Page frames
- Figure 9.11 LRU page-replacement algorithm.
58Second Chance Page Replacement Algorithm
- Operation of a second chance
- pages sorted in FIFO order
- Page list if fault occurs at time 20, A has R bit
set(numbers above pages are loading times) - When A moves forward its R bit is cleared!
59The Clock Page Replacement Algorithm
60Page replacement NRU - Not Recently Used
- There are 4 classes of pages, according to
reference and modification bits - Select a page at random from the least-needed
class - Easy scheme to implement
- Prefers a frequently referenced (not modified)
page on an old modified page - Class b is interesting, can only happen when
clock tick generates an erasure of the referenced
bit..
61LRU Realizing in Hardware
- Use a large counter (64 bits) and save in a field
in the page table, for each page reference
operation. At PF find minimum how ? - Another option is to use for each page a counter
with shift. For each Page reference Shift all
counters and put 1 for the referenced page.
Select page with most zeroes from the left too
many counter shifts! - Another option is to use a bit array of nxn bits
and use only TWO operations set row to 1s, set
column to 0s. - In all cases, too much overhead for the Hardware
- Needed an (approximate) Software solution
62LRU with bit tables
Reference string is 0,1,2,3,2,1,0,3,2,3
63NFU - Not Frequently Used
- In order to record frequently used pages add a
counter to all table entries but dont update
each memory reference, but each Clock tick! - At each clock tick add the R bit to the counters
- Select page with lowest counter for replacement
- problem remembers everything
- remedy (an aging algorithm)
- shift-right the counter before adding the
reference bit - add the reference bit at the left
- Less operations than LRU, depends on the
intervals used for updating
64NFU - the aging simulation version
65Differences between LRU and NFU
- If two pages have the same number of zeroes
before the first 1, who to select? - If two pages have both counters 0s who to
select? (counter too short) - Therefore its only an Approximation!
66Modelling (static) paging algorithms
- Beladys anomaly
- Example FIFO with reference string 123412512345
67Characterizing page replacement
- a Reference string (of requested pages)
- number of virtual pages n
- number of physical page frames m - static
- a page replacement algorithm
- can be represented by an array M of n rows
1
68Stack Algorithms
- Definition Set of pages in physical memory with
m page frames is a subset of the pages in
physical memory with m1 page frames (for every
reference string) - Stack algorithms have no anomaly
- Example LRU, optimal replacement
- FIFO is not a stack algorithm
- Useful definition
- Distance string distance from top of stack
69Predicting page fault number
- Ci is the number of times that i is in the
distance string - the number of page faults with m frames is
- Fm
70The Distance String
- Probability density functions for two
hypothetical distance strings
71Page Allocation Policies (2)
- Page fault rate as a function of the number of
page frames assigned
72Page Frame Allocation
- for a page-fault rate p, memory access time of
100 nanosecs and page-fault service time of 25
millisecs the effective access time is (1-p) x
100 p x 25,000,000 - for p of 0.001 the effective access time is
still larger than 100 nanosecs by a factor of 250 - for a goal of only a 10 degradation in access
time we need p 0.0000004 - policies for page-frame allocation must allocate
as much as possible to processes, to enhance
performance leave no unassigned page-frame - difficult to know how much frames to allocate to
processes differ in size structure priority
73Allocation to multiprocesses
- Fair share is not the best policy (static !!)
- allocate according to process size so, so
- must be a minimum for running a process...
Age
A6
A6
74Thrashing
- If a process does not have enough pages, the
page-fault rate is very high. This leads to - Low CPU utilization.
- Operating system thinks that it needs to increase
the degree of multiprogramming. - Another process added to the system.
- Thrashing ? a process is busy swapping pages in
and out.
75Thrashing Diagram
- Why does paging work?Locality model
- Process migrates from one locality to another.
- Localities may overlap.
- Why does thrashing occur?? size of locality gt
total memory size
76Working-Set Model
- ? ? working-set window ? a fixed number of page
references Example 10,000 instruction - WSSi (working set of Process Pi) total number
of pages referenced in the most recent ? (varies
in time) - If ? too small will not encompass entire
locality. - If ? too large will encompass several localities.
- If ? ? ? will encompass entire program.
- D ? WSSi ? total demand frames
- If D gt m ? Thrashing
- Policy if D gt m, then suspend one of the
processes.
77Working-Set Model
- The working set is the set of pages used by the K
most recent memory references - The function w(k,t) is the size of the working
set at time t - How do we estimate w(k,t) WITHOUT update on each
memory reference?
78Working set model
79Dynamic Page Allocation - lookback ?
- 0 2 1 3 5 4 6 3 7 5 7 3 3 5 6 4
- with 5 page frames (LRU)
- p p p p p p p - p - - - - - -
- optimal - with ? 5 (and LRU)
- p p p p p p p - p - - (4)(3) - p(4)
p(4) - for a window of size 5 the allocated WS is
decreasing after request 12 and 14 - the maximum page allocation is ?
- extra page fault, because of the size of the WS
- after the last request, page 4, the number of
allocated page frames increases again (4)
80Keeping track of the Working Set
- Approximate with interval timer a reference
bit. - Example ? 10,000
- Timer interrupts after every 5000 time units.
- Keep in memory 2 bits for each page.
- Whenever a timer interrupts copy and sets the
values of all reference bits to 0. - If one of the bits in memory 1 ? page in
working set. - Why is this not completely accurate?
- Improvement 10 bits and interrupt every 1000
time units.
81Dynamic set Aging
- the look-back window cannot be based on memory
references - too expensive - one way to enlarge the time gap between updates
is to use some clock tick triggering - reference bits are updated by the hardware
- some algorithm sets-off reference bits, but uses
also an additional data structure to store the
current virtual time of the process - aging.
The current virtual time is stored for each
entry with R 1, this is done every clock
interrupt. - At PF time, the table is scanned and the entry
with R0 and the largest age (virtual time
stored time), is selected. - Why virtual time? Since we need to keep times
independently for processes. - This idea can be a basis for page replacement
that selects the oldest pages among the
non-referenced
82The Working Set Page Replacement Algorithm (2)
- The working set algorithm
83Dynamic set - Clock Algorithm
- WSClock is a global clock algorithm - for pages
held by all processes in memory - Circling the clock, the algorithm uses the
reference bit and an additional data structure,
ref(frame), is set to the current virtual time
of the process - WSClock Use an additional condition that
measures elapsed (process) time and compares it
to ? - replace page when two conditions apply
- reference bit is unset
- Tp -- ref(frame) gt ?
84The WSClock Page Replacement Algorithm
85Dynamic set - WSClock Example
- 3 processes p0, p1 and p2
- current (virtual) times of the 3 processes are
- Tp0 50 Tp1 70 Tp2 90
- WSClock replace when Tp -- ref(frame) gt ?
- the minimal distance (window size) is ? 20
- The clock hand is currently pointing to page
frame 4 - page-frames 0 1 2 3 4 5 6
7 8 9 10 - ref. bit 0 0 1 1 1 0 1
0 0 1 0 - process ID 0 1 0 1 2 1 0
0 1 2 2 - last_ref 10 30 42 65 81 57 31 37 31
47 55 - 13 13 39
- gt20
86Review of Page Replacement Algorithms
87Comment - Page size analysis
- To minimize wasted memory
- process size s
- page size p
- page table entry size e
- Fragmentation overhead is
- Table space overhead is
- Total overhead is
- Minimize overhead
- Example s 128k e 8bytes
- optimal page size is 1488 bytes... i.e. use
1k or 2k or 4k
88Virtual Memory - Advantages
- Programs use much smaller physical memory than
their maximum requirements (much code or data is
unused) - more programs can run concurrently in memory
- Programs can use much larger (virtual) memory
- simplifies programming and enable using powerful
software - swapping time is smaller
- All physical memory can be used, whether
consecutive or not. - More flexible memory protection
89Virtual Memory - Disadvantages
- Special hardware for address translation - some
instructions may require 5-6 address
translations! - Difficulties in restarting instructions
(chip/microcode complexity) - Complexity of OS!
- Overhead - a Page-fault is an expensive operation
in terms of both CPU and I/O overhead. - Difficulty of optimizing memory utilization -
e.g. Buffering in DBMSs. Dangers of Thrashing!
90Additional issues - Locking and Sharing
- i/o channel/processor (DMA) transfers data
independently - page must not be replaced during transfer
- OS can use a lock variable per page
- Pages of editors code - shared among processes
- swapping out, or terminating, process A (and its
pages) may cause many page faults for process B
that shares them - looking up for evicted pages in all page tables
is impossible - solution maintain special data structures for
shared pages - nice idea transfer page from (kernel) process
sending data to process receiving it
91Handling the backing store
- need to store non-resident pages on disk
- the backing store (disk swap area) need to be
managed - allocate swap area to (whole) processes and
address pages by offset from swap address - processes grow during execution - assign separate
swap areas to Text Data and Stack - allocate disk blocks when needed - needs disk
addresses in memory to keep track of swapped pages
92Backing Store
- (a) Paging to static swap area
- (b) Backing up pages dynamically
93Implementation Issues
- Four times when the OS is involved with paging
- Process creation
- determine program size
- create page table
- Process execution
- MMU reset for new process
- TLB flushed
- Page fault
- determine virtual address causing fault
- swap target page out, needed page in
- Process termination
- release page table, pages
94Cleaning Policy
- Need for a background process, paging daemon
- periodically inspects state of memory
- When too few frames are free
- selects pages to evict using a replacement
algorithm - It can use same circular list (clock)
- as regular page replacement algorithm but with
diff ptr
95Locking Pages in Memory
- Virtual memory and I/O occasionally interact
- Proc issues call for read from device into buffer
- while waiting for I/O, another processes starts
up - has a page fault
- buffer for the first proc may be chosen to be
paged out - Need to specify some pages locked
- exempted from being target pages
96Separation of Policy and Mechanism
- Page fault handling with an external pager
- Example use DBMS!
97Page Daemons - Unix
- It is assumed useful to keep a number of free
pages - freeing of page frames can be done by a page
daemon - a process that sleeps most of the time - awakened periodically to inspect the state of
memory - if there are too few free page frames
then it frees page frames - yet another type of (global) dynamic page
replacement policy - this strategy performs better than evicting pages
when needed (and writing the modified to disk in
a hurry) - The net result is the use of all of available
memory as page-pool
98Page replacement - Unix
- The page daemon uses a two handed clock
algorithm - Any global clock algorithm either clears the
reference bit or grabs the (unreferenced) page
from its process. It is fast and just uses the
reference bit - a two-handed clock algorithm clears the
reference bit first and grabs with its second
hand. It has the parameter of the angle between
the hands - small angle leaves only busy pages - interesting idea on fork - keep the same page
for offspring and only copy-upon-write (Linux) - another interesting idea (Linux) inspect user
pages in virtual memory order (global clock) and
in system order (first unused cache, second
unused shared, third, unused heaviest user
process) - bdflush a daemon to flush dirty pages
99 and in Windows 2000
- Processes have working sets defined by two
parameters - the minimal and maximal of pages - the WS of processes is updated at the occurrence
of each page fault (i.e. the data structure WS) -
- PF and WS lt Min add to WS
- PF and WS gt Max remove from WS
- Memory is managed by keeping a number of free
pages, which is a complex function of memory use,
at all times (at most one disk reference per PF) - when the balance-set-manager is run (every
second) and it needs to free pages - - surplus pages (to the WS) are removed from a
process (large background before small
foreground) - counters of reference for pages are maintained
(on a multi-processor refs bits dont work since
they are local)
100Memory Management System Calls
- The principal Win32 API functions for mapping
virtual memory in Windows 2000
101Implementation of Memory Management
- A page table entry for a mapped page on the
Pentium
102Physical Memory Management (1)
- Various page lists and transitions between them
103Segmentation
- several logical address spaces per process
- a compiler needs segments for
- source text
- symbol table
- constants segment
- stack
- parse tree
- compiler executable code
- Most of these segments grow during execution
symbol table
symbol table
Source Text
source text
constant table
parse tree
call stack
104Segmentation - segment table
105Sharing of segments
106Segmentation vs. Paging
consideration Paging Segmentation
Need the program be aware of the technique ? no yes
How many linear address spaces ? 1 many
Can the total address space exceed physical memory ? yes yes
Can procedures and data be distinguished ? no yes
Sharing of procedures among users facilitated ? no yes
Motivation for the technique Get a large linear space Programs and data in logical independent address spaces
107Segmentation Architecture
- Logical address consists of a two tuple
- ltsegment-number, offsetgt,
- Segment table maps two-dimensional physical
addresses each table entry has - base contains the starting physical address
where the segment reside in memory. - limit specifies the length of the segment.
- Segment-table base register (STBR) points to the
segment tables location in memory. - Segment-table length register (STLR) indicates
number of segments used by a program - segment number s is legal if s
lt STLR.
108Segmentation Architecture (Cont.)
- Protection. With each entry in segment table
associate - validation bit 0 ? illegal segment
- read/write/execute privileges
- Protection bits associated with segments code
sharing occurs at segment level. - Since segments vary in length, memory allocation
is a dynamic storage-allocation problem (i.e.
Fragmentation problem)
109Segmentation with Paging
- MULTICS combined segmentation and paging
- 218 segments of up to 64k words (36 bits)
- addresses of 34 bits -
- 18 bit segment number
- 16 bit - page number (6) offset within page
(10) - Each process has a segment table (STBR)
- The segment table is a segment and is paged
(8bits page 10 offset). STBR added to 18bits
seg-num - Each segment is a separate virtual memory with a
page table (6 bits) - Segment tables contain segment descriptors 18
bits page table address 9 bits segment length.
110MULTICS segment descriptors
111Segmentation - Memory reference procedure
- 1. Use segment number to find segment descriptor
- segment table is itself paged because it is
large, so in actuality a STBR is used to locate
page of descriptor - 2. Check if segments page table is in memory
- if not a segment fault occurs
- if there is a protection violation TRAP (fault)
- 3. page table examined, a page fault may occur.
- if page is in memory the address of start of page
is extracted from page table - 4. offset is added to the page origin to
construct main memory address - 5. perform read/store etc.
112MULTICS Address Translation Scheme
113segmentation and paging - locating addresses
114Segmentation with Paging MULTICS
- Simplified version of the MULTICS TLB
- Existence of 2 page sizes makes actual TLB more
complicated
115Multics - Additional checks during Segment link
(call)
- Since Segments are mapped to files, ACLs
(access-control list) are checked with first
access (open) - Protection rings are checked
- Parameters may be passed via special gates
- A most advanced Architecture!
116Paged segmentation on the INTEL 80386
- 16k segments, each up to 1G (32bit words)
- 2 types of segment descriptors
- Local Descriptor Table (LDT), for each process
- Global (GDT) system etc.
- access by loading a 16bit selector to one of the
6 segment registers CS, DS, SS, (holding the
16bit selector during run time, 0 means
not-in-use) - Selector points to segment descriptor (8 bytes)
Privilege level (0-3)
0 GDT/ 1 LDT
13
1
2
Index
11780386 - segment descriptors
11880386 - Forming the linear address
- Segment descriptor is in internal (microcode)
register - If segment is not zero (TRAP) or paged out (TRAP)
- Offset size is checked against limit field of
descriptor - Base field of descriptor is added to offset (4k
page-size)
11980386 - paged segmentation (contnd.)
- Combine descriptor and offset into linear address
- If paging disabled, pure segmentation (286
compatibility). Linear address is physical
address - Paging is 2-level
- page directory (1k) page table (1k)
- pages are 4k bytes each (12bit offset)
- Page directory is pointed to by a special
register - PTEs have 20bits page frame and 12 bits of
modified, accessed, protection, etc. - Small segments have just a few page tables
12080386 - 2-level paging
121Segmentation with Paging Pentium (4)
- Mapping of a linear address onto a physical
address
122Intel 30386 address translation
123The end
124Dynamic Loading
- Routine is not loaded until it is called
- Better memory-space utilization unused routine
is never loaded. - Useful when large amounts of code are needed to
handle infrequently occurring cases. - No special support from the operating system is
required implemented through program design.
125Dynamic Linking
- Linking postponed until execution time.
- Small piece of code, stub, used to locate the
appropriate memory-resident library routine. - Stub replaces itself with the address of the
routine, and executes the routine. - Operating system needed to check if routine is in
processes memory address. - Dynamic linking is particularly useful for
libraries.
126Memory Protection
- Hardware
- history IBM 360 had a 4bit protection code in
PSW and memory in 2k partitions - process code in
PSW matches memory partition code - Two registers - base limit
- base is added by hardware without changing
instructions dynamic relocation - every request is checked against limit
runtime bound checking - reminder In the IBM/pc there are segment
registers (but no limit)
127Modeling Multiprogramming
Degree of multiprogramming
- CPU utilization as a function of number of
processes in memory
128No page tables - MIPS R2000
- 64 entry associative memory for virtual pages
- if not found, TRAP to the operating system
- software uses some hardware registers to find the
virtual page needed - a second trap may happen by page fault...
129Inverted page tables
- for very large memories (page tables) one can
have an inverted page table sorted by
(physical) page frames - IBM RT HP Spectrum (thinking of 64 bit
memories) - to avoid linear search for every virtual
address of a process use a hash table (one or a
few memory references) - only one page table the physical one for all
processes currently in memory - in addition to the hash table, associative
memory registers are used to store recently used
page table entries - the only way to deal with a 64 bit memory 4k
size pages two-level page tables can result in
242 entries
130Inverted Page Table Architecture
131Problem - instruction backup
- page faulting instructions trap to OS
- OS must restart instruction
- The page fault may originate at the op-code or
any of the operands - PC value useless - the location of the instruction itself is lost
- worse still, undoing of autoincrement or
autodecrement - was it already performed ?? - Hardware solutions
- Register to store PC value of instruction and
register to store changes to other registers
(increment/decrement) - Micro-code dumps all information on the stack
- Restart complete instruction and redo increments
etc. - Do nothing - RISC ......
132Assignment 3 Virtual Memory
- In your third assignment you will implement a
virtual memory simulator. - VMs goal is to give the user the ability to
write programs without the concern of physical
memory size in her computer. - The simulator will enable simulation of paging
hardware and page-replacement software and
testing of various page replacement strategies.
133The main questions
- Which page replacement algorithm to use?
- how to maintain the page tables?
- Before we can answer these questions we must
review our hardware.
134The main components
- Swapper - very simple swapper device,
simulating a paging disk. It reads/writes pages
from/to a specific page address. - Fast memory - the physical memory and some
info.It has the ability to read/write a byte or
a page from/to a specific address. For the same
price, it includes also a table with the
following info on each page ID ,Dirty bit,
Reference bit - MMU the hardware translator from logical to
physical addresses. Has limited amount of space
to store information. When a page is not in
physical memory, the MMU will trap to the page
replacement manager.
135And two more
- Page Replacement Manager - acts as the OS in
time of a trap from the MMU. When called to duty,
it chooses a page from physical memory and
replaces it with the requested page. - VM The object that the user has to interface.
All other components are transparent to the user.
It provides read/write from/to any address in the
virtual address space, and requests from the
system some statistical data (e.g. hit ratio).
136Back to our questions
- Which page replacement algorithm to use?
- Answer you will have to design a LRU
approximation algorithm with the given hardware
in the fast memory. - For comparison, also a FIFO algorithm.
137- How to maintain the page table?
- Answer Use a 2-level page table. The first
level is stored in the MMU cache memory. The 2nd
level tables are page sized each and are located
in the physical memory.Important 2nd level
tables may be swapped in and out of memory.
138A Typical configuration.
Physical Memory
1 6 V
2 7 V
5 I
6 4 V
4
2
7
1
6
Kernel space
Swap device
User space
3 I
4 3 v
7 5 v
8 I
8
5
3
Kernel space
User space
User Page no 1
no adr V/I
1 2 V
2 I
3 1 V
4 I
First Level Table in MMU.
1
139What happens if
- The user wishes to write to user page no 6.
- The user wishes to write to user page no 5, while
the next candidate to be swapped out is user page
no 6. - The user wishes to write to user page no 3, while
the next candidate to be swapped out is user page
no 7.
140The scenario
- User wishes to Read/Write a character from/to
address v_adr in the virtual memory that belongs
to a virtual page number pg. - The virtual memory queries the MMU for the
physical address of v_adr . - The MMU first checks (in the first level table)
if the second level page (that contains the entry
for pg) is in physical memory. If it is, go to 6. - Notify the Page Replacement manager that a page
fault occurred provide the required information.
- The Page Replacement Manager chooses a page p
from the second level pages section in the
physical memory and replaces the requested page
with p. Then it updates both entries in the first
level table. Go to 3.
141- Look for the physical address of pg in the
appropriate second level page table entry. If it
is in physical memory, then return correct
physical address of v_adr and go to 9. - The MMU notifies the Page Replacement Manager
that a page fault occurred. - The Page Replacement Manager chooses page sp from
the user pages section of the physical memory and
replaces the requested page with sp. Then it
updates both entries in the appropriate second
level pages (But the second level page containing
the entry of sp might not be in physical memory.
In that case we have another page fault that has
to be taken care of). Go to 6 - The VM receives from the MMU the physical address
of v_adr and reads/writes from/to that physical
address.
142For evaluating your assignment
- virtual void pf_history() 0 for each page
fault, displays on screen a record serial
number, type(kernel/user), Page In, Page out - virtual double hit_ratio() 0
- virtual void showMemoryTable()
- virtual void showPhysicalAddress(int adr)0
- virtual void showFirstLevelPageTable()0
- virtual void showSecondLevelPageTable(int i)0
- Important these methods are for evaluation only
and will not change the simulators configuration.
143Segmentation - Dynamic Linking
144Fundamental Concepts (1)
- Virtual address space layout for 3 user processes
- White areas are private per process
- Shaded areas are shared among all processes
145Fundamental Concepts (2)
- Mapped regions with their shadow pages on disk
- The lib.dll file is mapped into two address
spaces simultanously