Memory Management

About This Presentation

Title:

Memory Management

Description:

Parkinson's law: 'Programs expand to fill the memory available to hold them' ... Large (if memory was cheep it would have been large and we wouldn't have to ... – PowerPoint PPT presentation

Number of Views:199

Avg rating:3.0/5.0

Slides: 146

Provided by: csBg

Category:

more less

Transcript and Presenter's Notes

Title: Memory Management

1
Memory Management

Important expensive resource
Parkinsons law Programs expand to fill the
memory available to hold them.
Ideally programmers want memory that is
fast
non volatile
Large (if memory was cheep it would have been
large and we wouldnt have to discuss its
management).
Strong relation
multi-programming lt-gt memory management

2
Memory Management

Memory hierarchy
small amount of fast, expensive memory cache -
lt1M
some medium-speed, medium price main memory (RAM)
512M?
gigabytes of slow, cheap disk storage (portion
used for virtual memory) 16G?
Memory manager handles the memory hierarchy

3
Memory Management - Motivation

n processes, each spending a fraction p of their
time waiting for i/o, gives a probability of pn
of all processes waiting for i/o simultaneously
cpu utilization 1 - pn

4
Utilizing Memory

Assume each process takes 200k and so does the
operating system
Assume there is 1Mb of memory available and that
p0.8
space for 4 processes ? 60 cpu utilization
Another 1Mb enables 9 processes? 87 cpu
utilization

5
Types of memory managers

Those that move processes back and forth between
main-memory and disk
And those who dont
Simplest form one process in memory at a time.
User types a command
System loads the program to the main memory and
executes it.
System reports when its done.

6
Multiprogramming with Fixed Partitions

How to organize the memory ?
How to assign jobs to partitions ?
Separate queues vs. single queue

7
Allocating memory - growing segments
8
Memory Allocation and Fragmentation
job queue
process memory time
P1 600K 10
P2 1000K 5
P3 300K 20
P4 700K 8
P5 500K 15
9
Memory Allocation - Keeping Track (bitmaps
linked lists)
10
Strategies for Allocation

First fit do not search too much..
Next fit - start search from last location
Best fit - a drawback generates small holes
Worst fit - solves the above problem, badly
Quick fit - several queues of different sizes
( Try allocating 2 on the previous slide )
Main problem of memory allocation -
Fragmentation
Internal wasted parts of allocated space
External wasted unallocated space

11
The Buddy System

An example elaborate scheme the Buddy system
(Knuth 1973)
Separate lists of free holes of sizes of powers
of two
For any request, pick the 1st hole of the right
size
Not very good memory utilization
Freed blocks can only be merged with their
neighbors of their own size

12
The Buddy System
13
Fragmentation

External Fragmentation total memory space
exists to satisfy a request, but it is not
contiguous.
Internal Fragmentation allocated memory may be
slightly larger than requested memory this size
difference is memory internal to a partition, but
not being used.
Reduce external fragmentation by compaction
Shuffle memory contents to place all free memory
together in one large block.
Compaction is possible only if relocation is
dynamic, and is done at execution time.
I/O problem
Latch job in memory while it is involved in I/O.
Do I/O only into OS buffers.

14
Memory Compaction
15
Swapping

A process can be swapped temporarily out of
memory to a backing store, and then brought back
into memory for continued execution.
Backing store fast disk large enough to
accommodate copies of all memory images for all
users must provide direct access to these memory
images.
Roll out, roll in swapping variant used for
priority-based scheduling algorithms
lower-priority process is swapped out so
higher-priority process can be loaded and
executed.
Major part of swap time is transfer time total
transfer time is directly proportional to the
amount of memory swapped.
Modified versions of swapping are found on many
systems, i.e., UNIX, Linux, and Windows.

16
Schematic View of Swapping
17
Managing memory by Swapping

Processes from disk to memory and from memory to
disk
Whenever there are too many jobs to fit in memory
Swapping can help solve fragmentation
Allocating memory
Freeing memory and holes
possible solution swapping and memory compaction
since swapping is performed on whole processes it
results in a noticeable response time
longer queues of blocked processes can lead to
many swaps
Allocating swap space
Processes are swapped in/out from the same
location
Allocate maximum space? Or estimate maximum
Dont allocate swap space for memory-resident
processes (e.g. Daemons)

18
Swapping in Unix

When? Kernel run out of memory
a fork system call no space for child process
a brk system call to expand data segment
(new?)
a stack becomes too large
Who?
a blocked process with highest priority
a process which consumed much CPU
How much space?
maximum
use holes and first/best fit (old unix)

19
Issues - Relocation and Linking

Compile time - create absolute code
Load time - linker lists relocatable
instructions and loader changes instructions (at
each reload..)
Execution time - special hardware needed to
support moving of processes during run time
Dynamic Linking - used with system libraries and
includes only a stub in each user routine,
indicating how to locate the memory-resident
library function (or how to load it, if needed)

20
Binding of Instructions and Data to Memory
Address binding of instructions and data to
memory addresses can happen at three different
stages.

Compile time If memory location known a priori,
absolute code can be generated must recompile
code if starting location changes.
Load time Must generate relocatable code if
memory location is not known at compile time.
Execution time Binding delayed until run time
if the process can be moved during its execution
from one memory segment to another. Need
hardware support for address maps (e.g., base and
limit registers).

21
Dynamic Linking

Linking postponed until execution time.
Small piece of code, stub, used to locate the
appropriate memory-resident library routine.
Stub replaces itself with the address of the
routine, and executes the routine.
Operating system needed to check if routine is in
processes memory address.
Dynamic linking is particularly useful for
libraries.

22
Logical vs. Physical Address Space

The concept of a logical address space that is
bound to a separate physical address space is
central to proper memory management.
Logical address generated by the CPU also
referred to as virtual address.
Physical address address seen by the memory
unit.
Logical and physical addresses are the same in
compile-time and load-time address-binding
schemes logical (virtual) and physical addresses
differ in execution-time address-binding scheme.

23
Paging and Virtual Memory

enable an address space that is independent of
physical memory
232 addresses for a 32 bit (address bus) machine
- virtual addresses
can be achieved by segmenting the executable
(with segment registers..) or by dividing memory
using another method
Paging - Divide memory into fixed-size blocks
(page-frames)
Small enough blocks - many for one process
Allocate to processes non-contiguous memory
chunks - avoiding holes..

24
Memory-Management Unit (MMU)

Hardware device that maps virtual to physical
address.
In MMU scheme, the value in the relocation
register is added to every address generated by a
user process at the time it is sent to memory.
The user program deals with logical addresses it
never sees the real physical addresses.

25
Paging
26
Memory Management Unit
27
MMU Operation - page fault if accessed page is
absent
28
Pages the dataPage frames the physical memory
locations

Page Table Entries (PTE) contain (per page)
Page frame number (physical address)
Present/absent bit (valid bit)
Dirty (modified) bit
Referenced (accessed) bit
Protection
Caching disable/enable

page frame number
29
Page vs Page-table sizes -Tradeoffs

A logical address of 24 bits (16MB) (on 32-bit
machine with op-codes of 8 bits) can be divided
into
1K page and 16K entries table (16K 8 128K )
4K page and 4K entries table (4K 8 32K )
Large pages less number of pages, but waste in
last page.
Small pages- larger tables (also waste of space)
A logical address of 32 bits (4GB) can be
divided into
1K page and 4M entries table (4M 8 32M! )
4K page and 1M entries table (1M 8 8M )
Huge tables! what to do?

30
Two-Level Paging Example

A logical address (on 32-bit machine with 4K page
size) is divided into
a page number consisting of 20 bits.
a page offset consisting of 12 bits.
Since the page table is paged, the page number is
further divided into
a 10-bit page number.
a 10-bit page offset.
Thus, a logical address is as follows
Where pi is an index into the outer page table,
and p2 is the displacement within the page of the
outer page table.

31
Two-Level Page-Table Scheme
32
Two-Level Paging Example - Vax

A logical address (on 32-bit machine) is divided
into
a page number consisting of 23 bits.
a page offset consisting of 9 bits (page size
1/2K!).
Since the page table is paged, the page number is
further divided into
a 21-bit page number.
a 2-bit section index. (code, heap, stack,
system)
Thus, a logical address is as follows
Where s is an index into the section table, and
p is the pointer to the page table. Note, Section
table is always in memory. Page table may be
swapped. Its max size is 2M 4 8MB!

33
SPARC 3 level pagingContext table (MMU
hardware) - 1 entry per process
34
Page table considerations

Can be very large (1M pages for 32bits addresses)
Must be fast (every instruction needs it)
One extreme will have it all in hardware - fast
registers that hold the page table and are loaded
with each process, too expensive for the above
size
The other extreme has it all in memory (using a
page table base register (ptbr) to point to it -
each memory reference during instruction
translation is doubled...
To avoid keeping complete page tables in memory -
make them multilevel (and avoid the danger of
accumulating memory references per instruction by
caching)

35
Multilevel Paging and Performance

Since each level is stored as a separate table in
memory, covering a logical address to a physical
one may take four memory accesses.
Even though time needed for one memory access is
quintupled, caching permits performance to remain
reasonable.
Cache hit rate of 98 percent yields effective
access time 0.98 x 120 0.02 x 520
128 nanoseconds.Which is only a 28 percent
slowdown in memory access time.

36
Inverted page tables

for very large memories (page tables) one can
have an inverted page table sorted by
(physical) page frames
IBM RT HP Spectrum (thinking of 64 bit
memories)
to avoid linear search for every virtual
address of a process use a hash table (one or a
few memory references)
only one page table the physical one for all
processes currently in memory
in addition to the hash table, associative
memory registers are used to store recently used
page table entries
the only way to deal with a 64 bit memory 4k
size pages two-level page tables can result in
242 entries

37
Inverted Page Table Architecture
38
Shared Pages
39
Motivation for Virtual Memory

Unused code
Error routines
Rare functionality
Unused data
Array larger then needed
Garbage not collected

40
Demand Paging

Bring a page into memory only when it is needed.
Less I/O needed
Less memory needed
Faster response
More users
Page is needed ? reference it
Invalid reference ? abort
not-in-memory ? bring to memory

41
In-memory Bit

With each page table entry a valid-invalid bit is
associated (1 ? in-memory, 0 ? not-in-memory).
Initially valid-invalid but is set to 0 on all
entries.
Example of a page table snapshot.
During address translation, if valid-invalid bit
in page table entry is 0 ? page fault.

42
Page Fault

If there is ever a reference to a page, first
reference will trap to OS ? page fault
OS looks at another table to decide
Invalid reference ? abort.
Just not in memory.
Get empty frame.
Swap page into frame.
Reset tables, validation bit1.
Restart instruction Least Recently Used
block move
Auto increment/decrement location

43
What Happens if there is no Free Frame

Page replacement find some page in memory, but
not really in use, swap it out.
Algorithm
Performance want an algorithm which will result
in minimum number of page faults.
Same page may be brought into memory several times

44
Page fault Handling

1. trap to kernel, save PC on stack and
(sometimes) partial state in registers (and/or
stack)
2. assembly routine saves volatile information
and calls the operating system
3. find requested virtual page
4. check protection. If legal, find free page
frame (or invoke page replacement algorithm)
5. if replacing, check if modified and start
write to disk. Mark frame busy. Call scheduler
to block process until the write-to-disk process
has completed.

45
Page fault Handling (contnd.)

6. transfer of requested page from disk
(scheduler runs alternative processes)
7. upon transfer completion, enter page table,
mark new page as valid and update all other
parameters
8. back up faulted instruction which was in
principle in mid execution now the PC can be
set back to its initial value
9. schedule faulting process, return from
operating system
10. restore state (i.e. all volatile information
stored by the assembly routine) and return to
user space for execution of faulted process

46
Problem - instruction backup

page faulting instructions trap to OS
OS must restart instruction
The page fault may originate at the op-code or
any of the operands - PC value useless
the location of the instruction itself is lost
worse still, undoing of autoincrement or
autodecrement - was it already performed ??
Hardware solutions
Register to store PC value of instruction and
register to store changes to other registers
(increment/decrement)
Micro-code dumps all information on the stack
Restart complete instruction and redo increments
etc.
Do nothing - RISC ......

47
Memory access with page faults

P probability of a page fault
MA memory access time
PF time to process page faults
EMA Effective Memory Access
(1-p) x MA P x PF
where
PF page-fault interrupt service time
Read-in page time (maybe write-page too?)
Restart process time

48
Effective memory access

For MA 100nsec and PF 25msec
if P 0.001
? MA 10025 x 106 / 103 25100nsec
if P 10-5
? MA 100250 350nsec

49
Associative Memory - content addressable
memorypage insertion - complete entry from page
tablepage deletion - just the modified bit to
page table
50
Associative Memory - comments

With a large enough hit-ratio the average access
time is close to 0
Only a complete virtual address (all levels) can
be counted as a hit
with multi-processing associative memory can be
cleared on context switch - wasteful..
Add a field to the associative memory to hold
process ID and a special register for PID

51
Fundamental Concepts (1)

Virtual address space layout for 3 user processes
White areas are private per process
Shaded areas are shared among all processes

52
Fundamental Concepts (2)

Mapped regions with their shadow pages on disk
The lib.dll file is mapped into two address
spaces simultanously

53
Page Replacement Algorithms

Page fault forces choice
which page must be removed
make room for incoming page
Modified page must first be saved
unmodified just overwritten
Better not to choose an often used page
will probably need to be brought back in soon

54
Optimal page replacement

Demand comes in for pages (3 Physical pages). The
Reference string
7, 5, 1, 0, 5, 4, 7, 0, 2,
1, 0, 7
an optimal algorithm faults on
7 5 1 (0,1) - (4,5) - - (2,4) (1,2)
- -
altogether 4 page - replacements
take FIFO for example
7 5 1 (0,7) - (4,5) (7,1) - (2,0) (1,4)
(0,7)(7,2)
3 additional page-replacements

55
Good old FIFO

implemented as a queue
the usual drawback
oldest page may be a referenced (needed) page
second chance FIFO
if reference bit is on - move to end of queue
Better to implement as a circular queue
save overhead of movements on the queue

56
LRU - Least Recently Used

Approximate the optimal algorithm -
most recently used page as most probable next
reference
Replace page used furthest in the past
Not easy to implement - needs counting of
references
Use a large counter (number of operations) and
save in a field in the page table, for each page
reference operation
Another option is to use a bit array of nxn bits
In both cases the page entry with the smallest
number attached to it is selected for replacement

57
LRU vs. Optimal

reference string
7 0 1 2 0 3 0 4
2 3 0 3 2 1 2 0 1
7 0 1
page frames
Figure 9.10 Optimal page-replacement
algorithms
reference string
7 0 1 2 0 3 0 4
2 3 0 3 2 1 2 0 1
7 0 1
Page frames
Figure 9.11 LRU page-replacement algorithm.

58
Second Chance Page Replacement Algorithm

Operation of a second chance
pages sorted in FIFO order
Page list if fault occurs at time 20, A has R bit
set(numbers above pages are loading times)
When A moves forward its R bit is cleared!

59
The Clock Page Replacement Algorithm
60
Page replacement NRU - Not Recently Used

There are 4 classes of pages, according to
reference and modification bits
Select a page at random from the least-needed
class
Easy scheme to implement
Prefers a frequently referenced (not modified)
page on an old modified page
Class b is interesting, can only happen when
clock tick generates an erasure of the referenced
bit..

61
LRU Realizing in Hardware

Use a large counter (64 bits) and save in a field
in the page table, for each page reference
operation. At PF find minimum how ?
Another option is to use for each page a counter
with shift. For each Page reference Shift all
counters and put 1 for the referenced page.
Select page with most zeroes from the left too
many counter shifts!
Another option is to use a bit array of nxn bits
and use only TWO operations set row to 1s, set
column to 0s.
In all cases, too much overhead for the Hardware
Needed an (approximate) Software solution

62
LRU with bit tables
Reference string is 0,1,2,3,2,1,0,3,2,3
63
NFU - Not Frequently Used

In order to record frequently used pages add a
counter to all table entries but dont update
each memory reference, but each Clock tick!
At each clock tick add the R bit to the counters
Select page with lowest counter for replacement
problem remembers everything
remedy (an aging algorithm)
shift-right the counter before adding the
reference bit
add the reference bit at the left
Less operations than LRU, depends on the
intervals used for updating

64
NFU - the aging simulation version
65
Differences between LRU and NFU

If two pages have the same number of zeroes
before the first 1, who to select?
If two pages have both counters 0s who to
select? (counter too short)
Therefore its only an Approximation!

66
Modelling (static) paging algorithms

Beladys anomaly
Example FIFO with reference string 123412512345

67
Characterizing page replacement

a Reference string (of requested pages)
number of virtual pages n
number of physical page frames m - static
a page replacement algorithm
can be represented by an array M of n rows

1
68
Stack Algorithms

Definition Set of pages in physical memory with
m page frames is a subset of the pages in
physical memory with m1 page frames (for every
reference string)
Stack algorithms have no anomaly
Example LRU, optimal replacement
FIFO is not a stack algorithm
Useful definition
Distance string distance from top of stack

69
Predicting page fault number

Ci is the number of times that i is in the
distance string
the number of page faults with m frames is
Fm

70
The Distance String

Probability density functions for two
hypothetical distance strings

71
Page Allocation Policies (2)

Page fault rate as a function of the number of
page frames assigned

72
Page Frame Allocation

for a page-fault rate p, memory access time of
100 nanosecs and page-fault service time of 25
millisecs the effective access time is (1-p) x
100 p x 25,000,000
for p of 0.001 the effective access time is
still larger than 100 nanosecs by a factor of 250
for a goal of only a 10 degradation in access
time we need p 0.0000004
policies for page-frame allocation must allocate
as much as possible to processes, to enhance
performance leave no unassigned page-frame
difficult to know how much frames to allocate to
processes differ in size structure priority

73
Allocation to multiprocesses

Fair share is not the best policy (static !!)
allocate according to process size so, so
must be a minimum for running a process...

Age
A6
A6
74
Thrashing

If a process does not have enough pages, the
page-fault rate is very high. This leads to
Low CPU utilization.
Operating system thinks that it needs to increase
the degree of multiprogramming.
Another process added to the system.
Thrashing ? a process is busy swapping pages in
and out.

75
Thrashing Diagram

Why does paging work?Locality model
Process migrates from one locality to another.
Localities may overlap.
Why does thrashing occur?? size of locality gt
total memory size

76
Working-Set Model

? ? working-set window ? a fixed number of page
references Example 10,000 instruction
WSSi (working set of Process Pi) total number
of pages referenced in the most recent ? (varies
in time)
If ? too small will not encompass entire
locality.
If ? too large will encompass several localities.
If ? ? ? will encompass entire program.
D ? WSSi ? total demand frames
If D gt m ? Thrashing
Policy if D gt m, then suspend one of the
processes.

77
Working-Set Model

The working set is the set of pages used by the K
most recent memory references
The function w(k,t) is the size of the working
set at time t
How do we estimate w(k,t) WITHOUT update on each
memory reference?

78
Working set model
79
Dynamic Page Allocation - lookback ?

0 2 1 3 5 4 6 3 7 5 7 3 3 5 6 4
with 5 page frames (LRU)
p p p p p p p - p - - - - - -
- optimal
with ? 5 (and LRU)
p p p p p p p - p - - (4)(3) - p(4)
p(4)
for a window of size 5 the allocated WS is
decreasing after request 12 and 14
the maximum page allocation is ?
extra page fault, because of the size of the WS
after the last request, page 4, the number of
allocated page frames increases again (4)

80
Keeping track of the Working Set

Approximate with interval timer a reference
bit.
Example ? 10,000
Timer interrupts after every 5000 time units.
Keep in memory 2 bits for each page.
Whenever a timer interrupts copy and sets the
values of all reference bits to 0.
If one of the bits in memory 1 ? page in
working set.
Why is this not completely accurate?
Improvement 10 bits and interrupt every 1000
time units.

81
Dynamic set Aging

the look-back window cannot be based on memory
references - too expensive
one way to enlarge the time gap between updates
is to use some clock tick triggering
reference bits are updated by the hardware
some algorithm sets-off reference bits, but uses
also an additional data structure to store the
current virtual time of the process - aging.
The current virtual time is stored for each
entry with R 1, this is done every clock
interrupt.
At PF time, the table is scanned and the entry
with R0 and the largest age (virtual time
stored time), is selected.
Why virtual time? Since we need to keep times
independently for processes.
This idea can be a basis for page replacement
that selects the oldest pages among the
non-referenced

82
The Working Set Page Replacement Algorithm (2)

The working set algorithm

83
Dynamic set - Clock Algorithm

WSClock is a global clock algorithm - for pages
held by all processes in memory
Circling the clock, the algorithm uses the
reference bit and an additional data structure,
ref(frame), is set to the current virtual time
of the process
WSClock Use an additional condition that
measures elapsed (process) time and compares it
to ?
replace page when two conditions apply
reference bit is unset
Tp -- ref(frame) gt ?

84
The WSClock Page Replacement Algorithm
85
Dynamic set - WSClock Example

3 processes p0, p1 and p2
current (virtual) times of the 3 processes are
Tp0 50 Tp1 70 Tp2 90
WSClock replace when Tp -- ref(frame) gt ?
the minimal distance (window size) is ? 20
The clock hand is currently pointing to page
frame 4
page-frames 0 1 2 3 4 5 6
7 8 9 10
ref. bit 0 0 1 1 1 0 1
0 0 1 0
process ID 0 1 0 1 2 1 0
0 1 2 2
last_ref 10 30 42 65 81 57 31 37 31
47 55
13 13 39
gt20

86
Review of Page Replacement Algorithms
87
Comment - Page size analysis

To minimize wasted memory
process size s
page size p
page table entry size e
Fragmentation overhead is
Table space overhead is
Total overhead is
Minimize overhead
Example s 128k e 8bytes
optimal page size is 1488 bytes... i.e. use
1k or 2k or 4k

88
Virtual Memory - Advantages

Programs use much smaller physical memory than
their maximum requirements (much code or data is
unused)
more programs can run concurrently in memory
Programs can use much larger (virtual) memory
simplifies programming and enable using powerful
software
swapping time is smaller
All physical memory can be used, whether
consecutive or not.
More flexible memory protection

89
Virtual Memory - Disadvantages

Special hardware for address translation - some
instructions may require 5-6 address
translations!
Difficulties in restarting instructions
(chip/microcode complexity)
Complexity of OS!
Overhead - a Page-fault is an expensive operation
in terms of both CPU and I/O overhead.
Difficulty of optimizing memory utilization -
e.g. Buffering in DBMSs. Dangers of Thrashing!

90
Additional issues - Locking and Sharing

i/o channel/processor (DMA) transfers data
independently
page must not be replaced during transfer
OS can use a lock variable per page
Pages of editors code - shared among processes
swapping out, or terminating, process A (and its
pages) may cause many page faults for process B
that shares them
looking up for evicted pages in all page tables
is impossible
solution maintain special data structures for
shared pages
nice idea transfer page from (kernel) process
sending data to process receiving it

91
Handling the backing store

need to store non-resident pages on disk
the backing store (disk swap area) need to be
managed
allocate swap area to (whole) processes and
address pages by offset from swap address
processes grow during execution - assign separate
swap areas to Text Data and Stack
allocate disk blocks when needed - needs disk
addresses in memory to keep track of swapped pages

92
Backing Store

(a) Paging to static swap area
(b) Backing up pages dynamically

93
Implementation Issues

Four times when the OS is involved with paging
Process creation
determine program size
create page table
Process execution
MMU reset for new process
TLB flushed
Page fault
determine virtual address causing fault
swap target page out, needed page in
Process termination
release page table, pages

94
Cleaning Policy

Need for a background process, paging daemon
periodically inspects state of memory
When too few frames are free
selects pages to evict using a replacement
algorithm
It can use same circular list (clock)
as regular page replacement algorithm but with
diff ptr

95
Locking Pages in Memory

Virtual memory and I/O occasionally interact
Proc issues call for read from device into buffer
while waiting for I/O, another processes starts
up
has a page fault
buffer for the first proc may be chosen to be
paged out
Need to specify some pages locked
exempted from being target pages

96
Separation of Policy and Mechanism

Page fault handling with an external pager
Example use DBMS!

97
Page Daemons - Unix

It is assumed useful to keep a number of free
pages
freeing of page frames can be done by a page
daemon - a process that sleeps most of the time
awakened periodically to inspect the state of
memory - if there are too few free page frames
then it frees page frames
yet another type of (global) dynamic page
replacement policy
this strategy performs better than evicting pages
when needed (and writing the modified to disk in
a hurry)
The net result is the use of all of available
memory as page-pool

98
Page replacement - Unix

The page daemon uses a two handed clock
algorithm
Any global clock algorithm either clears the
reference bit or grabs the (unreferenced) page
from its process. It is fast and just uses the
reference bit
a two-handed clock algorithm clears the
reference bit first and grabs with its second
hand. It has the parameter of the angle between
the hands - small angle leaves only busy pages
interesting idea on fork - keep the same page
for offspring and only copy-upon-write (Linux)
another interesting idea (Linux) inspect user
pages in virtual memory order (global clock) and
in system order (first unused cache, second
unused shared, third, unused heaviest user
process)
bdflush a daemon to flush dirty pages

99
and in Windows 2000

Processes have working sets defined by two
parameters - the minimal and maximal of pages
the WS of processes is updated at the occurrence
of each page fault (i.e. the data structure WS) -
PF and WS lt Min add to WS
PF and WS gt Max remove from WS
Memory is managed by keeping a number of free
pages, which is a complex function of memory use,
at all times (at most one disk reference per PF)
when the balance-set-manager is run (every
second) and it needs to free pages -
surplus pages (to the WS) are removed from a
process (large background before small
foreground)
counters of reference for pages are maintained
(on a multi-processor refs bits dont work since
they are local)

100
Memory Management System Calls

The principal Win32 API functions for mapping
virtual memory in Windows 2000

101
Implementation of Memory Management

A page table entry for a mapped page on the
Pentium

102
Physical Memory Management (1)

Various page lists and transitions between them

103
Segmentation

several logical address spaces per process
a compiler needs segments for
source text
symbol table
constants segment
stack
parse tree
compiler executable code
Most of these segments grow during execution

symbol table
symbol table
Source Text
source text
constant table
parse tree
call stack
104
Segmentation - segment table
105
Sharing of segments
106
Segmentation vs. Paging
consideration Paging Segmentation
Need the program be aware of the technique ? no yes
How many linear address spaces ? 1 many
Can the total address space exceed physical memory ? yes yes
Can procedures and data be distinguished ? no yes
Sharing of procedures among users facilitated ? no yes
Motivation for the technique Get a large linear space Programs and data in logical independent address spaces
107
Segmentation Architecture

Logical address consists of a two tuple
ltsegment-number, offsetgt,
Segment table maps two-dimensional physical
addresses each table entry has
base contains the starting physical address
where the segment reside in memory.
limit specifies the length of the segment.
Segment-table base register (STBR) points to the
segment tables location in memory.
Segment-table length register (STLR) indicates
number of segments used by a program
segment number s is legal if s
lt STLR.

108
Segmentation Architecture (Cont.)

Protection. With each entry in segment table
associate
validation bit 0 ? illegal segment
read/write/execute privileges
Protection bits associated with segments code
sharing occurs at segment level.
Since segments vary in length, memory allocation
is a dynamic storage-allocation problem (i.e.
Fragmentation problem)

109
Segmentation with Paging

MULTICS combined segmentation and paging
218 segments of up to 64k words (36 bits)
addresses of 34 bits -
18 bit segment number
16 bit - page number (6) offset within page
(10)
Each process has a segment table (STBR)
The segment table is a segment and is paged
(8bits page 10 offset). STBR added to 18bits
seg-num
Each segment is a separate virtual memory with a
page table (6 bits)
Segment tables contain segment descriptors 18
bits page table address 9 bits segment length.

110
MULTICS segment descriptors
111
Segmentation - Memory reference procedure

1. Use segment number to find segment descriptor
segment table is itself paged because it is
large, so in actuality a STBR is used to locate
page of descriptor
2. Check if segments page table is in memory
if not a segment fault occurs
if there is a protection violation TRAP (fault)
3. page table examined, a page fault may occur.
if page is in memory the address of start of page
is extracted from page table
4. offset is added to the page origin to
construct main memory address
5. perform read/store etc.

112
MULTICS Address Translation Scheme
113
segmentation and paging - locating addresses
114
Segmentation with Paging MULTICS

Simplified version of the MULTICS TLB
Existence of 2 page sizes makes actual TLB more
complicated

115
Multics - Additional checks during Segment link
(call)

Since Segments are mapped to files, ACLs
(access-control list) are checked with first
access (open)
Protection rings are checked
Parameters may be passed via special gates
A most advanced Architecture!

116
Paged segmentation on the INTEL 80386

16k segments, each up to 1G (32bit words)
2 types of segment descriptors
Local Descriptor Table (LDT), for each process
Global (GDT) system etc.
access by loading a 16bit selector to one of the
6 segment registers CS, DS, SS, (holding the
16bit selector during run time, 0 means
not-in-use)
Selector points to segment descriptor (8 bytes)

Privilege level (0-3)
0 GDT/ 1 LDT
13
1
2
Index
117
80386 - segment descriptors
118
80386 - Forming the linear address

Segment descriptor is in internal (microcode)
register
If segment is not zero (TRAP) or paged out (TRAP)
Offset size is checked against limit field of
descriptor
Base field of descriptor is added to offset (4k
page-size)

119
80386 - paged segmentation (contnd.)

Combine descriptor and offset into linear address
If paging disabled, pure segmentation (286
compatibility). Linear address is physical
address
Paging is 2-level
page directory (1k) page table (1k)
pages are 4k bytes each (12bit offset)
Page directory is pointed to by a special
register
PTEs have 20bits page frame and 12 bits of
modified, accessed, protection, etc.
Small segments have just a few page tables

120
80386 - 2-level paging
121
Segmentation with Paging Pentium (4)

Mapping of a linear address onto a physical
address

122
Intel 30386 address translation
123
The end

The end

124
Dynamic Loading

Routine is not loaded until it is called
Better memory-space utilization unused routine
is never loaded.
Useful when large amounts of code are needed to
handle infrequently occurring cases.
No special support from the operating system is
required implemented through program design.

125
Dynamic Linking

Linking postponed until execution time.
Small piece of code, stub, used to locate the
appropriate memory-resident library routine.
Stub replaces itself with the address of the
routine, and executes the routine.
Operating system needed to check if routine is in
processes memory address.
Dynamic linking is particularly useful for
libraries.

126
Memory Protection

Hardware
history IBM 360 had a 4bit protection code in
PSW and memory in 2k partitions - process code in
PSW matches memory partition code
Two registers - base limit
base is added by hardware without changing
instructions dynamic relocation
every request is checked against limit
runtime bound checking
reminder In the IBM/pc there are segment
registers (but no limit)

127
Modeling Multiprogramming
Degree of multiprogramming

CPU utilization as a function of number of
processes in memory

128
No page tables - MIPS R2000

64 entry associative memory for virtual pages
if not found, TRAP to the operating system
software uses some hardware registers to find the
virtual page needed
a second trap may happen by page fault...

129
Inverted page tables

for very large memories (page tables) one can
have an inverted page table sorted by
(physical) page frames
IBM RT HP Spectrum (thinking of 64 bit
memories)
to avoid linear search for every virtual
address of a process use a hash table (one or a
few memory references)
only one page table the physical one for all
processes currently in memory
in addition to the hash table, associative
memory registers are used to store recently used
page table entries
the only way to deal with a 64 bit memory 4k
size pages two-level page tables can result in
242 entries

130
Inverted Page Table Architecture
131
Problem - instruction backup

page faulting instructions trap to OS
OS must restart instruction
The page fault may originate at the op-code or
any of the operands - PC value useless
the location of the instruction itself is lost
worse still, undoing of autoincrement or
autodecrement - was it already performed ??
Hardware solutions
Register to store PC value of instruction and
register to store changes to other registers
(increment/decrement)
Micro-code dumps all information on the stack
Restart complete instruction and redo increments
etc.
Do nothing - RISC ......

132
Assignment 3 Virtual Memory

In your third assignment you will implement a
virtual memory simulator.
VMs goal is to give the user the ability to
write programs without the concern of physical
memory size in her computer.
The simulator will enable simulation of paging
hardware and page-replacement software and
testing of various page replacement strategies.

133
The main questions

Which page replacement algorithm to use?
how to maintain the page tables?
Before we can answer these questions we must
review our hardware.

134
The main components

Swapper - very simple swapper device,
simulating a paging disk. It reads/writes pages
from/to a specific page address.
Fast memory - the physical memory and some
info.It has the ability to read/write a byte or
a page from/to a specific address. For the same
price, it includes also a table with the
following info on each page ID ,Dirty bit,
Reference bit
MMU the hardware translator from logical to
physical addresses. Has limited amount of space
to store information. When a page is not in
physical memory, the MMU will trap to the page
replacement manager.

135
And two more

Page Replacement Manager - acts as the OS in
time of a trap from the MMU. When called to duty,
it chooses a page from physical memory and
replaces it with the requested page.
VM The object that the user has to interface.
All other components are transparent to the user.
It provides read/write from/to any address in the
virtual address space, and requests from the
system some statistical data (e.g. hit ratio).

136
Back to our questions

Which page replacement algorithm to use?
Answer you will have to design a LRU
approximation algorithm with the given hardware
in the fast memory.
For comparison, also a FIFO algorithm.

137

How to maintain the page table?
Answer Use a 2-level page table. The first
level is stored in the MMU cache memory. The 2nd
level tables are page sized each and are located
in the physical memory.Important 2nd level
tables may be swapped in and out of memory.

138
A Typical configuration.
Physical Memory
1 6 V
2 7 V
5 I
6 4 V
4
2
7
1
6
Kernel space
Swap device
User space
3 I
4 3 v
7 5 v
8 I
8
5
3
Kernel space
User space
User Page no 1
no adr V/I
1 2 V
2 I
3 1 V
4 I
First Level Table in MMU.
1
139
What happens if

The user wishes to write to user page no 6.
The user wishes to write to user page no 5, while
the next candidate to be swapped out is user page
no 6.
The user wishes to write to user page no 3, while
the next candidate to be swapped out is user page
no 7.

140
The scenario

User wishes to Read/Write a character from/to
address v_adr in the virtual memory that belongs
to a virtual page number pg.
The virtual memory queries the MMU for the
physical address of v_adr .
The MMU first checks (in the first level table)
if the second level page (that contains the entry
for pg) is in physical memory. If it is, go to 6.
Notify the Page Replacement manager that a page
fault occurred provide the required information.
The Page Replacement Manager chooses a page p
from the second level pages section in the
physical memory and replaces the requested page
with p. Then it updates both entries in the first
level table. Go to 3.

141

Look for the physical address of pg in the
appropriate second level page table entry. If it
is in physical memory, then return correct
physical address of v_adr and go to 9.
The MMU notifies the Page Replacement Manager
that a page fault occurred.
The Page Replacement Manager chooses page sp from
the user pages section of the physical memory and
replaces the requested page with sp. Then it
updates both entries in the appropriate second
level pages (But the second level page containing
the entry of sp might not be in physical memory.
In that case we have another page fault that has
to be taken care of). Go to 6
The VM receives from the MMU the physical address
of v_adr and reads/writes from/to that physical
address.

142
For evaluating your assignment

virtual void pf_history() 0 for each page
fault, displays on screen a record serial
number, type(kernel/user), Page In, Page out
virtual double hit_ratio() 0
virtual void showMemoryTable()
virtual void showPhysicalAddress(int adr)0
virtual void showFirstLevelPageTable()0
virtual void showSecondLevelPageTable(int i)0
Important these methods are for evaluation only
and will not change the simulators configuration.

143
Segmentation - Dynamic Linking
144
Fundamental Concepts (1)