Title: Memory Management
1Memory Management
- Basic memory management
- Swapping
- Virtual memory
- Page replacement algorithms
- Modeling page replacement algorithms
- Design issues for paging systems
- Implementation issues
- Segmentation
2 Memory Management
- Ideally programmers want memory that is large,
fast, non volatile - Memory hierarchy
- small amount of fast, expensive memory cache
- some medium-speed,
- medium price main memory
- gigabytes of slow,
- cheap disk storage
- Memory manager handles
- the memory hierarchy
40 GB to 160 GB
3Basic Memory Management
4Basic Memory ManagementMemory management(1)
swapping and paging (2) without swapping and
paging Monoprogramming without Swapping or Paging
- Model (a) was used on mainframes and
minicomputers, and is rarely used any more. - Model (b) is used on some palmtop computers and
embedded systems. - Model (c) was used by the early personal
computers. The portion of the system in ROM is
called BIOS (Basic Input Output System) - Except on simple embedded systems,
monoprogramming is hardly used anymore.
5Multiprogramming with Fixed Partitions
- (a) separate input queues for each partition
- (b) Single input queue
6- (-) multiple input queues
- queue for a large partition is empty but queue
for a small partition is full - since the partitions are fixed, any space in a
partition not used by a job is lost - single input queue whenever a partition becomes
free, the job closest to the front of the queue
that fits in it could be loaded into the empty
partition and run - different strategy since it is undesirable to
waste a large partition on a small job, search
the whole input queue whenever a partition
becomes free and pick the largest job that fits
the partition.
7Modeling Multiprogramming
- CPU utilization 1- pn
- p fraction of time waiting for I/O
- n number of processes
8Analysis of Multiprogramming System Performance
Example 4 Jobs arrival time and CPU time are
shown in (a). How long it will take for all jobs
to complete? Assume all has 80 I/O wait.
- Arrival and work requirements of 4 jobs
- CPU utilization for 1 4 jobs with 80 I/O wait
- Sequence of events as jobs arrive and finish
- note numbers show amout of CPU time jobs get in
each interval
9Swapping
10Swapping
- Two general approaches to memory management
- Swapping Method of copying a processs memory
contents to secondary storage, removing the
process from the memory and allocating the new
free memory to a new process, running it for a
while, then putting it back on disk. - Virtual memory Capability of operating systems
that enables programs to address more memory
locations than are actually provided in main
memory. Virtual memory systems help remove much
of the burden of memory management from
programmers, freeing them to concentrate on
application development ? Sec. 4.3.
11Memory allocation
time
- Swapping system
- The number of processes in memory varies
dynamically. - Locations of processes in memory vary
dynamically. - Size of the partitions varies dynamically.
- Memory Compaction When swapping creates multiple
holes in memory, it is possible to combine them
all into one big one by moving all the processes
downward as far as possible. - Usually not done because it requires a lot of CPU
time.
12- How much memory should be allocated for a process
when it is created or swapped? - If processes are created with a fixed size that
never change, then the allocation is simple the
OS allocates exactly what is needed, no more and
no less. - If processes data segments can grow, a problem
occurs whenever a process tries to grow.
13Allocation space for a growing data
- (a)Allocating space for growing data segment
- If the hole between processes A and B runs
out, A or B will have to be moved to a hole with
enough space, swapped out of the memory until a
large enough hole can be created, or killed. - (b)Allocating space for growing stack data
segment - If the hole between stack segment and data
segment runs out, the process will have to be
moved to a hole with enough space, swapped out of
the memory until a large enough hole can be
created, or killed.
14- Two ways to keep track of memory usage
- bit maps
- lists
15Memory Management with Bit Maps
- Memory is divided up into allocation units, the
size of unit may be as small as a few words as
large as several kilobytes. - Part of memory with 5 processes, 3 holes
- tick marks show allocation units
- shaded regions are free
16- Trade-off
- The smaller the allocation unit, the larger the
bitmap. - If the allocation unit is chosen large, the
bitmap will become smaller, but the memory may be
wasted in the last unit of the process if the the
process size is not an exact multiple of the
allocation unit. - Main problem
- When it has been decided to bring a k-unit
process into memory, the memory manager must
search the bitmap to find a run of k consecutive
0 bits in the map. Searching a bitmap for a run
of a given length is a slow operation.
17Memory Management with Linked ListsLinked list
of allocated and free memory segments
- The segment list is kept sorted by address.
Sorting this way has advantage that when a
process terminates or is swapped out, updating
the list is straightforward.
18- Updating the list requires replacing a P with H.
- Two entries are coalesced into one, and the list
becomes one entry shorter. - The same with (b).
- Three entries are merged and two items are
removed from the list.
19Algorithms to allocate memory for a newly created
processAssume that the memory manager knows how
much memory to allocate.
- First fit The memory manager scans along the
list of segments until it finds a hole that is
big enough. The hole is then broken up into two
pieces, one for the process and one for the
unused memory. - It is a fast algorithm because it searches as
little as possible. - Next fit It works the same way as first, except
that it keeps track of where it is whenever it
finds a suitable hole. The next time it is called
to find a hole, it starts searching the list from
the place where it left off last time. - Simulations (Bays, 1977) show that it gives
slightly worse performance than first fit. - Best fit It searches the entire list and takes
the smallest hole that is adequate. - It is slower than first fit.
20- Worst fit To get around the problem of breaking
up nearly exact matches into a process and tiny
hole, it always takes the largest available hole,
so that the hole broken off will be big enough to
be useful. - Simulation has shown that the worst fit is not a
very good idea either. - Quick fit It maintains separate lists for some
of the more common sizes requested. - e.g. a table with n entries, in which the first
entry is a pointer to the head of a list of 4-KB
holes, the second entry is the a pointer to a
list of 8-KB holes, the third entry a pointer to
12-KB holes. - Finding a hole of required size is fast.
- It has the same disadvantage as all schemes that
sort by hole size, when a process terminates or
is swapped out, finding its neighbor to see if a
merge is possible is expensive.
21Virtual Memory
22Virtual Memory
- Virtual memory Capability of operating systems
that enables programs to address more memory
locations than are actually provided in main
memory. Virtual memory systems help remove much
of the burden of memory management from
programmers, freeing them to concentrate on
application development (Devised by Fotheringham,
1961) - Basic idea the combined size of a program, data,
and stack may exceed the amount of physical
memory available for it. OS keeps those parts of
the program currently in use in main memory, and
the rest on disk - e.g. 16-MB program can run on a 4-MB machine by
carefully choosing which 4-MB to keep in memory
at each instant, with pieces of program being
swapped between disk and memory as needed.
23Paging
- Paging Virtual memory organization technique
that divides an address space into fixed blocks
of contiguous address. When applied to a
processs virtual address space, the blocks are
called pages, which store process data and
instructions. When applied to main memory, the
blocks are called page frames.
24- Virtual address Program-generated address (using
indexing, base registers, segment registers and
other ways). - Virtual address space formed by all virtual
address. - Pentium II pro36 bits address 236 64GB
- Memory management unit (MMU) a chip or
collection of chips that maps the virtual
addresses onto the physical memory addresses
25- Example of how the mapping works.
- Virtual addresses 16-bit (0 64KB)
- Physical memory 64KB
- User program can be up to 64KB, but it cannot be
loaded into memory entirely and run. - The virtual address space is divided into units
called pages. - The corresponding units in physical memory are
called page frames. - The pages and frame pages are always the same
size. 4KB (512B 64KB in real system) - 8 frame pages, 16 virtual pages
- e.g. MOV REG, 0
- it is transformed into (by MMU)
- MOV REG, 8192
26e.g. MOV REG, 8192 is transformed into
MOV REG, 24576 In the actual hardware, a
Present/absent bit keeps track of which pages are
physically present in memory.
27- Page fault Fault that occurs as the result of an
error when a process attempts to access a
nonresident page, in which case the OS can load
it from disk. - e.g. MOV REG, 32780
- (12-th byte within virtual page 8)
- MMU notices that the page is unmapped and causes
CPU to trap to OS. - OS picks a little-used page frame and writes back
to the disk. - Then it fetches the page just referenced into
frame page just freed. - Change the map and restart the trapped
instruction.
28Page Tables
Page table Table that stores entries that map
page numbers to page frames. A page table
contains an entry for each of a processs virtual
pages. e.g. 16-bit address High-order 4 bits
virtual page number. Low-order 12 bits offset
8196 is transformed into 24580 by MMU.
- Internal operation of MMU with 16 4 KB pages
29- The purpose of page table is to map virtual pages
onto page frames. - Two major issues must be faced
- (1) The page table can be extremely large.
- e.g. a computer uses 32-bit virtual addresses,
page size 4KB - Page number 232/ 212 220 (1 million)
- Remember that each process needs its own page
table because it has its own virtual address
space. - (2) The mapping must be fast.
- The virtual-to-physical mapping must be done
on every memory reference. - A typical instruction has an instruction word,
and often a memory operand as well. Consequently,
it is necessary to make 1, 2, or sometimes more
page table reference per instruction.
30- Hardware solutions
- Simplest design one page table consisting of an
array of fast hardware registers, with one entry
for each virtual page, indexed by virtual page
number. - Advantage straightforward, and requires no
memory reference. - Disadvantage expensive (if the page table is
large) - Page table entirely in main memory, and one
hardware register that points to the start of the
page table - Advantage allows the memory map to be changed at
a context switch by reloading one register. - Disadvantage requires one or more memory
references to read page table entries during the
execution of each instruction. - Variations of the two approaches
31Multilevel Page TablesTo get around the problem
of having to store huge page tables in memory all
the time.
Second-level Page tables
Second-level page tables
32-bit virtual address PT110 bits, PT2 10
bits Offset12 bits (Page size 4KB ) Page
number 220
Top-level page table
The secret to the multilevel page table method is
to avoid keeping all tables in memory all the
time. e.g. a process needs 12Mbytes, 4MB for
text, the next 4MB for data, and the top 4MB for
stack. Only 4 page tables are actually needed
top-level table, second level tables for 0 to 4M,
4M to 8M, and top 4M. e.g. Virtual address
0x00402004, then PT11, PT22, Offset4
32Structure of a Page Tables Entry
- The exact layout of an entry is highly machine
dependent, but the kind of information present is
roughly the same. - The size varies from computer to computer, but
32 bits is a common size. - Page frame number the goal of the page mapping
is to locate this value. - Present/absent bit If this bit is 1, the entry
is valid and can be used. If it is 0, the
virtual page to which the entry belongs is not
currently in memory. - Modified and Referenced bits keep track of page
usage. When a page is written to, the hardware
automatically sets the modified bit. If the page
in it has been modified, it must be written back
to the disk. Modified bit is sometimes called
dirty bit. The reference bit is set whenever a
page is referenced. - Caching disabled bit allows caching to be
disabled for the page.
33TLBs Translation Lookaside Buffers
- All paging schemes keep the page tables in memory
gt performance problems! - Most programs tend to make a large number of
references to a small number of pages, and not
the other way around - Solution equip computers with a small hardware
device for mapping virtual addresses to physical
addresses without going through the page table - This device is called associative memory (AM) or
translation lookaside buffer. It is usually
inside the MMU and consists of a small number of
entries (normally 32)
34A TLB to speed up paging
- When a virtual address is presented to the MMU
for translation, the hardware first check to see
if its virtual page number is present in TLB by
comparing it to all the entries simultaneously.
If a valid match is found and the access does not
violate the protection bits, the page frame is
taken directly from TLB, without going to the
page table. - Hit ratio fraction of memory references that can
be satisfied from the TLBs. The higher the hit
ratio, the better the performance. - When the virtual page number is not in TLB, the
MMU detects the miss and does an ordinary page
lookup.
35Software TLB Management
- Hardware TLB Management
- MMU hardware recognizes the virtual memory has
page table. TLB management and TLB fault handling
are done by TLB. - Software TLB Management
- Modern RISC computers do nearly all of these page
management in software. - e.g. SPARC, MIPS, Alpha, and HP PA.
- On these machines, TLB entries are explicitly
loaded by the OS. When a TLB miss occurs, it just
generates a TLB fault and tosses the problem to
OS. The OS must find the page, remove an entry
from the TLB, enter the new one, and restart the
instruction that faulted. And, of course, all of
this must be done in a handful of instructions
because TLB misses occur much more frequently
than page faults. - If TLB is reasonably large to reduce the miss
rate, software management of TLB turns out to be
acceptably efficient (Uhlig, 1994). - Main gain simpler MMU, more area on CPU chip for
cache and other features.
36Inverted Page Tables
- Today 32-bit virtual address space and physical
memory, 4 Kbytes pages size gt each process need
2 20 entries in its page table (PT) with 4 bytes
per entry 4 Mbytes / process and PT is large
but manageable (multilevel paging schemes) - RISC chips with 64-bit virtual address space?
- 64-bit virtual address space gtgtgtgt physical memory
- 64-bit address space 20 million terabytes
- 4 Kbytes page size gt 2 52 4 quadrillion PT
entries gt requires rethinking!!!!! - Solution virtual address space immense, physical
pages frames still manageable gt inverted page
table ? in this design, there is one entry per
page frame in real memory, rather than one entry
per page of virtual address space. - E.g. with 64-bit virtual addresses, a 4-KB
page, and 256 MB of RAM, and inverted page table
only requires 65,536 entries. The entry keeps
track of which (process, virtual page) is located
in the page frame. -
37All virtual pages currently in memory that have
the same hash value are chained together
- Comparison of a traditional page table with an
inverted page table - IBM and HP workstations use inverted page tables.
It will become more common as 64-bit machines
become wide-spread.
38- Page Replacement Algorithms
39Page Replacement Algorithms
- Page fault gt OS has to select a page for
replacement - Modified page gt write back to disk
- Not modified page gt just overwrite with new page
- How to decide which page should be replaced?
- random
- many algorithms take into account
- usage
- age
- ...
40Optimal Page Replacement Algorithm
- What is optimal page replacement algorithm?
- Unrealizable page-replacement strategy that
replaces the page that will not be used until
furthest in the future. - Easy to describe - impossible to implement
because OS cannot look into future - Useful to evaluate page replacement algorithms
- Best (optimal) page replacement algorithm
- page fault occurs, a set of pages is in memory
- label all pages with the number of instructions
that will be executed before this page will be
used again in the future - replace the page with the highest number
- It is of no use in practical.
41NRU(Not Recently Used) Page Replacement Algorithm
- What is NRU page replacement algorithm?
- Page replacement strategy that uses
referenced bits and modified bits to replace
page. - Status bits associated with each page
- R page referenced (read or written)
- M page modified (written) (dirty bit, dirty
page) - Four classes
- class 0 not referenced, not modified
- class 1 not referenced, modified
- class 2 referenced, not modified
- class 4 referenced, modified
- NRU removes a page at random from the lowest
numbered nonempty class - Low overhead
42FIFO Page Replacement Algorithm
- What is FIFO page replacement algorithm?
- It is a page replacement strategy that replaces
the page that has been in memory longest. - OS maintains list of all pages currently in
memory. - Pages are stored in list by age.
- FIFO replaces oldest pages in case of page fault.
- Incurs low overhead, but does not predict future
page usage accurately. - FIFO is rarely used in its pure form.
43Second Chance Page Replacement Algorithm
- What is second chance page replacement algorithm?
- It is a variation of FIFO page replacement
that uses the referenced bit and FIFO queue to
determine which page to replace. If the oldest
pages referenced bit is off, it replace the
page. Otherwise it turns off the referenced bit
on the oldest page and moves it to the tail of
FIFO queue, and examines the next page or pages
until it locates a page with its referenced bit
turned off. - R referenced bit.
- Second chance is a reasonable algorithm
- But, inefficient because it is moving pages
around on its list
44The Clock Page Replacement Algorithm
When a page fault occurs, the page the arrow is
pointing to is inspected. Action taken depends
on the R bit R0 evict page R1 clear R
advance
- What is clock page replacement? It is a variation
of second chance page replacement strategy that
arranges the pages in a circular list instead of
a linear list. - Pointer to the oldest page
- R bit 0 page not referenced in last round gt
replace - R bit 1 page referenced in last round
- set R bit to 0
- advance until first page with R 0 is found
- advance pointer to next entry in both cases
45Least Recently Used (LRU) Page Replacement
Algorithm
- What is LRU page replacement algorithm?
Page-replacement strategy that replaces the page
that has not been referenced for longest time.
LRU generally predicts future page usage well but
incurs significant overhead. - Linked list. It is expensive maintaining the
list is time consuming operation. - Implement with special hardware a counter. Each
page table entry must also have a filed large
enough to contain the counter. - Another special hardware that can contain a
matrix of n?n bits, initially all 0. At any
instant, the row whose value is lowest is the
least recently used. -
46Simulating LRU in Software
- Previous LRU algorithms are realizable in
principle if machines have this hardware. They
are no use to OS designer who is making a system
for a machine that does not have this hardware. - Solution NFU (Not Frequently Used) algorithm It
requires a software counter associated with each
page, initially zero. At each clock interrupt, OS
scans all pages in memory. For each page, the R
bit (0 or 1) is added to the counter. - Main problem of NFU algorithm it never
forget anything. - Aging Modifies NFU algorithm as follows, and
makes it able to simulate LRU quite well. - (1) The counters are each shifted right 1 bit
before R bit is added in - (2) The R bit is added to the leftmost, rather
than the rightmost. -
47- The aging algorithm simulates LRU in software
- Note 6 pages for 5 clock ticks, (a) (e)
- In practice, 8 bits is enough if a clock tick is
around 20 msec.
48The Working Set Page Replacement Algorithm
W(k,t)
k
- Working set the set of pages that a process is
currently using. - k most recent memory reference
- t time
- w(k,t) the size of the working set at time, t
49page span current virtual time time of last
use ? predetermined page span
- The working set page replacement algorithm
- The hardware is assumed to set R and M bits.
- A periodic clock interrupt is assumed to cause
software to run that clears R bit on every clock
tick. - On page every fault, the page table is scanned to
look for a suitable page to evict.
504.4.9 The WSClock Page Replacement Algorithm An
improved algorithm that is based on the clock
algorithm but also uses the working set
information.page span current virtual time
time of last use ? predetermined page span.
51Review of Page Replacement Algorithms
52 53Segmentation
- Problem in one-dimensional address space with
growing tables - Ex. A compiler has following tables
- Source text
- Symbol table
- Constant table
- Parse tree
- Stack
- Problem one table may bump into another
- Solution To provide the machine with many
completely independent address spaces, called
sgements.
54- Segment Variable-size set of contiguous
addresses in a processs virtual address space
that is managed as one unit. A segment is
typically the size of an entire set of similar
items, such as a set of instructions in a
procedure or the contents of an array, which
enables the system to protect such items with
fine granularity using appropriate access rights.
- two or more separate/independent virtual address
spaces growing/shrinking - different kinds of protection are possible
- Two-part address (n, k)
- n address number (which segment)
- k address within segment
- Segmentation also facilitates sharing procedures
or data between several processes - e.g. shared library
55- Segmented memory allows each table to grow or
shrink independently of other tables
56- Comparison of paging and segmentation
57Implementation of Pure Segmentation
The implementation of segmentation differs from
paging in an essential way Pages are fixed size
and segments are not.
- (a)-(d) Development of checkerboarding
- (e) Removal of the checkerboarding by compaction
- External fragment (or checkerboarding) After the
system has been running for a while, memory will
be divided up into a number of chunks, some
containing segments and some containing holes.
This phenomena is called external fragment.
58Segmentation with Paging MULTICS
- MULTICS (MULTiplexed Information and Computer
Service) One of the first operating systems to
implement virtual memory. Developed by MIT, GE
and Bell Laboratories as the successors to MITs
CTSS (Compatible Time Sharing System). - Ken Thompson, one of the computer scientists at
Bell Labs who had worked on MULTICS project,
wrote a stripped-down, one-user version of
MULTICS. This work later developed into UNIX.
59Segmentation with Paging
- Many large segments gt main memory size gt paging
- MULTICS
- Honeywell 6000 machines descendents
- per program virtual memory of max. size 218
256 K segments (max. size 64 K 36-bit word long) - Treat each segment as a virtual memory and to
page it. - segment table page tables
- 16-word high speed TLB
60- Descriptor segment points to page tables
64K
61- A 34-bit MULTICS virtual address
62Memory reference
- Conversion of a 2-part MULTICS address into a
main memory address - Problem program would not run very fast.
- Solution 16-word TLB
63- Simplified version of the MULTICS TLB (Existence
of 2 page sizes makes actual TLB more complicated)
64Segmentation with Paging The Intel Pentium
- MULTICS
- Both segmentation and paging
- 256K independent segments, each up to 64K 36-bit
words - Intel Pentium
- Both segmentation and paging
- 16K independent segments, each up to 1 billion
32-bit words - Each program has its own LDT (Local Descriptor
Table). LDT describes segments local to each
program, including its code, data, stack, and so
on. - A single GDT (Global Descriptor Table) shared by
all programs on the computer. GDT describes
system segments including the OS its self.
65- To access a segment, a Pentium program first
loads a selector for that segment into one of the
machines 6 segment register. - CS holds the selector for code segment
- DS holds the selector for data segment
Specify LDT or GDT entry number. Theses tables
are restricted to hold 8K segment descriptors.
66- At the time a selector is loaded into a segment
register, the corresponding descriptor is fetched
from the LDT or GDT and stored in microprogram
registers, so it can be accessed quickly.
- Pentium code segment descriptor (Data segments
differ slightly) 8 bytes
67- How to convert (selector, offset) pair to
physical address ? - Find the descriptor corresponding to the
selector. If the segment does not exist, or is
currently paged out, a trap occurs. - Check the offset is beyond the end of the
segment, in which case a trap occurs. - If G(granularity)0, limit field (20bits) is the
exact segment size, up to 1MB. If G0, limit
field gives the segment size in pages instead of
bytes. Pentium page size is fixed as 4KB, 20 bits
are enough for segments up to 232 bytes. - (3) Assuming that the segment is in memory and
the offset is in range, the Pentium then adds
32-bit base field to offset to form linear
address. - 32-bit base is broken into 3 pieces all over
descriptor for compatibility with 286 (base is 24
bits) - (4) If paging is disabled (by a bit in global
control register), the linear address is
interpreted as the physical address and sent to
memory for read or write. This is a pure
segmentation scheme. - (5) If paging is enabled, the linear address is
interpreted as a virtual address and mapped onto
physical address using page tables. Page size is
4KB, a segment might contain 1 million pages.
68- Conversion of a (selector, offset) pair to a
linear address
69Each running program has a page directory
consisting of 1K 32-bit entries. Located at an
address pointed to by a global register. Each
entry in this directory points to a table also
containing 1K 32-bit entries.
- Mapping of a linear address onto a physical
address
70- Page table entry 32 bits each, 20 of which
contains page frame number, remaining bits
contains access and dirty bits, set by hardware, - Single page table handles 4MBytes of memory (1K
page frames, page size is 4KB) - To avoid making repeated reference to memory, the
Pentium (like MULTICS) has a small TLB that
directly maps the most recently used Dir-page
combination onto physical address of the page
frame. - If some application does not need segmentation
but is content with a single, paged, 32-bit
address, the model is possible. All segment
registers can be set up with the same selector,
whose descriptor has base0 and limit set to
maximum. In fact all current OSs for Pentium work
this way. OS/2 was the only one that used full
power of Intel MMU architecture.
71- Protection on the Pentium
- Pentium supports 4 protection level. A running
program is at a certain level indicated by 2 bits
in PSW(processor status word). - Each segment in the system also has a level.