Title: Linux Virtual Memory for Intel Processor
1Linux Virtual Memory for Intel Processor
2Overview
- Overview of Virtual memory.
- What are the supports available in Intel
architecture for virtual memory. - How Linux use those hardware support and
implement virtual memory. - Process Address Space.
- Page fault handler.
- What are the additional improvements in
kernel2.6. - References.
3Introduction
- In Virtual Memory environment a large logical
address space is simulated with a small amount of
physical memory (RAM) and some disk storage (swap
space). - Processors addressable logical address is
converted to physical address during program
execution. - Implementation requires extensive hardware
assistance and a lot of complex OS code and time. - Virtual memory can be implemented as
- Paging Fixed sized memory blocks.
- Segmentation variable sized memory blocks.
- Fetch technique Demand Paging
- Replacement technique Least Recently Used (LRU)
algorithm.
4Why Virtual Memory?
RAM
Process may be too big for Physical Memory There
are more active process than the physical memory
can hold. Solution Virtual Memory where a
large virtual address space(4GB) for each process
is simulated with a small amount of physical
memory (RAM) and some disk storage (swap space).
Process 2 (50 MB)
Process 3 (30 MB)
Process 1 (50 MB)
OS (8 MB)
5Virtual Memory
Process 1
Process 2
RAM
Page1(1)
Page1(2)
Page1(1)
Page1(1)
Process 1 Running
Process 2 Scheduled to run
Page 3(2)
Process 1 Sleep
Process 2 Running
Process 2 faulted
Page 2(1)
Page 2(2)
Page 2(1)
Page 3(1)
Page 3(2)
Page1(2)
Page 2(2)
Page 4(1)
OS (8 MB)
Page 5(1)
Page 5(2)
Page 6(2)
Page 6(1)
The system works because principle of locality
holds. Thrashing System swaps in/out all the
time, no real work is done.
Page 7(2)
Page 7(1)
6IA-32 Virtual Memory
- IA-32 architecture supports either pure
segmentation or segmentation/paging virtual
memory. - Logical address
- Consists of a segment selector(16 bit) and an
offset(32 bit). - Linear Address (LA) or Virtual Address (VA)
- The base address of the segment offset. This 32
bit address is used to address 4GB of memory. - Physical Address (PA)
- 32 bit Address in RAM.
7IA-32 Virtual Memory
8IA-32 Segmentation(1)
- Segment Registers (6)
- Hold and retrieve segment selectors quickly.
- CS (Code segment register) points to a segment
containing program instructions. Also includes
Current privilege Level (CPL) field to denote
privilege level 0 means kernel mode and 3 means
user mode. - DS (Data segment register) points to a segment
containing static and external data. - SS (Stack segment register) points to a segment
containing the current program stack. - ES, FS GS are general purpose registers and may
refer to arbitrary data segments.
9IA-32 Segmentation(2)
- Segment Descriptors (8 Byte)
- Unique Segment Identifier.
- Stored in Global Descriptor Table (GDT).
- Contains
- 32 bit Base address of the segment
- 20 bit limit
- 4 bit Type that denote segment type and access
rights. - DPL (Descriptor Privilege Level) Field 0 means
use is restricted to only kernel mode, 3 means
both mode.
10IA-32 Protection
- Protection
- Intel Use 4 Privilege levels 0-3 with 0 being
the most privilege level. - The privilege level of executing program is
determined by the privilege level of the code
segment currently executing. - CPL (Current privilege level) Bit 0 1 of CS
(code segment) register. - The processor changes CPL when program control is
transferred to a code segment with a different
privilege level. - DPL (Descriptors privilege level) Bits in
Segment descriptor. When the currently executing
code segment attempts to access a segment, The
DPL is compared to the CPL of CS. - Programs executing in a high privilege level can
not access segments with a lower privilege level
while programs low privilege level can access all
segments.
11Segmentation in Linux
- There is no mode bit to disable segmentation.
- Linux prefer paging over segmentation because of
simplicity and portability. - The pages are divided among 4 Segments.
- All process use the same logical address and
segment descriptors. - GDT is implemented is /arch/i386/kernet/head.S
- Each time CPL in CS change, DS and SS changed
correspondingly. - SS points to DS.
Segments used by Linux Type DPL Accessed By
Kernel Code Code, Read, Execute 0 Kernel
Kernel Data Data, Read , Write 0 Kernel
User Code Code, Read, Execute 3 Both
User Data Data, Read , Write 3 Both
12Protection in Linux
- Segments overlap in linear address space
/arch/i386/kernet/head.S - Thus access is effectively allowed to the entire
virtual address space using any of the above
segments. - All processes have two segments
- 0 - 3GB user segment
- 3GB - 4GB kernel segment
- Boundary is determined by PAGE_OFFSET
0xC00000000. - Process in user mode (CPL 3) can only access
addresses lower than 3 GB (only segments with DPL
3). - Process in kernel mode (e.g. after a system call)
can access both. When CPL 0, can access
segments (DPL 0,3) - Any distinction between code and data is enforced
at the page level, not at the segment level R/W
, U/S bit of page.
13IA-32 Paging
- Paging
- RAM is partitioned into fixed-sized page frames.
- Linear address is divided into same size pages
- The processor use information contained in page
directories and page tables (stored in RAM) to
map linear to physical address and to generate
page fault exception. - Translation Lookaside Buffers (TLB) are used to
store most recently accessed page directory and
table entries to reduce access time. - Intel supports 4KB, 2MB, 4MB page size.
- Paging is controlled by three flags in the
processors control registers and sets by OS
during initialization. - PG (paging) Available in all Intel processor
starting from 80386. Enable paging. - PSE (page size extensions) Introduced in the
Pentium processor. Permit large page(4 MB/2 MB
when PAE is set) - PAE (physical address extension) Introduced in
the Pentium Pro processors. Provides a method of
extending physical address to 36 bits(64MB).
Support page size of 4 KB/2 MB.
14Page Table and directories
- 32 bit linear address is divided into 3
fields(4KB page) - Page Directory Most significant 10 bits (1024
entry) - Page Table The intermediate 10 bits (1024 entry)
- Offset Least significant 12 bits (Each page is
4KB) - Incase of 2MB/4MB page, most significant 10 bits
are for page directory and rest 22 bits are for
page offset. Page tables are not used.
15Page Directory and Page table Entries
- When 32 bit address and 4KB page used
- 20 bit base address, bits 12 through 32.
- Present when set, Page is in RAM.
- Read/Write When set, page can be read and
written into. - User/supervisor When set, user privilege level,
otherwise both. - Accessed sets each time paging unit access the
entry. - PCD (page-level cache disable) and PWT
(page-level write through) - Dirty Applies page table entries only. Sets when
the page is accessed for write. - Global Introduced in Pentium Pro. Applies page
table entries only. When set indicates a global
page and prevent the page flushed from TLB when
context switch occurs. - Page size Applies page directories only. When 1
refers to 2MB/4MB page frame PGD points to
page. 4KB page when 0. - This flags are checked by hardware to see whether
requested kind of addressing can be performed.
16Paging in Linux(1)
- Linux uses 3 level paging to adopt to 64 bit
architectures. - Page global directory (PGD)
- Page Middle directory (PMD)
- Page table
- Linear address is divided into four parts three
table offset and an page offset. - What happens with IA-32, which use only two level
page tables? - Linux makes the PMD entry points back to PGD.
- IA-32 contains 1024 entries in PGD, one entry in
PMD and 1024 entries in page table. - Each process has its own PGD. During context
switch, PGD base value of the process executing
next is loaded into CR3 and TLB get flushed.
17Paging in Linux(2)
- Linux use PAE, but dont use PSE.
- Also use page size (PS) flag of PGD to refer
different page size for that specific PGD. - Mixing 4MB and 4 KB page size
- Kernel use large page(4MB) and one level
translation to reduce TLB entries and memory. - Application use 4KB page.
PAE PS of PGD Page size Physical Address size
0 0 4KB 32 bit
0 1 4MB 32 bit
1 0 4KB 36 bit
1 1 2MB 36 bit
18Paging in Linux(3)
- include/asm-i386/page.h
- 5 define PAGE_SHIFT 12
- 6 define PAGE_SIZE (1UL ltlt PAGE_SHIFT)
- 7 define PAGE_MASK ((PAGE_SIZE-1))
- include/asm-i386/pgtable.h include/asm-i386/pgtab
le-2level.h - Page table lookup code mm/memory.c
19Paging in Linux (4)
- The linear address space is split into two parts.
- The userspace(0-3GB) can be addressed in both
mode - Kernel space(3GB-4GB) can be accessed in only
kernel mode. - PAGE_OFFSET is defined as 0xc0000000 (3 GB)
- Kernel Paging (4 MB page)
- Kernel code and data stored in a group of
reserved page frame. - Never be dynamically assigned or swapped to disk.
- Kernel maintains a set of page tables rooted at
Master Kernel Page Global Directory. - How kernel initializes its own page tables?
- swapper_pg_dir is initialized during kernel
compilation. - Phase 1 Kernel can address the first 8 MB of RAM
by either LA identical to PA or 8MB starts from
0xc0000000. - Phase 2 Only transform LA starts from 0xc0000000
to PA from 0. - Where Paging starts? /arch/i386/kernel/head.S
20Physical Memory Management
- Physical memory is divided into three Zones DMA,
Normal HighMEM. - Page frames are assigned from these zones.
- Each physical page is associated with a page
descriptor - All pages are stored in mem_map array.
- Requesting page frames alloc_pages() allocates
groups of contiguous page frames and use buddy
system. - If alloc_pages cant find a free page frame, it
calls try_to_free_pages() to reclaim. - try_to_free_pages() reclaim pages according to
LRU algorithm. - Memory for small data structures are carried out
by Slab Allocator.
21Process Address Space
- The linear address space is split into two parts.
- The userspace(0-3GB) changes with each context
switch and accessed in both mode. - Kernel space(3GB-4GB) remains constant and
accessed while in kernel mode. - Memory descriptor mm_struct.
- One structure exits for each process and is
shared among threads. - Memory descriptor for kernel threads.
PAGE_OFFSET 0xC0000000
Kernel code data
User code data
22Memory Regions
- Full address space rarely used
- Each address space consists of several non
overlapping page aligned regions that are in use. - Each region contains pages with same protection
and purpose. - A list of mapped regions by /proc/PID/maps
- Regions are described by vm_area_struct
- If a file is memory mapped, the file pointer is
available through vm_file. - do_mmap(), find_vma(), get_unmapped_aera()
23Process Address Space
Linear Address
Memory Regions
mmap_cache
mmap
Memory Descriptor
24Page faulting
- Demand fetching
- Page is only fetched from swap space when
hardware raise a page fault exception, which then
the OS traps and allocates a page. - A number of pages after the faulting page is
prefetched. - Two types of page fault
- Major Has to read from disk, expensive.
- Minor Page in swap cache, protection fault.
- Architecture specific function do_page_fault().
- basically decides what type of fault and how can
it be handled. - If it is a valid page fault in a valid memory
region then call architecture independent
function handle_mm_fault(). - It allocates the required page table entries and
calls handle_pte_fault.
25Do_page_fault() flow diagram
26handle_mm_fault() Call graph
handle_mm_fault Allocates required page table
entries, if they dont exist
handle_pte_fault Based on properties,
corresponding handlers are called
do_swap_page Pages swapped out to disk
do_wp_page Copy on Write (COW) page
do_no_page If first time allocation
do_anonymous_page Handle anonymous access
27Copy on Write (COW)
- During fork kernel duplicates the parent address
space to child. It requires - Allocating page frames for the page tables of
child process. - Allocating page frames for the pages of the child
process. - Copying the pages of parent process to the pages
of child process. - Linux use an efficient copy on write approach
- The pages and page table entries are shared
between parent and child process and cant be
modified. - Whenever either one tries to write, a write fault
occurs. - Kernel then duplicates the page into a new page
frame and marks it as writable. - The original page frame remain write protected.
When other process tries to write, kernel check
whether it is only owner. If so then the page
become writable.
28Whats different in 2.6
- The big change is Linux's new support for NUMA
servers. Support for high end systems with
multiple processors, with separate memory pools
directly connected to each processor. - Support for Intel's PAE (Physical Address
Extension) allows the access up to 64 GB of RAM
in paged mode. Linux can now run applications
that access large blocks of memory. - For example, bigger databases are now supported
on Linux. - Reverse Mapping
- Multiple virtual pages (pages shared by different
processes) might point to the same physical page.
- The technique is useful when the kernel wants to
free a particular physical page.
29References
- IA-32 Intel Architecture Software Developers
Manual Volume 3 System Programming Guide
(Document 253668) Chapter 3 4. - Bovet, D., and Cesati, M. Understanding the Linux
Kernel. O'Reilly, 2001. (chapter 2, 7, 8 16) - Virtual memory management for Linux 2.4 kernel
Description   Code documentation - http//home.earthlink.net/jknapka/linux-mm/vmoutl
ine.html - Dietel Dietel, Operating Systems, Prentice Hall
, 2004 - The Wonderful World of Linux 2.6 by Joseph
Pranevich