Chapter 3 Memory Management - PowerPoint PPT Presentation

1 / 70
About This Presentation
Title:

Chapter 3 Memory Management

Description:

Title: Your name Your titile Author: zx Last modified by: Created Date: 1/5/2006 8:04:39 AM Document presentation format: Custom Company – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 71
Provided by: zx3
Category:

less

Transcript and Presenter's Notes

Title: Chapter 3 Memory Management


1
Chapter 3 Memory Management
Page Management
  • Li Wensheng
  • wenshli_at_bupt.edu.cn

2
Outline
  • Data Structure
  • Page Scanner Operation
  • Page-out Algorithm
  • Hardware Address Translation Layer

3
PagesThe Basic Unit of Solaris Memory
  • Physical memory is divided into pages.
  • A pages identity is its vnode/offset pair.
  • The hardware address translation (HAT) and
    address space layers manage the mapping between
    a physical page and its virtual address space.

4
The Page Structure
5
The Page Hash List
  • global hash list -- an array of pointers to
    linked lists of pages
  • VM system hashes pages with identity onto a
    global hash list so that they can be located by
    vnode/offset.
  • Three page functions search the global page hash
    list
  • page_find()
  • page_lookup()
  • page_lookup_nowait()

6
Locating Pages by Vnode/Offset Identity
7
MMU-Specific Page Structures
  • need to keep machine-specific data about every
    page, e.g. the HAT information that describes
    how the page is mapped by the MMU.
  • struct machpage
  • The contents of the machine-specific page
    structure are hidden from the generic kernel.
  • only the HAT machine-specific layer can see or
    manipulate its contents

8
Machine-Specific Page Structures sun4u Example
9
Physical Page Lists
  • a segmented global physical page list, consisting
    of segments of contiguous physical memory.
  • Contiguous physical memory segments are added
    during system boot.
  • Can also added and deleted dynamically when
    physical memory is added and removed while the
    system is running.

10
arrangement of the physical page lists
11
Free List and Cache List
  • hold pages that are not mapped into any address
    space and that have been freed by page_free().
  • free list
  • Does not have a vnode/offset associated
  • Pages are put on the free list at process exits
  • is generally very small
  • cache list
  • still have a vnode/offset
  • Seg_map free-behind and seg_vn executables and
    libraries (for reuse)

12
The Page-Level Interfaces
Method Description
page_create() Creates pages. Page coloring is based on a hash of the vnode offset. page_create() is provided for backward compatibility only. Dont use it if you dont have to. Instead, use the page_create_va() function so that pages are correctly colored.
page_create_va() Creates pages, taking into account the virtual address they will be mapped to. The address is used to calculate page coloring.
page_exists() Tests that a page for vnode/offset exists.
page_find() Searches the hash list for a page with the specified vnode and offset that is known to exist and is already locked
page_first() Finds the first page on the global page hash list
page_free() Frees a page. Pages with vnode/offset go onto the cache list other pages go onto the free list
page_isfree() Checks whether a page is on the free list
page_ismod() Checks whether a page is modified. This function checks only the software bit in the page structure. To sync the MMU bits with the page structure, you may need to call hat_pagesync() before calling page_ismod().
13
The Page-Level Interfaces (Cont.)
Method Description
page_isref() Checks whether a page has been referenced checks only the software bit in the page structure. To sync the MMU bits with the page structure, you may need to call hat_pagesync() before calling page_isref().
page_isshared() Checks whether a page is shared across more than one address space.
page_lookup() Finds a page representing the specified vnode/offset. If the page is found on a free list, then it will be removed from the free list
page_lookup_nowait() Finds a page representing the specified vnode/offset that is not locked or on the free list
page_needfree() Informs the VM system we need some pages freed up. Calls to page_needfree() must be symmetric, that is they must be followed by another page_needfree() with the same amount of memory multiplied by -1, after the task is complete.
page_next() Finds the next page on the global page hash list.
14
The Page Throttle
  • implemented in the page_create() and
    page_create_va() functions
  • causes page creates to block when the PG_WAIT
    flag is specified, that is, when available is
    less than the system global, throttlefree.
  • throttlefree is set to the same value as minfree.
  • memory allocated through the kernel memory
    allocator specifies PG_WAIT and is subject to the
    page-created throttle.

15
Page Sizes
System Type System Type MMU Page Size Capability Solaris 2.x Page Size
Early SPARC systems sun4c 4K 4K
microSPARC-I, -II sun4m 4K 4K
SuperSPARC-I, -II sun4m 4K, 4M 4K, 4M
UltraSPARC-I, -II sun4u 4K, 64K, 512K, 4M 8K, 4M
Intel x86 architecture i86pc 4K, 4M 4K, 4M
16
Page Coloring
  • page placement policy affects processor
    performance
  • The optimal placement of pages often depends on
    the memory access patterns of the application.
  • in a random order
  • in some sort of stridden ordered
  • How page placement can affect performance?
  • The UltraSPARC-I -II implementations
  • The L1 cache is 16 Kbytes
  • The L2 (external) cache can vary between 512
    Kbytes and 8 Mbytes
  • The L2 cache is arranged in lines of 64 bytes,
    and transfers are done to and from physical
    memory in 64-byte units.

17
Page Coloring (Cont.)
  • Assume
  • we have a 32-Kbyte L2 cache
  • page size of 8 Kbytes
  • four page-sized slots on the L2 cache
  • The cache does not necessarily read and write
    8-Kbyte units from memory it does that in
    64-byte chunks, so 32-Kbyte cache has 1024
    addressable slots.

18
Page Coloring (Cont.)
offsets 0 and 32678 map to the same cache
line. If we were now to access these two
addresses, cache ping-pong effect occurs.
we program to virtual memory rather than physical
memory.The OS must provide a sensible mapping
between virtual memory and physical memory
19
Page Coloring (Cont.)
  • physical pages are assigned to an address space
    from the order they appear in the free list.
  • page coloring algorithm
  • the free list of physical pages is organized into
    specifically colored bins, one color bin for each
    slot in the physical cache.
  • When a page is put on the free list, the
    page_free() algorithms assign it to a color bin.
  • When a page is consumed from the free list
    (page_create_va() function ), the
    virtual-to-physical algorithm takes the page from
    a physical color bin.

20
Page Coloring (Cont.)
  • The kernel supports a default algorithm and two
    optional algorithms.
  • The default algorithm was chosen according to the
    following criteria
  • Fairly consistent, repeatable results
  • Good overall performance for the majority of
    applications
  • Acceptable performance across a wide range of
    applications

21
Solaris Page Coloring Algorithms
algorithm algorithm description Solaris Availability Solaris Availability Solaris Availability
No. Name 2.5.1 2.6 7
0 Hashed VA The physical page color bin is chosen on a hashed algorithm to ensure even distribution of virtual addresses across the cache. Default Default Default
1 P.Addr V.Addr The physical page color is chosen so that physical addresses map directly to the virtual addresses (as in the example). Yes Yes Yes
2 Bin Hopping Physical pages are allocated with a round-robin method. Yes Yes Yes
6 Kesslers Best Bin Kessler best bin algorithm. Keep history per process of used colors and chooses least used color if multiple, use largest bin. E10000 only (default) E10000 only (default) Not Available
22
Outline
  • Data Structure
  • Page Scanner Operation
  • Page-out Algorithm
  • Hardware Address Translation Layer

23
Page Scanner
  • Is the memory management daemon that manages
    system wide physical memory
  • When there is a memory shortage, the page scanner
    runs to steal memory from address spaces, by
  • taking pages that havent been used recently
  • syncing them up with their backing store
  • freeing them
  • If paged-out virtual memory is required again, a
    memory page fault occurs.

24
Page Scanner (Cont.)
  • The balancing of page stealing and page faults
    determines which parts of virtual memory will be
    backed and which will be moved out to swap.
  • global page replacement / local page replacement
  • The subtleties of which pages are stolen govern
    the memory allocation policies and can affect
    different workloads in different ways.
  • Enhancements to minimize page stealing from
    extensively shared libraries and executables
  • Priority paging to prevent application, shared
    library, and executable paging on systems with
    ample memory.

25
Page Scanner Operation
  • tracks page usage by reading a per-page hardware
    bit from the MMU for each page
  • Two bits for each page Reference bit modify
    bit
  • awakened when the amount of memory on the
    free-page list falls below a system threshold
  • typically 1/64th of total physical memory.
  • scans through pages in physical page order
  • looking for pages that havent been used recently
    to page out to the swap device and free

26
Two-handed Clock Algorithm
  • front hand clears the referenced and modified
    bits for each page
  • back hand inspects the referenced and modified
    bits some time later
  • Pages havent been referenced or modified are
    swapped out and freed
  • scan rate is controlled by the amount of free
    memory on the system
  • The gap between the front and back hand is fixed
    by a boot-time parameter, handspreadpages.

27
Outline
  • Data Structure
  • Page Scanner Operation
  • Page-out Algorithm
  • Hardware Address Translation Layer

28
Introduction to page-out algorithm
  • Steals pages when memory is lower than lotsfree
  • Scanner runs
  • Starts scanning at slowscan (pages/sec)
  • Four times/second when memory is short
  • Awoken by page allocator if very low
  • Puts memory out to backing store
  • Uses a Least Recently Used process
  • Kernel threads does the scanning

29
Page Scanner Parameters
Parameter Description Min Default
Lotsfree starts stealing anonymous memory pages 512K 1/64 th of memory
Desfree scanner is started at 100 times/second Minfree ½ of lotsfee
Minfree start scanning every time a new page is created ½ of desfree
Throttlefree page_create routine makes the caller wait until free pages are Available Minfree
Fastscan scan rate (pages per second) when free memory minfree slowscan minimum of 64MB/s or ½ memory size
Slowscan scan rate (pages per second) when free memory lotsfree 100
Maxpgio max number of pages per second that the swap device can handle 60 60 or 90 pages per spindle
hand-spreadpages number of pages between the front hand (clearing) and back hand (checking) 1 Fastscan
min_percent_cpu CPU usage when free memory is at lotsfree 4 (1 clock tick) of a single CPU
30
Scan Rate Parameters (Assuming No Priority
Paging)
Stsrts scanning at slowscan
Scans faster as the amount of free memory
approaches 0
31
Scan Rate Parameters calculation
  • lotsfree is calculated at startup as 1/64th of
    memory
  • slowscan parameter is 100 by default on Solaris
    systems
  • fastscan is set to total physicalmemory/2
  • If total physical memory is 1G, then
  • Lotsfree2048 pages/sec fastscan8192 pages/sec
  • If free memory falls to 12 Mbytes (1536 pages)

32
Not Recently Used Time
  • The time between the front hand and back hand
  • short time ? the most active pages remain intact
  • long time ? only the largely unused pages are
    stolen
  • varies from just a few seconds to several
    hours,according to
  • the number of pages between front and back hand
  • the scan rate
  • Example
  • Scan rate 2000pages/sec
  • hand spread 8192 pages/sec
  • Clear/check time 4 seconds

33
Shared Library Optimizations
  • prevents scanner from stealing pages from
    extensively shared libraries
  • looks at the share reference count for each page
  • if the page is shared more than a certain amount,
    then it is skipped during the page scan
    operation.
  • threshold parameter po_share
  • 8 134217728, By default, starts at 8
  • A page shared by more than po_share processes
    will be skipped
  • Each time around, it is decremented ?

34
The Priority Paging Algorithm
  • Purpose overcome adverse behavior that results
    from the memory pressure caused by the file
    system.
  • puts a higher priority on a processs pages
  • its heap, stack, shared libraries, and
    executables.
  • permits scanner to
  • pick file system cache pages only when ample
    memory is available
  • only steal application pages when there is a true
    memory shortage.

35
The Priority Paging Algorithm
  • a new paging parameter, cachefree
  • When the amount of free memory lies between
    cachefree and lotsfree, the page scanner steals
    only file system cache pages
  • scanner wakes up when memory falls below
    cachefree rather than below lotsfree

36
Scan Rate Interpolation with the Priority Paging
Algorithm
37
Page Scanner CPU Utilization Clamp
  • Purpose to prevent the page-out daemon from
    using too much processor time
  • Two parameters
  • min_percent_cpu, default 4 of a single CPU
  • max_percent_cpu, default 80 of a single CPU
  • CPU time can be used
  • From min_percent_cpu to max_percent_cpu
  • min_percent_cpu when free memory is at lotsfree
    (cachefree with priority paging enabled)
  • max_percent_cpu if free memory were to fall to
    zero

38
Parameters That Limit Pages Paged Out
  • Maxpgio
  • limits the rate at which I/O is queued to the
    swap devices
  • defaults to 40 or 60 I/Os per second
  • Often set to 100 times the number of swap
    spindles
  • Maxpgio can also indirectly affect file system
    throughput

39
Page Scanner Implementation
  • implemented as two kernel threads
  • Page scanner thread scans pages
  • Page-out thread pushes the dirty pages queued
    for I/O

40
Page Scanner Architecture
41
Scanner Schedpaging()
  • waken up
  • called four times per second by a callout,
  • triggered by the clock() thread if memory falls
    below minfree
  • triggered by the page allocator if memory falls
    below throttlefree
  • calculates two setup parameters for the page
    scanner thread
  • the number of pages to scan
  • the number of CPU ticks that the scanner thread
    can consume
  • triggers the scanner through a condition variable

42
Page scanner thread
  • cycles through the physical page list
  • The front and back hand each have a page pointer
  • front hand is incremented first to clear the
    referenced and modified bits for pointed page
  • back hand is then incremented to check the status
    of the pointed page (using check_page() function)
  • If modified, placed in the dirty page queue
  • If not referenced, freed

43
Page-out thread
  • uses a preinitialized list of async buffer
    headers as the queue for I/O requests
  • The number of entries is controlled by parameter
    async_request_size, initialized with 256
  • Requests to queue more I/Os will be blocked
  • if the entire queue is full
  • if the rate of pages queued has exceeded the
    maxpgio
  • removes I/O entries from the queue
  • initiates I/O by calling the vnode putpage()

44
The Memory Scheduler
  • swap out entire processes to conserve memory
  • removing all of a processs thread structures and
    private pages
  • setting flags in the process table to indicate
    that this process has been swapped out
  • Not expensive but affects processs performance
  • launched at boot time
  • does nothing unless memory is less than desfree
  • looking for processes that can completely swap
    out
  • soft-swap out / hard-swap out

45
Soft Swapping
  • takes place when the 30-second average for free
    memory is below desfree
  • memory scheduler looks for processes that have
    been inactive for at least maxslp seconds
  • If found
  • swaps out the thread structures for each thread
  • pages out all of the private pages of memory for
    that process

46
Hard Swapping
  • takes place when all of the following are true
  • At least two processes are on the run queue,
    waiting for CPU.
  • The average free memory over 30 seconds is
    consistently less than desfree.
  • Excessive paging is going on
  • determined to be true if page-out page-in gt
    maxpgio
  • Use a much more aggressive approach to find
    memory
  • First, the kernel is requested to unload all
    modules and cache memory that are not currently
    active
  • Then, processes are sequentially swapped out
    until the desired amount of free memory is
    returned

47
Memory Scheduler Parameters
Parameter Affect on Memory Scheduler
desfree If the average amount of free memory falls below desfree for 30 seconds, then the memory scheduler is invoked.
maxslp When soft-swapping, the memory scheduler starts swapping processes that have slept for at least maxslp seconds. The default for maxslp is 20 seconds and is tunable
maxpgio When the run queue is greater than 2, free memory is below desfree, and the paging rate is greater than maxpgio, then hard swapping occurs, unloading kernel modules and process memory.
48
Outline
  • Data Structure
  • Page Scanner Operation
  • Page-out Algorithm
  • Hardware Address Translation Layer

49
Introduction to HAT
  • Hardware Address Translation (HAT)
  • controls the hardware that manages mapping of
    virtual to physical memory
  • provides interfaces that implement the creation
    and destruction of mappings between virtual and
    physical memory
  • provides a set of interfaces to probe and control
    the MMU
  • implements all of the low-level trap handlers to
    manage page faults and memory exceptions

50
Solaris Virtual Memory Layers
51
Solaris Memory Model
52
Address Apace
  • Process Address Space
  • Process Text and Data
  • Stack (anon memory) and Libraries
  • Heap (anon memory)
  • Kernel Address Space
  • Kernel Text and Data
  • Kernel map Space (data structures, caches)
  • 32-bit kernel map (64-bit kernels only)
  • Trap table
  • Critical virtual memory data structures
  • Mapping File System Cache (segmap)

53
The Address Space
54
Role of the HAT layer in virtual-to-physical
translation
  • hides the platform-specific implementation
  • used by the segment drivers to implement the
    segment drivers view of virtual-to-physical
    translation
  • use hat to hold top-level translation information
  • hat structure is platform specific
  • hat is referenced by the address space structure
  • HAT-specific data structures existing in every
    page represent the translation information at a
    page level
  • HAT layer is called when the segment drivers want
    to manipulate the hardware MMU

55
Summarizes HAT functions
Function Description
hat_chgattr() Changes the protections for the supplied virtual address range.
hat_clrattr() Clears the protections for the supplied virtual address range.
hat_free_end() Informs the HAT layer that a process has exited.
hat_free_start() Informs the HAT layer that a process is exiting.
hat_get_mapped_size() Returns the number of bytes that have valid mappings.
hat_getattr() Gets the protections for the supplied virtual address range.
hat_memload() Creates a mapping for the supplied page at the supplied virtual address. Used to create mappings.
hat_setattr() Sets the protections for the supplied virtual address range.
hat_stats_disable() Finishes collecting stats on an address space.
hat_stats_enable() Starts collecting page reference and modification stats on an address space.
hat_swapin() Allocates resources for a process that is about to be swapped in.
hat_swapout() Allocates resources for a process that is about to be swapped out.
hat_sync() Synchronizes the struct_page software referenced and modified bits with the hardware MMU.
hat_unload() Unloads a mapping for the given page at the given address.
56
Virtual Memory Contexts Address Spaces
  • A virtual memory context is a set of
    virtual-to-physical translations that maps an
    address space
  • contexts change when
  • scheduler wants to switch execution from one
    process to another
  • a trap or interrupt from user mode to kernel
    occurs
  • virtual memory context zero refers to kernel
    context
  • HAT layer implements functions to create, delete,
    and switch virtual memory contexts
  • Different hardware MMUs support different numbers
    of concurrent virtual memory contexts

57
Hardware Translation Acceleration
  • translation lookaside buffer (TLB)
  • a hardware cache of recent translations
  • The number of entries in the TLB is typically 64
    on SPARC systems
  • TLB fill
  • hardware
  • such as Intel and older SPARC implementations
  • software algorithms
  • like the UltraSPARC architecture

58
The UltraSPARC-I -II HAT
  • The UltraSPARC-I -II MMUs do the following
  • Implement mapping between a 44-bit virtual
    address and a 41-bit physical address
  • Support page sizes of 8 Kbytes, 64 Kbytes, 512
    bytes, and 4 Mbytes

59
Virtual-to-Physical Translation
60
Translation Table Entry (TTE)
  • TTE is a translation map entry, one for each page
  • TTE contains a virtual address tag and the high
    bits of the physical address
  • TTEs must be loaded into the TLB
  • When MMU finds the TTE entry that matches the
    virtual page number and current context, it
    retrieves the physical page information

61
Relationship of TLBs, TSBs, and TTEs
Translation Software Buffer software cache of
TTEs a direct-mapped cache of the TLB an array
of TTEs in regular physical memory
62
TSB Size
Memory Size Kernel TSB Entries Kernel TSB Size User TSB Entries User TSB Size
lt 32 Mbytes 2048 128 Kbytes
32 Mbytes 64 Mbytes 4096 256 Kbytes 8192 16383 512 Kbytes 1 Mbyte
32 Mbytes 2 Gbytes 4096 262,144 512 Kbytes 16 Mbytes 16384 524,287 1 Mbyte 32 Mbytes
2 Gbytes 8 Gbytes 262,144 16 Mbytes 524,288 2,097,511 32 Mbytes 128 Mbytes
8 Gbytes -gt 262,144 16 Mbytes 2,097,512 128 Mbytes
63
Address Space Identifiers
  • describe the MMU mode and hardware used to access
    pages
  • derived from the instruction being executed and
    the current trap level
  • grouped into three different modes of physical
    memory access
  • The MMU translation context used to index TLB
    entries is derived from the ASI

ASI Description Derived Context
Primary The default address translation used for regular SPARC Instructions The address space translation is done through TLB entries that match the context number in the MMU primary context register
Secondary A secondary address space context used for accessing another address space context without requiring a context switch The address space translation is done through TLB entries that match the context number in the MMU secondary context register
Nucleus The address translation used for TLB miss handlers, system calls, and interrupts The nucleus context is always zero (the kernels context).
64
UltraSPARC-I II Watchpoint Implementation
  • watchpoint registers describe the address of
    watchpoints for the address space
  • Virtual address / physical address
  • Watchpoint traps are generated when
  • watchpoints are enabled, and
  • the data MMU detects a load or store to the
    virtual or physical address specified by the
    virtual address data watchpoint register or the
    physical data watchpoint register

65
UltraSPARC-I -II Protection Modes
Condition Condition Condition Resultant Protection Mode
TTE in D-MMU TTE in I-MMU Writable Attribute Bit Resultant Protection Mode
Yes No 0 Read-only
No Yes Dont Care Execute-only
Yes No 1 Read/Write
Yes Yes 0 Read-only/Execute
Yes Yes 1 Read/Write/Execute
66
UltraSPARC-I -II MMU-Generated Traps
Trap Description
Instruction_access_miss A TTE for the virtual address of an instruction was not found in the instruction TLB
Instruction_access_exception An instruction privilege violation or invalid instruction address occurred
Data_access_MMU_miss A TTE for the virtual address of a load was not found in the data TLB
Data_access_exception A data access privilege violation or invalid data address occurred
Data_access_protection A data write was attempted to a read-only page
Privileged_action An attempt was made to access a privileged address space
Watchpoint Watchpoints were enabled and the CPU attempted to load or store at the address equivalent to that stored in the watchpoint register
Mem_address_not_aligned An attempt was made to load or store from an address that is not correctly word aligned
67
TLB Performance and Large Pages
  • large pages
  • typically 4 Mbytes in size
  • optimize the effectiveness of the hardware TLB
  • memory performance is largely influenced by the
    effectiveness of the TLB
  • because of the time spent servicing TLB misses
  • TLBs are limited in size
  • only 64 entries in UltraSPARC-I and -II

68
TLB reach
  • TLB reach -- the amount of memory that TLB can
    address concurrently
  • TLB reach TLB entries Page size
  • 648 Kbytes, or 512 Kbytes
  • increase TLB reach
  • Increase the number of entries in the TLB
  • Increase the page size that each entry reflects
  • A trade-off method -- use two or more different
    page sizes at the same time
  • 8-Kbyte, 64-Kbyte, 512-Kbyte. Or 4-Mbyte pages

69
Solaris Support for Large Pages
  • 8 Kbytes
  • a good mix of performance across the range of
    smaller machines to larger machines
  • hurts large-memory scientific applications and
    large-memory databases
  • hurts kernel performance
  • 4 Mbytes
  • speeds up the kernel code path
  • frees up valuable TLB slots for hungry
    applications
  • accelerates graphics performance
  • Large-Page Database Performance Improvements

Database Performance Improvement
Oracle TPC-C 12
Informix TPC-C 1
Informix TPC-D 6
70
End
  • Last.first_at_Sun.COM
Write a Comment
User Comments (0)
About PowerShow.com