Windows 2000XP Internals - PowerPoint PPT Presentation

1 / 194
About This Presentation
Title:

Windows 2000XP Internals

Description:

Uncover the internal algorithms of Windows 2000/XP ... Alerter. Event logger. Win32. 18. Environment Subsystems. Environment Subsystems and Subsystems DLLs ... – PowerPoint PPT presentation

Number of Views:135
Avg rating:3.0/5.0
Slides: 195
Provided by: pavelyos
Category:

less

Transcript and Presenter's Notes

Title: Windows 2000XP Internals


1
Windows 2000/XP Internals
  • Muhammad ZahalqaWe! Consulting
    groupmuhammad_at_we-can.co.iltryfinally_at_hotmail.com

2
Contents
  • Module 0 Introduction
  • Module 1 System Architecture
  • Module 2 Kernel Mechanisms
  • Module 3 Memory Management
  • Module 4 Processes, Threads and Jobs
  • Module 5 I/O System
  • Module 6 Windows XP Kernel Enhancements
  • Appendix Kernel Debugging

3
Windows 2000/XP Internals
  • Module 0
  • Introduction

4
Agenda
  • Course objectives
  • Target Audience
  • Resources

5
Course Objectives
  • Objectives
  • Understand Windows 2000/XP features and
    architecture
  • Uncover the internal algorithms of Windows
    2000/XP relevant to developers
  • Enhance the ability to design and implement
    optimized software for the Windows 2000/XP
    platform
  • Target Audience
  • Win32 Programmers, Device Drivers Programmers
  • Not Covered
  • Device Drivers Development
  • Win32 API Programming
  • Networking Internals

6
Resources
  • Books
  • Inside Windows 2000, 3rd Ed. / David Solomon
    Mark Russinovich (MS Press)
  • Programming Applications for Windows, 4th Ed. /
    Jeffery Richter (MS Press)
  • Windows NT/2000 Native API Reference / Gary
    Nebbet
  • Programming the Windows Driver Model / Walter
    Oney (MS Press)
  • Platform SDK Documentation
  • Windows 2000/XP Device Driver Kit Documentation
  • Web Sites
  • www.osr.com
  • www.sysinternals.com
  • www.microsoft.com/hwdev

7
Windows 2000/XP Internals
  • Module 1
  • System Architecture

8
Agenda
  • Windows 2000/XP Design Goals
  • Key System Components
  • Symmetric Multiprocessing
  • Portability
  • Professional vs. Server
  • System Architecture
  • Win32 Subsystem

9
Windows 2000/XP Design (1)
  • Separate address space per process
  • One process cannot (easily) corrupt anothers
    memory
  • Protected kernel
  • User mode code cant touch it in any way
  • Preemptive multithreading and multitasking
  • Supports up to 32 CPUs
  • Fully supports internationalization with Unicode
  • Meets requirements of C2 security level

10
Windows 2000/XP Design (2)
  • Integrated networking support
  • Fully supports Plug Play and Power Management
  • High performance file system (NTFS)
  • Supports protection, compression and encryption
  • Multiple personalities
  • DOS, Win16, Win32
  • POSIX, OS2 (Windows 2000 only)
  • Easily portable to new platforms

11
General Architecture Overview
Environment Subsystem
User Applications
Server Processes
System Processes
Subsystem DLLs
User mode Kernel mode
Executive
RegDB
Graphics (Win32k)
Device Drivers .
Kernel
Hardware Abstraction Layer (HAL)
12
User Mode Key Components
  • User Applications
  • One of Win32, Win16, DOS
  • OS2, POSIX (Windows 2000 only)
  • System Processes
  • Logon, Session Manager, etc.
  • Server Processes
  • Services Scheduler, Event Log, etc.
  • Environment Subsystem
  • Win32 Subsystem process (csrss.exe)
  • Subsystem DLLs
  • Expose an API to applications (kernel32.dll,
    user32.dll, gdi32.dll, advapi32.dll, etc.)

13
Kernel Mode Key Components
  • Executive
  • Virtual memory management, I/O, Cache, Process
    management, Security, IPC, Power management
  • Kernel
  • Basic OS functionality Thread scheduling,
    interrupt exception handling, multiprocessor
    synchronization
  • Hardware Abstraction Layer (HAL)
  • An abstraction of hardware to isolate hardware
    specifics from the kernel and device drivers
  • Device Drivers
  • Loadable kernel modules that translate I/O calls
    into specific hardware requests
  • Win32 Graphics and windowing system
  • Implemented in Win32k.sys

14
Symmetric Multiprocessing
  • SMP
  • All CPUs share same main memory and have equal
    access to peripheral devices (no master/slave)
  • Basic architecture supports up to 32 CPUs
  • Specific HAL and kernel (NtOsKrnl.Exe for 1 CPU)
    or NtKrnlMp.Exe (for more than 1 CPU)
  • Actual number of CPUs enabled is determined by
    licensing
  • Home (XP only) 1 CPU
  • Professional 2 CPUs
  • Server 4 CPUs
  • Advanced Server 8 CPUs
  • Datacenter Server 32 CPUs
  • Number of licensed CPUs is stored in registry
    under HKLM\System\CCS\Control\Session Manager
  • Licensed ProcessorsDWORD2

15
Portability
  • Windows 2000/XP has a layered design where
    processor architecture specific portions of the
    system are isolated into separate modules to
    shield upper layers from differences in hardware
    platforms.
  • Majority of OS written in portable C.
  • Assembly is used where direct access to system
    hardware is a must or performance is critical.
    Mainly HAL, Kernel, Executive routines
    (interlocked) and LPC facility

16
Professional vs. Server
  • Identical kernel
  • Some policy changes and other parameters tuning
  • Can determine if server using GetVersionEx
    (Win32) or MmIsThisAnNtAsSystem (DDK)
  • Windows XP has a Home Edition
  • Designed to replace the Windows 9x/ME family
  • Similar to Professional but with several
    limitations

17
Windows 2000/XP Architecture
System Processes
Services
Subsystems
Applications
Other
Other
Replicator
RPC
File server
Alerter
Logon
Session manager
Event logger
Win32

LPC
LPC
LPC
LPC
Windows 2000 System
Security monitor
Power Management
Executive
Process support
Memory Management
I/O manager
File systems
Object management/executive run time
Device drivers
Kernel
Hardware abstraction layer
Platform interface
Privileged architecture
Interrupt dispatch
I/O devices
DMA control
Bus mapping
Clocks/ timers
Cache control
18
Environment Subsystems
  • Environment Subsystems and Subsystems DLLs
  • Windows 2000 has three environment subsystems
    (POSIX, OS/2, Win32)
  • Windows XP has only Win32
  • Subsystems expose native kernel services as
    multiple OS personalities, semantics and I/O
    subsystem to user applications
  • Executables and DLLs are associated with only one
    Subsystem
  • Subsystem type is embedded in image header
  • Use the ExeType utility

19
Subsystems Components
  • Subsystem API DLLs
  • Win32 Kernel32.dll, User32.dll, Gdi32.dll, etc.
  • Implements some helper function, provide gate for
    calling Environment Process or kernel Executive
    services
  • Subsystem Process
  • Win32 Client/Server Run time Subsystem -
    CSRSS.EXE
  • Owner of display
  • Implement subsystem-wide functionality
  • Global state of subsystem handles objects
  • Processes and threads created under subsystem
  • Window management for character-mode applications
  • Kernel Mode Code
  • Win32 Only kernel-mode GDI and User code
    Win32K.SYS
  • Implements Win32 GDI functions by calling graphic
    device drivers

20
Environment Subsystems Information
  • Subsystem configuration and startup is maintained
    in registry
  • HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Contro
    l\Session Manager\SubSystems
  • Kmode location of Win32K.sys
  • Windows, Posix, Os2 location of subsystem files
  • Required list of subsystems to load at boot
    time
  • Optional list of subsystems to load on demand

21
Win32 Subsystem Functions
  • Exposed through a set of subsystem DLLs
  • Kernel32.dll low level functions (CreateFile,
    OpenProcess, WaitForSingleObject)
  • User32.dll user interface functions
    (GetMessage, CreateWindowEx, CreateMenu)
  • Gdi32.dll graphics functions (GetDC, CreatePen,
    Ellipse)
  • Advapi32.dll registry and security functions
    (RegCreateKeyEx, RegSetValueEx, AccessCheck)
  • Ole32.dll, OleAut32.dll COM support functions
    (CoCreateInstance, MkParseDisplayName)
  • Implementation
  • Some implemented entirely inside the DLL (e.g.
    PtInRect)
  • Some require transition to kernel mode (e.g.
    CreateFile)
  • Some also require notification of the subsystem
    process (e.g. PostThreadMessage)

22
Win32 API Calls
User Applications
Environment Subsystem
Subsystem DLLs
NTDLL.DLL
Executive
LPC
Graphics (Win32k)
Device Drivers .
Kernel
Hardware Abstraction Layer (HAL)
23
NTDLL.DLL
  • Special DLL required for all subsystems
  • Provides a gateway to kernel mode by implementing
    Native Windows functions (undocumented)
  • Some implementations are complete in NtDll.Dll
  • Others load the system service index in EAX, the
    arguments pointer in EDX and call int 2E to
    switch to kernel mode to the kernel System
    Service Dispatcher
  • The dispatcher uses the EAX register as an index
    to a table and calls the appropriate function
  • Windows XP on Pentium II and up uses special
    (faster) instructions to achieve the same effect
    (sysenter / sysexit)

24
NTDLL.DLL Examples
  • Contains
  • Internal support functions (Rtl, Csr, Ldr)
  • System calls dispatch stubs to the Executive

Exported fn() RtlFillMemoryUlong -
Ord01D0h 77FB96CC 57 push
edi 77FB96CD 8B7C2408 mov edi,
dword ptr esp08 77FB96D1 8B4C240C
mov ecx, dword ptr esp0C 77FB96D5 8B442410
mov eax, dword ptr
esp10 77FB96D9 C1E902 shr
ecx, 02 77FB96DC F3
repz 77FB96DD AB
stosd 77FB96DE 5F pop
edi 77FB96DF C20C00 ret 000C
Exported fn() NtFlushBuffersFile -
Ord007Eh Exported fn() ZwFlushBuffersFile -
Ord02FEh 77F83880 B841000000 mov
eax, 00000041 77F83885 8D542404
lea edx, dword ptr esp04 77F83889 CD2E
int 2E 77F8388B C20800
ret 0008
25
Function Call Flow
Call fread
application
  • E.g. ReadFile

call ReadFile
CRT
call NtReadFile return to caller
Kernel32.DLL
int 0x2E return to caller
NtDll.DLL
call NtReadFile dismiss int 0x2E
NtOskrnl.EXE
check parameters call driver block, or not return
to caller
NtOskrnl.EXE
initiate I/O return to caller
driver.sys
26
Dependency Walker
  • Allows exploring of images (EXE, DLL)
  • Imported and exported functions, linking types

27
Access Mode
  • Code under Windows 2000/XP always executes with
    an access mode
  • User mode
  • The mode user applications always run in
  • Code cannot touch kernel code/data in any way
  • Trying to do so causes an access violation
    exception
  • Code cannot touch hardware in any way
  • Ring 3 on Intel 80x86
  • Kernel (privileged) mode
  • The kernel (including device drivers) run in this
    mode
  • Code can do anything it wants
  • An unhandled exception will crash the system
    (Blue Screen)
  • Ring 0 on Intel 80x86

28
Code and Stacks
  • User mode code runs with a user space stack
  • Size is practically limited only by available
    virtual memory in process (but typically limited
    to 1MB)
  • Kernel mode code runs with a stack residing in
    kernel space
  • Size is 12KB (x86), 16KB (Alpha)
  • Mapped to system space (upper 2GB)
  • No documented way to change it
  • Usually non-pageable
  • This means that each thread is created with two
    stacks

29
System Crash
  • A.K.A. Blue Screen of Death (BSOD)
  • Called bugcheck by DDK docs
  • Generated by code in the kernel which decides
    "This should never have happened"
  • Kernel code is trusted, so should not do illegal
    things
  • Most often caused by an unhandled exception
    occurring in kernel mode
  • Most common exception is Access Violation, e.g.
  • Trying to dereference a NULL pointer
  • Trying to dereference a pointer to an unmapped
    address
  • Trying to write on a page protected by a
    read-only access

30
The Kernel API
  • Implementation
  • Most of the Kernel code is in NtOsKrnl.Exe
    (single CPU) or NtKrnlMp.Exe (Multi CPU)
  • Always called NtOsKrnl.Exe on the local hard disk
    (in the System32 directory)
  • Some implementation is in Hal.Dll
  • The DDK documents about 1/3 of the exported
    functions
  • Most functions have a prefix suggesting origin

31
Kernel API Prefixes
Ex - General executive routines Exp - Executive
private (not exported) Cc - Cache Manager
(Controller) Mm - Memeory Manager Rtl - General
runtime library FsRtl - file system runtime
library Ob - object management Io - I/O
subsystem Se - Security Ps - Process
structure Po - Power management Wmi - Windows
Management Instrumentation Zw - File and
registry access Ke - General Kernel Ki - Kernel
internal (not available outside of kernel) Hal -
hardware abstraction layer READ_xxx, WRITE_xxx -
I/O port and register access (HAL)
32
Executive
  • Functionality
  • Executive is the upper layer of NTOSKRNL and the
    Gate to kernel Mode
  • Components
  • Process and Thread Manager
  • Virtual Memory Manager
  • Security Reference Monitor
  • I/O System
  • Cache Manager
  • Plug Play Manager
  • Functions
  • Object Manager Functions
  • LPC Facility a flexible, optimized version of
    RPC
  • Run-Time Library string manipulation, data
    conversion arithmetic and security structure
    processing.
  • Executive Support Memory allocation, Interlocked
    Memory Access and Fast Mutex.

33
Kernel
  • The lowest layer of NTOSKRNL governs how OS uses
    the processor/s. It provides for
  • Thread scheduling and context switching
  • Interrupt handling and exception dispatching
  • Multiprocessor synchronization
  • CPU Architecture functions GDT and LDT
    manipulation on x86, CPU Cache Support
  • Generic Wait Operations
  • Provides foundation synchronization primitives
    for use by the Executive
  • Kernel Code
  • Mostly Resident (non pageable)
  • Interruptible but sometimes non preemptible
  • Search for functions beginning with Ke, Ki

34
Kernel Objects
  • Kernel Control Objects controls various OS
    functions
  • APC, DPC, IRP, Interrupt, Adapter, Device and
    more
  • Dispatcher kernel Objects have synchronization
    capabilities
  • Thread, process, mutant (mutex), event,
    semaphore, timer and more
  • Executive uses kernel objects to construct more
    complex Objects to provide to user mode. it adds
  • Policy Management through handles
  • Security Checks access and protects
  • Quotas Limits

35
HAL
  • Hardware Abstraction Layer
  • Purpose
  • Isolates Kernel and Executive from hardware
    specifics
  • Presents uniform model to ease device driver
    development and porting
  • HAL provides low level interface to
  • I/O System Specifics (buses, DMA, ports and
    registers)
  • Interrupt controllers and system timers
  • MP Communication
  • Hardware interrupt priorities
  • Importance somewhat reduced in Windows 2000/XP
  • Bus drivers do some of these functions

36
Device Drivers
  • Device Drivers are loadable kernel modules that
    interface between the I/O subsystem and hardware
  • Access hardware through the HAL
  • Typically written in C (sometimes in C)
  • Types
  • Hardware Device Drivers manipulate physical
    devices using the HAL
  • Also interact with the Plug Play manager
  • File System drivers handle file-oriented requests
  • Filter Drivers perform added-value processing
    on-top of another driver

37
System Process
  • Represents the kernel (in a way)
  • Hosts kernel threads
  • Always run in kernel mode
  • Number of threads is not constant (drivers are
    free to add their own threads under this process)
  • Process ID is constant
  • 2 (NT 4), 8 (2000), 4 (XP)

38
Viewing Process/Thread Information
  • Use PSTAT.EXE

39
Session Manager
  • First user-mode process created by
    ExInitializeSystem
  • Key system initializations from configuration
    information in registry HKEY_LOCAL_MACHINE\SYSTEM\
    CCS\Control\Session Manager
  • Loads subsystems processes (usually just CSRSS)
  • Acts as switch and monitor between applications
    and debuggers
  • Creates LPC port object \SmApiPort and two
    threads to wait for requests (loading subsystems,
    creating a new session)
  • Defines Symbolic Link for DOS Devices (C\, COM1,
    LPT1, etc.)
  • Opens Known DLLs
  • Loads Win32K.SYS
  • Starts Logon Process (Winlogon.exe)
  • Create LPC ports (DbgSsApiPort, DbgUiApiPort) and
    threads to handle debug events messages
  • Waits on CSRSS and WinLogon processes and crashes
    system if they are terminated

40
Windows 2000/XP Internals
  • Module 2
  • Kernel Mechanisms

41
Agenda
  • Object Management
  • The Registry
  • Interrupt Dispatching
  • Interrupt Request Level (IRQL)
  • Deferred Procedure Call (DPC)
  • Multiprocessor Synchronization

42
Object Manager
  • The kernel implements an object model to provide
    consistent and secure access to internal services
    in the executive
  • Executive objects are implemented in the
    executive.
  • Kernel objects are more primitive set of objects
    implemented in the kernel
  • Object Manager is an Executive component that
    manages system objects
  • Manager is responsible to creating, maintaining
    and tracking objects
  • Objects are named structures used to represent
    executive resources
  • Provides a common mechanism for using, naming and
    sharing objects
  • Manages an object hierarchical namespace.
  • Provides object protection with Access Control
    Lists (ACLs)
  • Manages object retention and maintains handle
    counts on objects

43
Object Namespace
  • View with WinObj.Exe (from MS or SysInternals)

44
User Mode Objects
  • Named user mode objects reside in
    \BaseNamedObjects

45
Object Structure
  • An Object represents a hardware or a software
    entity
  • Most objects are documented but some are opaque

Objects Name Object Directory Security
Descriptor Quota Charges Open Handle Count Open
Handles List Object Type Reference Count
Owned By Object Manager
Type Object
Type Name Access Type Synchronizable? Pageable? Ob
ject Methods
Object Data
Kernel Object
Owned by kernel
Owned by Executive
46
Object Handles
  • When a process creates an object, it receives a
    handle to it
  • Handles prevents direct access to objects
  • Handles provide for consistent interface to
    objects
  • The created object lives in kernel space
  • Object handle is an index into a process handle
    table
  • Code in kernel mode can manipulate objects
    directly
  • Handle Table Entries are two 32-bit values
  • 32-byte aligned pointer to object header
  • Flags
  • Inheritance designation will handle be inherited
    by a child process?
  • Protect from close
  • Audit on close
  • Access Mask a product of requested access rights
    and allowed access rights granted by Security
    Reference Monitor

47
Object Retention
  • Object Retention is the process of tracking the
    life time of an object.
  • User applications must open a handle to an object
    before using it.
  • Objects Manager tracks the open handles
  • Name retention is controlled by the number of
    open handles, when the count reaches zero, object
    manager deletes the object name from its
    namespace.
  • Reference counting records how many pointers to
    the object where dispensed to kernel mode
    components. When reference count reaches zero the
    object is deleted
  • View with oh.exe (resource kit) or Process
    Explorer (SysInternals)

48
Object Protection
  • Windows 2000/XP objects can be protected
  • Files, devices, mailslots, pipes, jobs,
    processes, threads, events, mutexes, semaphores,
    sections, I/O completion ports, LPC ports,
    waitable timers, access tokens, window stations,
    desktops, network shares, services, registry keys
    and printers
  • An object is created with a Security Descriptor
  • Determines who can do what with that object
  • When a caller requests access to an object
  • The object manager checks with the security
    system if the caller can obtain a handle to the
    object

49
Security Descriptor
  • Associated with an object upon creation
  • Main ingredients
  • Owner SID
  • Discretionary Access Control List (DACL)
  • Specifies who has what access to the object
  • System Access Control List (SACL)
  • Specifies which operations by which users should
    be logged on in the security audit log
  • Access Control List contains
  • A header
  • Zero or more Access Control Entry (ACE)
    structures
  • Each ACE contains a SID and an Access Mask

50
ACE Types
  • Access Allowed
  • Access is allowed for that SID
  • Access Denied
  • Access is denied for that SID
  • Allowed Object, Denied Object
  • Used only with Active Directory
  • These ACEs have a GUID (Global unique identifier)
  • See documentation for Active Directory
  • ACE order is important!

51
ACL Assignment
  • Upon creation, the object manager must assign a
    DACL to the newly created object
  • If the caller specifies a Security Descriptor,
    the object manager uses it
  • Also, if the object is named, and under a
    container (such as an Event under
    \BasedNamedObject) it adds inheritable ACL, if
    applicable
  • If the SD is NULL, and there is an inherited ACL,
    then that ACL is used
  • If the SD is NULL and no ACL is inherited, then
    the default DACL from the creator's SID is used
  • If the SD is NULL, no inherited ACL and no
    default DACL, then the object is created with a
    NULL DACL which means Everyone has access to the
    object

52
Determining Access (Simplified)
  • When a caller tries to open a handle to an
    object, an access check is made
  • If the object has no DACL (NULL) then it has no
    protection the access is allowed
  • If the caller has the take-ownership privilege,
    then a write-access is granted
  • If the caller is the owner of the object, then a
    read-control and write DACL access is granted
  • Each ACE in the DACL is examined from first to
    last
  • If an access allowed for that SID is present,
    access is granted to the object with the relevant
    access mask
  • If an access denied for that SID is present,
    access is denied to the object
  • If the end of the DACL is reached, access is
    denied

53
Process Explorer Tool
54
Object Names
  • User mode can see only \BaseNamedObjects and \??
    directories
  • Named Object Created by user applications resides
    in \BaseNamedObjects and are global on a computer
  • Kernel Device Driver places a symbolic link in
    \??
  • The Object Directory Object
  • Enables object manager to support hierarchal name
    space.
  • Kernel mode code can create object directories
    ZwCreateDirectoryObject
  • Symbolic Links
  • \?? (formerly \DosDevices) contains symbolic link
    objects
  • \??\A \Device\Floppy0
  • \??\COM1 \Device\Serial0
  • User mode can query with QueryDosDevice

55
LPC Facility
  • An optimized high-speed communication facility
    between client/server processes on one machine
  • Not Available through Win32 API, but RPC on same
    machine is switched by kernel to LPC
    automatically
  • Kernel API is exported, but undocumented
  • Types of Message Passing
  • Short messages below 256 bytes are placed in a
    buffer copied from one address space to another
  • Big messages are exchanged in a shared memory
    section mapped by both client/server processes
  • Larger amount of data can be directly read from
    or written to clients address space
  • LPC Port Object
  • Server Connection Port a named server request
    port.
  • Server Communication Port unnamed port used to
    talk to a client.
  • Client Communication Port unnamed port used to
    talk to a server.
  • Unnamed Communication Port unnamed port for
    communication among two threads in same process.

56
The Registry
  • Hierarchical repository of system / user
    configuration data
  • Some stored in files, some built dynamically and
    stored in memory
  • Access
  • REGEDIT.EXE
  • Originally written for Windows 95
  • Enhanced significantly in Windows 2000 and XP
  • REGEDT32.EXE
  • Originally written for Windows NT
  • UI not so convenient, functionality now identical
    to REGEDIT
  • Programmatically
  • APIs in Win32 and the Kernel
  • Activity
  • Can watch with RegMon.Exe from SysInternals

57
Registry The Hives
  • HKEY_LOCAL_MACHINE (HKLM)
  • Contains machine specific configuration (not user
    related)
  • HKEY_CURRENT_USER (HKCU)
  • Contains per-user information
  • HKEY_CLASSES_ROOT (HKCR)
  • Contains file extension associations and COM
    registration
  • HKEY_USERS
  • Contains sub-keys for each user ever logged on to
    the system
  • HKEY_CURRENT_CONFIG
  • Contains current hardware configuration
  • HKEY_PERFORMANCE_DATA
  • Contains performance counters data
  • Only accessible through registry APIs

58
HKEY_CURRENT_USER
  • Contains user specific information
  • Maintained in a file named NtUser.Dat stored
    under \Documents and Settings\ltuser namegt
  • Sub-keys
  • AppEvents sound/event associations
  • Console command window settings
  • Control Panel screen saver, desktop scheme,
    keyboard and mouse settings, etc.
  • Environment environment variable definitions
  • Keyboard Layout
  • Network network drives mappings
  • Printers printers connections settings
  • Software user specific software preferences

59
HKEY_CLASSES_ROOT
  • Contains file extension association (used by the
    shell Explorer.Exe) and COM servers
    registration (classes, interfaces, type
    libraries, applications, prog IDs)
  • Actual data comes from two sources
  • HKLM\Software\Classes
  • Per user class registration in HKCU\Software\Class
    es stored in \Documents and Settings\ ltusernamegt
    \LocalSettings\Application Data\
    Microsoft\Windows\UsrClass.dat
  • The addition of per-user classes is new to
    Windows 2000
  • Sub-keys
  • CLSID COM classes registration
  • Interface COM interfaces proxy\stub
    registration
  • TypeLib Registered type libraries
  • AppID COM server applications registration (not
    the same as COM applications)

60
HKEY_LOCAL_MACHINE
  • Contains information relevant to the machine
    regardless of the logged on user
  • Sub-keys
  • Hardware contains device descriptions detected
    during the boot process
  • SAM contains local users and group information
  • Actually a link to HKLM\Security\SAM
  • Security system-wide security policy and user
    rights assignments
  • Software System-wide configuration for system
    boot as well as third party software settings
    (directories, passwords, etc.)
  • System system-wide configuration needed to boot
    the system, such as device drivers and Win32
    services to load

61
Hive Store Path
  • Non volatile registry data is stored in files
  • HKLM\System
  • System32\Config\System
  • HKLM\SAM
  • System32\Config\SAM
  • HKLM\Security
  • System32\Config\Security
  • HKLM\Software
  • System32\Config\Software

62
Performance Monitor
  • Allows monitoring various system and process
    activities
  • Can be reached from the Administrative Tools
  • Applications can create their own add-in to the
    Performance monitor
  • Examples SQL Server, Internet Information Server
    (IIS)
  • Search the Performance Monitor documentation

63
Interrupt Dispatching
User or Kernel mode code
Interrupt Dispatch Routine
Interrupt
Record machine state (trap frame) to allow
resume Mask equal or lower IRQL interrupts Call
appropriate ISR Dismiss interrupt Restore
machine state
Tell the device to stop interrupting Start next
operation on device, etc. Request a DPC Return
to caller
64
Interrupt Request Level (IRQL)
  • Each interrupt has an associated Interrupt
    Request Level (IRQL)
  • Can be considered its priority
  • Each processors context includes its current
    IRQL
  • A CPU always runs the highest IRQL code
  • Servicing an interrupt raises the processor IRQL
    to the level of the interrupt's IRQL
  • This masks all interrupts at that IRQL and lower
  • Dismissing an interrupt restores the processor's
    IRQL to that prior to the interrupt
  • Allowing any previously masked interrupts to be
    serviced
  • A high IRQL interrupt preempts a lower IRQL one
  • WinDbgs !pcr ltprocessor gt shows current IRQL
  • ? IRQL shows the same info in SoftIce

65
IRQLs on Intel 80x86
HIGH_LEVEL (31)
POWER_LEVEL (30)
IPI_LEVEL (29)
CLOCK1_LEVEL (28)
Hardware interrupts
PROFILE_LEVEL (27)
Device n
.
.
Device 1
DISPATCH_LEVEL (2)
Software interrupts
APC LEVEL (1)
Thread priorities (0-31)
PASSIVE_LEVEL (0)
66
IRQL Levels (1)
  • PASSIVE_LEVEL (0)
  • The normal IQRL level
  • User mode code always runs at this level
  • APC_LEVEL (1)
  • Used for special kernel APCs
  • Not really interesting for driver writers
  • DISPATCH_LEVEL or DPC_LEVEL (2)
  • Many driver routines run at this IRQL
  • The kernel scheduler runs at this level
  • If the CPU runs code at this (or higher) level,
    no context switching will occur on that CPU until
    IRQL drops below this level
  • Also no waiting on kernel objects (requires
    scheduler)
  • Page fault handling also occurs at this level
  • Code running at this or higher IRQL must always
    access non-paged memory

67
IRQL Levels (2)
  • Device IRQL (DIRQL) (3-26)
  • Reserved for hardware devices
  • The level that an ISR runs at
  • Always greater than DISPATCH_LEVEL (2)
  • HIGH_LEVEL (31)
  • The highest level possible
  • If code runs at this level, nothing can interfere
    on that CPU
  • However, other CPUs are not affected
  • Use only when absolutely necessary!
  • Other levels exist for kernel internal use

68
Manipulating IRQL
VOID KeRaiseIrql( IN KIRQL NewIrql, OUT PIRQL
OldIrql)
VOID KeLowerIrql( IN KIRQL NewIrql)
  • KIRQL is just a UCHAR
  • Make sure KeRaiseIrql actually raises the IRQL
    and KeLowerIrql actually lowers it!
  • However, raising by zero is OK
  • To get the current IRQL call KeGetCurrentIRQL

69
Some IRQL Usage Rules
  • Each data set is protected at a particular IRQL
  • Must be at that IRQL to modify
  • Unsynchronized reading is sometimes acceptable
  • A low-IRQL routine must raise its IRQL to gain
    access to data synchronized at that IRQL
  • High-IRQL code must not modify data that may be
    written by low-IRQL code
  • Watch for IRQL restrictions on every routine in
    the DDK!
  • In general
  • KeRaiseIrql and KeLowerIrql should come in pairs
  • Always leave a routine at same IRQL level it was
    called
  • If you raise IRQL, insure all exit paths lower
    the IRQL

70
Deferred Procedure Call (DPC)
  • Used to defer processing from higher (device)
    IRQL to a lower (dispatch) level
  • Implemented via DPC objects and software
    interrupts
  • DPC object defines a procedure and arguments
  • Executes specified procedure at dispatch IRQL
    (also "dispatch level", "DPC level")
  • Used heavily for driver "after interrupt"
    functions

71
DPC Object
typedef struct _KDPC CSHORT Type UCHAR
CpuNumber UCHAR Importance LIST_ENTRY
DpcListEntry PKDEFERRED_ROUTINE DeferredRoutin
e PVOID DeferredContext PVOID
SystemArgument1 PVOID SystemArgument2 PULONG_P
TR Lock KDPC, PKDPC
  • Defined in ltwdm.hgt and ltntddk.hgt

72
DPC Queue
  • A list of "work requests"
  • One queue per CPU
  • But one CPU can process a DPC from another CPU's
    queue
  • Implicitly ordered by time (FIFO)
  • Can be somewhat manipulated with the Importance
    field
  • Each specifies procedure and arguments
  • Processed after all higher-IRQL work (interrupts)
    completed
  • DpcListEntry field member holds pointers to next
    and previous DPC object (if any)

73
The DPC Processing Loop
Software interrupt at DPC level
Attempt to remove a DPC object from the DPC queue
DPC routine executes and returns
Call DPC routine with arguments from the DPC
object
  • DPCs are queued and cannot interrupt each other
    on any one CPU
  • Each CPU in the system might be executing a
    different DPC at the same time

Did we get one ?
Yes
No (queue empty)
74
DPC Serialization
  • A DPC object can be on the queue only once
  • Trying to insert a DPC object into the queue if
    it's already there will do nothing, except return
    FALSE
  • If the DPC object was actually queued, TRUE is
    returned from IoRequestDpc or KeInsertDpcQueue,
    FALSE otherwise
  • The DPC object is removed from the queue before
    the DPC routine executes
  • A DPC can be queued while its previous instance's
    DPC routine is still executing

75
Synchronization on MP Systems
  • Raising the IRQL on one processor does not mask
    interrupts on other processors
  • So, IRQL based synchronization works only for a
    single CPU
  • DPCs are not necessarily MP-safe
  • Two DPCs might run simultaneously on two CPUs and
    access same data
  • A spin lock is used to protect shared data in an
    MP system
  • A driver should always assume its running on an
    MP system and use a spin lock

76
The Spin Lock
  • Synchronization on MP systems uses IRQLs within
    each CPU and spin locks to coordinate among the
    CPUs
  • A spin lock is just a data cell in memory
  • It is accessed with a test and modify operation,
    atomic across all processors
  • KSPIN_LOCK is an opaque type, typedefed as
    ULONG_PTR (ULONG on 32bit systems)
  • On the free build of Windows 2000/XP only bit 0
    is used

77
Spin Lock Concepts
  • Spin lock acquisition and release routines
    implement a one-owner-at-a-time algorithm
  • Analogues in concept to mutexes
  • But no ownership!
  • Where do spin locks come from?
  • Some are defined in the system
  • Some are automatically associated with I/O
    devices and related objects
  • Device drivers can create and use additional spin
    locks
  • A spin lock is either free or owned by a specific
    CPU
  • A CPU should own a spin lock that protects shared
    data before manipulating it

78
Spin Locks and IRQLs
  • Each spin lock has an associated IRQL
  • DISPATCH_LEVEL
  • "executive spin locks" - acquired by
    KeAcquireSpinLock or KeAcquireSpinLockAtDpcLevel
  • Device IRQL
  • "interrupt object spin locks" - acquired by
    KeSynchronizeExecution, interrupt dispatcher
  • HIGH_LEVEL
  • Not directly acquired by drivers acquired by
    some routines ("interlocked" routines)
  • Code that wants to own a spin lock first raises
    to the associated IRQL to synchronize with other
    requests on the same CPU
  • Must follow IRQL rules, otherwise a deadlock is
    possible!

79
Acquiring a Spin Lock
  • IRQL is implicit in the choice of routine
  • KeAcquireSpinLock uses IRQLDISPATCH_LEVEL
  • KeAcquireSpinLockAtDpcLevel does not change the
    IRQL
  • KeSynchronizeExecution and interrupt dispatcher
    use SyncIrql found in interrupt object
  • ExInterlockedXxx routines use IRQLHIGH_LEVEL
  • spin locks should not be requested if already
    owned
  • Causes a deadlock!

Raise to associated IRQL
Test and set the spin lock bit
Was it previously clear?
No
Yes
This CPU now owns the spin lock
80
Spin Locks and Single CPU Systems
  • All drivers should use spin lock routines, not
    KeRaise/LowerIrql
  • Assume the driver runs or might run on an MP
    system
  • Except for code that is inherently single
    threaded (e.g. DriverEntry routine, Unload
    routine)
  • Single and multi CPU systems use a different
    version of the kernel
  • NTOSKRNL.EXE vs. NTKRNLMP.EXE
  • On single CPU systems, free build
  • KeAcquire/ReleaseSpinLock ignore the spin lock
    argument and simply call KeRaise/LowerIrql
  • KeAcquire/ReleaseSpinLockAtDpcLevel do nothing
  • KeSynchronizeExecution does only IRQL changes

81
IRQLs and Spin Locks in the Checked Build
  • Spin lock routines actually acquire and release
    the spin lock, even on a single CPU system
  • This means the MP kernel is always running in the
    checked build, regardless of the number of CPUs
  • Consistency checks
  • Check for release of a non-owned spin lock
  • Check for recursive spin lock acquisition
  • Many system routines check for correct IRQL
  • Bugcheck if more than 250msec spent at
    DISPATCH_LEVEL IRQL or above
  • Many other checks
  • The free build does not make these checks

82
Queued Spin Locks
  • New to Windows XP
  • Efficient version of dispatch-level spin locks
    (IRQL 2)
  • Ensure that the spin lock is acquired on a
    first-come first-serve CPU basis
  • Check KeAcquireInStackQueuedSpinLock and
    KeReleaseInStackQueuedSpinLock in the DDK

83
Windows 2000/XP Internals
  • Module 3
  • Memory Management

84
Agenda
  • Memory Manager Features
  • Virtual Memory vs. Physical Memory
  • Working Set
  • Memory Mapped Files and Shared Memory

85
Memory Manager Features
  • The Windows 2000/XP Memory Manager provides
  • 4GB flat address space per process
  • Maximum RAM
  • 4GB (Home (XP), Professional, Server)
  • 8GB (Advanced Server)
  • 64GB (Datacenter Server)
  • Sharing pages between processes
  • Memory mapped files
  • True 64-bit OS on Intel Itanium (XP only)

86
4GB Virtual Address Space
  • 2 GB per-process
  • Address space of one process is not directly
    reachable from other processes
  • 2 GB systemwide
  • The operating system is loaded here, and appears
    in every processs address space

00000000
.EXE code Globals Per-thread user mode
stacks Process heaps .DLL code
Unique per process, accessible in user or kernel
mode
7FFFFFFF
80000000
Exec, Kernel, HAL, drivers, per-thread kernel
mode stacks, Win32K.Sys File system cache Paged
pool Non-paged pool
Per process, accessible only in kernel mode
C0000000
Process page tables, hyperspace
System wide, accessible only in kernel mode
FFFFFFFF
87
3GB Address Space Option
00000000
  • Only available on x86 Windows 2000 Advanced and
    Datacenter Server
  • Boot with /3GB option in BOOT.INI
  • Chief loser in system space is file system
    cache
  • Expands per-process address space
  • But image must be marked as large address
    space aware

.EXE code Globals Per-thread user mode
stacks .DLL code Process heaps
Unique per process, accessible in user or kernel
mode
Per process, accessible only in kernel mode
BFFFFFFF
C0000000
Process page tables, hyperspace
System wide, accessible only in kernel mode
Exec, kernel, HAL, drivers, etc.
FFFFFFFF
88
Virtual Address Translation
  • Hardware translates each virtual address to a
    physical address

virtual address
virtual page number
byte within page
page directory
page fault (exception handles by software)
Address Translation (hardware)
page tables
if page is not valid...
translation lookaside buffer (TLB)
physical page number
byte within page
recently used page table entries
89
Virtual Address Translation Example (x86)
Virtual address
31
0
CR3
10 bits
10 bits
12 bits
RAM
PDE
Page
Byte within page
1024 entries
PTE
1024 entries
Page directory (one per process)
Page table(s)
90
PDE and PTE Layout
  • Memory management is always done by Pages
  • Determined by CPU architecture
  • 4KB on Intels 80x86
  • 8KB on Digitals Alpha
  • Each PDE (Page Directory Entry) and PTE (Page
    Table Entry) is 32 bits
  • Upper 20 bits are the Page Frame Number (PFN)
  • Bit 0 is the Valid bit
  • If set the page exists in RAM
  • Otherwise accessing the page will cause a page
    fault
  • Other bits exist (check Inside Windows 2000)

91
Address Windowing Extensions
  • Temporary solution to providing access to large
    amounts of physical memory (gt4GB)
  • Platform independent
  • Applications allocate physical memory
  • Then map views of physical memory into their
    virtual address space (can do I/Os to it)
  • Win32 functions
  • AllocateUserPhysicalPages
  • MapUserPhysicalPages
  • A true 64 bit OS (XP on Itanium) will make this
    solution obsolete

92
Virtual Memory Pages
  • Page states
  • Free not mapped
  • Access will cause an Access Violation exception
  • Reserved not mapped, but new allocations will
    not use that address space
  • Allows allocations later as needed
  • Access will cause an Access Violation exception
  • Committed mapped to RAM or a page file
  • Access is allowed, although a page fault
    exception might be raised to fetch the page from
    disk
  • Page protection
  • Each page in virtual memory can be protected
  • Read only, read/write, execute only, no access,
    etc.

93
Sharing Pages (1)
Process B
Process A
RAM
Kernel32.DLL code
Kernel32.DLL code
Kernel32.DLL code
Process B code
EXE code
EXE code
Process A code
94
Sharing Pages (2)
  • Code pages are shared between processes
  • 2 or more processes based on the same images
  • DLL code
  • However, DLLs must be loaded in same address
  • Data pages (read/write) are shared at first
  • But with special protection called Copy-On-Write
  • If one process changes the data, an exception is
    caught by the Memory Manager, which creates a
    private copy of the accessed page for that
    process
  • Removing the Copy-On-Write protection

95
Memory APIs in User Mode (1)
High Level
C/C runtime API
Local/Global API
Heap API
Virtual API
Low Level
96
Memory APIs in User Mode (2)
  • Virtual API
  • VirtualAlloc, VirtualFree, VirtualProtect, etc.
  • Lowest level API
  • Works on page granularity only
  • Allows reserving and/or committing of memory
  • Good for large allocations
  • Heap API
  • HeapCreate, HeapAlloc, HeapFree, etc.
  • Uses the Virtual API internally
  • Allows small allocations without wasting pages
  • C/C runtime
  • Malloc, realloc, free, operator new, etc.
  • Uses the Heap API (usually compiler dependent)
  • Uses the Default Heap (always exists per process)
  • Local/Global API
  • LocalAlloc, GlobalAlloc, GlobalLock, LocalFree,
    etc.
  • Mostly for compatibility with Win16
  • But some new APIs use it (e.g. CreateStreamOnHGlob
    al)
  • Global/Local are practically the same

97
Virtual View of a Process
  • Use Process Walker (pwalk.exe) from Resource kit

98
Committed Memory
  • (almost) All committed virtual memory is mapped
    to files
  • Except non-paged pool
  • Ranges of virtual address space are mapped to
    ranges of blocks within disk files
  • These files are the backing store for virtual
    address space
  • Commonly-used files are
  • The system paging file(s)
  • For writeable, non-shareable pages
  • For read-only application-defined code and for
    shareable data
  • Executable program or DLL
  • Can set up additional file/virtual address space
    relationships at run time (Memory Mapped Files)

99
Page File(s)
  • Backup storage for writeable, non-shareable
    committed memory
  • Up to 16 page files are supported
  • On different partitions
  • Initial size and maximum size can be set
  • Using the System applet in Control Panel
  • Named PageFile.Sys on disk
  • Created contiguous on boot
  • Initial value should be maximum of normal usage
  • Recommended 2 X size of RAM
  • When page files space run low
  • System running low on virtual memory
  • First time Before page file expansion
  • Second time When committed bytes reaching
    commit limit
  • System out of virtual memory
  • Page files are full

100
Working Set
  • Working set The subset of the process virtual
    address space residing in physical memory
  • Essentially, all the pages the process can
    reference without incurring a page fault
  • Upper limit on size for each process
  • When limit is reached, a page must be released
    for every page thats brought in (working set
    replacement)
  • Working set limit The maximum number of pages
    the process can own
  • Default value for new processes
  • System-wide maximum computed at boot time

101
Working Set Replacement
  • When working set of process is too large
  • Working set must be trimmed
  • Algorithm is dependent on number of CPUs
  • Single CPU Least Recently Used (LRU)
  • Most needed pages remain in the working set
  • Multi CPU First In First Out (FIFO)
  • First page to arrive on the working set is also
    first page to leave the working set
  • LRU is not used to refrain from invalidating TLB
    entries on other CPUs

102
Page Faults
  • Hard page faults involve a disk read
  • Some hard page faults are unavoidable
  • Code is brought into physical memory (from EXEs
    and DLLs) via page faults
  • More than one page is read as an optimization
  • The file system cache reads data from cached
    files in response to page faults
  • Soft page faults are satisfied in memory
  • A shared page thats valid for one process can be
    faulted into other processes
  • Pages can be faulted back into a process from the
    standby and modified page list (described later)
  • Performance counters
  • Page faults/sec versus page reads/sec
  • Demand zero faults/second

103
Balance Set Manager
  • Balance set sum of all in-swapped working sets
  • Balance Set Manager is a system thread
  • Wakes up every second. If paging activity high
    or memory needed
  • Trims working sets of processes
  • If thread in a long user-mode wait, marks kernel
    stack pages as pageable
  • If process has no nonpageable kernel stacks,
    outswaps process
  • Triggers a separate thread to do the outswap,
    gradually reducing target processs working set
    to zero
  • Evidence Look for threads in Transition state
    in PerfMon
  • Means that kernel stack has been paged out, and
    thread is waiting for memory to be allocated so
    it can be paged back in
  • This thread also performs a scheduling-related
    function
  • Priority inversion avoidance

104
Memory Information (1)
  • Task Manager Processes Tab
  • Mem Usage physical memory used by process
    (working set size, not working set limit)
  • VM Size private (not shared) committed
    virtual space in processes
  • Mem Usage in status bar is same as commit
    charge/commit limit in Performance tab (see
    next slide) - not same as Mem Usage column here!

1
2
1
2
3
4
4
3
105
Memory Information (2)
  • Performance Monitor Process object
  • Working Set working set size (not limit)
  • Private Bytes same as VM Size from Task
    Manager Processes list
  • Virtual Bytes committed virtual space,
    including shared pages
  • Also In Threads object, look for threads in
    Transition state - evidence of swapping (usually
    caused by severe memory pressure)

1
2
6
2
6
1
106
Process Explode Tool (1)
  • Pview.exe from Resource kit

Virtual sizes of committed sections of image and
DLLs or total of all (total selected) Virtual
sizes of sections mapped after image startup
(including DLLs loaded with LoadLibrary)
Process-private committed virtual address space
(i.e. paging file allocation) note, writecopy
writeable, but not written to yet. Windows
2000/XP has yet to create process-private pages
for these they are still shared they become
private commit when written to Some, but not
all, of this info is also shown by Process
Viewers memory detail button
107
Process Explode Tool (2)
Total virtual address space (committed PLUS
reserved, private and shared) WS working set
(physical) PF paging file space allocated (not
necessarily written to!) Same as PerfMon
private bytes, TaskMan VM size Systemwide
paged pool (virtual) and nonpaged pool used by
this process Systemwide paged pool Systemwide
nonpaged pool Paging file space allocated by
all processes OS Note, limits in the last
three groups are per-process limits i.e., how
much each process can use of these
7
1
2
108
Memory Information (3)
Commit charge total total of private (not
shared) committed virtual space in all processes
i.e., total of VM Size from processes display,
paged Kernel Memory Commit charge limit
sum of available physical memory for processes
free space in paging file
3
4
3
3
4
4
3
109
Unassigned Physical Memory
  • System keeps unassigned (available) physical
    pages on one of several lists
  • Free page list
  • Modified page list
  • Standby page list
  • Zero page list
  • Bad page list
  • Lists are implemented by entries in the PFN
    database

110
Paging Dynamics
demand zero page faults
page read from disk or kernel allocations
Standby PageList
Zero Page List
Free PageList
Bad Page List
Process Working Sets
Modified PageList
working set replacement
Private pages at process exit
111
Standby and Modify Page Lists
  • Used to
  • Avoid writing pages back to disk too soon
  • Avoid releasing pages to the free list too soon
  • The system can replenish the free page list by
    taking pages from the top of the standby page
    list
  • This breaks the association between the process
    and the physical page
  • i.e., the system no longer knows if the page
    still contains the process info
  • Pages move from the modified list to the standby
    list
  • Modified pages contents are copied to the pages
    backing stores (usually the paging file) by the
    modified page writer (see next slide)
  • The pages are then placed at the bottom of the
    standby page list
  • Pages can be faulted back into a process from the
    standby and modified page list
  • The SPL and MPL form a system-wide cache of
    pages likely to be needed again

112
Modified Page Writer
  • Moves pages from modified to standby list, and
    copies their contents to disk
  • i.e., this is what writes the pag
Write a Comment
User Comments (0)
About PowerShow.com