Title: Windows XP Performance and Tuning: An Update
1Windows XP Performance and Tuning An Update
- Demand Technology Software
- 1020 Eighth Avenue South, Suite 6, Naples, FL
34102 - phone (941) 261-8945 fax (941) 261-5456
- e-mail markf_at_demandtech.com
- http//www.demandtech.com
2Windows XP
- Designed to provide an upgrade path from Windows
(9x, ME) to Windows NT - Conceived with usability, not performance, in
mind - Unified desktop permits MS apps to exploit native
NT technology - Multithreading, file cache, NTFS, etc.
- Another new UI to get used to
- A minor, maintenance release of Windows NT Server
(internally, version 5.1) - Synchronized with Windows XP 64 bit support
3Windows XP
- No major changes in the way the OS works!
- New Power Management-related Processor
utilization Counters - Many incremental improvements
- Several changes designed to enhance scalability
- Prefetching to speed-up program loading
(including the boot process) - New Volume snapshot copy APIs for backing up Open
files with integrity
4Windows NT evolution new in WinXP
5Windows NT evolution new in WinXP
- Boot and image file prefetch
-
- (An attempt to answer criticisms leveled at OS
designers by Jeffrey Raskin in his influential
book, The Humane Interface)
6Windows NT evolution new in WinXP
- Volume
- shadow
- copy
- IOCTL_VOLSNAP_FLUSH_AND_HOLD_WRITES,
IOCTL_VOLSNAP_RELEASE_WRITES
7Windows NT evolution new in WinXP
- New Networking tab in Taskman
8Windows NT evolution new in WinXP
9Windows NT evolution new in WinXP
- New built-in security profiles
- SYSTEM
- LOCAL SERVICE
- NETWORK SERVICE
10Windows NT evolution new in WinXP
11Windows XP 64 bit support
- Major new version of the OS to support new Intel
64-bit processors - P7 chips
- 64-bit virtual addressing
- Matches look-and-feel of Windows XP 32-bit
desktop and applications
12Intel 786 IA-64 architecture
- EPIC Explicitly Parallel Instruction Execution
- Explicit parallelism
- Predication
- Speculation
- Massive Resources
- 10 GHz by 2010
- First generation Itanium chips 800 MHz
- .013 micron fabrication process
- Second generation Itanium chips (McKinley)
- Clocked at 1.2 GHz and higher
- 400 MHz X 128 bit internal system bus
13Intel 786 IA-64 architecture
- Very difficult to compare performance of the P7
to the P6 - Significant architectural differences
- New instruction set
- Parallel programming model
- Massive microprocessor designed for high-end
applications - Currently, requires Intel compiler optimizations
that exploit its major architectural features - Far superior Floating Point performance
14Intel 786 IA-64 architecture
- Parallel Execution Resources
- 2 Memory Units
- 2 Integer Units
- 2 Floating Point Units
- 3 Branch Units
- all designed to execute up to six separate
instructions in parallel
15Intel 786 IA-64 architecture
- Massive Resources extended Register set
- 64-bit Instruction Pointer (IP)
- 128 64-bit GPRs, plus an associated Not a Thing
(NaT) bit - some GPRs have reserved meanings
- GR 0 is hardwired to always contains a Zero value
- GR 1 is a global data pointer (gp) for the
currently addressable global data segment - Register stacking functions for loop optimization
- 128 82-bit Floating Point Registers
- 128 64-bit dedicated Application Registers
- e.g., 8 dedicated Kernel registers (AR0-AR7)
- 64 1-bit Predicate Registers
- 8 64-bit Branch Registers
16Intel 786 IA-64 architecture
- VLIW Very Long Instruction Word
- 16-byte Instruction Bundles
- (aligned on 16-byte boundaries)
- 5-bit template, followed by
- 3 41-bit instruction slots
- Can be filled out with No Ops
- Compiler optimization
- Match Instruction Bundles to Execution Resources
- Instruction dispersal
17Intel 786 IA-64 architecture
- Memory Latency
- Instructions executing in parallel all stall
during memory waits
18Intel 786 IA-64 architecture
- Strategies to minimize memory latency
- Instructions executing in parallel all stall
during memory waits - Utilizes a Register stack for passing parameters
to and from functions - Function arguments do not have to be loaded from
memory - Register stack overflows into process virtual
memory - Speculative Loads from memory
19Intel 786 IA-64 architecture
- Speculation
- Data speculation
- Advanced Load with an associated Check to ensure
that there was no intermediate store instruction - ld8.a r6r8 makes an entry in the ALAT
(Advanced Load Address Table) - ld8.c r6r8 is a zero cycle Check instruction
that must be issued prior to using the data
loaded in r6 speculatively - Store into memory at r8 sets the NaT Register
bit invalidates the ALAT entry, causing the
processor to recover the Load - Control speculation
- Advanced Load in front of a Branch instruction
with an associated Check to ensure the Branch was
taken - ld8.s r6r8
20Intel 786 IA-64 architecture
- Predication
- Conditional execution of an instruction based on
a qualifying predicate value - Contained in a Predicate Register
- Uses
- If conversion remove branches from IF-THEN-ELSE
constructs and execute in-line predicated
instructions - Loop optimizations (control parallel execution)
21HP i2000 Itanium Workstation
- Uses first generation Itanium chip
- 733 MHz
- 4.2 GB/sec system bus
- 1 GB RAM
- DVD/CD drive
- Ethernet port
- Etc.
22HP i2000 Itanium Workstation
- Uses first generation Itanium chip
- Install Evaluation copy of Windows XP 64 bit
from bootable CD-ROM - Test Performance SeNTry collection agent
2364-bit Address Space
- One uniform 64-bit Virtual address space
- 7152 GB Process address spaces are built on demand
0
User Mode User Space
6fc 0000 0000
Kernel Mode User Space
1fff ff00 0000 0000
User Page Tables
2000 0000 0000 0000
Session Space
3fff ff00 0000 0000
Session Space Page Tables
e000 0000 0000 0000
System Space
e000 0600 0000 0000
System Space Page Tables
ffff ff00 0000 0000
2464-bit Address Space
2564-bit Windows Applications
- WOW64 provides emulation services for 32-bit
applications - Thunking in User mode is performed to extract
arguments from the 32-bit stack, extend them to
64 bits, then make the native 64-bit system call
to ntdll.dll. - WOW64.dll, WOW64cpu.dll, and WOW64win.dll
increase the size of the applications working
set significantly - System calls redirected to systemroot\SysWOW64
for 32-bit DLLs
2664-bit Windows Applications
- WOW64.dll, WOW64cpu.dll, and WOW64win.dll
increase the size of the applications working
set significantly
2764-bit Windows Programming
- New data types
- DWORD32 32-bit unsigned integer
- DWORD64 64-bit unsigned integer
- INT32 32-bit signed integer
- INT64 64-bit signed integer
- LONG32 32-bit signed integer
- LONG64 64-bit signed integer
- UINT32 Unsigned INT32
- UINT64 Unsigned INT64
- ULONG32 Unsigned LONG32
- ULONG64 Unsigned LONG64
2864-bit Windows Programming
- New Pointers
- POINTER_32
- A 32-bit pointer. On 32-bit Windows, this is a
native pointer. On 64-bit Windows, this is a
truncated 64-bit pointer. - POINTER_64
- A 64-bit pointer. On 64-bit Windows, this is a
native pointer. On 32-bit Windows, this is a
sign-extended 32-bit pointer.
2964-bit Windows Programming
- New 64-bit compiler
- macros
- _WIN64 64-bit platform.
- _WIN32 32-bit platform. This value is also
defined by the 64-bit compiler for backward
compatibility. - _WIN16 16-bit platform
- Inline Helper functions to convert from one data
type to another - E.g., UIntToPtr
30Where to get more information
- Windows XP Kernel improvements create a more
robust, powerful, and scalable OS - by David Solomon and Mark Russinovich,
- MSDN Magazine, December 2001.
- Itanium Processor Microarchitecture Reference
http//developer.intel.com/design/itanium/download
s/245474.htm - Programming Itanium-based Systems
- By Triebel, Bissell and Booth (Intel Press)
- TechNet or the Microsoft Developer Network (MSDN)
CD