Title: Microsoft%20SQL%20Server%20Database%20Engine%20I/O
1Microsoft SQL ServerDatabase Engine I/O
- by Bob Dorr, Microsoft SQL Server Principle
Escalation Engineer, 1994 PresentBuilt Jan
2008
2Areas Covered
- Write Ahead Logging (WAL) Protocol
- Synchronous vs Asynchronous I/O
- Scatter / Gather I/O
- Sector alignment, Block Alignment
- Latching and a page A read walk-through
- SQL Server I/O Sizes
- Data cache maintenance
- PAE and AWE
- Read Ahead
- User Mode and Kernel Mode (SYSTRAP)
- Sparse Files and Copy On Write (COW) Pages
- Locked Pages
- Scribbler(s) and Bit flips
- Page Protection and Constant Pages
- Checksum vs Torn
- Stale Read
- Stalled I/O
3WAL Protocol
- Write Ahead Logging
- ACID (Durability Property)
- Log records secured before data
- Hardened / Stable Media
- Log contains parity bit
- Commit
- Rollback
- Trigger Snapshot
4Synchronous vs Asynchronous I/O
- Sync Wait for Completion
- Async Post and Continue
- Overlapped
- Event
- Completion Port
- SQL Server
- 98 Async Usage
- Overlapped and HasOverlappedIoCompleted
- Network Layers Use Completion Port
- Backup/Restore Use Sync Sequential Patterns
- dm_io_pending_io_requests
- Overlapped Structure
- Async Processing CPU
- Package vs Phone
5Scatter / Gather I/O
- Consolidates or Distributes
- APIs
- ReadFileScatter
- WriteFileGather
- Increases Efficiency
- Used by SQL I/O Paths
- Used by Windows Page File
Memory
Gather
Scatter
Disk
- Old Design 6.x Sorting
- AWE Availability
- WriteMultiple
- of 8K Pages
- Forward and Backward
- Buffer Pool Ramp-up
6Sector AlignmentBlock Alignment
- Sector Log Writes
- Block Performance
- Avoid Crossovers
- DiskPart/DiskPar Utilities
- Discuss with your Vendor
Alignment http//support.microsoft.com/kb/929491
To verify that an existing partition is aligned,
divide the size of the stripe unit by the
starting offset of the RAID disk group. Use the
following syntax ((Partition offset) (Disk
sector size)) / (Stripe unit size) Example of
alignment calculations in bytes for a 256-KB
stripe unit size (63 512) / 262144
0.123046875(64 512) / 262144 0.125(128
512) / 262144 0.25(256 512) / 262144
0.5(512 512) / 262144 1 These examples
shows that the partition is not aligned correctly
for a 256-KB stripe unit size until the partition
is created by using an offset of 512 sectors (512
bytes per sector).
- Double Touch
- Rewrites
- Defragment
- 4K Sectors
7Latch
- Multiple Readers (SH)
- One Writer (EX)
- Protects In-Memory Data Page
- Latch Physical Protection
- Lock Logical Protection
- User Mode
- UMS/SQLOS Aware
- Optimized FIFO Ordering
Memory (Data Pages)
BUF Array
BUF StatusLatchDatabasePageIdHash
- Flushed Rollback
- Latch Timeout
- Sub-latch
8Reading A Page
- Get Free Buffer for Read
- Acquire Exclusive (EX) Latch
- Is already in-memory/hashed?
- Add Entry to Page Hash
- Post and Record Asynchronous Read
- Continue Processing .
- Check Status (Scheduler Switch)
- Complete Validate I/O and Release Latch
0000gt uf ZwWriteFile mov r10,rcx mov
eax,5 Syscall ? Kernel Transition ret
- Page Audits
- Read retry
- Stalled I/O Warnings
- Error raised at Acquire
- Shared (SH) waiters
- PAGE_IO vs PAGE Latch
- Writing A Page
kernel transition Stuck I/O?ntdll!ZwWriteFile0
xa kernel32!WriteFile0xf6 sqlservr!DiskWriteAsy
nc0xee
9Myth Single Worker Per FileTruth Each Worker
Issues I/O
Vol 1
Vol 2
Create Database Workers Assigned by Volume
IDPrimary dbTest.MDFSecondary
dbTest.NDFLog dbTest.LDF
Worker 4
dbTest.MDF
dbTest.NDF
dbTest.LDF
Worker 5
Worker 3
Worker 1
Worker 2
Serial Planselect from dbTest.dbo.tblTestinser
t into dbTest.dbo.tblTest
Parallel Planselect from dbTest.dbo.tblTest ins
ert into dbTest.dbo.tblTest
10Data Cache Maintenance
- Memory Pressure LazyWriter
- Per NUMA Node
- Time Of Last Access (TLA)
- Recovery Interval Checkpoint
- Queue
- I/O Targets
- .LDF Usage Triggers
- Alternate Triggers (Backup, Restore, )
- Scatter/Gather Usage (WriteMultiple)
- Checkpoint Assignments
- By Ordinal Sweep
- Stalled I/O LW 0
- I/O Queue Depth gt 2
11PAE and AWE
- Physical Address Extensions
- /PAE in Boot.ini
- Boots Kernel with 36 bit addressing
- Physical Memory gt 4GB
- Virtual Address Unchanged (/2gb or /3GB)
- Automatic for Hot Add Memory Computers
- Address Windows Extension
- Windows APIs (AllocateUserPhysicalPages)
- Physical Memory Allocations
- Un/Mapped in or out of Virtual Address Range
- 32 Bit Address 4294967295 (0xFFFFFFFF) 4GB
- Interlocked Instruction lock xadd dword ptr
ecx,eax - 36 Bit Address 68719476735 (0xFFFFFFFFF) 64GB
- Multiple Instructions
- Data Pages-Only
- Locked Pages
- Windows Paging
- Windows 2000 Bugs
12Read Ahead
- 128 Pages Standard SKU
- 1024 Pages Enterprise SKU
- Uses ReadFileScatter
- Plan Based Decisions
- Power of Asynchronous I/O
13Sparse Files Copy On Write
- Usage
- Online DBCC
- Snapshot Databases
- Buffer Pool PrepareToDirty
- File Control Block (FCB) Chaining
- Sparse Allocation
- FCB Tracking
- Windows Limits
- New Page Allocations
14Advanced Protection
- What is a Scribbler?
- Data Page Audits
- None
- Torn Bits
- Checksum
- Log Block Checksum
- Constant Page
- Backup with Checksum
- DBCC Page Audit
- Stale Read Check
- SQLIOSim
15References
16Overview
- SQL Server Always Onhttp//www.microsoft.com/sql/
alwayson - SQL Server I/O Basics Chapter 1http//www.microso
ft.com/technet/prodtechnol/sql/2000/maintain/sqlIO
basics.mspx - SQL Server I/O Basics Chapter 2http//www.microso
ft.com/technet/prodtechnol/sql/2005/iobasics.mspx
17Fundamentals and Requirements
- KB230785 - SQL Server 7.0, SQL Server 2000 and
SQL Server 2005 logging and data storage
algorithms extend data reliability - KB917047 - Microsoft SQL Server I/O subsystem
requirements for the tempdb database - KB231347 - SQL Server databases not supported on
compressed volumes (except 2005 read only files)
18Subsystems
- KB917043 - Key factors to consider when
evaluating third-party file cache systems with
SQL Server - KB234656- Using disk drive caching with SQL
Server - KB46091- Using hard disk controller caching with
SQL Server - KB86903 - Description of caching disk controls in
SQL Server - KB304261- Description of support for network
database files in SQL Server - KB910716 (in progress) - Support for third-party
Remote Mirroring solutions used with SQL Server
2000 and 2005 - KB833770 - Support for SQL Server 2000 on iSCSI
technology components (applies to SQL
Server 2005)
19Design and Configuration
- White paper - Physical Database Layout and Design
- KB298402 - Understanding How to Set the SQL
Server I/O Affinity Option - KB78363 - When Dirty Cache Pages are Flushed to
Disk - White paper - Database Mirroring in SQL Server
2005 - White paper - Database Mirroring Best Practices
and Performance Considerations - KB910378 - Scalable shared database are supported
by SQL Server 2005 - MSDN article - Read-Only Filegroups
- KB156932 - Asynchronous Disk I/O Appears as
Synchronous on Windows NT, Windows 2000, and
Windows XP
20Diagnostics
- KB826433 - Additional SQL Server Diagnostics
Added to Detect Unreported I/O Problems - KB897284 - SQL Server 2000 SP4 diagnostics help
detect stalled and stuck I/O operations (applies
to SQL Server 2005) - KB828339 - Error message 823 may indicate
hardware problems or system problems in SQL
Server - KB167711 - Understanding Bufwait and Writelog
Timeout Messages - KB815436 - Use Trace Flag 3505 to Control SQL
Server Checkpoint Behavior - KB906121 - Checkpoint resumes behavior that it
exhibited before you installed SQL Server 2000
SP3 when you enable trace flag 828 - WebCast- Data Recovery in SQL Server 2005
21Certification Policy
- KB913945- Microsoft does not certify that
third-party products will work with Microsoft SQL
Server - KB841696 - Overview of the Microsoft third-party
storage software solutions support policy - KB231619 - How to use the SQLIOStress utility to
stress a disk subsystem such as SQL Server
22Utilities
- Download - SQLIO Disk Subsystem Benchmark Tool
- Download - SQLIOStress utility to stress disk
subsystem (applies to SQL Server 7.0, 2000, and
2005 - replaced with SQLIOSim and SQL Server 2008
installed in BINN)
23Blog Content
- SQL Server Urban Legends Discussedhttp//blogs.ms
dn.com/psssql/archive/2007/02/21/sql-server-urban-
legends-discussed.aspx - How It Works SQL Server Checkpoint (FlushCache)
Outstanding I/O Targethttp//blogs.msdn.com/psssq
l/archive/2008/04/11/how-it-works-sql-server-check
point-flushcache-outstanding-i-o-target.aspx - How It Works SQL Server Page Allocationshttp//b
logs.msdn.com/psssql/archive/2008/04/08/how-it-wor
ks-sql-server-page-allocations.aspx - How It Works Shapshot Database (Replica) Dirty
Page Copy Behavior (NewPage)http//blogs.msdn.com
/psssql/archive/2008/03/24/how-it-works-shapshot-d
atabase-replica-dirty-page-copy-behavior-newpage.a
spx - How It Works SQL Server 2005 I/O Affinity and
NUMA Don't Always Mixhttp//blogs.msdn.com/psssql
/archive/2008/03/18/how-it-works-sql-server-2005-i
-o-affinity-and-numa-don-t-always-mix.aspx - How It Works Debugging SQL Server Stalled or
Stuck I/O Problems - Root Causehttp//blogs.msdn.
com/psssql/archive/2008/03/03/how-it-works-debuggi
ng-sql-server-stalled-or-stuck-i-o-problems-root-c
ause.aspx - How It Works SQL Server 2005 Database Snapshots
(Replica)http//blogs.msdn.com/psssql/archive/200
8/02/07/how-it-works-sql-server-2005-database-snap
shots-replica.aspx - How It Works File Stream the Before and After
Image of a Filehttp//blogs.msdn.com/psssql/archi
ve/2008/01/15/how-it-works-file-stream-the-before-
and-after-image-of-a-file.aspx - Using SQLIOSim to Diagnose SQL Server Reported
Checksum (Error 824/823) Failureshttp//blogs.msd
n.com/psssql/archive/2008/12/19/using-sqliosim-to-
diagnose-sql-server-reported-checksum-error-824-82
3-failures.aspx - How to use the SQLIOSim utility to simulate SQL
Server activity on a disk subsystem
http//support.microsoft.com/kb/231619 - Should I run SQLIOSim? - An e-mail follow-up from
SQL PASS 2008 http//blogs.msdn.com/psssql/archiv
e/2008/11/24/should-i-run-sqliosim-an-e-mail-follo
w-up-from-sql-pass-2008.aspx - What do I need to know about SQL Server database
engine I/O? http//blogs.msdn.com/psssql/archive/
2006/11/27/what-do-i-need-to-know-about-sql-server
-database-engine-i-o.aspx - SQLIOSim is "NOT" an I/O Performance Tuning Tool
http//blogs.msdn.com/psssql/archive/2008/04/05/
sqliosim-is-not-an-i-o-performance-tuning-tool.asp
x - How It Works SQLIOSim - Running Average, Target
Duration, Discarded Buffers ...
http//blogs.msdn.com/psssql/archive/2008/11/12/h
ow-it-works-sqliosim-running-average-target-durati
on-discarded-buffers.aspx - How It Works SQLIOSim Audit Users and .INI
Control File Sections with User Count Options
http//blogs.msdn.com/psssql/archive/2008/08/19/h
ow-it-works-sqliosim-audit-users-and-ini-control-f
ile-sections-with-user-count-options.aspx - Understanding SQLIOSIM Output http//sqlblog.com/
blogs/kevin_kline/archive/2007/06/28/understanding
-sqliosim-output.aspx
24Additional Learning Resources
- Inside SQL Server 7.0 and Inside SQL Server 2000
- Written by Kalen Delaney her husband is Paul
Randle who wrote the core dbcc checks for SQL
7.0, 2000 and 2005 - The Gurus Guide to SQL Server Architecture and
Internals ISBN 0-201-70047-6 - Written by Ken after he joined Microsoft SQL
Server Support - Many chapters reviewed by developers and folks
like myself - SQL Server 2005 Practical Troubleshooting ISBN
0-321-44774-3 Ken Henderson - Authors of this book were key developers or
support team members - Cesar QP developer and leader of the QP
RedZone with Keithelm and Jackli - Sameert Developer of UMS and SQLOS Scheduler
- Santeriv Developer of the lock manager
- Slavao Developer of the SOS memory managers
and engine architect - Wei Xiao Engine developer
- Bart Duncan long time SQL EE and now
developer of the Microsoft Data Warehouse
performance focused - Bob Ward SQL Server Support Senior EE
- Advanced Windows Debugging ISBN 0-321-37446
- Written by Microsoft developers excellent
resource - Applications for Windows Jeffrey Richter
- Great details about Windows basics