Title: Parallel%20Execution%20Plans
1Parallel Execution Plans
- Joe Chang
- jchang6_at_yahoo.com
- www.sql-server-performance.com/joe_chang.asp
2Parallel Execution Plans
- Allows single query to use multiple processors
- Query should run faster but may consume more
resources - Example
- 1 thread 10 sec run time, 10 CPU-sec
- 2 threads 6 sec run time, 12 CPU-sec
3Parallel Execution Configuration
- Cost Threshold For Parallelism
- Minimum query plan threshold for considering
queries for parallel execution - Default 5 Considering increasing to 20-50 for
new systems - Max Degree of Parallelism
- Default 0 Can use all available processors
- SQL Server determines level based on available
memory and recent CPU usage
4Parallel Plan Operators
The Distribute Streams operator consumes a single
input stream of records and produces multiple
output streams. The record contents and format
are not changed. Each record from the input
stream appears in one of the output streams. This
operator automatically preserves the relative
order of the input records in the output streams.
Usually, hashing is used to decide to which
output stream a particular input record belongs.
The Repartition Streams operator consumes
multiple streams and produces multiple streams of
records. The record contents and format are not
changed. Each record from an input stream is
placed into one output stream. If this operator
is order-preserving, then all input streams must
be ordered and merged into several ordered output
streams.
The Gather Streams operator consumes several
input streams and produces a single output stream
of records by combining the input streams. The
record contents and format are not changed. If
this operator is order-preserving, then all input
streams must be ordered.
5Execution Plan Cost Formulas
Index Seek Plan Formula I/O Cost 0.006328500
0.000740741 per additional page (1GB)
0.003203425 0.000740741 per additional page
(gt1GB) CPU Cost 0.000079600 0.000001100 per
additional row
Bookmark Lookup May have changed ? I/O Cost
multiple of 0.006250000 (1GB)
multiple of 0.003124925 (gt1GB) CPU Cost
0.0000011 per row
Table Scan or Index Scan I/O 0.0375785
0.0007407 per page CPU 0.0000785 0.0000011
per row
Table Scan or Index Scan IUD I/O Cost 0.01002
0.01010 (gt100 rows) IUD CPU Cost 0.000001 per
row
6Cost Interpretation
- Time in seconds? CPU time?
Too fast for 7200RPM disk random I/Os.
0.0062500sec -gt 160/sec 0.000740741 -gt1350/sec
(8KB) -gt169/sec(64K)-gt 10.8MB/sec
About right for 1997 sequential disk transfer
rate?
S2K BOL Administering SQL Server, Managing
Servers, Setting Configuration Options cost
threshold for parallelism Opt Query cost refers
to the estimated elapsed time, in seconds,
required to execute a query on a specific
hardware configuration.
7Test Table
CREATE TABLE M3A_20 ( GroupID int NOT NULL, ID
int NOT NULL, ID2 int NOT NULL, ID3 int NOT
NULL, ID4 int NOT NULL, sID smallint NOT
NULL, bID1 bigint NOT NULL, bID2 bigint NOT
NULL, bID3 bigint NOT NULL, rMoney money NOT
NULL, rDate datetime NOT NULL, rReal real NOT
NULL, rDecimal decimal (9,4) NOT
NULL, CONSTRAINT PK_M3A_20 PRIMARY KEY
CLUSTERED ( GroupID, ID ) WITH FILLFACTOR
100 ) GO
8Data Population Script 1
SET NOCOUNT ON DECLARE _at_BatchTotal int,
_at_BatchSize int, _at_TotalRows int, _at_BatchStart int,
_at_BatchEnd int, _at_BatchRow int, _at_I int,
_at_RowsPerPage bigint , _at_Card int , _at_DistinctValues
int SELECT _at_BatchStart1, _at_BatchEnd1000,
_at_BatchTotal1000, _at_BatchSize100000,
_at_RowsPerPage100, _at_Card100000 SELECT
_at_TotalRows_at_BatchTotal_at_BatchSize SELECT
_at_I(_at_BatchStart-1)_at_BatchSize1,
_at_DistinctValues_at_TotalRows/_at_Card WHILE
_at_BatchStart lt _at_BatchEnd BEGIN BEGIN
TRANSACTION SELECT _at_BatchRow
_at_BatchStart_at_BatchSize WHILE _at_I lt _at_BatchRow
BEGIN INSERT M3A_20 (GroupID, ID, ID2, ID3,
ID4, sID, bID1, bID2, bID3, rMoney, rDate, rReal,
rDecimal) VALUES ( 1, _at_I, _at_TotalRows-_at_I1,
(_at_I-1)/_at_Card1, (_at_TotalRows-_at_I)_at_Card1,
_at_I32768, _at_I, (_at_I-1)_at_Card1,
1(_at_I-1)_at_RowsPerPage/_at_TotalRows((_at_I-1)_at_RowsPerP
age)_at_TotalRows, 10000rand(),
DATEADD(hour,_at_I3000000,'1900-01-01'),
10000rand(), 10000rand() ) IF _at__at_ERROR gt 0
BEGIN GOTO B END SET _at_I _at_I1 END COMMIT
TRANSACTION CHECKPOINT PRINT CONVERT(varchar,GETDA
TE(),121) ', row ' CONVERT(varchar,_at_BatchRow)
SET _at_BatchStart _at_BatchStart1 END B IF
_at__at_TRANCOUNT gt 0 COMMIT TRANSACTION PRINT '01
Complete ' CONVERT(varchar,GETDATE(),121) ',
row ' CONVERT(varchar,_at_BatchRow) ',
Trancount ' CONVERT(varchar(10),_at__at_TRANCOUNT)
9Data Population Script 1 Notes
- Double While Loop
- Each Insert/Update/Delete statement is an
implicit transaction - Gets separate transaction log entry
- Explicit transaction generates a single
transaction log write (max 64KB per IO) - Single TRAN for entire loop requires excessively
large log file - Inserts are grouped into intermediate size batches
10Data Population Scripts 2
Primary table populated using single row inserts
in a WHILE loop, Additional tables populated with
INSERT / SELECT statement Single row inserts
20-30K rows/sec INSERT / SELECT statement 100K
rows/sec
DECLARE _at_L int SELECT _at_L 1 WHILE _at_L lt 3 BEGIN
INSERT M3A_11 (GroupID,ID,ID2,ID3,ID4,sID,bID1,bID
2,bID3,rMoney,rDate,rReal, rDecimal) SELECT TOP
500000 GroupID, ID, 1500001-ID, ID3, ID4, sID,
bID1, bID2, bID3, rMoney, rDate, rReal, rDecimal
FROM M3A_20 WHERE GroupID 1 AND ID BETWEEN
(_at_L-1)5000001 AND _at_L500000 SELECT _at_L _at_L 1
CHECKPOINT PRINT '11 Step '
CONVERT(varchar,_at_L) ', ' CONVERT(varchar,GETDA
TE(),121) END UPDATE STATISTICS M3A_01
(PK_M3A_01) WITH FULLSCAN CREATE STATISTICS ST_01
ON M3A_01 (ID) WITH FULLSCAN, NORECOMPUTE
11Index Seek Plans
Many rows returned, Non-parallel plan
Cost 9.34
Parallel Execution disabled
Cost 9.82
Cost 4.94
Parallel Plan
12Index Seek Details
Non-parallel plan
Parallel plan
13Index Seek Non-parallel
Cost assigned to SELECT
Index Seek, 1M rows in 11,115 pages (81
bytes/row, 90 Fill) I/O cost is 8.2365 CPU Cost
is 1.1000785 Cost sub-tree Cost is correct, I/O
CPU is ½ of correct value
14Index Seek Parallel Plan
No cost assigned to SELECT
Index Seek I/O and CPU cost ½ of non-parallel
plan
15Index Seek with Aggregate
1
2
3
4
16Index Seek Aggregate Parallel Plan Details
2
4
1
3
17Table Scan
Cost 9.01
Cost 8.26
18Table Scan Details
Non-parallel plan
Parallel plan
I/O cost same CPU cost ½ of non parallel plan
19Table Scan Details
Non-parallel plan
Parallel plan
No cost
No cost on Select
I/O cost same CPU cost ½ of non parallel plan
20Parallel Plan Cost Formulas Patterns
- CPU costs are ½ of non-parallel plan
- Index Seek I/O cost are also ½
- Scan I/O cost is same as non-parallel plan
- Parallel plan costs are based on 2 processors
- Actual number of processors determined at runtime
- Overhead operations
- Distribute, Repartition Gather Streams
21Hash Join
200,000 rows 15 byte OS row size
Cost 6.50
Cost 4.79
22Hash Join Details
Non-parallel plan
Parallel plan
23Hash Join Details
Non-parallel plan
Parallel plan
24Hash Join Non-parallel plan
25Hash Join Parallel Plan
2
3
4
1
2
4
1
3
26Hash Join with I/O Cost
900,000 rows MAXDOP 1
Cost 85.1
Cost 74.1
27Hash Join Join I/O Cost
730,000 rows
740,000 rows
28Hash Join - Bitmap
29Hash Join Cost Formula
Index Seek Plan Formula I/O Cost 0.006328500
0.000740741 per additional page (1GB)
0.003203425 0.000740741 per additional page
(gt1GB) CPU Cost 0.000079600 0.000001100 per
additional row
Hash Join CPU Cost 0.017750000 base
0.0000001749 (2-30 rows) 0.0000000720 (100
rows) 0.000015091 per row 0.000015857
(parallel) 0.000001880 per row per 4 bytes
in OS 0.000005320 per additional row in
IS I/O Cost 0.000042100 per row over 64MB (Row
Size8) 0.0000036609 per 4 byte over 15B
30Parallel Cost Formula
Base Cost 0.028500 Repartition Stream Cost per
row 0.0000024705 Base (15 Bytes)
0.000000759 per additional 4 Bytes Gather
Stream Cost per row 0.0000018735 Base(15)
0.000000759 per additional 4 Bytes Dispatch
31Loop Join
32Loop Join Details
Non-parallel plan Outer Source
Parallel plan Outer Source
33Loop Join Details
Inner Source cost identical for both non-parallel
and parallel plans
34Loop Join Details
Non-parallel plan
Parallel plan
35Merge Join
36Merge Join Details
Non-parallel plan
Parallel plan
37Merge Join Details
Non-parallel plan
Parallel plan
38Merge Join Details
Non-parallel plan
Parallel plan
39Index Seek Aggregate Test
Xeon 2.4GHz/512K
Opteron2.2GHz 1M
40Index Seek Aggregate Test, Itanium 2
Itanium 2 1.5GHz/6M
41Index Seek Aggregate Test, SUM(INT)
Itanium 2 1.5GHz/6M
42Index Seek Aggregate Test, NULL
Itanium 2 1.5GHz/6M
43Loop Join, COUNT()
Itanium 2 1.5GHz/6M
44Hash Join, COUNT()
Itanium 2 1.5GHz/6M
45Merge Join, COUNT()
Itanium 2 1.5GHz/6M
46General Recommendations
- Useful in DW, ETL, and maintenance activities
- Use judgment on transactions processing
- Is throughput more important
- Or faster expensive queries
- Increase Cost Threshold from 5 to 20-50
- Limit MAXDOP to 4
- Verify or limit parallelism on Xeon systems with
Hyper-Threading enabled
47Additional Information
www.sql-server-performance.com/joe_chang.asp SQL
Server Quantitative Performance Analysis Server
System Architecture Processor Performance Direct
Connect Gigabit Networking Parallel Execution
Plans Large Data Operations Transferring
Statistics SQL Server Backup Performance with
Imceda LiteSpeed jchang6_at_yahoo.com