Title: SQL Server 2000 Query Processing and Optimization
1SQL Server 2000Query Processing and Optimization
- Don Vilen
- Program Manager
- SQL Server Development Team
2Agenda
- SQL Server Overview
- SQL Server Architecture
- Storage and Access Methods
- Query Processing and Optimization
- Transaction Processing
- Other Topics
3Query Processing and Optimization
4Query Processing and Optimization
- SQL Server Architecture
- Query Processor goals
- Optimization techniques
- Query plans
- Futures
5Server ArchitectureThe Big Picture
6Query Processor Components
- Query optimization
- Selection of best execution plan
- Cost-based, transformation-driven
- Extensive logical inferences
- Query execution
- Algorithms to perform join, group by,
- Hash-based, merge-based
- Parallelism
- Provides plan building blocks
7Query Processor Goals
- Responsibilities
- Processing of DML queries (T-SQL and )
- SELECT, INSERT, UPDATE, DELETE,
- Processing of DDL operations
- Index creation, DBCC CHECK,
- Creation and maintenance of statistics
- DBCC UPDATE STATISTICS
8Query ProcessorGoal Performance
- SQL Server 6.5 provides excellent OLTP
performance - SQL Server 7.0 extends to set-oriented, decision
support queries (star schema) - SQL Server 2000 extends to handle snowflake
schemas, indexed views, partitioned views, and
extensive parallelism
9Query ProcessorGoal Modularity
- Robustness
- Rapid future innovation
- Uniform internal interfaces
- Learn from the lessons of 6.5
- Keep the code clean ? extensible
10Query Processor Goal Functionality
- Distributed and heterogeneous queries
- More complex queries, gt 16 tables
- Some extensions, like SELECT TOP
- Indexed views support
- Statistics on non-indexed columns
- Partitioned views
- SELECT, INSERT, UPDATE, DELETE
- Cascading DRI, etc
- Rich heterogeneous query support (OLEDB)
11Agenda
- SQL Server Architecture
- Query Processor goals
- Optimization techniques
- Query plans
- Futures
12Query OptimizerOverview
- Rewrite- property- cost-driven
- Like DB2 C/S v5 Tandem, not Oracle v6 (but
Oracle 8i does) - Extensive rewrite set
- E.g. index selection, join order,
- Rich inference capabilities
- E.g. contradiction detection
- Sensitive to query complexity
- Optimization time-out based on estimated
execution cost
13Query OptimizerOptimization model
Input tree
Subtree
Pool of alternatives
Rewrite
Output cheapest plan
New subtree
14Query OptimizerOptimization time
- Will you wait for the exhaustive optimization of
your 20-table query? - Goal Make optimization time proportional to the
query complexity - Query complexity ? cost of the optimal plan
15Query OptimizerMulti-stage optimization
- Multiple stages
- No-choice queries (trivial plan)
- Transaction processing queries
- Complex query I
- Complex query II
- Parallel queries
- No knobs!
16Query OptimizerTree-rewrite and search
- Change join order
- (R JOIN S) JOIN T, can also be done as
- (R JOIN T) JOIN S
- Alternative or replacement
- Evaluate filter conditions early
- Extent of search based on query
- Sub-second queries optimized quickly
17Query OptimizerTransformations
- Lots of transformations in SQL Server (over 300)
Filter (A.x 5)
GrpBy A.x, sum(A.y)
Join
Join
Join
B
A
B
A
B
A
Join
Join
Hash-Join
Filter (A.x 5)
B
B
GrpBy A.x, sum(A.y)
B
A
A
A
Simplification
Implementation
Exploration
18Query OptimizerOver 300 transformation rules
- Join reordering
- Outerjoins
- Subqueries
- Aggregation
- Star and snowflakes
- Join elimination
- Materialized views
- Index plans
- Update plans
- Halloween protection
- Empty tab simplification
- (Integrity constraints)
- Partitioned tables
- Parallelism
- Remote queries
19Query OptimizerLogical inferences
- Equivalence classes for columns
- If ab then sort(a) same as sort(b)
- Implied join predicates
- Keys and functional dependencies
- GROUPBY(e,ename) same as GROUPBY(e)
- Contradiction detection
- Infer empty table(s) from check constraints
- Join simplification using FK constraints
- Outer join simplification using nullability
20Query OptimizerResult size estimation
- Basis for cost estimation
- Uses statistics on stored data
- Densities and histograms
- Keys (from unique indices)
- Constraints (DRI, check constraints)
- Available in showplan
21Query OptimizerStatistics on demand
- Optimizer relies on up-to-date statistics
- Automatic create, drop and update statistics
- Fall-back mechanism
- Quick statistics estimation (heuristics)
- Statistics on histogram and density
- MAXDIFF to capture frequent values
22Query OptimizerCost calculation
- Cost to first/last row
- I/O CPU, normalized to seconds
- Row goals during optimization
- E.g. optimize for first-10 rows retrieval
- Available in showplan output
- Forms basis for FAST-N/TOP-N hints
23Query OptimizerChoosing the right plan
- Give as much information as possible!
- You wont always know how your tables will be
used - Declare constraints they can help
- Uniqueness
- DRI
- Nullability
- Keep statistics up to date
- Use auto-stats or your own maintenance plan
- Provide useful indexes ?
24Query OptimizerHints
- Youre smart but
- One or more indexes
- Join order
- Join, grouping, distinct algorithms
- Row goal (FAST N)
- But be careful!
- Maintenance
- Compatibility
- your data volume can change
25Query OptimizerUtility Operations
- More than queries
- Update plans
- Bulk Import, export convert
- CREATE INDEX
- DBCC CHECKDB/CHECKTABLE
- ALTER TABLE
- CREATE STATISTICS
26Query OptimizerOptimized update plans
- Small updates (e.g., OLTP operations)
- Row-by-row update all indexes for each row
- Standard technique may use lots of random I/O
- Large updates (e.g., warehouse refresh)
- Index-by-index update
- Pre-sorting per index merges change into index
- Each index leaf is touched at most once
- Saved index update cost often exceeds sort cost
271 of every 8 rows is deleted
(These spool operations share one work file)
8KB/page / 24B/entry 335 entries/page 70
fill factor ? 235 1 in 8 ? 30 deletions/page
A sort operation per updated index very fast
index maintenance
Is each index leaf touchedonce or 30 times?
On average, after lots of random insertions and
deletions, B-tree pages are about 70 full
thats why this fill factor is used in the
example.
28Query OptimizerUtility example BCP IN
- OLE/DB over local text file or TDS INSERT
SELECT from tds-stream - Query semantics conversion, restriction
- Standard optimized update plans
- E.g. sorted insertions in each index
29Agenda
- SQL Server Architecture
- Query processor goals
- Optimization techniques
- Query plans
- Futures
30Query PlansIterator execution model
- An iterator consumes/produces a stream of rows
- A query plan is a graph of iterators
- Open, get-next, close methods
- Pull parent-demand-driven
- Modular, easy to add new iterators
- Just like LEGO
- E.g. sample, rank, top, segment, sort,
31Query PlansExample Filter
- GetNext Method
- Repeat
- Call child.GetNext() to obtain row r
- IF EndOfStream, RETURN END_OF_STREAM
- IF predicate(r) is TRUE, RETURN r
32Query PlansCovering indices
- Index with all needed columns
- Clustering columns automatically added
non-clustered indices (as key/bookmark) - CREATE INDEX ON Orders(employeeid)
- Delivers employeeid and orderid columns
- Clustering column not in 6.5 indices
- 6.5 uses a RID (physical locator) model
33Query PlansMulti-index operations
- Index intersection, union, difference
- month January AND store 3
- month January AND store ltgt 3
- Join indices of a single table
- Extends covering indices
- Performed using existing join algorithms (no
special stuff) - Good for complex DSS/adhoc queries
34Query PlansIndex-lookup JOIN
- CUSTOMER JOIN ORDERS ON cust
- For every row in CUSTOMER
- Use index on ORDERS.cust to look up matches
- Apply fetch performance techniques
- Sort/pre-fetch lookup values
- Fetch columns from CUSTOMER or ORDERS after join
- Workhorse for OLTP queries
35Query PlansIndex lookup/scan, Fetch
- Indices are used to
- Restrict rows (T.a between 10 and 20)
- Extract columns (covering indices)
- Obtain rows in sorted order or random order
- QP can scan forwards or backwards
- SQL 2000 supports mixed ASC and DESC indexes
- Fetch locates rows in table from bookmark
- Fetch is a separate operator
36Query PlansFetch performance
- Locates base table row using bookmark retrieved
from non-clustered index - Simple fetch
- Synchronous I/O
- Scattered I/O
- Pre-fetch reduces I/O latency issues
- Delayed, hinted queue of records
- Sort by RID for good caching
37Query PlansSorting
- Integrated as iterator, not a step
- Pipeline result of last merge to consumer
- Support for sort distinct
- Sorted-input algorithms for join, grouping
- Merge join
- Stream aggregation
38Query PlansAggregation GROUP BY
- MIN/MAX
- Use sort order provided by index
- First non-null value
- Forwards or backwards direction
- Multiple min/max computed independently then
joined - Local-partial-global
- Aggregates split into multiple stages
- Exploit interesting sort orders, joins, etc
39Query PlansAggregation hash/stream
- Select sum(sales) group by quarter
sort on quarterstream aggregation
hash on quarterhash aggregation
40Query PlansMerge Join
- T join R on T.a R.a
- Sorted input streams
- Nested loops on groups
- Many-1 / Many-Many
- Finds non-matches
- Mark and re-scan
- Outerjoin, not exist subq
- Preserves order on join columns
41Query PlansSummary
- Large set of interesting plans
- Combined together to answer user queries
- Plan choice and combination decided at compile
time by the Optimizer - We looked at-
- Index-lookup join
- Small outer input, large indexed inner
- Merge join
- Sorted inputs, e.g. (covering) index
- Sorted output, e.g. stream aggregation
- Aggregation min/max/local/global/partial
- Others hash-join, nested-loop join,
42Query ProcessorConclusion
- State of the art query technology
- Industrial strength
- Framework for rapid future innovation
- More than just a query processor
- How you can help
- Provide as much information as you can DRI,
CHECK, NULL, etc - Keep statistics up to date (try auto-stats)
- Provide sensible indexes
43Questions?