Title: Communicating with the Outside
1Communicating withthe Outside
2Overview
3Package several SQL statements within one call to
the database server
- Embedded procedural language (Transact SQL) with
control flow facilities. - Use User Defined Functions (UDFs) when they
select out a high number of records. - Do not use positioned updates, which update rows
that are obtained as the result of a query,
forcing updates one row at a time.
4Packing sqls into one call to db server
create table Temp1 (parent varchar(200)) create
table Temp2(parent varchar(200)) create table
Ancestor (person varchar(200)) / Temp2will hold
the latest generation discovered. / INSERT INTO
Temp1 SELECT parent FROM Parental WHERE child
Nicholas Bennet WHILE EXISTS(SELECT FROM
Temp1) BEGIN INSERT INTO Ancestor SELECT FROM
Temp1 INSERT INTO Temp2 SELECT FROM
Temp1 DELETE FROM Temp1 INSERT INTO
Temp1 SELECT Parental.parent FROM Parental,
Temp2 WHERE Parental.child Temp2.parent DELETE
FROM Temp2
IF EXISTS ( SELECT FROM Ancestor WHERE person
Carol Diane ) PRINT Carol Diane is an ancestor
of Nicholas Bennet. ELSE PRINT Carol Diane is
not an ancestor of Nicholas Bennet.
T-SQL to determines whether Carol Diane is an
ancestor ofNicholas Bennet. Assume a genealogical
database containing at least the relation
Parental(parent, child).
5User Defined Functions
- Function computes the number of working days
between two dates. - Function executed either on the database site
(UDF) or on the application site - Applying the UDF yields good performances when it
helps reduce significantly the amount of data
sent back to the application.
6Retrieve Needed Columns Only
- Avoid transferring unnecessary data
- Might prevent the use of a covering index, i.e.
index-only scan - In the experiment the subset contains ¼ of the
attributes. - Reducing the amount of data that crosses the
application interface yields significant
performance improvement.
7Retrieve Needed Rows Only
- If the user is only viewing a small subset of a
very large result set, it is best to - Only transfer that subset
- Only compute that subset
- In cases when users only want a feel of the
data, use TOP or FETCH FIRST to fetch just a few
rows - Applications that allow the formulation of ad-hoc
queries(An Ad Hoc Query is a query that cannot be
determined prior to the moment the query is
issued I ) should permit users to cancel them.
8Minimize the Number of Query Compilations
- Prepared execution yields better performance when
the query is executed more than once - No compilation
- No access to catalog.
- Prepared execution plans become obsolete if
indexes are added or the size of the relation
changes.
Experiment performed on Oracle8iEE on Windows
2000.
Direct ODBC direct execution Prepared ODBC
prepared command, then execute it repeatedly .
9Tuning the Application Interface
- Avoid user interaction within a transaction
- Minimize the number of roundtrips between the
application and the database - Retrieve needed columns only
- Retrieve needed rows only
- Minimize the number of query compilations
10Bulk Loading Data
- Purpose loading large volumes of data into a
database - Tools
- SQL server bcp and a Transact-SQL command BULK
INSERT. - Oracle SQL Loader
- DB2 Load utility
- Tool parameters
- Bypass query engine
- Avoid logging
- No index update
- No constraint check
- Frequency of commits
11Use Direct Path for Bulk Loading
- sqlldr directpathtrue controlload_lineitem.ctl
dataE\Data\lineitem.tbl - load data
- infile "lineitem.tbl"
- into table LINEITEM append
- fields terminated by ''
- (
- L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER,
L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX,
L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE DATE
"YYYY-MM-DD", L_COMMITDATE DATE "YYYY-MM-DD",
L_RECEIPTDATE DATE "YYYY-MM-DD", L_SHIPINSTRUCT,
L_SHIPMODE, L_COMMENT - )
12Direct Path
- Loading 600000 records into the lineitem relation
from TPCH - Direct path loading bypasses the query engine and
the storage manager. It is orders of magnitude
faster than conventional path (with a commit
every 100 records) and inserts (with a commit for
each record).
Experiment performed on Oracle8iEE on Windows
2000.
13Batch Size
- Bulk load of 600000 records.
- Throughput increases steadily when the batch size
increases to 100000 records.Throughput remains
constant afterwards. - Trade-off between performance and amount of data
that has to be reloaded in case of problem.
Experiment performed on SQL Server 2000 on
Windows 2000.
14Storage Engine Parameters
- Bulk load of 600000 records.
- As expected
- Turning off logging helps.
- Collecting statistics hurts
- Maintaining indexes incrementally hurts a lot.
Experiment performed on IBM DB2 UDB V7.1 on
Windows 2000.
15Connecting to Multiple Databases
- Shared connection to reduce start-up cost
- Connection pooling
- Pass-through statements when performance is CPU
bound - E.g. dblink in Oracle
- Eliminates query rewriting, translation and
postprocesing to adapt to specific SQL dialect - Transfer large block of data to minimize
roundtrips when performance is network bound,
16summary