Communicating with the Outside

1 / 16
About This Presentation
Title:

Communicating with the Outside

Description:

Communicating with the Outside – PowerPoint PPT presentation

Number of Views:2
Avg rating:3.0/5.0
Slides: 17
Provided by: shaw1179

less

Transcript and Presenter's Notes

Title: Communicating with the Outside


1
Communicating withthe Outside
2
Overview
3
Package several SQL statements within one call to
the database server
  • Embedded procedural language (Transact SQL) with
    control flow facilities.
  • Use User Defined Functions (UDFs) when they
    select out a high number of records.
  • Do not use positioned updates, which update rows
    that are obtained as the result of a query,
    forcing updates one row at a time.

4
Packing sqls into one call to db server
create table Temp1 (parent varchar(200)) create
table Temp2(parent varchar(200)) create table
Ancestor (person varchar(200)) / Temp2will hold
the latest generation discovered. / INSERT INTO
Temp1 SELECT parent FROM Parental WHERE child
Nicholas Bennet WHILE EXISTS(SELECT FROM
Temp1) BEGIN INSERT INTO Ancestor SELECT FROM
Temp1 INSERT INTO Temp2 SELECT FROM
Temp1 DELETE FROM Temp1 INSERT INTO
Temp1 SELECT Parental.parent FROM Parental,
Temp2 WHERE Parental.child Temp2.parent DELETE
FROM Temp2
IF EXISTS ( SELECT FROM Ancestor WHERE person
Carol Diane ) PRINT Carol Diane is an ancestor
of Nicholas Bennet. ELSE PRINT Carol Diane is
not an ancestor of Nicholas Bennet.
T-SQL to determines whether Carol Diane is an
ancestor ofNicholas Bennet. Assume a genealogical
database containing at least the relation
Parental(parent, child).
5
User Defined Functions
  • Function computes the number of working days
    between two dates.
  • Function executed either on the database site
    (UDF) or on the application site
  • Applying the UDF yields good performances when it
    helps reduce significantly the amount of data
    sent back to the application.

6
Retrieve Needed Columns Only
  • Avoid transferring unnecessary data
  • Might prevent the use of a covering index, i.e.
    index-only scan
  • In the experiment the subset contains ¼ of the
    attributes.
  • Reducing the amount of data that crosses the
    application interface yields significant
    performance improvement.

7
Retrieve Needed Rows Only
  • If the user is only viewing a small subset of a
    very large result set, it is best to
  • Only transfer that subset
  • Only compute that subset
  • In cases when users only want a feel of the
    data, use TOP or FETCH FIRST to fetch just a few
    rows
  • Applications that allow the formulation of ad-hoc
    queries(An Ad Hoc Query is a query that cannot be
    determined prior to the moment the query is
    issued I ) should permit users to cancel them.

8
Minimize the Number of Query Compilations
  • Prepared execution yields better performance when
    the query is executed more than once
  • No compilation
  • No access to catalog.
  • Prepared execution plans become obsolete if
    indexes are added or the size of the relation
    changes.

Experiment performed on Oracle8iEE on Windows
2000.
Direct ODBC direct execution Prepared ODBC
prepared command, then execute it repeatedly .
9
Tuning the Application Interface
  • Avoid user interaction within a transaction
  • Minimize the number of roundtrips between the
    application and the database
  • Retrieve needed columns only
  • Retrieve needed rows only
  • Minimize the number of query compilations

10
Bulk Loading Data
  • Purpose loading large volumes of data into a
    database
  • Tools
  • SQL server bcp and a Transact-SQL command BULK
    INSERT.
  • Oracle SQL Loader
  • DB2 Load utility
  • Tool parameters
  • Bypass query engine
  • Avoid logging
  • No index update
  • No constraint check
  • Frequency of commits

11
Use Direct Path for Bulk Loading
  • sqlldr directpathtrue controlload_lineitem.ctl
    dataE\Data\lineitem.tbl
  • load data
  • infile "lineitem.tbl"
  • into table LINEITEM append
  • fields terminated by ''
  • (
  • L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER,
    L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX,
    L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE DATE
    "YYYY-MM-DD", L_COMMITDATE DATE "YYYY-MM-DD",
    L_RECEIPTDATE DATE "YYYY-MM-DD", L_SHIPINSTRUCT,
    L_SHIPMODE, L_COMMENT
  • )

12
Direct Path
  • Loading 600000 records into the lineitem relation
    from TPCH
  • Direct path loading bypasses the query engine and
    the storage manager. It is orders of magnitude
    faster than conventional path (with a commit
    every 100 records) and inserts (with a commit for
    each record).

Experiment performed on Oracle8iEE on Windows
2000.
13
Batch Size
  • Bulk load of 600000 records.
  • Throughput increases steadily when the batch size
    increases to 100000 records.Throughput remains
    constant afterwards.
  • Trade-off between performance and amount of data
    that has to be reloaded in case of problem.

Experiment performed on SQL Server 2000 on
Windows 2000.
14
Storage Engine Parameters
  • Bulk load of 600000 records.
  • As expected
  • Turning off logging helps.
  • Collecting statistics hurts
  • Maintaining indexes incrementally hurts a lot.

Experiment performed on IBM DB2 UDB V7.1 on
Windows 2000.
15
Connecting to Multiple Databases
  • Shared connection to reduce start-up cost
  • Connection pooling
  • Pass-through statements when performance is CPU
    bound
  • E.g. dblink in Oracle
  • Eliminates query rewriting, translation and
    postprocesing to adapt to specific SQL dialect
  • Transfer large block of data to minimize
    roundtrips when performance is network bound,

16
summary
Write a Comment
User Comments (0)