Communicating with the Outside

1 / 28
About This Presentation
Title:

Communicating with the Outside

Description:

Communicating with the Outside – PowerPoint PPT presentation

Number of Views:0
Avg rating:3.0/5.0
Slides: 29
Provided by: bonn1158

less

Transcript and Presenter's Notes

Title: Communicating with the Outside


1
Communicating withthe Outside
2
Application
Query Processor
Indexes
Storage Subsystem
Concurrency Control
Recovery
Operating System
HardwareProcessor(s), Disk(s), Memory
3
The ways applications accessing the Database
System
  • 4GL(fourth generation language)
  • Execute at high level resulting in small programs
  • Hider portability, different vendors offers
    different 4GLs
  • Programming language Call Level Interface
  • ODBC(Open DataBase Connectivity) for c/c
  • JDBC Java based API, for Java
  • Database specific ODBC drivers OCI (C/Oracle),
    CLI (C/ DB2)
  • Perl/DBI

4
API pitfalls
  • Cost of portability
  • Layer of abstraction on top of ODBC drivers to
    hide discrepancies across drivers with different
    conformance levels.
  • Beware of performance problems in this layer of
    abstraction
  • Use of meta-data description when submitting
    queries, accessing the result set
  • Iterations over the result set

5
ODBC vs. OCI
  • ODBC vs. OCI on Oracle8iEE on Windows 2000
  • Iteration over a result set one record at a time.
    Prefetching is performed.
  • Low OCI overhead when number of records
    transferred is small
  • ODBC performs better when number of records
    transferred increases due to its good at
    prefetching.

6
Client-Server Mechanisms
  • Connection pooling and multiplexing when multiple
    clients access a server
  • Communication buffer on the database server. One
    per connection.
  • If a client does not consume results fast enough,
    then the server holds resources until it can
    output the result.
  • Data is sent either when the communication buffer
    is full or when a batch is finished executing.
  • Small buffer frequent transfer overhead
  • Large buffer time to first record increases.
  • No actual impact on a 100 Mb network. More
    sensitive in an intranet with low bandwidth.

7
Communication buffer
8
Harmful small objects
  • authorized(user, type)
  • doc(id, type, date)
  • What are the document instances a user can see?
  • SQL
  • select doc.id, doc.datefrom authorized,
    docwhere doc.type authorized.typeand
    authorized.user ltinputgt
  • If documents and doctypes are encapsulated in
    objects, the risk is the following
  • Find types t authorized for user input
  • select doc.type as tfrom authorizedwhere user
    ltinputgt
  • For each type t issue the query
  • select id, datefrom docwhere type lttgt
  • The join is executed in the application and not
    in the DB!

9
Tuning the Application Interface
  • Avoid user interaction within a transaction
  • Minimize the number of roundtrips between the
    application and the database
  • Retrieve needed columns only
  • Retrieve needed rows only
  • Minimize the number of query compilations

10
Avoid User Interaction within a Transaction
  • User interaction within a transaction forces
    locks to be held for a long time.
  • Careful transaction design (possibly transaction
    chopping) to avoid this problem.

11
Minimize the Number of Roundtrips to the Database
  • Rule of thumb crossing the interface between the
    application and the database server is costly
  • Avoid Loops
  • Application programming languages offer looping
    facilities (SQL statements, cursors, positioned
    updates)
  • Positioned updates ODBC allows updating the rows
    that are obtained as the result of a query, this
    forces the processing of updates one row at a
    time
  • Rigid object-oriented programming might force
    such loops.

12
Avoid External Loops
  • No loop
  • sqlStmt select from lineitem where l_partkey
    lt 200
  • odbc-gtprepareStmt(sqlStmt)
  • odbc-gtexecPrepared(sqlStmt)
  • Loop
  • sqlStmt select from lineitem where l_partkey
    ?
  • odbc-gtprepareStmt(sqlStmt)
  • for (int i1 ilt200 i)
  • odbc-gtbindParameter(1, SQL_INTEGER, i)
  • odbc-gtexecPrepared(sqlStmt)

13
Avoid External Loops
Let the DBMS optimize set operations Fetch 2000
records Loop 200 queries No loop 1
query Crossing the application interface too
often hurts performances.
  • SQL Server 2000 on Windows 2000
  • Crossing the application interface has a
    significant impact on performance

14
Avoid Cursors
  • No cursor
  • select from employees
  • Cursor
  • DECLARE d_cursor CURSOR FOR select from
    employees
  • OPEN d_cursorwhile (_at__at_FETCH_STATUS 0)
  • BEGIN
  • FETCH NEXT from d_cursorEND
  • CLOSE d_cursor
  • go

15
Avoiding Cursors
  • Cursors are terribly slow in almost all systems
  • Using the cursor, records are transmitted from
    the database server to the application one at a
    time.
  • Query fetches 200000 56 bytes records
  • Response time is a few seconds with a SQL query
    and more than an hour iterating over a cursor.

16
Package several SQL statements within one call to
the database server
  • Embedded procedural language (Transact SQL) with
    control flow facilities.
  • Use User Defined Functions (UDFs) when they
    select out a high number of records.
  • Do not use positioned updates, which update rows
    that are obtained as the result of a query,
    forcing updates one row at a time.

17
Packing sqls into one call to db server
create table Temp1 (parent varchar(200)) create
table Temp2(parent varchar(200)) create table
Ancestor (person varchar(200)) / Temp2will hold
the latest generation discovered. / INSERT INTO
Temp1 SELECT parent FROM Parental WHERE child
Nicholas Bennet WHILE EXISTS(SELECT FROM
Temp1) BEGIN INSERT INTO Ancestor SELECT FROM
Temp1 INSERT INTO Temp2 SELECT FROM
Temp1 DELETE FROM Temp1 INSERT INTO
Temp1 SELECT Parental.parent FROM Parental,
Temp2 WHERE Parental.child Temp2.parent DELETE
FROM Temp2
IF EXISTS ( SELECT FROM Ancestor WHERE person
Carol Diane ) PRINT Carol Diane is an ancestor
of Nicholas Bennet. ELSE PRINT Carol Diane is
not an ancestor of Nicholas Bennet.
T-SQL to determines whether Carol Diane is an
ancestor ofNicholas Bennet. Assume a genealogical
database containing at least the relation
Parental(parent, child).
18
User Defined Functions
  • Function computes the number of working days
    between two dates.
  • Function executed either on the database site
    (UDF) or on the application site
  • Applying the UDF yields good performances when it
    helps reduce significantly the amount of data
    sent back to the application.

19
Retrieve Needed Columns Only
  • Avoid transferring unnecessary data
  • Might prevent the use of a covering index, i.e.
    index-only scan
  • In the experiment the subset contains ¼ of the
    attributes.
  • Reducing the amount of data that crosses the
    application interface yields significant
    performance improvement.

20
Retrieve Needed Rows Only
  • If the user is only viewing a small subset of a
    very large result set, it is best to
  • Only transfer that subset
  • Only compute that subset
  • In cases when users only want a feel of the
    data, use TOP or FETCH FIRST to fetch just a few
    rows
  • Applications that allow the formulation of ad-hoc
    queries(An Ad Hoc Query is a query that cannot be
    determined prior to the moment the query is
    issued I ) should permit users to cancel them.

21
Minimize the Number of Query Compilations
  • Prepared execution yields better performance when
    the query is executed more than once
  • No compilation
  • No access to catalog.
  • Prepared execution plans become obsolete if
    indexes are added or the size of the relation
    changes.

Experiment performed on Oracle8iEE on Windows
2000.
Direct ODBC direct execution Prepared ODBC
prepared command, then execute it repeatedly .
22
Tuning the Application Interface
  • Avoid user interaction within a transaction
  • Minimize the number of roundtrips between the
    application and the database
  • Retrieve needed columns only
  • Retrieve needed rows only
  • Minimize the number of query compilations

23
Bulk Loading Data
  • Purpose loading large volumes of data into a
    database
  • Tools
  • SQL server bcp and a Transact-SQL command BULK
    INSERT.
  • Oracle SQL Loader
  • DB2 Load utility
  • Tool parameters
  • Bypass query engine
  • Avoid logging
  • No index update
  • No constraint check
  • Frequency of commits

24
Use Direct Path for Bulk Loading
  • sqlldr directpathtrue controlload_lineitem.ctl
    dataE\Data\lineitem.tbl
  • load data
  • infile "lineitem.tbl"
  • into table LINEITEM append
  • fields terminated by ''
  • (
  • L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER,
    L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX,
    L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE DATE
    "YYYY-MM-DD", L_COMMITDATE DATE "YYYY-MM-DD",
    L_RECEIPTDATE DATE "YYYY-MM-DD", L_SHIPINSTRUCT,
    L_SHIPMODE, L_COMMENT
  • )

25
Direct Path
  • Loading 600000 records into the lineitem relation
    from TPCH
  • Direct path loading bypasses the query engine and
    the storage manager. It is orders of magnitude
    faster than conventional path (with a commit
    every 100 records) and inserts (with a commit for
    each record).

Experiment performed on Oracle8iEE on Windows
2000.
26
Batch Size
  • Bulk load of 600000 records.
  • Throughput increases steadily when the batch size
    increases to 100000 records.Throughput remains
    constant afterwards.
  • Trade-off between performance and amount of data
    that has to be reloaded in case of problem.

Experiment performed on SQL Server 2000 on
Windows 2000.
27
Storage Engine Parameters
  • Bulk load of 600000 records.
  • As expected
  • Turning off logging helps.
  • Collecting statistics hurts
  • Maintaining indexes incrementally hurts a lot.

Experiment performed on IBM DB2 UDB V7.1 on
Windows 2000.
28
Connecting to Multiple Databases
  • Shared connection to reduce start-up cost
  • Connection pooling
  • Pass-through statements when performance is CPU
    bound
  • E.g. dblink in Oracle
  • Eliminates query rewriting, translation and
    postprocesing to adapt to specific SQL dialect
  • Transfer large block of data to minimize
    roundtrips when performance is network bound,
Write a Comment
User Comments (0)