Title: CG096: Lecture 9
1CG096 Lecture 9
2Overview
- Background
- Recap of Relational Systems.
- Parallel Query Processing
- Sort and Hash-Join
- assuming a shared-nothing architecture
- supposedly hardest to program, but actually quite
clean. - Data Layout.
- Parallel Query Optimization.
3Some History
- In pre-relational data bases, programmers
reigned - Data models used explicit pointers
- Manipulated by (e.g.) COBOL code.
- Relational revolution data abstraction
- Declarative languages and data independence
- Key to the most successful parallel systems.
- Significant developments
- Codds relational model early 70s
- Experimental, partially relational
implementations (System R INGRES) mid-late
70s - Commercial, fully relational implementations
(Oracle, IBM DB2, INGRES Corp early 80s - Rise of parallel DBs late 80s onwards.
4The Relational Data Model
- A data model is a collection of concepts for
describing data. - A schema is a description of a particular
collection of data, using the a given data model. - The relational model of data
- Main construct relation, basically a table with
rows and columns. - Every relation has a schema, which describes the
columns, or fields. - N.B. no pointers, no nested structures, no
ordering, no irregular collections.
52 Levels of Indirection
- Many views, single conceptual (logical) schema
and single physical schema. - Views describe how users see the data.
- Conceptual schema defines logical structure
- Physical schema describes the files and indexes
used.
6Data Independence aiding the parallel approach
- Applications insulated from how data is
structured and stored. - Logical data independence
- Protection from changes in logical structure of
data - lets you implement parallel processing under
traditional applications. - Physical data independence
- Protection from changes in physical structure of
data - minimises constraints on processing, enabling
clean parallelism.
7Structure of a DBMS
Parallel considerations mostly here
- A typical DBMS has a layered architecture.
- The figure does not show the concurrency control
and recovery components. - This is one of several possible architectures
each system has its own variations.
8Relational Query Languages
- Query languages allow manipulation and
retrieval of data from a data base. - Relational model supports simple, powerful QLs
- Strong formal foundation based on logic.
- supporting optimisation and parallelisation.
- Query Languages ltgt Programming Languages!
- QLs not expected to be computationally
complete. - QLs not designed for doing complex calculations.
- QLs support easy, efficient access to large data
sets.
9Formal Relational Query Languages
- Two mathematical Query Languages form the basis
for real languages (e.g. SQL), and for
implementation - Relational Algebra More operational, very
useful for representing internal execution plans.
- Data base opcodes can be optimised and
parallelised for the sake of efficiency. - Relational Calculus Lets users describe what
they want, rather than how to compute it
(non-operational, declarative SQL comes from
here.)
10Basic SQL
- Relation-list a list of relation names
- possibly with a range-variable after each name.
- Target-list a list of attributes of tables in a
relation-list. - Qualification comparisons combined using AND,
OR and NOT. - Comparisons are Attr op const or Attr1 op Attr2,
where op is one of ltgt lt gt lt gt - DISTINCT optional keyword indicating that the
answer should not contain duplicates. - Default is that duplicates are not eliminated!
11Conceptual Evaluation Strategy
- Semantics of an SQL query defined in terms of the
following conceptual evaluation strategy - Compute the cross-product of the relation-list.
- Discard resulting tuples if they fail the
qualifications. - Delete attributes that are not in the
target-list. - If DISTINCT is specified, eliminate duplicate
rows. - This approach is naïve from the performance point
of view
12Downside of Naïve Approach
- Probably the least efficient way to compute a
query - An optimiser will find the same answers using
more efficient strategies. - Need to devise a strategy which
- Minimises the size of relations involved in joins
- Does selects and projects before joins
- Joins small to large relations before large to
large where 3 or more joins required
13Query Optimisation Processing
- Optimiser maps SQL to algebra tree with specific
algorithms - access methods, join algorithms, scheduling.
- Relational operators implemented as iterators
- Open ()
- Next (possibly with condition)
- Close ()
- Parallel processing engine built on partitioning
data flow to iterators - inter- and intra-query parallelism.
14Why use Parallel DBs?
- Parallel and Distributed Data Bases are being
increasingly used to achieve - better performance
- increased availability of data
- access to data distributed at various sites.
A Parallel DB enables the first of the above, and
a DDB enables the last two predominantly.
15Parallelism on one CPU
- Can use multi-threaded approach
- Threads are a way for a program to split itself
into two or more simultaneously running tasks. - Multiple threads can be executed in parallel
- We are more concerned here with use of multiple
CPUs
16Parallel Data Base Architecture
- Parallel data bases (PDBs) use more than one CPU
/ disk to carry out evaluation of a query faster. - There are three possible ways of sharing
processors, disks and memory - Shared nothing the individual CPU / disk /
memory are interconnected. - Shared memory all processors share the same
memory, all data can move between memory and any
disk. - Shared disk every processor has its own memory,
but can read / write from any disk.
17Communication Overhead
- The shared-disk and shared-memory architectures
have communication overheads. - As the number of CPUs / disks increases, there is
more and more of a bottleneck. - Beyond a certain limit, increasing the number of
CPUs reduces the performance. - Shared-nothing systems provide linear speed-up
and scale-up. The time per transaction should
remain the same when the numbers of CPUs and
transactions increase in proportion.
18Parallel Query Evaluation
- The query tree can be evaluated in parallel, each
node can be independently evaluated if the
corresponding relations are stored in a separate
CPU / disk. - Even a simple query can be evaluated in parallel
if the relation is partitioned. - How can we partition a relation?
19Data Partitioning Recap
- Data can be partitioned in various ways
- Each row is allocated to one of the processors in
round-robin. - Each row is hashed and allocated to the
corresponding processor. - Each CPU stores a range of rows.
- Which is better? Depends on nature of queries.
20Background - Hashing
- Hashing is an access method that
- Hashes (chops) an input key
- So that the result is an integer
- In the range of the address space available
- Example
- Address space is 099 buckets
- Key is s105
- Hash function chops off s, divides by 100 and
takes remainder ? 5 - Record is stored in bucket 5
- Fast retrieval and placement (one disk access) if
space in buckets
21Distributed Data Bases Recap
- A DDB system involves multiple sites or nodes
connected by a network. - Each site has own CPU, terminals, DBMS, users,
DBA and local autonomy. - A user can access data stored locally or on other
nodes. - DDBs are becoming more common.
22Benefits of DDBMS
- Closer match to distributed applications such as
airline reservation systems, bank ATMs etc. - Increased reliability and availability when
coupled with replication better system
functionality. - Shares data between sites, under local control.
- Improved performance may be achieved, because
processing of transactions is in parallel at
different sites.
23Costs of DDBMS
- Communication between sites for transmission of
data and commands. - Need to keep track of indices when processing
queries that access dispersed and replicated
data. - Maintenance.
- Risk of communication failure.
24Example of Distributed Data
Site A
(Head office)
Supplier Details Customer Details Product
Details Stock Levels Warehouse details Sales
transactions
Site B
Site C
This data does not change often so SN is
sufficient Product details change infrequently
and daily replication is enough. Replication is
for performance
Supplier Details (snapshot) Customer Details
(snapshot) Product Details (daily
replication) Transaction records
Supplier Details (snapshot) Customer Details
(snapshot) Product Details (daily
replication) Transaction records
25Workloads
- On-Line Transaction Processing
- Many little jobs (e.g. debit / credit).
- A typical SQL system c. 1995 supported 21,000
t.p.m. using 112 CPUs, 670 disks. - Batch (decision support and utility)
- Few big jobs, parallelism inside.
- Scan data at 100 MB/sec.
- Linear Scaleup to 500 processors.
26Parallel Sorting
- Why?
- DISTINCT, GROUP BY, ORDER BY, sort-merge join,
index build major overheads and inefficient. - Parallelisation targets these components to
improve performance - by splitting into manageable pieces.
- Phases
- parallel read and partition (coarse radix sort),
pipelined with parallel sorting of memory-sized
runs, spilling runs to disk. - parallel reading and merging of runs
27Coarse Radix Sort
- Radix sort
- Based on binary representation
- Compute ranks of elements
- Coarse-grained
- Individual tasks are relatively large
28Parallel Sorting 2
- Notes
- Phase 1 requires repartitioning. High bandwidth
network required. - Phase 2 totally local processing.
- Both pipelined and partitioned parallelism.
- Linear speedup, scaleup
29Hash Join
- Phase 1 Partition both relations using hash fn
h R tuples in partition i will only match S
tuples in partition i.
- Phase 2 Read in a partition of R, hash it using
h2 (ltgt h!). Scan matching partition of S, search
for matches.
30Parallel Query Processing 1
- Essentially no synchronisation except setup
teardown - No barriers, cache coherence, etc.
- Database transactions work fine in parallel
- data updated in place, with 2-phase locking
transactions - replicas managed only at end of transaction via
2-phase commit - coarser grain, higher overhead than cache
coherency.
31Terminology
- Setup
- Establish active connection (add to connection
table) - Teardown
- Remove from the connection table
- 2-phase locking
- In the first phase, locks are acquired but may
not be released. - In the second phase, locks are released but new
locks may not be acquired. - Rationalises acquiring and releasing of resources
- Serialises concurrent transactions
32Parallel Query Processing 2
- Bandwidth much more important than latency
- often pump 1-1/n of a table through the
network - aggregate net bandwidth should match aggregate
disk bandwidth. - Ordering of data flow immaterial (relational
model) - simplifies synchronisation, allows for
work-sharing. - Shared memory helps with skew
- but distributed work queues may solve this.
33Terminology
- Bandwidth
- Data rate across a network
- Latency
- Delays in processing network data
- Skew
- Uneven distribution of data and/or workload
across the disks and processors.
34Disk Layout
- Where was the data to begin with?
- Major effects on performance.
- Algorithms run at the speed of the slowest disk!
- Disk placement
- Logical partitioning, hash, round-robin
- Declustering for availability and load balance
- Indices stored with their data.
- This task is typically left to the DBA.
35Handling Skew
- For range partitioning,
- sample load on disks
- cool hot (overloaded) disks by making range
smaller. - For hash partitioning,
- cool hot disks by mapping some buckets to others
during query processing. - Use hashing and assume uniform
- if range partitioning, sample data and use
histogram to even out the distribution - in application for Shore Management Plan/River
scheme work queue used to balance load.
36Query Optimisation
- Map SQL to a relational algebra tree, annotated
with choice of algorithms. Issues - Choice of access methods (indices, scans)
- Join ordering
- Join algorithms
- Post-processing (e.g. hash vs. sort for groups,
order). - Typical scheme (courtesy System R)
- Bottom-up dynamic-programming construction of
entire plan space - Prune based on cost and selectivity estimation.
37Parallel Query Optimisation
- More dimensions to plan space
- degree of parallelism for each operator
- scheduling
- assignment of work to processors.
38Parallel Query Scheduling 1
- Usage of a site by an isolated operator is given
by(Tseq, W, V) where - Tseq is the sequential execution time (absence of
parallelism) of the operator - W is a d-dimensional work vector
- time-shared (e.g. disks, CPUs, network
interfaces) - V is a s-dimensional demand vector
- space-shared.(e.g. memory buffers)
- A set of time-space pairs S lt(W1,V1),,(Wk,Vk)gt
is said to be compatible - if they can be executed together on a site
- That is the trade-off between time and space is
satisfactory in practice
39Parallel Query Scheduling 2
- Algorithms in General. All algorithms in
computing science are classified with respect to - Time
- Space
- Time is represented here by W
- Space by V
- There are often trade-offs between time and space
- Increase space (memory), reduce time
- Reduce space, increase time
- Such trade-offs must be recognised in techniques
for handling parallel databases
40Parallel Query Scheduling 3
- Challenges
- capture dependencies among operators (simple)
- pick a degree of parallelism for each op (no. of
clones) - schedule clones to sites, under constraint of
compatibility. - Solution is a mixture of
- query plan understanding,
- approximation algorithms for bin-packing
- modifications of dynamic programming optimisation
algorithms.
41Moving Onward
- Parallelism and Object-Relational
- Can you abandon the structure and keep the
parallelism? - E.g. multi-dimensional objects, lists and array
data, multimedia (usually arrays). - Typical tricks include chunking and clustering,
followed by sorting - i.e. try to apply set-like algorithms and put
right later.