Title: Access Paths
1Access Paths
Chapter 20
2Types of Associative Access Paths
- Primary index Given the primary key value, find
the tuple. - Secondary index Given the value of a non-unique
attribute, find all qualified tuples. - Range index Given the value range of some
attribute, find all tuples within that range. - Structure index Given some tuple, find all
structurally related tuples (CODASYL sets, object
hierarchies, etc.)
3Two Important Techniques
- Two basic techniques dominate in modern DBMSs
- Hashing Use a fixed transformation algorithm to
convert the attribute value into a database
address. - Tree searchA dynamical search structure is built
that guides the search for a given attribute
value to the proper database location. - Hashing supports primary indices only. Tree
search is more versatile.
4The Access Gap
- Accessing a tuple in buffer costs ca. 2000 instr.
Accessing it on disk takes 25 ms for I/O-related
activities. On a 20 MIPS machine, this translates
into 500,000 instructions. - Therefore, one can spend many instructions on an
algorithm that saves one I/O on the average. - For access paths, the dominant cost measure is
the number of different pages that need to be
accessed during the search.
5Hashing
6Folding vs. Hashing
- Folding is used to turn an attribute value of
arbitrary length and arbitrary data type in to an
unsigned integer the maximum length of which is
determined by the instruction set. - Hashing is used to transform the result of
folding into the address of a page that
(probably) holds the tuple with the specified key
value.
7Requirement
- A good hash function H has to map primary key
values, which are very unevenly distributed over
a large value range, into a tuple address space
that is much smaller (proportional to the number
of existing tuples), such that the resulting
addresses are evenly distributed over the address
range.
8Parameters of the Hash Function
- K This is the (folded) key value. It varies
between 0 and 232 - 1. - B Number of pages to be allocated for the file.
Depends on the number of tuples expected. - H Hash function that performs a mapping of
- (0, 232-1) -gt (0, B-1).
9Consequences of the Approach
- Contiguous allocation All B pages must allocated
in physical contiguity, because the relative
addresses vary between 0 and B-1. - Fixed size The file size must be determined upon
creation time, because changing the size (i.e.
changing B) means changing the hash function.
This in turn requires a complete reorganization.
10Requirements for a Hash Function
- Hash-based allocation assumes that it is possible
to estimate the number of tuples T that the
relation will have, and that this estimate is not
drastically exceeded. - If a block has length B, and a tuple has an
average length of L bytes, then we need at least
S é T / (ëB/Lû) ù blocks to store the T
tuples. - The required number of blocks (S) is allocated
before the first tuple is stored. It is a good
idea to allocate some more blocks (S gt S) to
allow for unexpected growth. - Then a hash function H is defined, which takes in
the value of the primary key k of the relation
and converts it into a number between 1 and S
this is the block number where the tuple is to be
stored. If K is the set of possible values for
the primary key we have - H K 1, 2, , S
- The set of potential values for the primary key
attribute will be much larger than the number of
blocks allocated (think, e.g., of ISBNs for the
books relation). So the hash function is a
compacting function. For each primary key value,
there is exactly one block it is mapped to. Many
different primary key values are mapped to the
same block (n1 relationship).
11Properties of Hash Functions
- A hash function must be easy to compute and must
not require access to any blocks in the database. - It must be able to map a generally very large set
of potential primary key values (remember
primary keys can be constructed by concatenating
several attributes of a relation) into a
comparatively small set of block numbers in which
the tuples will be stored. - It must be able to take in primary key values of
different data types (integer, binary, decimal,
character, etc.) and map them to the set of
integers between 1 and S with equal efficiency. - The formula for estimating S based on the number
of tuples, average tuple length and block length
implicitly assumes that all blocks are equally
filled, i.e. that the same number of tuples is
mapped to each block. This is the most difficult
requirement, because the primary key values in
general are not equally distributed over their
value range Some parts of the value range are
used, others are not used at all, keys are
generated by some regular mechanism, etc. - To achieve this hashing property, different
methods exist table look-up, base conversion,
folding, encryption, division by prime numbers,
etc.
12A Popular Hash Function
- A hash function that most database systems use as
a default (if the user does not specify one) is
defined as - H(k) k mod d 1
- This requires that k is a positive integer. If
the primary key attribute of the relation does
not have the data type integer - it could, for
example, be a name - then it has to be turned
into an integer first. The usual way of doing
this is to fold the binary representation of
the key value such that its length does not
exceed 32 bits. Then these 32 bits are
interpreted as an integer if it is negative, it
is multiplied by -1 one. Details of folding are
omitted here. - For H to be a good hash function, d must be a
large enough prime number this is explained by
detailed number theoretic analyses. We also must
make sure that the number of blocks allocated is
about 25 larger than the minimum requirement. - Summing it up We first compute S. Then we
compute S 1.25S. Then we compute d
next_higher_prime (S). H(k) will then determine
the block for each tuple based on the primary key
value.
13Average Number of Overflow Pages
14Hashing for Non-Unique Attributes
- Let V denote the number of different attribute
values. Then we can distinguish three cases - V T The attribute is almost unique a good
hash function should work in that case. - V gt B There are more values than buckets. Can be
made work, but some buckets may get much higher
utilization than others. - V lt B This is the case where hashing cannot be
used.
15Overflow Handling
16Hashing Summary
- For unique attributes, hashing can yield
one-access retrieval. - It is critical to find a good hash function to
reduce collisions. - If the original estimate of the file size is
wrong, reorganization is inevitable. - Synchronization at the page level is done using
standard crabbing techniques. - Hashing does not support range queries.
17B-Trees
- B-Trees consist of two types of nodes
- Leaf nodes The contain the data, i.e. the tuples
or pointers to the tuples (TIDs). - Index nodes Index nodes contain reference keys
to direct the search towards the leaves. The data
structure looks like this - struct char KeyValue
- PAGEID PointerToNextNode
- index_node_structure
18Rules for Index Nodes
- Key values are in sorted order K0 K1 ...
Ki ... Kf (f is max. capacity of a node). - For any two adjacent key values Ki, Ki1 the
pointer Pi points to a node covering all values
in the interval (Ki, Ki1. - If a search for value v arrives at an index node,
the next node to be visited is pointed to by Pi
such that Ki v lt Ki1. - K0 is an arbitrary low value (smaller than
anything else in that node).
19Properties of a B-Tree
- Parameter f is called the fan-out of the tree.
- The number of nodes visited from the root to a
leaf is called the height of the tree. - A B-tree is always perfectly balanced, i.e. the
height is the same for all leaves. - Storage utilization is at least 50 for all nodes
except for the root. - Average storage utilization is close to 70.
20A Simple B-Tree
21Some Observations
- B-trees also work for non-unique attributes
implementational optimizations will be discussed
later on. - The reference keys in the index nodes can be
different from all real key values in the
leaves they only have to guide the search
correctly. - The key values at the leaf level are sorted in
ascending order this supports range queries.
22Inserting Into a B-Tree
23Growing a B-Tree
- If the insert leaf is full, allocate a new node,
distribute the values (the new one sorted in
place) evenly across the old leaf and the new
node, move the lowest key value of the new node
up to the index node. - If that index node is full, split it in the same
way. - If the root has to be split Allocate two new
nodes, distribute the key values evenly over
them, put the reference key in the root.
24Deleting Tuples From a B-Tree
- To maintain the space utilization guarantees, a
leaf that becomes under-utilized (lt 50) would
have to be merged with its neighbours. - This is a very costly operation in particular,
synchronization at the page level is very
complicated. - Therefore, most systems let nodes become empty
and discard them when that happens. - Analyses show that this does not deteriorate the
overall B-tree performance.
25Non-Unique Attributes
26The Basic Formula of B-Tree-Performance
- With the N number of tuples, C average
number of entries in a leaf, F average number of
entries in an index node, the height H of a
B-tree is - H 1 élog (é N/Cù)ù
F
27Some Performance Figures
28Tuples in the Leaves?
- Assuming a tuple is x times longer than a TID, we
get the following estimate - 1 logF (N/(xC)) 1.1 1 logF
(N/C). - This transforms into
- 1.1 logF x
- When this holds, moving the tuples out of the
- leaves improves performance.
29Key Compression Suffix Compression
30Synchronization on B-Trees What Is the Problem?
- B-Trees are fully redundant structures, which can
be reconstructed from the tuples therefore, no
synchronization should be required at all. - However, some queries operate on the index only.
This requires all operations on B-trees to be
serializable with the operations on the tuples. - Standard two-phase locking with the nodes as the
objects is not feasible for performance reasons.
31Protecting Tree Traversal
1. semaphore on Q
Node Q at level i
search path
2. follow search path
3. semaphore on R
Node R at level i1
4. release sem. on Q
32B-Trees and Value Locks
33Making Lock Names
- To implement value locking, we need to build
- lock names according to the following rule
- LockN TableName, IndexName, KeyValue.
- KeyValue in turn is a composite
- KeyValue AttributeValue, TupleIdentifier.
34Key Range Locking an B-Trees
- Simple retrieval k c
- Get a semaphore on leaf page get S-lock on
key range defined by largest existing value c1
with - c1 c hold lock until commit.
35Key Range Locking an B-Trees
- Range retrieval c1 lt k lt c2
- Get s semaphore on first leaf page get S-lock
on key defined by largest existing value c3 with
c3 lt c1 proceed sequentially along leaf level
request key range S-lock for each new attribute
value up to and including c2 do careful crabbing
across leaf pages hold S-lock until commit.
36Key Range Locking an B-Trees
- Insert c, ki
- Get X-semaphore on leaf page find largest
existing value c1 with c1 lt c request instant
IX-lock on c1 request long X-lock on c.
37Key Range Locking an B-Trees
Delete c, kd Get X-semaphore on leaf page
find largest existing value c1 with c1 lt c
request long IX-lock on c else request long
X-lock on c and c1.
38B-Tree Recovery
39B-Tree Recovery Based on Physiological Logging
- Cover all B-tree operations with semaphores on
all affected pages. - For each logical update a log record with the
logical UNDO operation must be moved to the log - While the update operation is being performed,
physical REDO log records are written. - After all REDO records are safely in the log, the
exclusive semaphores can be released.
40The Two Phases of B-Tree-Recovery
- Phase1 Go forward through the log up to its
current end, applying all REDO records to the
tree. - Phase2 Go backward to the Begin of transaction
record of the oldest incomplete transaction,
executing the UNDO operations on the tree for all
losers along the way.
41Other Access Path MethodsExtendible Hashing
42New Techniques
- Grid files Symmetric multi-dimensional point
access. Can become very unbalanced depending on
correlation in the data. - R-Trees Symmetric multi-dimensional access. Can
deteriorate depending on insertion strategy. - hb-Tree Symmetric multi-dimensional access. Can
turn into a DAG depending on deletion order.