Title: Reading and Review Chapter 12: Indexing and Hashing
1Reading and ReviewChapter 12 Indexing and
Hashing
- 12.1 Basic Concepts
- 12.2 Ordered Indices
- 12.3 B-tree Index Files
- lots of stuff
- 12.4 B-tree Index Files
- 12.5 Static Hashing
- lots of stuff
- 12.6 Dynamic Hashing (Extendable Hashing)
- lots of stuff
- 12.7 Comparison of Ordered Indexing and Hashing
- 12.9 Multiple-Key Access
- Multiple-key indices, Grid files, Bitmap Indices
- 12.10 Summary
2Reading and ReviewChapter 13 Query Processing
- 13.1 Overview
- 13.2 Measures of Query Cost
- 13.3 Selection Operation
- 13.4 Sorting (Sort-Merge Algorithm)
- 13.5 Join Operation
- Nested-loop Join (regular, block, indexed)
- Merge Join
- Hash Join
- Complex Joins
- 13.6 Other Operations
- 13.7 Evaluation of Expressions
- 13.8 Summary
3Reading and Review Chapter 14 Query Optimization
- 14.1 Overview
- 14.2 Estimating Statistics of Expression Results
- Catalog information
- Selection size estimation
- Join size estimation
- Size estimation for other operations
- Estimation of number of distinct values
- 14.3 Transformation of Relational Expressions
- Equivalence rules and examples
- Join ordering, enumeration of equivalent
expressions - 14.4 Choice of Evaluation Plans
- from 14.4 on, ignore any section with a at
the end of the section title - 14.6 Summary
4Reading and Review Chapter 15 Transaction
Management
- 15.1 Transaction Concept
- ACID properties
- 15.2 Transaction State
- state diagram of a transaction
- (15.3 Implementation of Atomicity and Durability)
- not important -- we covered this material better
in section 17.4 - 15.4 Concurrent Execution and Scheduling
- 15.5 Serializability (Conflict Serializability)
- (ignore 15.5.2 View Serializability)
- 15.6 Recoverability
- (15.7 Isolation -- not important, see 16.1 on
locking instead) - 15.9 Testing for Serializability (Precedence
Graphs) - 15.10 Summary
5Reading and Review Chapters 16 and 17
- 16.1 Lock-Based Protocols
- granting locks
- deadlocks
- two-phase locking protocol
- ignore 16.1.4 and 16.1.5
- 16.6.3 Deadlock Detection and Recovery
- Wait-for Graph
- 17.1, 17.2.1 -- skim these sections. Material
should be familiar to you as background, but I
wont be testing on definitions or memorization
of it - 17.4 Log-Based Recovery
- logs, redo/undo, basic concepts
- checkpoints
6Indexing and Hashing Motivation
- Query response speed is a major issue in database
design - Some queries only need to access a very small
proportion of the records in a database - Find all accounts at the Perryridge bank branch
- Find the balance of account number A-101
- Find all accounts owned by Zephraim Cochrane
- Checking every single record for the queries
above is very inefficient and slow. - To allow fast access for those sorts of queries,
we create additional structures that we associate
with files indices (index files).
7Basic Concepts
- Indexing methods are used to speed up access to
desired data - e.g. Author card catalog in a library
- Search key -- an attribute or set of attributes
used to look up records in a file. This use of
the word key differs from that used before in
class. - An index file consists of records (index entries)
of the form - (search-key, pointer)
- where the pointer is a link to a location in the
original file - Index files are typically much smaller than the
original file - Two basic types of index
- ordered indices search keys are stored in sorted
order - hash indices search keys are distributed
uniformly across buckets using a hash function
8Index Evaluation Metrics
- We will look at a number of techniques for both
ordered indexing and hashing. No one technique
is best in all circumstances -- each has its
advantages and disadvantages. The factors that
can be used to evaluate different indices are - Access types supported efficiently.
- Finding records with a specified attribute value
- Finding records with attribute values within a
specified range of values - Access (retrieval) time. Finding a single
desired tuple in the file. - Insertion time
- Deletion time
- Update time
- Space overhead for the index
9Index Evaluation Metrics (2)
- Speed issues are
- Access time
- Insertion time
- Deletion time
- Update time
- Access is the operation that occurs the most
frequently, because it is also used for insert,
delete, update - For insert, delete, and update operations we must
consider not only the time for the operation
itself (inserting a new record into the file) but
also any time required to update the index
structure to reflect changes. - We will often want to have more than one index
for a file - e.g., card catalogs for author, subject, and
title in a library
10Ordered Indices
- An ordered index stores the values of the search
keys in sorted order - records in the original file may themselves be
stored in some sorted order (or not) - the original file may have several indices, on
different search keys - when there are multiple indices, a primary index
is an index whose search key also determines the
sort order of the original file. Primary indices
are also called clustering indices - secondary indices are indices whose search key
specifies an order different from the sequential
order of the file. Also called non-clustering
indices - an index-sequential file is an ordered sequential
file with a primary index.
11Dense and Sparse Indices
- A dense index is where an index record appears
for every search-key value in the file - A sparse index contains index records for only
some search-key values - applicable when records are sequentially ordered
on the search key - to locate a record with search-key value K we
must - find index record with largest search-key value
ltK - search file sequentially starting at the location
pointed to - stop (fail) when we hit a record with search-key
value gtK - less space and less maintenance overhead for
insert, delete - generally slower than dense index for locating
records
12Example of a Dense Index
13Example of a Sparse Index
14Problems with Index-Sequential Files
- Retrieve search until the key value or a larger
key value is found - individual key access BAD
- scan the file in order of the key GOOD
- Insert is hard -- all higher key records must be
shifted to place the new record - Delete may leave holes
- Update is equivalent to a combined delete and
insert - updating a search key field may cause the
combined disadvantages of an insert and a delete
by shifting the records location in the sorted
file - differential files are often used to hold the
recent updates until the database can be
reorganized off-line
15Multi-level Index
- If the primary index does not fit in memory,
access becomes expensive (disk reads cost) - To reduce the number of disk accesses to index
records, we treat the primary index kept on disk
as a sequential file and construct a sparse index
on it - If the outer index is still too large, we can
create another level of index to index it, and so
on - Indices at all levels must be updated on
insertion or deletion from the file
16Index Update Deletion
- When deletions occur in the primary file, the
index will sometimes need to change - If the deleted record was the only record in the
file with its particular search-key value, the
search-key is deleted from the index also - Single-level index deletion
- dense indices -- deletion of search-key is
similar to record deletion - sparse indices -- if an entry for the search key
exists in the index, it is replaced by the next
search key value in the original file (taken in
search-key order) - Multi-level index deletion is a simple extension
of the above
17Index Update Insertion
- As with deletions, when insertions occur in the
primary file, the index will sometimes need to
change - Single-level index insertion
- first perform a lookup using the search-key value
appearing in the record to be inserted - dense indices -- if the search-key value does not
appear in the index, insert it - sparse indices -- depends upon the design of the
sparse index. If the index is designed to store
an entry for each block of the file, then no
change is necessary to the index unless the
insertion creates a new block (if the old block
overflows). If that happens, the first
search-key value in the new block is inserted in
the index - multi-level insertion is a simple extension of
the above
18Secondary Index Motivation
- Good database design is to have an index to
handle all common (frequently requested) queries. - Queries based upon values of the primary key can
use the primary index - Queries based upon values of other attributes
will require other (secondary) indices.
19Secondary Indices vs. Primary Indices
- Secondary indices must be dense, with an index
entry for every search-key value, and a pointer
to every record in the file. A primary index may
be dense or sparse. (why?) - A secondary index on a candidate key looks just
like a dense primary index, except that the
records pointed to by successive values of the
index are not sequential - when a primary index is not on a candidate key it
suffices if the index points to the first record
with a particular value for the search key, as
subsequent records can be found with a sequential
scan from that point
20Secondary Indices vs. Primary Indices (2)
- When the search key of a secondary index is not a
candidate key (I.e., there may be more than one
tuple in the relation with a given search-key
value) it isnt enough to just point to the first
record with that value -- other records with that
value may be scattered throughout the file. A
secondary index must contain pointers to all the
records - so an index record points to a bucket that
contains pointers to every record in the file
with that particular search-key value
21Secondary Index on balance
22Primary and Secondary Indices
- As mentioned earlier
- secondary indices must be dense
- Indices offer substantial benefits when searching
for records - When a file is modified, every index on the file
must be updated - this overhead imposes limits on the number of
indices - relatively static files will reasonably permit
more indices - relatively dynamic files make index maintenance
quite expensive - Sequential scan using primary index is efficient
sequential scan on secondary indices is very
expensive - each record access may fetch a new block from
disk (oy!)
23Disadvantages of Index-Sequential Files
- The main disadvantage of the index-sequential
file organization is that performance degrades as
the file grows, both for index lookups and for
sequential scans through the data. Although this
degradation can be remedied through file
reorganization, frequent reorgs are expensive and
therefore undesirable. - Next well start examining index structures that
maintain their efficiency despite insertion and
deletion of data the B-tree (section 12.3)
24B- Tree Index Files
- Main disadvantage of ISAM files is that
performance degrades as the file grows, creating
many overflow blocks and the need for periodic
reorganization of the entire file - B- trees are an alternative to
indexed-sequential files - used for both primary and secondary indexing
- B- trees are a multi-level index
- B- tree index files automatically reorganize
themselves with small local changes on insertion
and deletion. - No reorg of entire file is required to maintain
performance - disadvantages extra insertion, deletion, and
space overhead - advantages outweigh disadvantages. B-trees are
used extensively
25B- Tree Index Files (2)
- Definition A B-tree of order n has
- All leaves at the same level
- balanced tree (B in the name stands for
balanced) - logarithmic performance
- root has between 1 and n-1 keys
- all other nodes have between n/2 and n-1 keys (gt
50 space utilization) - we construct the tree with order n such that one
node corresponds to one disk block I/O (in other
words, each disk page read brings up one full
tree node).
26B- Tree Index Files (3)
- A B-tree is a rooted tree satisfying the
following properties - All paths from root to tree are the same length
- Search for an index value takes time according to
the height of the tree (whether successful or
unsuccessful)
27B- Tree Node Structure
- The B-tree is constructed so that each node
(when full) fits on a single disk page - parameters B size of a block in bytes (e.g.,
4096) - K size of the key in bytes (e.g., 8)
- P size of a pointer in bytes (e.g., 4)
- internal node must have n such that
- (n-1)K nP lt B
- nlt (BK)/(KP)
- with the example values above, this becomes
- nlt(40968)/(84)4114/12
- nlt342.83
28B- Tree Node Structure (2)
- Typical B-tree Node
- Ki are the search-key values
- Pi are the pointers to children (for non-leaf
nodes) or pointers to records or buckets of
records (for leaf nodes) - the search keys in a node are ordered
- K1ltK2 ltK3 ltKn-1
29Non-Leaf Nodes in B-Trees
- Non-leaf nodes form a multi-level sparse index on
the leaf nodes. For a non-leaf node with n
pointers - all the search keys in the subtree to which P1
points are less than K1 - For 2lt i lt n-1, all the search keys in the
subtree to which Pi points have values greater
than or equal to Ki-1 and less than Kn-1
30Leaf Nodes in B-Trees
- As mentioned last class, primary indices may be
sparse indices. So B-trees constructed on a
primary key (that is, where the search key order
corresponds to the sort order of the original
file) can have the pointers of their leaf nodes
point to an appropriate position in the original
file that represents the first occurrence of that
key value. - Secondary indices must be dense indices.
B-trees constructed as a secondary index must
have the pointers of their leaf nodes point to a
bucket storing all locations where a given search
key value occur this set of buckets is often
called an occurrence file
31Example of a B-tree
- B-tree for the account file (n3)
32Another Example of a B-tree
- B-tree for the account file (n5)
- Leaf nodes must have between 2 and 4 values
- (?(n-1)/2? and (n-1), with n5)
- Non-leaf nodes other than the root must have
between 3 and 5 children - (?n/2? and n, with n5)
- Root must have at least 2 children
33Observations about B-trees
- Since the inter-node connections are done by
pointers, logically close blocks need not be
physically close - The non-leaf levels of the B-tree form a
hierarchy of sparse indices - The B-tree contains a relatively small number of
levels (logarithmic in the size of the main
file), thus searches can be conducted efficiently - Insertions and deletions to the main file can be
handled efficiently, as the index can be
restructured in logarithmic time (as we shall
examine later in class)
34Queries on B-trees
- Find all records with a search-key value of k
- start with the root node (assume it has m
pointers) - examine the node for the smallest search-key
value gt k - if we find such a value, say at Kj , follow the
pointer Pj to its child node - if no such k value exists, then k gt Km-1, so
follow Pm - if the node reached is not a leaf node, repeat
the procedure above and follow the corresponding
pointer - eventually we reach a leaf node. If we find a
matching key value (our search value k Ki for
some i) then we follow Pi to the desired record
or bucket. If we find no matching value, the
search is unsuccessful and we are done.
35Queries on B-trees (2)
- Processing a query traces a path from the root
node to a leaf node - If there are K search-key values in the file, the
path is no longer than ?log?n/2? (K)? - A node is generally the same size as a disk
block, typically 4 kilobytes, and n is typically
around 100 (40 bytes per index entry) - With 1 million search key values and n100, at
most log50(1,000,000) 4 nodes are accessed in a
lookup - In a balanced binary tree with 1 million search
key values, around 20 nodes are accessed in a
lookup - the difference is significant since every node
access might need a disk I/O, costing around 20
milliseconds
36Insertion on B-trees
- Find the leaf node in which the search-key value
would appear - If the search key value is already present, add
the record to the main file and (if necessary)
add a pointer to the record to the appropriate
occurrence file bucket - If the search-key value is not there, add the
record to the main file as above (including
creating a new occurrence file bucket if
necessary). Then - if there is room in the leaf node, insert
(key-value, pointer) in the leaf node - otherwise, overflow. Split the leaf node (along
with the new entry)
37Insertion on B-trees (2)
- Splitting a node
- take the n (search-key-value, pointer) pairs,
including the one being inserted, in sorted
order. Place half in the original node, and the
other half in a new node. - Let the new node be p, and let k be the least key
value in p. Insert (k, p) in the parent of the
node being split. - If the parent becomes full by this new insertion,
split it as described above, and propogate the
split as far up as necessary - The splitting of nodes proceeds upwards til a
node that is not full is found. In the worst
case the root node may be split, increasing the
height of the tree by 1.
38Insertion on B-trees Example
39Deletion on B-trees
- Find the record to be deleted, and remove it from
the main file and the bucket (if necessary) - If there is no occurrence-file bucket, or if the
deletion caused the bucket to become empty, then
delete (key-value, pointer) from the B-tree
leaf-node - If the leaf-node now has too few entries,
underflow has occurred. If the active leaf-node
has a sibling with few enough entries that the
combined entries can fit in a single node, then - combine all the entries of both nodes in a single
one - delete the (K,P) pair pointing to the deleted
node from the parent. Follow this procedure
recursively if the parent node underflows.
40Deletion on B-trees (2)
- Otherwise, if no sibling node is small enough to
combine with the active node without causing
overflow, then - Redistribute the pointers between the active node
and the sibling so that both of them have
sufficient pointers to avoid underflow - Update the corresponding search key value in the
parent node - No deletion occurs in the parent node, so no
further recursion is necessary in this case. - Deletions may cascade upwards until a node with
?n/2? or more pointers is found. If the root
node has only one pointer after deletion, it is
removed and the sole child becomes the root
(reducing the height of the tree by 1)
41Deletion on B-trees Example 1
42Deletion on B-trees Example 2
43Deletion on B-trees Example 3
44B-tree File Organization
- B-Tree Indices solve the problem of index file
degradation. The original data file will still
degrade upon a stream of insert/delete
operations. - Solve data-file degradation by using a B-tree
file organization - Leaf nodes in a B-tree file organization store
records, not pointers into a separate original
datafile - since records are larger than pointers, the
maximum number of recrods that can be stored in a
leaf node is less than the number of pointers in
a non-leaf node - leaf nodes must still be maintained at least half
full - insert and delete are handled in the same was as
insert and delete for entries in a B-tree index
45B-tree File Organization Example
- Records are much bigger than pointers, so good
space usage is important - To improve space usage, involve more sibling
nodes in redistribution during splits and merges
(to avoid split/merge when possible) - involving one sibling guarantees 50 space use
- involving two guarantees at least 2/3 space use,
etc.
46B-tree Index Files
- B-trees are similar to B-trees, but search-key
values appear only once in the index (eliminates
redundant storage of key values) - search keys in non-leaf nodes dont appear in the
leaf nodes, so an additional pointer field for
each search key in a non-leaf node must be stored
to point to the bucket or record for that key
value - leaf nodes look like B-tree leaf nodes
- (P1, K1, P2, K2, , Pn)
- non-leaf nodes look like so
- (P1, B1, K1, P2, B2, K2, , Pn)
- where the Bi are pointers to buckets or file
records.
47B-tree Index File Example
48B-tree Index Files (cont.)
- Advantages of B-tree Indices (vs. B-trees)
- May use less tree nodes than a B-tree on the
same data - Sometimes possible to find a specific key value
before reaching a leaf node - Disadvantages of B-tree Indices
- Only a small fraction of key values are found
early - Non-leaf nodes are larger, so fanout is reduced,
and B-trees may be slightly taller than B-trees
on the same data - Insertion and deletion are more complicated than
on B-trees - Implementation is more difficult than B-trees
- In general, advantages dont outweigh
disadvantages
49Hashing
- Weve examined Ordered Indices (design based upon
sorting or ordering search key values) the other
type of major indexing technique is Hashing - Underlying concept is very simple
- observation small files dont require indices or
complicated search methods - use some clever method, based upon the search
key, to split a large file into a lot of little
buckets - each bucket is sufficiently small
- use the same method to find the bucket for a
given search key
50Hashing Basics
- A bucket is a unit of storage containing one or
more records (typically a bucket is one disk
block in size) - In a hash file organization we find the bucket
for a record directly from its search-key value
using a hash function - A hash function is a function that maps from the
set of all search-key values K to the set of all
bucket addresses B - The hash function is used to locate records for
access, insertion, and deletion - Records with different search-key values may be
mapped to the same bucket - the entire bucket must be searched to find a
record - buckets are designed to be small, so this task is
usually not onerous
51Hashed File Example
- So we
- divide the set of disk blocks that make up the
file into buckets - devise a hash function that maps each key value
into a bucket - V set of key values
- B number of buckets
- H hashing function H V--gt (0, 1, 2, 3, , B-1)
- Example V 9 digit SS B1000 H key modulo
1000
52Hash Functions
- To search/insert/delete/modify a key do
- compute H(k) to get the bucket number
- search sequentially in the bucket (heap
organization within each bucket) - Choosing H almost any function that generates
random numbers in the range 0, B-1 - try to distribute the keys evenly into the B
buckets - one rule of thumb when using MOD -- use a prime
number
53Hash Functions (2)
- Collision is when two or more key values go to
the same bucket - too many collisions increases search time and
degrades performance - no or few collisions means that each bucket has
only one (or very few) key(s) - Worst-case hash functions map all search keys to
the same bucket
54Hash Functions (3)
- Ideal hash functions are uniform
- each bucket is assigned the same number of
search-key values from the set of all possible
values - Ideal hash functions are random
- each bucket has approximately the same number of
records assigned to it irrespective of the actual
distribution of search-key values in the file - Finding a good hash function is not always easy
55Examples of Hash Functions
- Given 26 buckets and a string-valued search key,
consider the following possible hash functions - Hash based upon the first letter of the string
- Hash based upon the last letter of the string
- Hash based upon the middle letter of the string
- Hash based upon the most common letter in the
string - Hash based upon the average letter in the
string the sum of the letters (using A0, B1,
etc) divided by the number of letters - Hash based upon the length of the string (modulo
26) - Typical hash functions perform computation on the
internal binary representation of the search key - example searching on a string value, hash based
upon the binary sum of the characters in the
string, modulo the number of buckets
56Overflow
- Overflow is when an insertion into a bucket cant
occur because it is full. - Overflow can occur for the following reasons
- too many records (not enough buckets)
- poor hash function
- skewed data
- multiple records might have the same search key
- multiple search keys might be assigned the same
bucket
57Overflow (2)
- Overflow is handled by one of two methods
- chaining of multiple blocks in a bucket, by
attaching a number of overflow buckets together
in a linked list - double hashing use a second hash function to
find another (hopefully non-full) bucket - in theory we could use the next bucket that had
space this is often called open hashing or
linear probing. This is often used to construct
symbol tables for compilers - useful where deletion does not occur
- deletion is very awkward with linear probing, so
it isnt useful in most database applications
58Hashed File Performance Metrics
- An important performance measure is the loading
factor - (number of records)/(Bf)
- B is the number of buckets
- f is the number of records that will fit in a
single bucket - when loading factor too high (file becomes too
full), double the number of buckets and rehash
59Hashed File Performance
- (Assume that the hash table is in main memory)
- Successful search best case 1 block worst case
every chained bucket average case half of worst
case - Unsuccessful search always hits every chained
bucket (best case, worst case, average case) - With loading factor around 90 and a good hashing
function, average is about 1.2 blocks - Advantage of hashing very fast for exact queries
- Disadvantage records are not sorted in any
order. As a result, it is effectively impossible
to do range queries
60Hash Indices
- Hashing can be used for index-structure creation
as well as for file organization - A hash index organizes the search keys (and their
record pointers) into a hash file structure - strictly speaking, a hash index is always a
secondary index - if the primary file was stored using the same
hash function, an additional, separate primary
hash index would be unnecessary - We use the term hash index to refer both to
secondary hash indices and to files organized
using hashing file structures
61Example of a Hash Index
- Hash index into file
- account, on search key
- account-number
- Hash function computes
- sum of digits in account
- number modulo 7.
- Bucket size is 2
62Static Hashing
- Weve been discussing static hashing the hash
function maps search-key values to a fixed set of
buckets. This has some disadvantages - databases grow with time. Once buckets start to
overflow, performance will degrade - if we attempt to anticipate some future file size
and allocate sufficient buckets for that expected
size when we build the database initially, we
will waste lots of space - if the database ever shrinks, space will be
wasted - periodic reorganization avoids these problems,
but is very expensive - By using techniques that allow us to modify the
number of buckets dynamically (dynamic hashing)
we can avoid these problems - Good for databases that grow and shrink in size
- Allows the hash function to be modified
dynamically
63Dynamic Hashing
- One form of dynamic hashing is extendable hashing
- hash function generates values over a large range
-- typically b-bit integers, with b being
something like 32 - At any given moment, only a prefix of the hash
function is used to index into a table of bucket
addresses - With the prefix at a given moment being j, with
0ltjlt32, the bucket address table size is 2j - Value of j grows and shrinks as the size of the
database grows and shrinks - Multiple entries in the bucket address table may
point to a bucket - Thus the actual number of buckets is lt 2j
- the number of buckets also changes dynamically
due to coalescing and splitting of buckets
64General Extendable Hash Structure
65Use of Extendable Hash Structure
- Each bucket j stores a value ij all the entries
that point to the same bucket have the same
values on the first ij bits - To locate the bucket containing search key Kj
- compute H(Kj) X
- Use the first i high order bits of X as a
displacement into the bucket address table and
follow the pointer to the appropriate bucket - T insert a record with search-key value Kj
- follow lookup procedure to locate the bucket, say
j - if there is room in bucket j, insert the record
- Otherwise the bucket must be split and insertion
reattempted - in some cases we use overflow buckets instead (as
explained shortly)
66Splitting in Extendable Hash Structure
- To split a bucket j when inserting a record with
search-key value Kj - if igt ij (more than one pointer in to bucket j)
- allocate a new bucket z
- set ij and iz to the old value ij incremented by
one - update the bucket address table (change the
second half of the set of entries pointing to j
so that they now point to z) - remove all the entries in j and rehash them so
that they either fall in z or j - reattempt the insert (Kj). If the bucket is
still full, repeat the above.
67Splitting in Extendable Hash Structure (2)
- To split a bucket j when inserting a record with
search-key value Kj - if i ij (only one pointer in to bucket j)
- increment i and double the size of the bucket
address table - replace each entry in the bucket address table
with two entries that point to the same bucket - recompute new bucket address table entry for Kj
- now igt ij so use the first case described earlier
- When inserting a value, if the bucket is still
full after several splits (that is, i reaches
some preset value b), give up and create an
overflow bucket rather than splitting the bucket
entry table further - how might this occur?
68Deletion in Extendable Hash Structure
- To delete a key value Kj
- locate it in its bucket and remove it
- the bucket itself can be removed if it becomes
empty (with appropriate updates to the bucket
address table) - coalescing of buckets is possible
- can only coalesce with a buddy bucket having
the same value of ij and same ij -1prefix, if one
such bucket exists - decreasing bucket address table size is also
possible - very expensive
- should only be done if the number of buckets
becomes much smaller than the size of the table
69Extendable Hash Structure Example
- Hash function
- on branch name
- Initial hash table
- (empty)
70Extendable Hash Structure Example (2)
- Hash structure after insertion of one Brighton
and two Downtown records
71Extendable Hash Structure Example (3)
- Hash structure after insertion of Mianus record
72Extendable Hash Structure Example (4)
- Hash structure after insertion of three
Perryridge records
73Extendable Hash Structure Example (5)
- Hash structure after insertion of Redwood and
Round Hill records
74Extendable Hashing vs. Other Hashing
- Benefits of extendable hashing
- hash performance doesnt degrade with growth of
file - minimal space overhead
- Disadvantages of extendable hashing
- extra level of indirection (bucket address table)
to find desired record - bucket address table may itself become very big
(larger than memory) - need a tree structure to locate desired record in
the structure! - Changing size of bucket address table is an
expensive operation - Linear hashing is an alternative mechanism which
avoids these disadvantages at the possible cost
of more bucket overflows
75ComparisonOrdered Indexing vs. Hashing
- Each scheme has advantages for some operations
and situations. To choose wisely between
different schemes we need to consider - cost of periodic reorganization
- relative frequency of insertions and deletions
- is it desirable to optimize average access time
at the expense of worst-case access time? - What types of queries do we expect?
- Hashing is generally better at retrieving records
for a specific key value - Ordered indices are better for range queries
76Index Definition in SQL
- Create an index
- create index ltindex-namegt on ltrelation-namegt
(ltattribute-listgt) - e.g.,
- create index br-index on branch(branch-name)
- Use create unique index to indirectly specify and
enforce the condition that the search key is a
candidate key - not really required if SQL unique integrity
constraint is supported - To drop an index
- drop index ltindex-namegt
77Multiple-Key Access
- With some queries we can use multiple indices
- Example select account-number
- from account
- where branch-namePerryridge and
balance1000 - Possible strategies for processing this query
using indices on single attributes - use index on balance to find accounts with
balances 1000, then test them individually to
see if branch-namePerryridge - use index on branch-name to find accounts with
branch-namePerryridge, then test them
individually to see if balances 1000 - use branch-name index to find pointers to all
records of the Perryridge branch, and use balance
index similarly, then take intersection of both
sets of pointers
78Multiple-Key Access (2)
- With some queries using a single-attribute index
is unnecessarily expensive - with methods (1) and (2) from the earlier slide,
we might have the index we use return a very
large set, even though the final result is quite
small - Even with method (3) (use both indices and then
find the intersection) we might have both indices
return a large set, which will make for a lot of
unneeded work if the final result (the
intersection) is small - An alternative strategy is to create and use an
index on more than one attribute -- in this
example, an index on (branch-name, balance) (both
attributes)
79Indices on Multiple Attributes
- Suppose we have an ordered index on the combined
search-key (branch-name, balance) - Examining the earlier query with the clause
- where branch-namePerryridge and balance1000
- Our new index will fetch only records that
satisfy both conditions -- much more efficient
than trying to answer the query with separate
single-valued indices - We can also handle range queries like
- where branch-namePerryridge and balancelt1000
- But our index will not work for
- where branch-nameltPerryridge and balance1000
80Multi-Attribute Indexing
- Example EMP(eno, ename, age, sal)
- lots of ways to handle this problem
- separate indices lots of false hits
- combined index based on composite key
- key sal100 age
- search for 30ltsallt40 translates into
3000ltkeylt4000 (easy) - search for 60ltagelt80 difficult
- Grid files (up next)
- R-tree (B-tree generalization)
- Quad-trees, K-d trees, etc...
81Grid Files
- Structure used to speed processing of general
multiple search-key queries involving one or more
comparison operators - The grid file has
- a single grid array
- array has number of dimensions equal to the
number of search-key attributes - one linear scale for each search-key attribute
- Multiple cells of the grid array can point to the
same bucket - To find the bucket for a search-key value, locate
the row and column of the cell using the linear
scales to get the grid location, then follow the
pointer in that grid location to the bucket
82Example Grid File for account
83Queries on a Grid File
- A grid file on two attributes A and B can handle
queries of all the following forms with
reasonable efficiency - (a1 lt A lt a2)
- (b1 lt B lt b2)
- (a1 lt A lt a2 and b1 lt B lt b2)
- For example, to answer (a1 lt A lt a2 and b1 lt B lt
b2), use the linear scales to find corresponding
candidate grid array cells, and look up all the
buckets pointed-to from those cells
84Grid Files (cont)
- During insertion, if a bucket becomes full, a new
bucket can be created if more than one cell
points to it - idea similar to extendable hashing, but in
multiple dimensions - if only one cell points to the bucket, either an
overflow bucket must be created or the grid array
size must be increased - Linear scales can be chosen to uniformly
distribute records across buckets - not necessary to have scale uniform across the
domain -- if records are distributed in some
other pattern, the linear scale can mirror it (as
shown in the example earlier)
85Grid Files (end)
- Periodic re-organization to increase grid array
size helps with good performance - reorganization can be very expensive, though
- Space overhead of the grid array can be high
- R-trees (chapter 23) are an alternative
86Bitmap Indices
- Bitmap indices are a special type of index
designed for efficient queries on multiple keys - Records in a relation are assumed to be numbered
sequentially from 0 - given a number n the objective is for it to be
easy to retrieve record n - very easy to achieve if were looking at
fixed-length records - Bitmap indices are applicable on attributes that
take on a relatively small number of distinct
values - e.g. gender, country, state, hockey team
- or an arbitrary mapping of a wider spectrum of
values into a small number of categories (e.g.
income level divide income into a small number
of levels such as 0-9999, 10K-19999, 20K-49999,
50K and greater) - A bitmap is simply an array of bits
87Bitmap Indices (cont)
- In the simplest form a bitmap index on an
attribute has a bitmap for each value of the
attribute - bitmap has as many bits as there are records
- in a bitmap for value v, the bit for a record is
1 if the record has the value v for the
attribute, and 0 otherwise
88Bitmap Indices (cont)
- Bitmap indices are useful for queries on multiple
attributes - not particularly useful for single-attribute
queries - Queries are answered using bitmap operations
- intersection (and)
- union (or)
- complementation (not)
- Each operation takes two bitmaps of the same size
and applies the operation on corresponding bits
to get the result bitmap - e.g. 100110 and 110011 100010
- 100110 or 110011 110111
- not 100110 011001
89Bitmap Indices (cont)
- Every 1 bit in the result marks a desired tuple
- can compute its location (since records are
numbered sequentially and are all of the same
size) and retrieve the records - counting number of tuples in the result (SQL
count aggregate) is even faster - Bitmap indices are generally very small compared
to relation size - e.g. if record is 100 bytes, space for a single
bitmap on all tuples of the relation takes 1/800
of the size of the relation itself - if bitmap allows for 8 distinct values, bitmap is
only 1 of the size of the whole relation
90Bitmap Indices (cont)
- Deletion needs to be handled properly
- cant just store zero values at deleted
locations (why?) - need existence bitmap to note if the record at
location X is valid or not - existence bitmap necessary for complementation
- not(Av) (not bitmap-A-v) and ExistenceBitmap
- Should keep bitmaps for all values, even null
- to correctly handle SQL null semantics for not
(Av) must compute - not(bitmap-A-Null) and (not bitmap-A-v) and
ExistenceBitmap
91Efficient Implementation of Bitmap Operations
- Bitmaps are packed into words a single word and
is a basic CPU instruction that computes the and
of 32 or 64 bits at once - e.g. one-million-bit maps can be anded with just
31,250 instructions - Counting number of 1s can be done fast by a
trick - use each byte to index into a precomputed array
of 256 elements each storing the count of 1s in
the binary representation - add up the retrieved counts
- can use pairs of bytes to speed up further in a
similar way, but at a higher memory cost (64K
values in precomputed array for 2 byte)
92Bitmaps and B-trees
- Bitmaps can be used instead of Tuple-ID lists at
leaf levels of B-trees, for values that have a
large number of matching records - assuming a tuple-id is 64 bits, this becomes
worthwhile if gt1/64 of the records have a given
value - This technique merges benefits of bitmap and
B-tree indices - useful if some values are uncommon, and some are
quite common - why not useful if there are only a small number
of different values total (16, 20, something less
than 64)? - That covers all of chapter 12.
93Query Processing
- SQL is good for humans, but not as an internal
(machine) representation of how to calculate a
result - Processing an SQL (or other) query requires these
steps - parsing and translation
- turning the query into a useful internal
representation in the extended relational algebra - optimization
- manipulating the relational algebra query into
the most efficient form (one that gets results
the fastest) - evaluation
- actually computing the results of the query
94Query Processing Diagram
95Query Processing Steps
- 1. parsing and translation
- details of parsing are covered in other places
(texts and courses on compilers). Weve already
covered SQL and relational algebra translating
between the two should be relatively familiar
ground - 2. optimization
- This is the meat of chapter 13. How to figure
out which plan, among many, is the best way to
execute a query - 3. evaluation
- actually computing the results of the query is
mostly mechanical (doesnt require much
cleverness) once a good plan is in place.
96Query Processing Example
- Initial query select balance
- from account
- where balancelt2500
- Two different relational algebra expressions
could represent this query - sel balancelt2500(Pro balance(account))
- Pro balance( sel balancelt2500(account))
- which choice is better? It depends upon metadata
(data about the data) and what indices are
available for use on these operations.
97Query Processing Metadata
- Cost parameters (some are easy to maintain some
are very hard -- this is statistical info
maintained in the systems catalog) - n(r ) number of tuples in relation r
- b(r ) number of disk blocks containing tuples of
relation r - s(r ) average size of a tuple of relation r
- f(r ) blocking factor of r how many tuples fit
in a disk block - V(A,r) number of distinct values of attribute A
in r. (V(A,r)n(r ) if A is a candidate key) - SC(A,r) average selectivity cardinality factor
for attribute A of r. Equivalent to n(r
)/V(A,r). (1 if A is a key) - min(A,r) minimum value of attribute A in r
- max(A,r) maximum value of attribute A in r
98Query Processing Metadata (2)
- Cost parameters are used in two important
computations - I/O cost of an operation
- the size of the result
- In the following examination well find it useful
to differentiate three important operations - Selection (search) for equality (R.A1c)
- Selection (search) for inequality (R.A1gtc) (range
queries) - Projection on attribute A1
99Selection for Equality (no indices)
- Selection (search) for equality (R.A1c)
- cost (sequential search on a sorted relation)
- b(r )/2 average unsuccessful
- b(r )/2 SC(A1,r) -1 average successful
- cost (binary search on a sorted relation)
- log b(r ) average unsuccessful
- log b(r ) SC(A1,r) -1 average successful
- size of the result n(select(R.A1c))
- SC(A1,r) n(r )/V(A1,r)
100Selection for Inequality (no indices)
- Selection (search) for inequality (R.A1gtc)
- cost (file unsorted)
- b(r )
- cost (file sorted on A1)
- b(r )/2 b(r )/2 (if we assume that half the
tuples qualify) - b(r ) in general
- (regardless of the number of tuples that
qualify. Why?) - size of the result
- depends upon the query unpredictable
101Projection on A1
- Projection on attribute A1
- cost
- b(r )
- size of the result n(Pro(R,A1))
- V(A1,r)
102Selection (Indexed Scan) for Equality
- Primary Index on key
- cost (height1) unsuccessful
- cost (height1) 1 successful
- Primary (clustering) Index on non-key
- cost (height1) SC(A1,r)/f(r )
- all tuples with the same value are clustered
- Secondary Index
- cost (height1) SC(A1,r)
- tuples with the same value are scattered
103Selection (Indexed Scan) for Inequality
- Primary Index on key search for first value and
then pick tuples gt value - cost (height1) 1 size of the result (in disk
pages) - height2 n(r ) (max(A,r)-c)/(max(A,r)-min(A
,r))/f(r ) - Primary (clustering) Index on non-key
- cost as above (all tuples with the same value are
clustered) - Secondary (non-clustering) Index
- cost (height1) B-treeLeaves/2 size of
result (in tuples) - height1 B-treeLeaves/2 n(r )
(max(A,r)-c)/(max(A,r)-min(A,r))
104Complex Selections
- Conjunction (select where theta1 and theta2)
- (s1 of tuples satisfying selection
condition theta1) - combined SC (s1/n(r )) (s2/n(r )) s1s2/n(r
)2 - assuming independence of predicates
- Disjunction (select where theta1 or theta2)
- combined SC 1 - (1 - s1/n(r )) (1 - s2/n(r ))
- s1/n(r )) s2/n(r ) - s1s2/n(r )2
- Negation (select where not theta1)
- n(! Theta1) n(r ) - n(Theta1)
105Complex Selections with Indices
- GOAL apply the most restrictive condition first
and combined use of multiple indices to reduce
the intermediate results as early as possible - Why? No index will be available on intermediate
results! - Conjunctive selection using one index B
- select using B and then apply remaining
predicates on intermediate results - Conjunctive selection using a composite key index
(R.A1, R.A2) - create a composite key or range from the query
values and search directly (range search on the
first attribute (MSB of the composite key) only) - Conjunctive selection using two indices B1 and
B2 - search each separately and intersect the tuple
identifiers (TIDs) - Disjunctive selection using two indices B1 and
B2 - search each separately and union the tuple
identifiers (TIDs)
106Sorting?
- What has sorting to do with query processing?
- SQL queries can specify the output be sorted
- several relational operations (such as joins) can
be implemented very efficiently if the input data
is sorted first - as a result, query processing is often concerned
with sorting temporary (intermediate) and final
results - creating a secondary index on the active relation
(logical sorting) isnt sufficient --
sequential scans through the data on secondary
indices are very inefficient. We often need to
sort the data physically into order
107Sorting
- We differentiate two types of sorting
- internal sorting the entire relation fits in
memory - external sorting the relation is too large to
fit in memory - Internal sorting can use any of a large range of
well-established sorting algorithms (e.g.,
Quicksort) - In databases, the most commonly used method for
external sorting is the sort-merge algorithm.
(based upon Mergesort)
108Sort-merge Algorithm
- create runs phase.
- Load in M consecutive blocks of the relation (M
is number of blocks that will fit easily in main
memory) - Use some internal sorting algorithm to sort the
tuples in those M blocks - Write the sorted run to disk
- Continue with the next M blocks, etcetera, until
finished - merge runs phase (assuming that the number of
runs, N, is less than M) - load the first block of each run into memory
- grab the first tuple (lowest value) from all the
runs and write it to an output buffer page - when the last tuple of a block is read, grab a
new block from that run - when the output buffer page is full, write it to
disk and start a new one - continue until all buffer pages are empty
109Sort-merge Algorithm (2)
- Merge-runs phase (NgtM)
- operate on M runs at a time, creating runs of
length M2, and continue in multiple passes of the
Merge operation - Cost of sorting b(r ) is the number of blocks
occupied by relation r - runs phase does one read, one write on each block
of r cost 2b(r ) - total number of runs (N) b(r )/M
- number of passes in merge operation 1 if NltM
- otherwise logM-1(b(r )/M)
- during each pass in the merge phase we read the
whole relation and write it all out again cost
2b(r ) per pass - total cost of merge phase is therefore 2b(r )
(logM-1(b(r )/M)1) - if only one merge pass is required (NltM) the cost
is 4b(r ) - if Mgtb(r ) then there is only one run (internal
sorting) and the cost is b(r )
110Join Operation
- Join is perhaps the most important operation
combining two relations - Algorithms computing the join efficiently are
crucial to the optimization phase of query
processing - We will examine a number of algorithms for
computing joins