Title: Yet%20More%20on%20Indexes
1Yet More on Indexes
Source our textbook, slides by
Hector Garcia-Molina
2Main Memory Hash Tables
- A hash function h maps search keys to integers in
some range 0 to B-1 - B is the number of buckets
- There is a B-element array, each entry holds a
pointer to a linked list - Record with key k is put in the linked list that
starts at entry h(k) of B.
3Changes for Secondary Storage
- Bucket array contains blocks, not pointers to
linked lists - Records that hash to a certain bucket are put in
the corresponding block - If a bucket overflows then start a chain of
overflow blocks
4Insertion into Static Hash Table
- To insert a record with key K
- compute h(K)
- insert record into one of the blocks in the chain
of blocks for bucket number h(K), adding a new
block to the chain if necessary
5EXAMPLE 2 records/bucket
0 1 2 3
- INSERT
- h(a) 1
- h(b) 2
- h(c) 1
- h(d) 0
h(e) 1
6Deletion from a Static Hash Table
- To delete records with key K
- Go to the bucket numbered h(K)
- Search for records with key K, deleting any that
are found - Possibly condense the chain of overflow blocks
for that bucket
7EXAMPLE deletion
Deleteef
0 1 2 3
a
b
d
c
c
e
f
g
8Rule of thumb
- Try to keep space utilization
- between 50 and 80
- Utilization keys used
- total keys that fit
9Efficiency of Static Hash Tables
- If the hash table size is large enough and the
distribution of keys by the hash function is
sufficiently "even", then most buckets have no
overflow blocks - In this case lookup typically takes one disk I/O
and insertion/deletion take two - Significantly better than sequential indexes and
B-trees - (But hash tables do not support efficient range
queries as B-trees do) - What if there are long overflow blocks?
10How do we cope with growth?
- Overflows and reorganizations
- Dynamic hashing
11Extensible Hash Tables
- Each bucket in the bucket array contains a
pointer to a block, instead of a block itself - Bucket array can grow by doubling in size
- Certain buckets can share a block if small enough
- hash function computes a sequence of k bits, but
only first i bits are used at any time to index
into the bucket array - Value of i can increase (corresponds to bucket
array doubling in size)
12Extensible hashing two ideas
- (a) Use i of b bits output by hash function
- b
- h(K) ?
- use i ? grows over time.
00110101
13- (b) Use directory
- h(K)i to bucket
. . .
. . .
14Inserting into Extensible Hash Table
- To insert record with key K
- compute h(K)
- go to bucket indexed by first i bits of h(K)
- follow the pointer to get to block B
- if room in B, insert record
- else let j be number of bits of hash value used
to determine membership in B
15Insertion cont'd
- Case 1 j lt i.
- split block B in two
- distribute records in B to the 2 new blocks based
on value of their (j1)-st bit - update header of each new block to j1
- adjust pointers in bucket array so that entries
that used to point to B now point to correct
block - if still no room in appropriate block for new
record then repeat this process
16Insertion cont'd
- Case 2 j i.
- increment i by 1
- double length of bucket array
- entry for w0 and w1 both point to same block that
old entry w pointed to (block is shared) - apply case 1 to split block B
17Example h(k) is 4 bits 2 keys/bucket
1
0001
1
1001
1100
Insert 1010
18Example continued
i
2
00 01 10 11
1
0001
0111
1001
1010
Insert 0111 0000
1100
19Example continued
2
0000
0001
i
2
00 01 10 11
2
0111
Insert 1001
20Extensible hashing deletion
- No merging of blocks
- Merge blocks and cut directory if possible
- (Reverse insert procedure)
21 Extensible hashing
Summary
- Can handle growing files
- - with less wasted space
- - with no full reorganizations
22Linear Hash Tables
- Number of buckets increases more slowly than with
extensible hashing - Number of buckets is such that on average each
block is x full (say 80) -- threshold - Overflow blocks can occur but average number per
bucket ltlt 1 - Use the i low-order bits from the result of the
hash function to index into the bucket array
23Linear hashing
- Another dynamic hashing scheme
(b) Bucket array grows linearly
24Inserting into Linear Hash Table
- To insert record with key K, with last i bits of
h(K) being a1a2ai - Let m be the integer represented by a1a2ai in
binary - If m lt n (number of buckets), then bucket m
exists -- put record in that bucket - If m n, then bucket m does not (yet) exist, so
put record in bucket whose index corresponds to
0a2ai
25Inserting cont'd
- If no room in indicated bucket, then create an
overflow bucket - Compare records / buckets to threshold
- If exceeds threshold then add a new bucket and
rearrange records - If number of buckets exceeds i, then increment i
by 1
26Example b4 bits, i 2, 2 keys/bucket
Future growth buckets
0101
0000
1111
1010
m 01 (max used block)
27Example b4 bits, i 2, 2 keys/bucket
Future growth buckets
0101
0000
1111
1010
m 01 (max used block)
28Example Continued How to grow beyond this?
i 2
1111
1010
0101
0000
0101
. . .
m 11 (max used block)
29 Linear Hashing
Summary
- Can handle growing files
- - with less wasted space
- - with no full reorganizations
-
- No indirection like extensible hashing
30Comparing Index Approaches
- Hashing good for probes given key
- e.g., SELECT
- FROM R
- WHERE R.A 5
31Indexing vs Hashing
- Sequential Indexes and B-trees good for
- Range Searches
- e.g., SELECT
- FROM R
- WHERE R.A gt 5
32Index definition in SQL
- Create index name on rel (attr)
- Create unique index name on rel (attr)
defines candidate key
33- CANNOT SPECIFY TYPE OF INDEX
- (e.g. B-tree, Hashing, )
- OR PARAMETERS
- (e.g. Load Factor, Size of Hash,...)
- ... at least in SQL...
Note