Yet%20More%20on%20Indexes - PowerPoint PPT Presentation

About This Presentation

Title:

Yet%20More%20on%20Indexes

Description:

B is the number of buckets ... In this case lookup typically takes one disk I/O and insertion/deletion take two ... (Reverse insert procedure) 21. Extensible ... – PowerPoint PPT presentation

Number of Views:24

Avg rating:3.0/5.0

Slides: 33

Provided by: JeffU4

Learn more at: https://people.engr.tamu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Yet%20More%20on%20Indexes

1
Yet More on Indexes

Hash Tables

Source our textbook, slides by
Hector Garcia-Molina
2
Main Memory Hash Tables

A hash function h maps search keys to integers in
some range 0 to B-1
B is the number of buckets
There is a B-element array, each entry holds a
pointer to a linked list
Record with key k is put in the linked list that
starts at entry h(k) of B.

3
Changes for Secondary Storage

Bucket array contains blocks, not pointers to
linked lists
Records that hash to a certain bucket are put in
the corresponding block
If a bucket overflows then start a chain of
overflow blocks

4
Insertion into Static Hash Table

To insert a record with key K
compute h(K)
insert record into one of the blocks in the chain
of blocks for bucket number h(K), adding a new
block to the chain if necessary

5
EXAMPLE 2 records/bucket
0 1 2 3

INSERT
h(a) 1
h(b) 2
h(c) 1
h(d) 0

h(e) 1
6
Deletion from a Static Hash Table

To delete records with key K
Go to the bucket numbered h(K)
Search for records with key K, deleting any that
are found
Possibly condense the chain of overflow blocks
for that bucket

7
EXAMPLE deletion
Deleteef
0 1 2 3
a
b
d
c
c
e
f
g
8
Rule of thumb

Try to keep space utilization
between 50 and 80
Utilization keys used
total keys that fit

9
Efficiency of Static Hash Tables

If the hash table size is large enough and the
distribution of keys by the hash function is
sufficiently "even", then most buckets have no
overflow blocks
In this case lookup typically takes one disk I/O
and insertion/deletion take two
Significantly better than sequential indexes and
B-trees
(But hash tables do not support efficient range
queries as B-trees do)
What if there are long overflow blocks?

10
How do we cope with growth?

Overflows and reorganizations
Dynamic hashing

11
Extensible Hash Tables

Each bucket in the bucket array contains a
pointer to a block, instead of a block itself
Bucket array can grow by doubling in size
Certain buckets can share a block if small enough
hash function computes a sequence of k bits, but
only first i bits are used at any time to index
into the bucket array
Value of i can increase (corresponds to bucket
array doubling in size)

12
Extensible hashing two ideas

(a) Use i of b bits output by hash function
b
h(K) ?
use i ? grows over time.

00110101
13

(b) Use directory
h(K)i to bucket

. . .
. . .
14
Inserting into Extensible Hash Table

To insert record with key K
compute h(K)
go to bucket indexed by first i bits of h(K)
follow the pointer to get to block B
if room in B, insert record
else let j be number of bits of hash value used
to determine membership in B

15
Insertion cont'd

Case 1 j lt i.
split block B in two
distribute records in B to the 2 new blocks based
on value of their (j1)-st bit
update header of each new block to j1
adjust pointers in bucket array so that entries
that used to point to B now point to correct
block
if still no room in appropriate block for new
record then repeat this process

16
Insertion cont'd

Case 2 j i.
increment i by 1
double length of bucket array
entry for w0 and w1 both point to same block that
old entry w pointed to (block is shared)
apply case 1 to split block B

17
Example h(k) is 4 bits 2 keys/bucket
1

0001
1
1001
1100
Insert 1010
18
Example continued
i
2
00 01 10 11
1
0001
0111
1001
1010
Insert 0111 0000
1100
19
Example continued
2
0000
0001
i
2
00 01 10 11
2
0111
Insert 1001
20
Extensible hashing deletion

No merging of blocks
Merge blocks and cut directory if possible
(Reverse insert procedure)

21
Extensible hashing
Summary

Can handle growing files
- with less wasted space
- with no full reorganizations

22
Linear Hash Tables

Number of buckets increases more slowly than with
extensible hashing
Number of buckets is such that on average each
block is x full (say 80) -- threshold
Overflow blocks can occur but average number per
bucket ltlt 1
Use the i low-order bits from the result of the
hash function to index into the bucket array

23
Linear hashing

Another dynamic hashing scheme

(b) Bucket array grows linearly
24
Inserting into Linear Hash Table

To insert record with key K, with last i bits of
h(K) being a1a2ai
Let m be the integer represented by a1a2ai in
binary
If m lt n (number of buckets), then bucket m
exists -- put record in that bucket
If m n, then bucket m does not (yet) exist, so
put record in bucket whose index corresponds to
0a2ai

25
Inserting cont'd