Yet%20More%20on%20Indexes - PowerPoint PPT Presentation

About This Presentation
Title:

Yet%20More%20on%20Indexes

Description:

B is the number of buckets ... In this case lookup typically takes one disk I/O and insertion/deletion take two ... (Reverse insert procedure) 21. Extensible ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 33
Provided by: JeffU4
Category:

less

Transcript and Presenter's Notes

Title: Yet%20More%20on%20Indexes


1
Yet More on Indexes
  • Hash Tables

Source our textbook, slides by
Hector Garcia-Molina
2
Main Memory Hash Tables
  • A hash function h maps search keys to integers in
    some range 0 to B-1
  • B is the number of buckets
  • There is a B-element array, each entry holds a
    pointer to a linked list
  • Record with key k is put in the linked list that
    starts at entry h(k) of B.

3
Changes for Secondary Storage
  • Bucket array contains blocks, not pointers to
    linked lists
  • Records that hash to a certain bucket are put in
    the corresponding block
  • If a bucket overflows then start a chain of
    overflow blocks

4
Insertion into Static Hash Table
  • To insert a record with key K
  • compute h(K)
  • insert record into one of the blocks in the chain
    of blocks for bucket number h(K), adding a new
    block to the chain if necessary

5
EXAMPLE 2 records/bucket
0 1 2 3
  • INSERT
  • h(a) 1
  • h(b) 2
  • h(c) 1
  • h(d) 0

h(e) 1
6
Deletion from a Static Hash Table
  • To delete records with key K
  • Go to the bucket numbered h(K)
  • Search for records with key K, deleting any that
    are found
  • Possibly condense the chain of overflow blocks
    for that bucket

7
EXAMPLE deletion
Deleteef
0 1 2 3
a
b
d
c
c
e
f
g
8
Rule of thumb
  • Try to keep space utilization
  • between 50 and 80
  • Utilization keys used
  • total keys that fit

9
Efficiency of Static Hash Tables
  • If the hash table size is large enough and the
    distribution of keys by the hash function is
    sufficiently "even", then most buckets have no
    overflow blocks
  • In this case lookup typically takes one disk I/O
    and insertion/deletion take two
  • Significantly better than sequential indexes and
    B-trees
  • (But hash tables do not support efficient range
    queries as B-trees do)
  • What if there are long overflow blocks?

10
How do we cope with growth?
  • Overflows and reorganizations
  • Dynamic hashing

11
Extensible Hash Tables
  • Each bucket in the bucket array contains a
    pointer to a block, instead of a block itself
  • Bucket array can grow by doubling in size
  • Certain buckets can share a block if small enough
  • hash function computes a sequence of k bits, but
    only first i bits are used at any time to index
    into the bucket array
  • Value of i can increase (corresponds to bucket
    array doubling in size)

12
Extensible hashing two ideas
  • (a) Use i of b bits output by hash function
  • b
  • h(K) ?
  • use i ? grows over time.

00110101
13
  • (b) Use directory
  • h(K)i to bucket

. . .
. . .
14
Inserting into Extensible Hash Table
  • To insert record with key K
  • compute h(K)
  • go to bucket indexed by first i bits of h(K)
  • follow the pointer to get to block B
  • if room in B, insert record
  • else let j be number of bits of hash value used
    to determine membership in B

15
Insertion cont'd
  • Case 1 j lt i.
  • split block B in two
  • distribute records in B to the 2 new blocks based
    on value of their (j1)-st bit
  • update header of each new block to j1
  • adjust pointers in bucket array so that entries
    that used to point to B now point to correct
    block
  • if still no room in appropriate block for new
    record then repeat this process

16
Insertion cont'd
  • Case 2 j i.
  • increment i by 1
  • double length of bucket array
  • entry for w0 and w1 both point to same block that
    old entry w pointed to (block is shared)
  • apply case 1 to split block B

17
Example h(k) is 4 bits 2 keys/bucket
1
  • i

0001
1
1001
1100
Insert 1010
18
Example continued
i
2
00 01 10 11
1
0001
0111
1001
1010
Insert 0111 0000
1100
19
Example continued
2
0000
0001
i
2
00 01 10 11
2
0111
Insert 1001
20
Extensible hashing deletion
  • No merging of blocks
  • Merge blocks and cut directory if possible
  • (Reverse insert procedure)

21
Extensible hashing
Summary
  • Can handle growing files
  • - with less wasted space
  • - with no full reorganizations


22
Linear Hash Tables
  • Number of buckets increases more slowly than with
    extensible hashing
  • Number of buckets is such that on average each
    block is x full (say 80) -- threshold
  • Overflow blocks can occur but average number per
    bucket ltlt 1
  • Use the i low-order bits from the result of the
    hash function to index into the bucket array

23
Linear hashing
  • Another dynamic hashing scheme

(b) Bucket array grows linearly
24
Inserting into Linear Hash Table
  • To insert record with key K, with last i bits of
    h(K) being a1a2ai
  • Let m be the integer represented by a1a2ai in
    binary
  • If m lt n (number of buckets), then bucket m
    exists -- put record in that bucket
  • If m n, then bucket m does not (yet) exist, so
    put record in bucket whose index corresponds to
    0a2ai

25
Inserting cont'd
  • If no room in indicated bucket, then create an
    overflow bucket
  • Compare records / buckets to threshold
  • If exceeds threshold then add a new bucket and
    rearrange records
  • If number of buckets exceeds i, then increment i
    by 1

26
Example b4 bits, i 2, 2 keys/bucket
  • insert 0101

Future growth buckets
0101
0000
1111
1010
  • 00 01 10 11

m 01 (max used block)
27
Example b4 bits, i 2, 2 keys/bucket
Future growth buckets
0101
0000
1111
1010
  • 00 01 10 11

m 01 (max used block)
28
Example Continued How to grow beyond this?
i 2
1111
1010
0101
0000
0101
  • 00 01 10 11

. . .
m 11 (max used block)
29
Linear Hashing
Summary
  • Can handle growing files
  • - with less wasted space
  • - with no full reorganizations
  • No indirection like extensible hashing



30
Comparing Index Approaches
  • Hashing good for probes given key
  • e.g., SELECT
  • FROM R
  • WHERE R.A 5

31
Indexing vs Hashing
  • Sequential Indexes and B-trees good for
  • Range Searches
  • e.g., SELECT
  • FROM R
  • WHERE R.A gt 5

32
Index definition in SQL
  • Create index name on rel (attr)
  • Create unique index name on rel (attr)

defines candidate key
  • Drop INDEX name

33
  • CANNOT SPECIFY TYPE OF INDEX
  • (e.g. B-tree, Hashing, )
  • OR PARAMETERS
  • (e.g. Load Factor, Size of Hash,...)
  • ... at least in SQL...

Note
Write a Comment
User Comments (0)
About PowerShow.com