Title: Chapter 13B Week 4
1Chapter 13BWeek 4
2External Hashing
- Hashing implemented in secondary storage
3Static Hashing
- Unix Example
- Each unix file has an inode that contains
- File ownership info
- Access rights
- Location of data
4Continued
- Have the start address contained in the inode
point to a table that maps hash addresses to
block addresses. - Each block address is called a bucket.
- Each bucket contains multiple records
5The scheme looks like this
0 1 2 . . . M-1
inode
r1 r2 r3 r4 r5
Buckets are contiguous blocks. Can be one or
several. This one holds five records.
Block Address
Hash function maps key to here
6Problem Number of buckets is fixed at file
creation
- Maintain a linked list of overflow records for
each bucket.
7What happens if a bucket is filled
inode
Record pointer contains both block address and
the relative record position within the block
8Trie-Based Hashing
- Another remedy to the overflow/fixed space
problem - Allow the number of allocated buckets to grow and
shrink as needed - Scheme
- H(K) produces an integer in binary
- Distributes records among buckets based on the
values of the leading bits in their hash values - Mimics a binary tree
9Insert
- Begin with a bucket of disk addresses
r1 r2 r3 r4
Overflow? Split the bucket (block) based on the
first binary digit of the hash address
0
1
r1 r3 r4 r5
r2
10Bucket 1 Overflows Again?
- Split on 2nd bit in the hash address
- Suppose we have
- Record Number Hash Address
- r1 0000
- r2 1000
- r3 0010
- r4 0100
- r5 0110
- Insert r6 0111 in the previous structure
11New Structure
0
1
0
1
r2
r1
r4
r3
r5
r6
12Find(K)
- Compute h(k)
- Traverse structure based on the binary value of
the hash function - Do linear search of bucket for k
- Go to the disk address contained there
13Issues
- Hashing function that produces n bit address can
accommodate 2n buckets - Total Records lt 2n NumRecs/bucket
- Hash function should distribute values uniformly
to keep tree balanced - Delete
- Suppose r4, r5, r6 deleted
- Remove pointer to their bucket
- Remove 00/01 internal node
- Reset the 0/1 internal node to point to bucket
containing r1 r3
14Extendible Hashing
- Replaces the binary tree of dynamic hashing with
a directory consisting of 2d bucket addresses - d is called the global depth of the directory
- First d bits of the hash value is an index into
the array - d is called the local depth
- d lt d
15Continued
- Not necessarily 2d buckets allocated
- Several directory locations with the same first
d bits for their hash values may point to the
same bucket address when d lt d for a particular
bucket
16Example
d3
Bucket for records whose hash values start with
000
000
001
b1
010
...
Bucket for records whose hash values start with 01
011
d 2
b2
111
d 3
17Continued
- Suppose bucket pointed to by 010,011 overflows
- Allocate a new bucket. The old b2 now contains
records whose hash values begin with 010 - The new b3 contains records whose hash values
begin 011 - d of each new bucket is 3
18Bucket splits
d3
Bucket for recs whose hash starts with 000
000
001
b1
010
...
Bucket for recs whose hash starts with 010
011
d 3
b2
111
Bucket for recs whose hash starts with 011
d3
b3
d 3
19Increase Table Size
- What happens when dd for some bucket and
overflow occurs? - We need an extra table entry to distinguish the
new value - E.g., 000 overflows we need 0000 and 0001
- Increase d by 1
- Doubles table size
20Continued
- 0000
- 0001
- 0010
- 0011
- 0100
- 0101
- 0110
- 0111
- .
- .
- .
- 111
d4r1, r2, , rn
d4r1, r2, , rn
d3r1, r2, , rn
d3r1, r2, , rn
21What if d gt d for all buckets?
- Means that at least two directory positions point
to the same bucket for all buckets - We can now halve the directory
22Continued
- 000 d 2
- 001
- 010 d 2
- 011
- 100 d 2
- 101
- 110 d 2
- 111
- Halve
- 00 d 2
- 01 d 2
- 10 d 2
- 11 d 2
23Linear Hashing
- Drawback of trie-based and extendible hashing is
that each requires an auxiliary data structure - Dynamic hashing? binary tree which may not be
balanced - Extendible hashing ? directory
24Task
- Design a scheme that adjusts size to the amount
of data and requires no auxiliary data structures - Linear Hashing Example