Chapter 13B Week 4 - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Chapter 13B Week 4

Description:

Each unix file has an inode that contains. File ownership info ... Have the start address contained in the inode point to a table that maps hash ... inode ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 25
Provided by: paulde8
Category:
Tags: 13b | chapter | inode | week

less

Transcript and Presenter's Notes

Title: Chapter 13B Week 4


1
Chapter 13BWeek 4
  • External Hashing

2
External Hashing
  • Hashing implemented in secondary storage

3
Static Hashing
  • Unix Example
  • Each unix file has an inode that contains
  • File ownership info
  • Access rights
  • Location of data

4
Continued
  • Have the start address contained in the inode
    point to a table that maps hash addresses to
    block addresses.
  • Each block address is called a bucket.
  • Each bucket contains multiple records

5
The scheme looks like this
0 1 2 . . . M-1
inode
r1 r2 r3 r4 r5
Buckets are contiguous blocks. Can be one or
several. This one holds five records.
Block Address
Hash function maps key to here
6
Problem Number of buckets is fixed at file
creation
  • Maintain a linked list of overflow records for
    each bucket.

7
What happens if a bucket is filled
inode
Record pointer contains both block address and
the relative record position within the block
8
Trie-Based Hashing
  • Another remedy to the overflow/fixed space
    problem
  • Allow the number of allocated buckets to grow and
    shrink as needed
  • Scheme
  • H(K) produces an integer in binary
  • Distributes records among buckets based on the
    values of the leading bits in their hash values
  • Mimics a binary tree

9
Insert
  • Begin with a bucket of disk addresses

r1 r2 r3 r4
Overflow? Split the bucket (block) based on the
first binary digit of the hash address
0
1
r1 r3 r4 r5
r2
10
Bucket 1 Overflows Again?
  • Split on 2nd bit in the hash address
  • Suppose we have
  • Record Number Hash Address
  • r1 0000
  • r2 1000
  • r3 0010
  • r4 0100
  • r5 0110
  • Insert r6 0111 in the previous structure

11
New Structure
0
1
0
1
r2
r1
r4
r3
r5
r6
12
Find(K)
  1. Compute h(k)
  2. Traverse structure based on the binary value of
    the hash function
  3. Do linear search of bucket for k
  4. Go to the disk address contained there

13
Issues
  • Hashing function that produces n bit address can
    accommodate 2n buckets
  • Total Records lt 2n NumRecs/bucket
  • Hash function should distribute values uniformly
    to keep tree balanced
  • Delete
  • Suppose r4, r5, r6 deleted
  • Remove pointer to their bucket
  • Remove 00/01 internal node
  • Reset the 0/1 internal node to point to bucket
    containing r1 r3

14
Extendible Hashing
  • Replaces the binary tree of dynamic hashing with
    a directory consisting of 2d bucket addresses
  • d is called the global depth of the directory
  • First d bits of the hash value is an index into
    the array
  • d is called the local depth
  • d lt d

15
Continued
  • Not necessarily 2d buckets allocated
  • Several directory locations with the same first
    d bits for their hash values may point to the
    same bucket address when d lt d for a particular
    bucket

16
Example
d3
Bucket for records whose hash values start with
000
000
001
b1
010

...
Bucket for records whose hash values start with 01
011
d 2

b2
111
d 3
17
Continued
  • Suppose bucket pointed to by 010,011 overflows
  • Allocate a new bucket. The old b2 now contains
    records whose hash values begin with 010
  • The new b3 contains records whose hash values
    begin 011
  • d of each new bucket is 3

18
Bucket splits
d3
Bucket for recs whose hash starts with 000
000
001
b1
010

...
Bucket for recs whose hash starts with 010
011
d 3

b2
111
Bucket for recs whose hash starts with 011
d3
b3
d 3
19
Increase Table Size
  • What happens when dd for some bucket and
    overflow occurs?
  • We need an extra table entry to distinguish the
    new value
  • E.g., 000 overflows we need 0000 and 0001
  • Increase d by 1
  • Doubles table size

20
Continued
  • 0000
  • 0001
  • 0010
  • 0011
  • 0100
  • 0101
  • 0110
  • 0111
  • .
  • .
  • .
  • 111

d4r1, r2, , rn
d4r1, r2, , rn
d3r1, r2, , rn
d3r1, r2, , rn
21
What if d gt d for all buckets?
  • Means that at least two directory positions point
    to the same bucket for all buckets
  • We can now halve the directory

22
Continued
  • 000 d 2
  • 001
  • 010 d 2
  • 011
  • 100 d 2
  • 101
  • 110 d 2
  • 111
  • Halve
  • 00 d 2
  • 01 d 2
  • 10 d 2
  • 11 d 2

23
Linear Hashing
  • Drawback of trie-based and extendible hashing is
    that each requires an auxiliary data structure
  • Dynamic hashing? binary tree which may not be
    balanced
  • Extendible hashing ? directory

24
Task
  • Design a scheme that adjusts size to the amount
    of data and requires no auxiliary data structures
  • Linear Hashing Example
Write a Comment
User Comments (0)
About PowerShow.com