Title: Hashing
1Lecture 6
2Motivating Example
Want to store a list whose elements are integers
between 1 and 5
Will define an array of size 5, and if the list
has element j, then j is stored in Aj-1,
otherwise Aj-1 contains 0.
Complexity of find operation is O(1)
3Hash table
The objective is to find an element in constant
time on an average.
Supposing we know the elements belong to 1,2U,
and we are allowed an overall space of U, then
this can be done as described before. But U can
be very large.
Space for storage is called hash table, H
4Assume that the hashtable has size M
There is a hashfunction which maps an element to
a value p in 0,.M-1, and the element is placed
in position p in the hashtable.
The function is called hj, (the hash value for
j is hj) If hj k, then the element is
added to Hk.
Suppose we want a list of integers, then an
example hash function is hj j modulo M.
Note down example from board
5We may want to store elements which are not
numbers, e.g., names.
Then we use a function to convert each element to
an integer and hash the integer.
We want to store string, abc Represent each
symbol by the ASCII code, choose a number r,
integer value for abc is ASCII(a)r2
ASCII(b)r ASCII ( c )
6Implementation
Hashtables are arrays. Size of a hash table is
normally a prime number
Two different elements may hash to the same value
(collision)
Hashing needs collision resolution
Hash functions are chosen so that the hash values
are spread over 0,..M-1, and there are only few
collisions.
7Separate Chaining
Store all the elements mapped to the same
position in a linked list.
Note down the illustration from the board.
Hk is the list of all elements mapped to k.
To find an element j, compute h(j). Let h(j) k.
Then search in link list Hk To insert an
element j, compute h(j). Let h(j) k. Then
insert in link list Hk To delete an element,
delete from the link list.
8Note down example from the board.
Insertion is O(1).
Worst case searching complexity depends on the
maximum length of a list Hp O(q) if q is the
maximum length.
We are interested in average searching complexity.
9Load factor ? is the average size of a list.
? number of elements in the hash table/ number
of positions in the hash table(M)
Average find complexity is 1 ?
Want ? to be approximately 1
To reduce worst case complexity we choose hash
functions which distribute the elements evenly in
the list.
10Open Addressing
Separate chaining requires manipulation of
pointers and dynamic memory allocation which are
expensive.
Open addressing is an alternate scheme.
Want to insert key (element) j Compute h(j)
k If Hk is empty store in Hk, otherwise try
Hk1, Hk2, etc. (increment in modulo
size) Linear Probing
11Every position in hash table contains one element
each.
Note down example from board.
Can always insert a key as long as the table is
not full
Finding may be difficult if the table is close to
full.
12The idea is to declare a hash table large enough
so that it is never full.
Initially, all slots are empty.
Elements are inserted as described.
When an element is deleted, the space is marked
deleted (empty and deleted are different).
During the find operation, one looks for element
k starting from where it should be (Hh(k)),
till the element is found, or an empty slot is
found. In the latter case, we conclude that the
element is not in the list.
13Any problem if empty and deleted are not
distinguished?
When we insert an element k, then start from
Hh(k) and move till an empty or deleted slot
can be found.
An element can be inserted as long as the
hash-table is not full.
If hash values are clustered, then even if hash
table is relatively empty, finding may be
difficult.
14Quadratic Probing
Alternative to linear probing.
To insert key k, try slot h(k). If the slot is
full try slot h(k) 1, then h(k) 4, then h(k)
9 and so on.
Advantage?
Are we guaranteed to be able to insert as long as
the hash table is not full?
15If size of hash table M is a prime number greater
than 3, then we can always insert a new element
if the table is at most half full.
We want to insert element k. h(k) j. Let n
?M/2?
If the locations j, j 1, j 4,..,j n2 are
all distinct modulo M, then we can insert an
element in the hash table. Why?
Proof by contradiction. Suppose there is p, q, 0
? p lt q ? n with j p2 j q2 mod M
16p2 q2 mod M
(p q)(p q) 0 mod M
Then either p q mod M or p q 0 mod M. Is
that right?
Since p and q are distinct and less than M/2,
neither p q mod M nor p q 0 mod M
17Rehashing
If the hash table is close to full, then a hash
table of bigger size is used. The old hash table
is copied into a new one. The old hash table is
subsequently deleted.
Should be done infrequently.
Chapter 5 of Weiss