Title: Hash Tables
1Hash Tables
2Exercise 2
- / Exercise 1 /
- void mystery(int n)
- int i, j, k
- for (i 1 i lt n - 1 i)
- for (j i 1 j lt n j)
- for (k 1 k lt j k)
- / Some statement taking O(1) time /
-
-
-
-
3Exercise 3
- / Exercise 2 /
- void veryodd(int n)
- int i, j, x, y
- x 0
- y 0
- for (i 1 i lt n i)
- if (i 2 1)
- for (j i j lt n j)
- x x 1
-
- for (j 1 j lt i j)
- y y 1
-
-
-
4Consider www.google.com
- Efficient searches lookup laptop in all web
pages - How many web pages ? How fast is response ?
-
-
5Consider www.google.com
- 4 billion pages
- Consider data structures linked list, sorted
linked list, - array, sorted array, BST
-
-
6Unsorted Linked List of n elem
- int searchList(List a, int key)
- if (a NULL)
- return NULL //not found
- if (a-gtdata key)
- return a
- return searchList(a-gtnext, key)
-
- Best, Average, Worst T(n) ?
-
7Sorted Linked List of n elem
- int searchList(List a, int key)
- if (a NULL)
- return NULL //not found
- if (a-gtdata key)
- return a
- return searchList(a-gtnext, key)
-
- Best, Average, Worst T(n) ?
-
8Unsorted Array of n elem
- int seq_search(int n, int a, int key)
- int i 0
- while (i lt n ai ! key)
- i
-
- return i
-
- Best, Average, Worst T(n) ?
-
9Sorted Array of n elem
- int binary_search(int n, int a, int key)
- int lo -1
- int hi n
- while (hi - lo ! 1)
- int mid (hi lo) / 2
- if (amid lt key)
- lo mid
- else
- hi mid
-
-
- return lo
-
- Best, Average, Worst T(n) ?
10How about BST ?
- Best O(1)
- Average O(logn)
- Worst O(n) very imbalanced (tree degenerates to
list)
11Answer Hash Tables
- Search complexity is O(1) with good hash
function - Hash Table A generalization of an array that
under - some assumptions allows O(1) for
Insert/Delete/Search
12Intuition
- How can you store all Student Numbers in an
array? - Use an array with range 0 - 999,999,999
- This will give you O(1) access time but
- considering there are approx. 5000
students - you waste lots of array entries!
- Problem The range of key values is too large
- (0-999,999,999) when compared to the of
keys (students)
13Formal Definition
- Hash Tables solve this problem by using a smaller
array and mapping keys with a hash function. - Set of keys K and an array of size m. A hash
function h is a function from K to 0m-1, that
is h K 0m-1
14Example Hash Function
0 1 2 3 4 5 6 7
k888999222 k123456789
15Example Hash Function
- For example, if we hash the student number keys
into a hash table with 8 entries we could use h
(key) key mod 8
0 1 2 3 4 5 6 7
k888999222 k123456789
16Problem ?
- Collisions Two keys hash into the same array
entry - h (888888888) h (000000000) key 8 0
0 1 2 3 4 5 6 7
k888999222 k123456789
17Solution
- Hashing with Chaining (Open Hashing) every hash
table entry contains a pointer to a linked list
of keys that hash in the same entry - Closed Hashing every hash table entry contains
only one key. If a new key hashes to a table
entry which is filled, systematically examine
other table entries until you find one empty
entry to place the new key
18Hashing with Chaining (Open Hashing)
- h (54) 54 5 4 h (34) solved by
CHAIN-ing
key next
0 1 2 3 4
21
2
54
34
CHAIN
19Hashing with Chaining
- Insert 101 where does it hash to ?
key next
0 1 2 3 4
21
2
54
34
CHAIN
20Hashing with Chaining
Insert 101
key next
0 1 2 3 4
0 1 2 3 4
21
21
101
2
2
54
34
54
34
CHAIN
21Complexity Analysis
- What is the running time to insert/search/delete?
- Insert It takes O(1) time to compute the hash
function and insert at head of linked list - Search It is proportional to max linked list
length - Delete Same as search
22What is a good hash ?
- uniform hashing each key is equally likely to
hash in any of the m slots - Creating a good hash function is black magic !
- How about when keys are student names ?
- Interpret characters as numbers
- (int)a, (int)b, (int)c means 97 98 99
- Ex. Hash for names
- Name abc hashes to (abc) m
23Example Hash Function
- For example, if we hash the student number keys
into a hash table with 8 entries we could use h
(key) key mod 8
0 1 2 3 4 5 6 7
k888999222 k123456789
24Hashing with Chaining
- Insert 101 where does it hash to ?
key next
0 1 2 3 4
21
2
54
34
CHAIN
25Closed Hashing
- The key is first mapped to a slot
- index h(k)
- If there is a collision, subsequent probes are
performed - collision resolution is done as a linear search.
This is known as linear probing. - index (index 1) m
26Closed Hashing with Linear Probing
H(k) k 11
1001
0 1 2 3 4
9537
Insert(1100) ? ?
3016
5 6 7 8 9 10
9874
2009
9875
27Closed Hashing with Linear Probing
H(k) k 11
1001
0 1 2 3 4
9537
Insert(1100) ? ?
3016
5 6 7 8 9 10
9874
2009
9875
28Closed Hashing with Linear Probing
H(k) k 11
1001
0 1 2 3 4
9537
Insert(1100) ? ?
3016
5 6 7 8 9 10
9874
2009
9875
29Closed Hashing with Linear Probing
H(k) k 11
1001
0 1 2 3 4
9537
Insert(1100) ? 3
3016
Same for keys that hash into 0 or 1
5 6 7 8 9 10
9874
2009
9875
Prob(insert_into_3) ?
30Closed Hashing with Linear Probing
H(k) k 11
1001
0 1 2 3 4
9537
Insert(1100) ? 3
3016
Same for keys that hash into 0 or 1
5 6 7 8 9 10
9874
2009
9875
Prob(insert_into_3) 4/11
31Closed Hashing with Linear Probing
H(k) k 11
1001
0 1 2 3 4
9537
Insert(1100) ? 3
3016
Same for keys that hash into 0 or 1
5 6 7 8 9 10
9874
2009
Prob(insert_into_3) 4/11
9875
Prob(insert_into_4) 1/11
32Closed Hashing with Linear Probing
H(k) k 11
1001
0 1 2 3 4
9537
Assume Insert(1052) ? 10
3016
5 6 7 8 9 10
9874
2009
Prob(insert_into_3) ?
9875
Prob(insert_into_4) ?
1052
33Closed Hashing with Linear Probing
H(k) k 11
1001
0 1 2 3 4
9537
Assume Insert(1052) ? 10
3016
5 6 7 8 9 10
9874
2009
Prob(insert_into_3) 8/11
9875
Prob(insert_into_4) 1/11
1052
34Problem Clustering
- Even with a good hash function, linear probing
has its problems - The position of the initial mapping i 0 of key k
is called the home of k. - When several insertions map to the same home
position, they end up placed contiguously in the
table. This collection of keys with the same
home position is called a cluster. - As clusters grow, the probability that a key will
map to the middle of a cluster increases,
increasing the rate of the clusters growth. As
these clusters grow, they merge with other
clusters forming even bigger clusters which grow
even faster. - This tendency of linear probing to place items
together is known as primary clustering.
35Complexity Analysis Worst Case
- What is the running time to insert/search/delete?
- Insert Same as search
- Search It is proportional to max no of probes
- Delete Same as search
- Worst O(n)
36Complexity Analysis
- When hash table is empty insert is in 1 step
- (in home position)
- As the table fills up, the probab that a record
can - be inserted in 1 step decreases
- More and more records are likely to be inserted
- far from their home position
37Complexity Analysis - Intuition
- The expected (avg.) cost of hash
(insert/search/delete) - is a function of how full the table is
38The Load Factor
n
a
m
n is the number of entries in a hash table that
are occupied m is the size of the hash
table
?1 means the table is full, and ?0 means the
table is empty.
39Complexity Analysis - Average Case
n
a
- The load factor where n current
no of records - On avg. probability to find the position
occupied - The probability to find both position and next
position - occupied is n/m (n-1)/(m-1)
- The probability of i collisions is
- n/m (n-1)/(m-1) (n- i 1)/(m i 1)
(n/m)i - probes 1 Si 1 to N (n/m)i
m
n
a
m
40Complexity Analysis Average Case
- It can be shown that the number of probes in a
successful search, C, and the number of probes in
an unsuccessful search, C is given by
Separate chaining
Linear probing
41Successful search
Linear probing
Double hashing
Separate chaining
Average of probes
0.8
1
Load factor
42Unsuccessful search
Linear probing
Double hashing
Separate chaining
Average of probes
0.8
1
Load factor
43Insert Implementation
- bool HashTable hashInsert(const Elem e)
- int home
- int index home h(getkey(e))
- for (int i 1 !is_empty(HTindex) i)
- index (home i) m // follow probes
- if (is_equal (e, HTindex) return false //
duplicate -
- HTindex e
- return true
-
44Search Implementation
- bool HashTable hashSearch(const Key k, Elem
e) - int home
- int index home h(k)
- for (int i 1
- !is_empty(HTindex) !is_equal(k,
HTindex) i) - index (home i) m // follow probes
- if (is_equal (k, HTindex) //found it
- e HTindex
- return true
-
- else return false // k is not in the table
-