Hash Tables - PowerPoint PPT Presentation

About This Presentation

Title:

Hash Tables

Description:

array, sorted array, BST. 6. Unsorted Linked List of n elem. int searchList(List *a, int key) ... How about BST ? Best O(1) Average O(logn) ... – PowerPoint PPT presentation

Number of Views:55

Avg rating:3.0/5.0

Slides: 45

Provided by: valueds270

Category:

more less

Transcript and Presenter's Notes

Title: Hash Tables

1
Hash Tables
2
Exercise 2

/ Exercise 1 /
void mystery(int n)
int i, j, k
for (i 1 i lt n - 1 i)
for (j i 1 j lt n j)
for (k 1 k lt j k)
/ Some statement taking O(1) time /

3
Exercise 3

/ Exercise 2 /
void veryodd(int n)
int i, j, x, y
x 0
y 0
for (i 1 i lt n i)
if (i 2 1)
for (j i j lt n j)
x x 1
for (j 1 j lt i j)
y y 1

4
Consider www.google.com

Efficient searches lookup laptop in all web
pages
How many web pages ? How fast is response ?

5
Consider www.google.com

4 billion pages
Consider data structures linked list, sorted
linked list,
array, sorted array, BST

6
Unsorted Linked List of n elem

int searchList(List a, int key)
if (a NULL)
return NULL //not found
if (a-gtdata key)
return a
return searchList(a-gtnext, key)
Best, Average, Worst T(n) ?

7
Sorted Linked List of n elem

int searchList(List a, int key)
if (a NULL)
return NULL //not found
if (a-gtdata key)
return a
return searchList(a-gtnext, key)
Best, Average, Worst T(n) ?

8
Unsorted Array of n elem

int seq_search(int n, int a, int key)
int i 0
while (i lt n ai ! key)
i
return i
Best, Average, Worst T(n) ?

9
Sorted Array of n elem

int binary_search(int n, int a, int key)
int lo -1
int hi n
while (hi - lo ! 1)
int mid (hi lo) / 2
if (amid lt key)
lo mid
else
hi mid
return lo
Best, Average, Worst T(n) ?

10
How about BST ?

Best O(1)
Average O(logn)
Worst O(n) very imbalanced (tree degenerates to
list)

11
Answer Hash Tables

Search complexity is O(1) with good hash
function
Hash Table A generalization of an array that
under
some assumptions allows O(1) for
Insert/Delete/Search

12
Intuition

How can you store all Student Numbers in an
array?
Use an array with range 0 - 999,999,999
This will give you O(1) access time but
considering there are approx. 5000
students
you waste lots of array entries!
Problem The range of key values is too large
(0-999,999,999) when compared to the of
keys (students)

13
Formal Definition

Hash Tables solve this problem by using a smaller
array and mapping keys with a hash function.
Set of keys K and an array of size m. A hash
function h is a function from K to 0m-1, that
is h K 0m-1

14
Example Hash Function
0 1 2 3 4 5 6 7
k888999222 k123456789

15
Example Hash Function

For example, if we hash the student number keys
into a hash table with 8 entries we could use h
(key) key mod 8

0 1 2 3 4 5 6 7
k888999222 k123456789

16
Problem ?

Collisions Two keys hash into the same array
entry
h (888888888) h (000000000) key 8 0

0 1 2 3 4 5 6 7
k888999222 k123456789

17
Solution

Hashing with Chaining (Open Hashing) every hash
table entry contains a pointer to a linked list
of keys that hash in the same entry
Closed Hashing every hash table entry contains
only one key. If a new key hashes to a table
entry which is filled, systematically examine
other table entries until you find one empty
entry to place the new key

18
Hashing with Chaining (Open Hashing)

h (54) 54 5 4 h (34) solved by
CHAIN-ing

key next
0 1 2 3 4
21
2
54
34
CHAIN
19
Hashing with Chaining

Insert 101 where does it hash to ?

key next
0 1 2 3 4
21
2
54
34
CHAIN
20
Hashing with Chaining

h (101) 101 5 1

Insert 101
key next
0 1 2 3 4
0 1 2 3 4
21
21
101
2
2
54
34
54
34
CHAIN
21
Complexity Analysis

What is the running time to insert/search/delete?
Insert It takes O(1) time to compute the hash
function and insert at head of linked list
Search It is proportional to max linked list
length
Delete Same as search

22
What is a good hash ?

uniform hashing each key is equally likely to
hash in any of the m slots
Creating a good hash function is black magic !
How about when keys are student names ?
Interpret characters as numbers
(int)a, (int)b, (int)c means 97 98 99
Ex. Hash for names
Name abc hashes to (abc) m

23
Example Hash Function

For example, if we hash the student number keys
into a hash table with 8 entries we could use h
(key) key mod 8

0 1 2 3 4 5 6 7
k888999222 k123456789

24
Hashing with Chaining

Insert 101 where does it hash to ?

key next
0 1 2 3 4
21
2
54
34
CHAIN
25
Closed Hashing

The key is first mapped to a slot
index h(k)
If there is a collision, subsequent probes are
performed
collision resolution is done as a linear search.
This is known as linear probing.
index (index 1) m

26
Closed Hashing with Linear Probing
H(k) k 11

1001
0 1 2 3 4
9537
Insert(1100) ? ?
3016
5 6 7 8 9 10
9874
2009
9875
27
Closed Hashing with Linear Probing
H(k) k 11

1001
0 1 2 3 4
9537
Insert(1100) ? ?
3016
5 6 7 8 9 10
9874
2009
9875
28
Closed Hashing with Linear Probing
H(k) k 11

1001
0 1 2 3 4
9537
Insert(1100) ? ?
3016
5 6 7 8 9 10
9874
2009
9875
29
Closed Hashing with Linear Probing
H(k) k 11

1001
0 1 2 3 4
9537
Insert(1100) ? 3
3016
Same for keys that hash into 0 or 1
5 6 7 8 9 10
9874
2009
9875
Prob(insert_into_3) ?
30
Closed Hashing with Linear Probing
H(k) k 11

1001
0 1 2 3 4
9537
Insert(1100) ? 3
3016
Same for keys that hash into 0 or 1
5 6 7 8 9 10
9874
2009
9875
Prob(insert_into_3) 4/11
31
Closed Hashing with Linear Probing
H(k) k 11

1001
0 1 2 3 4
9537
Insert(1100) ? 3
3016
Same for keys that hash into 0 or 1
5 6 7 8 9 10
9874
2009
Prob(insert_into_3) 4/11
9875
Prob(insert_into_4) 1/11
32
Closed Hashing with Linear Probing
H(k) k 11

1001
0 1 2 3 4
9537
Assume Insert(1052) ? 10
3016
5 6 7 8 9 10
9874
2009
Prob(insert_into_3) ?
9875
Prob(insert_into_4) ?
1052
33
Closed Hashing with Linear Probing
H(k) k 11

1001
0 1 2 3 4
9537
Assume Insert(1052) ? 10
3016
5 6 7 8 9 10
9874
2009
Prob(insert_into_3) 8/11
9875
Prob(insert_into_4) 1/11
1052
34
Problem Clustering

Even with a good hash function, linear probing
has its problems
The position of the initial mapping i 0 of key k
is called the home of k.
When several insertions map to the same home
position, they end up placed contiguously in the
table. This collection of keys with the same
home position is called a cluster.
As clusters grow, the probability that a key will
map to the middle of a cluster increases,
increasing the rate of the clusters growth. As
these clusters grow, they merge with other
clusters forming even bigger clusters which grow
even faster.
This tendency of linear probing to place items
together is known as primary clustering.

35
Complexity Analysis Worst Case

What is the running time to insert/search/delete?
Insert Same as search
Search It is proportional to max no of probes
Delete Same as search
Worst O(n)

36
Complexity Analysis

When hash table is empty insert is in 1 step
(in home position)
As the table fills up, the probab that a record
can
be inserted in 1 step decreases
More and more records are likely to be inserted
far from their home position

37
Complexity Analysis - Intuition

The expected (avg.) cost of hash
(insert/search/delete)
is a function of how full the table is

38
The Load Factor
n

a
m
n is the number of entries in a hash table that
are occupied m is the size of the hash
table
?1 means the table is full, and ?0 means the
table is empty.
39
Complexity Analysis - Average Case
n

a

The load factor where n current
no of records
On avg. probability to find the position
occupied
The probability to find both position and next
position
occupied is n/m (n-1)/(m-1)
The probability of i collisions is
n/m (n-1)/(m-1) (n- i 1)/(m i 1)
(n/m)i
probes 1 Si 1 to N (n/m)i

m
n

a
m
40
Complexity Analysis Average Case

It can be shown that the number of probes in a
successful search, C, and the number of probes in
an unsuccessful search, C is given by

Separate chaining
Linear probing
41
Successful search
Linear probing
Double hashing
Separate chaining
Average of probes
0.8
1
Load factor
42
Unsuccessful search
Linear probing
Double hashing
Separate chaining
Average of probes
0.8
1
Load factor
43
Insert Implementation