Hash Tables - PowerPoint PPT Presentation

About This Presentation
Title:

Hash Tables

Description:

array, sorted array, BST. 6. Unsorted Linked List of n elem. int searchList(List *a, int key) ... How about BST ? Best O(1) Average O(logn) ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 45
Provided by: valueds270
Category:
Tags: bst | hash | tables

less

Transcript and Presenter's Notes

Title: Hash Tables


1
Hash Tables
2
Exercise 2
  • / Exercise 1 /
  • void mystery(int n)
  • int i, j, k
  • for (i 1 i lt n - 1 i)
  • for (j i 1 j lt n j)
  • for (k 1 k lt j k)
  • / Some statement taking O(1) time /

3
Exercise 3
  • / Exercise 2 /
  • void veryodd(int n)
  • int i, j, x, y
  • x 0
  • y 0
  • for (i 1 i lt n i)
  • if (i 2 1)
  • for (j i j lt n j)
  • x x 1
  • for (j 1 j lt i j)
  • y y 1

4
Consider www.google.com
  • Efficient searches lookup laptop in all web
    pages
  • How many web pages ? How fast is response ?

5
Consider www.google.com
  • 4 billion pages
  • Consider data structures linked list, sorted
    linked list,
  • array, sorted array, BST

6
Unsorted Linked List of n elem
  • int searchList(List a, int key)
  • if (a NULL)
  • return NULL //not found
  • if (a-gtdata key)
  • return a
  • return searchList(a-gtnext, key)
  • Best, Average, Worst T(n) ?

7
Sorted Linked List of n elem
  • int searchList(List a, int key)
  • if (a NULL)
  • return NULL //not found
  • if (a-gtdata key)
  • return a
  • return searchList(a-gtnext, key)
  • Best, Average, Worst T(n) ?

8
Unsorted Array of n elem
  • int seq_search(int n, int a, int key)
  • int i 0
  • while (i lt n ai ! key)
  • i
  • return i
  • Best, Average, Worst T(n) ?

9
Sorted Array of n elem
  • int binary_search(int n, int a, int key)
  • int lo -1
  • int hi n
  • while (hi - lo ! 1)
  • int mid (hi lo) / 2
  • if (amid lt key)
  • lo mid
  • else
  • hi mid
  • return lo
  • Best, Average, Worst T(n) ?

10
How about BST ?
  • Best O(1)
  • Average O(logn)
  • Worst O(n) very imbalanced (tree degenerates to
    list)

11
Answer Hash Tables
  • Search complexity is O(1) with good hash
    function
  • Hash Table A generalization of an array that
    under
  • some assumptions allows O(1) for
    Insert/Delete/Search

12
Intuition
  • How can you store all Student Numbers in an
    array?
  • Use an array with range 0 - 999,999,999
  • This will give you O(1) access time but
  • considering there are approx. 5000
    students
  • you waste lots of array entries!
  • Problem The range of key values is too large
  • (0-999,999,999) when compared to the of
    keys (students)

13
Formal Definition
  • Hash Tables solve this problem by using a smaller
    array and mapping keys with a hash function.
  • Set of keys K and an array of size m. A hash
    function h is a function from K to 0m-1, that
    is h K 0m-1

14
Example Hash Function
0 1 2 3 4 5 6 7
k888999222 k123456789

15
Example Hash Function
  • For example, if we hash the student number keys
    into a hash table with 8 entries we could use h
    (key) key mod 8

0 1 2 3 4 5 6 7
k888999222 k123456789

16
Problem ?
  • Collisions Two keys hash into the same array
    entry
  • h (888888888) h (000000000) key 8 0

0 1 2 3 4 5 6 7
k888999222 k123456789

17
Solution
  • Hashing with Chaining (Open Hashing) every hash
    table entry contains a pointer to a linked list
    of keys that hash in the same entry
  • Closed Hashing every hash table entry contains
    only one key. If a new key hashes to a table
    entry which is filled, systematically examine
    other table entries until you find one empty
    entry to place the new key

18
Hashing with Chaining (Open Hashing)
  • h (54) 54 5 4 h (34) solved by
    CHAIN-ing

key next
0 1 2 3 4
21
2
54
34
CHAIN
19
Hashing with Chaining
  • Insert 101 where does it hash to ?

key next
0 1 2 3 4
21
2
54
34
CHAIN
20
Hashing with Chaining
  • h (101) 101 5 1

Insert 101
key next
0 1 2 3 4
0 1 2 3 4
21
21
101
2
2
54
34
54
34
CHAIN
21
Complexity Analysis
  • What is the running time to insert/search/delete?
  • Insert It takes O(1) time to compute the hash
    function and insert at head of linked list
  • Search It is proportional to max linked list
    length
  • Delete Same as search

22
What is a good hash ?
  • uniform hashing each key is equally likely to
    hash in any of the m slots
  • Creating a good hash function is black magic !
  • How about when keys are student names ?
  • Interpret characters as numbers
  • (int)a, (int)b, (int)c means 97 98 99
  • Ex. Hash for names
  • Name abc hashes to (abc) m

23
Example Hash Function
  • For example, if we hash the student number keys
    into a hash table with 8 entries we could use h
    (key) key mod 8

0 1 2 3 4 5 6 7
k888999222 k123456789

24
Hashing with Chaining
  • Insert 101 where does it hash to ?

key next
0 1 2 3 4
21
2
54
34
CHAIN
25
Closed Hashing
  • The key is first mapped to a slot
  • index h(k)
  • If there is a collision, subsequent probes are
    performed
  • collision resolution is done as a linear search.
    This is known as linear probing.
  • index (index 1) m

26
Closed Hashing with Linear Probing
H(k) k 11

1001
0 1 2 3 4
9537
Insert(1100) ? ?
3016
5 6 7 8 9 10
9874
2009
9875
27
Closed Hashing with Linear Probing
H(k) k 11

1001
0 1 2 3 4
9537
Insert(1100) ? ?
3016
5 6 7 8 9 10
9874
2009
9875
28
Closed Hashing with Linear Probing
H(k) k 11

1001
0 1 2 3 4
9537
Insert(1100) ? ?
3016
5 6 7 8 9 10
9874
2009
9875
29
Closed Hashing with Linear Probing
H(k) k 11

1001
0 1 2 3 4
9537
Insert(1100) ? 3
3016
Same for keys that hash into 0 or 1
5 6 7 8 9 10
9874
2009
9875
Prob(insert_into_3) ?
30
Closed Hashing with Linear Probing
H(k) k 11

1001
0 1 2 3 4
9537
Insert(1100) ? 3
3016
Same for keys that hash into 0 or 1
5 6 7 8 9 10
9874
2009
9875
Prob(insert_into_3) 4/11
31
Closed Hashing with Linear Probing
H(k) k 11

1001
0 1 2 3 4
9537
Insert(1100) ? 3
3016
Same for keys that hash into 0 or 1
5 6 7 8 9 10
9874
2009
Prob(insert_into_3) 4/11
9875
Prob(insert_into_4) 1/11
32
Closed Hashing with Linear Probing
H(k) k 11

1001
0 1 2 3 4
9537
Assume Insert(1052) ? 10
3016
5 6 7 8 9 10
9874
2009
Prob(insert_into_3) ?
9875
Prob(insert_into_4) ?
1052
33
Closed Hashing with Linear Probing
H(k) k 11

1001
0 1 2 3 4
9537
Assume Insert(1052) ? 10
3016
5 6 7 8 9 10
9874
2009
Prob(insert_into_3) 8/11
9875
Prob(insert_into_4) 1/11
1052
34
Problem Clustering
  • Even with a good hash function, linear probing
    has its problems
  • The position of the initial mapping i 0 of key k
    is called the home of k.
  • When several insertions map to the same home
    position, they end up placed contiguously in the
    table. This collection of keys with the same
    home position is called a cluster.
  • As clusters grow, the probability that a key will
    map to the middle of a cluster increases,
    increasing the rate of the clusters growth. As
    these clusters grow, they merge with other
    clusters forming even bigger clusters which grow
    even faster.
  • This tendency of linear probing to place items
    together is known as primary clustering.

35
Complexity Analysis Worst Case
  • What is the running time to insert/search/delete?
  • Insert Same as search
  • Search It is proportional to max no of probes
  • Delete Same as search
  • Worst O(n)

36
Complexity Analysis
  • When hash table is empty insert is in 1 step
  • (in home position)
  • As the table fills up, the probab that a record
    can
  • be inserted in 1 step decreases
  • More and more records are likely to be inserted
  • far from their home position

37
Complexity Analysis - Intuition
  • The expected (avg.) cost of hash
    (insert/search/delete)
  • is a function of how full the table is

38
The Load Factor
n

a
m
n is the number of entries in a hash table that
are occupied m is the size of the hash
table
?1 means the table is full, and ?0 means the
table is empty.
39
Complexity Analysis - Average Case
n

a
  • The load factor where n current
    no of records
  • On avg. probability to find the position
    occupied
  • The probability to find both position and next
    position
  • occupied is n/m (n-1)/(m-1)
  • The probability of i collisions is
  • n/m (n-1)/(m-1) (n- i 1)/(m i 1)
    (n/m)i
  • probes 1 Si 1 to N (n/m)i

m
n

a
m
40
Complexity Analysis Average Case
  • It can be shown that the number of probes in a
    successful search, C, and the number of probes in
    an unsuccessful search, C is given by

Separate chaining
Linear probing
41
Successful search
Linear probing
Double hashing
Separate chaining
Average of probes
0.8
1
Load factor
42
Unsuccessful search
Linear probing
Double hashing
Separate chaining
Average of probes
0.8
1
Load factor
43
Insert Implementation
  • bool HashTable hashInsert(const Elem e)
  • int home
  • int index home h(getkey(e))
  • for (int i 1 !is_empty(HTindex) i)
  • index (home i) m // follow probes
  • if (is_equal (e, HTindex) return false //
    duplicate
  • HTindex e
  • return true

44
Search Implementation
  • bool HashTable hashSearch(const Key k, Elem
    e)
  • int home
  • int index home h(k)
  • for (int i 1
  • !is_empty(HTindex) !is_equal(k,
    HTindex) i)
  • index (home i) m // follow probes
  • if (is_equal (k, HTindex) //found it
  • e HTindex
  • return true
  • else return false // k is not in the table
Write a Comment
User Comments (0)
About PowerShow.com