Title: CS2420: Lecture 33
1CS2420 Lecture 33
- Vladimir Kulyukin
- Computer Science Department
- Utah State University
2Outline
3Motivation
- Recall Big Question 4
- How can I retrieve/search data efficiently?
- After investigating the balanced binary search
trees (AVL, Red-Black), we can ask - Is it possible to break the log(n) barrier for
insertion and deletion? -
4Hash Tables
- A hash table is a data structure that was
invented as an attempt to break the log(N)
insertion and deletion barrier of the balanced
binary search trees. - Conceptually, a hash table is an array of items
plus a hash function that maps arbitrary objects
to indices of the array. - A hash function first extracts a key from a given
object and then maps the key into a legal array
index. - For example, if an object is an employee record,
the key could be the employees SSN or the
employees first and last names. - Typical keys are numbers and strings.
5Example A Hash Table
Mark
0
Rachel
1
2
3
John
4
David
5
Deborah
6
7
8
9
6Hash Functions
0
Key Extraction
Object
1
2
Hashing
3
Key
legal index
4
5
6
7Hash Functions
- It is impossible to find a hash function that
computes indices (two different array cells) for
any two distinct keys. Why? Because there are
infinitely many keys, but only finitely many
slots in the table. - Question What are we to do?
- Answer Look for hash functions that distribute
keys evenly among the cells.
8Three Hashing Problems
- Choose a hash function
- Simple and fast
- Distributes keys evenly.
- Choose a table size.
- Choose a collision resolution strategy (what to
do when several keys are mapped to the same
index).
9Choosing a Hash Function
- If keys are integers, Key Mod TableSize is a
sensible strategy. - Caveat Keys should be random and should not have
some undesirable properties. - For example, if TableSize 10 and all keys end
in 0, Key Mod TableSize is not a sensible
strategy.
10Choosing a Table Size
- To avoid the situations with uneven key
distributions, TableSize is typically a prime
number. - When keys are random integers Key Mod TableSize
works fairly well.
11A Hash Function Example 1
12A Hash Function Example 1
int hash(const string key, int tableSize) int
hashVal 0 for(int i 0 i lt key.length()
i) hashVal keyi return hashVal
tableSize
13Comments on hash1
- Easy to compute and fast.
- If the TableSize is large, the function may not
distribute keys well. - Why?
- Suppose TableSize 10,007 (a prime) and all keys
are ASCII strings of length 8 or smaller. - hash1s range is 0, 12781016.
- This is NOT an acceptable distribution.
14Hash Function Example 2
15Hash Function Example 2
int hash2(const string key, int
tableSize) int hashVal 0 for(int j0 j lt
key.length() j) hashVal 37 hashVal
keyj hashVal tableSize if ( hashVal
lt 0 ) hashVal tableSize return
hashVal
16Comments On Hash2
- Easy to compute.
- Fast on relatively short keys.
- Distributes keys fairly well.
- Potential problems with very long keys, because
there will be lots of buffer overflows and
collisions.
17Collision Resolution
- A collision occurs when an element is inserted
under a key that hashes to the cell that is
already occupied with a different element.
18Collision Resolution Strategies
- Separate chaining
- Open addressing
19Separate Chaining
- Separate chaining keeps a list of all elements
whose keys hash to the same index. - What does it mean?
- Under separate chaining, a hash table is an array
of lists. - The term lists is used rather loosely in the
previous statement. It can be an array of AVL
search trees or an array of has tables. But the
linked list remains the most common choice.
20Hash Table Implementation
- template ltclass Tgt
- class CHashTable
-
-
- private
- vectorltlistltTgt gt m_Lists
- int m_Size
-
-
- int hash(const string key)
- int hash(const string key)
21Hash Table Implementation
- class CEmployee
- private
- string m_Name
- double m_Salary
-
-
- int hash(const Employee x)
- return hash(x.GetName())
22Hash Table Implementation
- template ltclass Tgt
- int CHashTableltTgthashIndex(const T x)
- const
-
- int index hash(x)
- index m_Lists.size()
- if ( index lt 0 )
- index m_Lists.size()
- return index