CS2420: Lecture 33 - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

CS2420: Lecture 33

Description:

... Lecture 33. Vladimir Kulyukin. Computer Science Department. Utah State ... A hash function first extracts a key from a given object and then maps the key ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 23
Provided by: Vladimir120
Category:

less

Transcript and Presenter's Notes

Title: CS2420: Lecture 33


1
CS2420 Lecture 33
  • Vladimir Kulyukin
  • Computer Science Department
  • Utah State University

2
Outline
  • Hash Tables (Chapter 5)

3
Motivation
  • Recall Big Question 4
  • How can I retrieve/search data efficiently?
  • After investigating the balanced binary search
    trees (AVL, Red-Black), we can ask
  • Is it possible to break the log(n) barrier for
    insertion and deletion?

4
Hash Tables
  • A hash table is a data structure that was
    invented as an attempt to break the log(N)
    insertion and deletion barrier of the balanced
    binary search trees.
  • Conceptually, a hash table is an array of items
    plus a hash function that maps arbitrary objects
    to indices of the array.
  • A hash function first extracts a key from a given
    object and then maps the key into a legal array
    index.
  • For example, if an object is an employee record,
    the key could be the employees SSN or the
    employees first and last names.
  • Typical keys are numbers and strings.

5
Example A Hash Table
Mark
0
Rachel
1
2
3
John
4
David
5
Deborah
6
7
8
9
6
Hash Functions
0
Key Extraction
Object
1
2
Hashing
3
Key
legal index
4
5
6
7
Hash Functions
  • It is impossible to find a hash function that
    computes indices (two different array cells) for
    any two distinct keys. Why? Because there are
    infinitely many keys, but only finitely many
    slots in the table.
  • Question What are we to do?
  • Answer Look for hash functions that distribute
    keys evenly among the cells.

8
Three Hashing Problems
  • Choose a hash function
  • Simple and fast
  • Distributes keys evenly.
  • Choose a table size.
  • Choose a collision resolution strategy (what to
    do when several keys are mapped to the same
    index).

9
Choosing a Hash Function
  • If keys are integers, Key Mod TableSize is a
    sensible strategy.
  • Caveat Keys should be random and should not have
    some undesirable properties.
  • For example, if TableSize 10 and all keys end
    in 0, Key Mod TableSize is not a sensible
    strategy.

10
Choosing a Table Size
  • To avoid the situations with uneven key
    distributions, TableSize is typically a prime
    number.
  • When keys are random integers Key Mod TableSize
    works fairly well.

11
A Hash Function Example 1
12
A Hash Function Example 1
int hash(const string key, int tableSize) int
hashVal 0 for(int i 0 i lt key.length()
i) hashVal keyi return hashVal
tableSize
13
Comments on hash1
  • Easy to compute and fast.
  • If the TableSize is large, the function may not
    distribute keys well.
  • Why?
  • Suppose TableSize 10,007 (a prime) and all keys
    are ASCII strings of length 8 or smaller.
  • hash1s range is 0, 12781016.
  • This is NOT an acceptable distribution.

14
Hash Function Example 2
15
Hash Function Example 2
int hash2(const string key, int
tableSize) int hashVal 0 for(int j0 j lt
key.length() j) hashVal 37 hashVal
keyj hashVal tableSize if ( hashVal
lt 0 ) hashVal tableSize return
hashVal
16
Comments On Hash2
  • Easy to compute.
  • Fast on relatively short keys.
  • Distributes keys fairly well.
  • Potential problems with very long keys, because
    there will be lots of buffer overflows and
    collisions.

17
Collision Resolution
  • A collision occurs when an element is inserted
    under a key that hashes to the cell that is
    already occupied with a different element.

18
Collision Resolution Strategies
  • Separate chaining
  • Open addressing

19
Separate Chaining
  • Separate chaining keeps a list of all elements
    whose keys hash to the same index.
  • What does it mean?
  • Under separate chaining, a hash table is an array
    of lists.
  • The term lists is used rather loosely in the
    previous statement. It can be an array of AVL
    search trees or an array of has tables. But the
    linked list remains the most common choice.

20
Hash Table Implementation
  • template ltclass Tgt
  • class CHashTable
  • private
  • vectorltlistltTgt gt m_Lists
  • int m_Size
  • int hash(const string key)
  • int hash(const string key)

21
Hash Table Implementation
  • class CEmployee
  • private
  • string m_Name
  • double m_Salary
  • int hash(const Employee x)
  • return hash(x.GetName())

22
Hash Table Implementation
  • template ltclass Tgt
  • int CHashTableltTgthashIndex(const T x)
  • const
  • int index hash(x)
  • index m_Lists.size()
  • if ( index lt 0 )
  • index m_Lists.size()
  • return index
Write a Comment
User Comments (0)
About PowerShow.com