Title: CS 225 Data Structures and Software Principles
1CS 225 Data Structures and Software Principles
2Discussion Topics
- Hashing
- Open Hashing
- Closed Hashing
- Probing Methods
- Rehashing
- Sample Code
3Hashing
- A process that places an item into a structure
based on a key-to-address transformation - Goal optimize Find, Insert, Remove O(1)!
- Some Terminology
- Hash table
- Hash function
- Bucket
- Collisions
4Hashing
- hash function given a key from some key space K,
output a legal index into a hash table with m
entries, T0..m-1 - When two distinct keys hash to the same index we
have a collision, requiring a method of collision
resolution - Ideal hash function should
- be easy to compute
- distribute keys uniformly across the various
cells of the table - avoid systematic collisions when there is a
systematic nonrandom pattern to key selection
5Hashing
- Good heuristic set the size of the table to a
prime number. Then we can use the hash function - h(x) x mod tablesize
- Two general ways to resolve collisions
- Open Hashing
- Closed Hashing
0 4 8 12 16 20
0 4 0 4 0 4
0 4 1 5 2 6
x
x mod 8
not uniform!
x mod 7
much better
6Open Hashing
- Collision resolution build a linked list
(bucket) off the table cell to hold multiple
elements - a.k.a. Separate Chaining each bucket has a
separate chain of elements
0 1 2 3 4 5 6
22
29
8
17
11
4
h(x) x mod 7
13
7Open Hashing
- Running time of Insert, Remove and Find
- average case O(1)
- worst case O(n)
- Can we do better in the worst case? YES
- Upper bound the list sizes, but leave table size
unbounded - Run times improved from O(n) to O(1)
- Do a rehashing on the table when that limit is
exceeded - What if I want to keep the table size fixed, but
will let buckets grow?
8Open Hashing
- Advantages
- The number of keys stored can be greater than the
size of the hash table itself - Good strategy when there arent too many
collisions - Remove is easy
- Disadvantages
- Extra memory is used to store the pointers
- Need to use dynamic memory for Insert Remove
9Closed Hashing
- Records are stored directly in the table (no
linked lists) - Collision resolution choices
- delete the old element and replace it with the
new one (!) - move the old element elsewhere in the table
- move the new element elsewhere in the table
- a.k.a. open addressing a record is no longer
confined to the cell that its key hashes to - How do we know where to move an element?
10Closed Hashing Probing
- Idea systematic way to find alternative cells in
which to place a new record - Sequence of cells explored is the probing
sequence - Probing sequence defined by a probing function
- f takes one parameter probes made so far ?
returns an offset from the original cell - initial hash attempt is the 0th probe
- Hashing function now is
- H(x,i) ( h(x) f(i) ) mod tablesize
- If a cell is full, increment probe count and try
again
11Probing MethodLinear Probing
- Probing function f(i) i
- Hash function is now
- H(x,i) ( h(x) i ) mod tablesize
- Find algorithm search the cells according to the
probing sequence until we find the key or reach
an empty cell - This algorithm fails when we do a Remove
12Linear ProbingRemove Problem
- h(x) x mod 7 H(x,i) ( h(x) i ) mod 7
0 1 2 3 4 5 6
0 1 2 3 4 5 6
0 1 2 3 4 5 6
0 1 2 3 4 5 6
9
9
9
insert 9
insert 23
insert 16
23
23
16
0 1 2 3 4 5 6
0 1 2 3 4 5 6
9
9
remove 23
find 16
Not there!?
23
16
16
13Linear ProbingFixed
- To avoid this problem we maintain additional
state information in the cells - Valid flag
- Empty flag
- Deleted flag
- Remove sets the deleted flag (Lazy Deletion)
- Find deleted flag is same as valid flag (but
ignore the key in the cell) - Insert deleted flag is same as empty flag
14Linear ProbingExample
- h(x) x mod 7 H(x,i) ( h(x) i ) mod 7
remove 23
find 16
insert 2
0 1 2 3 4 5 6
0 1 2 3 4 5 6
0 1 2 3 4 5 6
0 1 2 3 4 5 6
E
E
E
E
E
E
E
E
9
V
9
9
V
V
9
V
23
V
23
23
D
D
2
V
16
V
16
16
V
V
16
V
E
E
E
E
E
E
E
E
Found!
- treat Delete flag like
- Empty in Insert
- Valid in Find
15Linear ProbingClustering Problem
- H(x,i) h(x) i f(i) i
- If collision, try h(x) 1, h(x) 2, etc...
- Good strategy when table is not too full, but
- Has Primary Clustering problem once a cluster
forms, it gets large quickly resulting in long
Find and Insert times - Inserting anywhere in a cluster
- Requires probing to the end of the cluster
- adds to the cluster size
16Probing MethodQuadratic Probing
- Probing function f(i) i2
- Hash function is now
- H(x,i) ( h(x) i2 ) mod tablesize
- Example If h(x) x mod 7
- values that hash to 2 would follow the sequence
H 2, 3, 6, 11, 18, 27 - i2 0, 1, 4, 9, 16, 25
17Quadratic ProbingExample
- h(x) x mod 7 H(x,i) ( h(x) i2 ) mod 7
insert 9
insert 23
insert 16
0 1 2 3 4 5 6
0 1 2 3 4 5 6
0 1 2 3 4 5 6
0 1 2 3 4 5 6
E
E
E
E
E
E
E
E
E
9
V
9
V
9
V
E
E
23
V
23
V
E
E
E
E
E
E
E
E
E
E
E
16
V
18Quadratic ProbingClustering Problem
- Quadratic probing avoids primary clustering
- Keys that hash to different cells no longer
cluster together - But has secondary clustering problem Keys that
hash to the same cell still cluster together
(resulting in long probe sequences) - Quadratic probing cannot guarantee successful
insertion when the table is half-full or more
19Probing MethodDouble Hashing
- Idea Avoid clusters by choosing a probe sequence
independent of primary position - Introduce 2nd hash function h2(x)
- Probing function f(x,i) ih2(x)
- h2(x) 1 is the same as linear probing
- Hash function is now
- H(x,i) ( h(x) ih2(x) ) tablesize
- Avoids the clustering problems
20Double HashingExample
- h(x) x mod 7
- h2(x) 5 x mod 5
- ? H(x,i) (x7 i(5 x5)) 7
insert 9 Seq 2
insert 23 Seq 2, 4
insert 16 Seq 2, 6
0 1 2 3 4 5 6
0 1 2 3 4 5 6
0 1 2 3 4 5 6
E
E
E
E
E
E
9
V
9
V
9
V
E
E
E
E
23
V
23
V
E
E
E
E
E
16
V
21Rehashing
- Idea increase the size of the hash table and
rehash the old values into this new table - do not re-insert Deleted or Empty cells
- One good strategy double the size of the table
and increase the size to the next largest prime
number - Rehash
- when the table gets filled up
- OR
- to keep the table relatively unfilled for
performance - What is considered relatively unfilled?
22Rehashing Load Factor
- Load factor cells used / tablesize
- Could also use ( Valid cells Deleted cells) /
tablesize - Rehash when load factor is above a certain
threshold - Load factor of .50 means the table is 50 full
- Example threshold rehash when LF gt 50
- Effect on running times
- O(n) for rehash but now table only half as full,
so over a long time insert is slower but still
O(1) average case
23Rehashing Example withQuadratic Probing
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
E
- Insert(2) causes a load factor threshold of 50
to exceed, so - H(x,i) ( h(x) i2 ) mod 7 becomes
- H(x,i) ( h(x) i2 ) mod 17
E
2
V
E
E
E
23
V
E
E
0 1 2 3 4 5 6
E
9
V
E
E
9
V
E
23
V
E
2
V
E
E
E
16
V
E
16
V
24Hashing Summary
H(x,i) ( h(x) f(x,i) ) tablesize
h(x) x tablesize (where tablesize is prime)
Hashing Category Probing Method
Open Hashing Store in list of fixed size (no additional probes)
Sequential Probing f(x,i) i
Quadratic Probing f(x,i) i2
Double Hashing f(x,i) ih2(x)
Closed hashing
25Sample Code
- All code discussed today is available at
- cs225/src/library/14-closehash/
- implements closed hashing
HashBase virtual Find() 0 virtual
HashFunction() 0
LinHashTable Find()
QuadHashTable Find()
DoubHashTable Find() virtual SecondHash() 0
UserDefined HashFunction() SecondHash()
UserDefined HashFunction()
UserDefined HashFunction()
26STL ( Java)Hash-related Classes
- Implemented as extensions to the C standard
- hash_set, hash_multiset
- hash_map, hash_multimap
- A number of predefined hash functions are
available through the function object hashltTgt - Compare to Java.util.
- HashMap, LinkedHashMap
- HashSet, LinkedHashSet
- Hashtable (open hashing)
27Practice Problems
- Insert the keys and draw the resulting table
the of probes used with the following hashing
schemes (no rehashing) - 1) separate chaining (Open Hashing) ( probes
N/A) - 2) open addressing (Closed Hashing) with linear
probing - 3) open addressing (Closed Hashing) with double
hashing - The output of the hash functions has been
provided.
K h(k) h2(k) probes A 2 3 B 5 6 C 6 4 D 5 2 E 6
5 F 1 3
0 1 2 3 4 5 6