CS 225 Data Structures and Software Principles - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

CS 225 Data Structures and Software Principles

Description:

distribute keys uniformly across the various cells of the table ... What if I want to keep the table size fixed, but will let buckets grow? Open Hashing ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 28

Provided by: anandkris

Category:

more less

Transcript and Presenter's Notes

Title: CS 225 Data Structures and Software Principles

1
CS 225 Data Structures and Software Principles

Session 14
Hashing

2
Discussion Topics

Hashing
Open Hashing
Closed Hashing
Probing Methods
Rehashing
Sample Code

3
Hashing

A process that places an item into a structure
based on a key-to-address transformation
Goal optimize Find, Insert, Remove O(1)!
Some Terminology
Hash table
Hash function
Bucket
Collisions

4
Hashing

hash function given a key from some key space K,
output a legal index into a hash table with m
entries, T0..m-1
When two distinct keys hash to the same index we
have a collision, requiring a method of collision
resolution
Ideal hash function should
be easy to compute
distribute keys uniformly across the various
cells of the table
avoid systematic collisions when there is a
systematic nonrandom pattern to key selection

5
Hashing

Good heuristic set the size of the table to a
prime number. Then we can use the hash function
h(x) x mod tablesize
Two general ways to resolve collisions
Open Hashing
Closed Hashing

0 4 8 12 16 20
0 4 0 4 0 4
0 4 1 5 2 6
x
x mod 8
not uniform!
x mod 7
much better
6
Open Hashing

Collision resolution build a linked list
(bucket) off the table cell to hold multiple
elements
a.k.a. Separate Chaining each bucket has a
separate chain of elements

0 1 2 3 4 5 6
22
29
8
17
11
4
h(x) x mod 7
13
7
Open Hashing

Running time of Insert, Remove and Find
average case O(1)
worst case O(n)
Can we do better in the worst case? YES
Upper bound the list sizes, but leave table size
unbounded
Run times improved from O(n) to O(1)
Do a rehashing on the table when that limit is
exceeded
What if I want to keep the table size fixed, but
will let buckets grow?

8
Open Hashing

Advantages
The number of keys stored can be greater than the
size of the hash table itself
Good strategy when there arent too many
collisions
Remove is easy
Disadvantages
Extra memory is used to store the pointers
Need to use dynamic memory for Insert Remove

9
Closed Hashing

Records are stored directly in the table (no
linked lists)
Collision resolution choices
delete the old element and replace it with the
new one (!)
move the old element elsewhere in the table
move the new element elsewhere in the table
a.k.a. open addressing a record is no longer
confined to the cell that its key hashes to
How do we know where to move an element?

10
Closed Hashing Probing

Idea systematic way to find alternative cells in
which to place a new record
Sequence of cells explored is the probing
sequence
Probing sequence defined by a probing function
f takes one parameter probes made so far ?
returns an offset from the original cell
initial hash attempt is the 0th probe
Hashing function now is
H(x,i) ( h(x) f(i) ) mod tablesize
If a cell is full, increment probe count and try
again

11
Probing MethodLinear Probing

Probing function f(i) i
Hash function is now
H(x,i) ( h(x) i ) mod tablesize
Find algorithm search the cells according to the
probing sequence until we find the key or reach
an empty cell
This algorithm fails when we do a Remove

12
Linear ProbingRemove Problem

h(x) x mod 7 H(x,i) ( h(x) i ) mod 7

0 1 2 3 4 5 6
0 1 2 3 4 5 6
0 1 2 3 4 5 6
0 1 2 3 4 5 6
9
9
9
insert 9
insert 23
insert 16
23
23
16
0 1 2 3 4 5 6
0 1 2 3 4 5 6
9
9
remove 23
find 16
Not there!?
23
16
16
13
Linear ProbingFixed

To avoid this problem we maintain additional
state information in the cells
Valid flag
Empty flag
Deleted flag
Remove sets the deleted flag (Lazy Deletion)
Find deleted flag is same as valid flag (but
ignore the key in the cell)
Insert deleted flag is same as empty flag

14
Linear ProbingExample

h(x) x mod 7 H(x,i) ( h(x) i ) mod 7

remove 23
find 16
insert 2
0 1 2 3 4 5 6
0 1 2 3 4 5 6
0 1 2 3 4 5 6
0 1 2 3 4 5 6
E
E
E
E
E
E
E
E
9
V
9
9
V
V
9
V
23
V
23
23
D
D
2
V
16
V
16
16
V
V
16
V
E
E
E
E
E
E
E
E
Found!

treat Delete flag like
Empty in Insert
Valid in Find

15
Linear ProbingClustering Problem

H(x,i) h(x) i f(i) i
If collision, try h(x) 1, h(x) 2, etc...
Good strategy when table is not too full, but
Has Primary Clustering problem once a cluster
forms, it gets large quickly resulting in long
Find and Insert times
Inserting anywhere in a cluster
Requires probing to the end of the cluster
adds to the cluster size

16
Probing MethodQuadratic Probing

Probing function f(i) i2
Hash function is now
H(x,i) ( h(x) i2 ) mod tablesize
Example If h(x) x mod 7
values that hash to 2 would follow the sequence
H 2, 3, 6, 11, 18, 27
i2 0, 1, 4, 9, 16, 25

17
Quadratic ProbingExample

h(x) x mod 7 H(x,i) ( h(x) i2 ) mod 7

insert 9
insert 23
insert 16
0 1 2 3 4 5 6
0 1 2 3 4 5 6
0 1 2 3 4 5 6
0 1 2 3 4 5 6
E
E
E
E
E
E
E
E
E
9
V
9
V
9
V
E
E
23
V
23
V
E
E
E
E
E
E
E
E
E
E
E
16
V
18
Quadratic ProbingClustering Problem

Quadratic probing avoids primary clustering
Keys that hash to different cells no longer
cluster together
But has secondary clustering problem Keys that
hash to the same cell still cluster together
(resulting in long probe sequences)
Quadratic probing cannot guarantee successful
insertion when the table is half-full or more

19
Probing MethodDouble Hashing

Idea Avoid clusters by choosing a probe sequence
independent of primary position
Introduce 2nd hash function h2(x)
Probing function f(x,i) ih2(x)
h2(x) 1 is the same as linear probing
Hash function is now
H(x,i) ( h(x) ih2(x) ) tablesize
Avoids the clustering problems

20
Double HashingExample

h(x) x mod 7
h2(x) 5 x mod 5
? H(x,i) (x7 i(5 x5)) 7

insert 9 Seq 2
insert 23 Seq 2, 4
insert 16 Seq 2, 6
0 1 2 3 4 5 6
0 1 2 3 4 5 6
0 1 2 3 4 5 6
E
E
E
E
E
E
9
V
9
V
9
V
E
E
E
E
23
V
23
V
E
E
E
E
E
16
V
21
Rehashing

Idea increase the size of the hash table and
rehash the old values into this new table
do not re-insert Deleted or Empty cells
One good strategy double the size of the table
and increase the size to the next largest prime
number
Rehash
when the table gets filled up
OR
to keep the table relatively unfilled for
performance
What is considered relatively unfilled?

22
Rehashing Load Factor

Load factor cells used / tablesize
Could also use ( Valid cells Deleted cells) /
tablesize
Rehash when load factor is above a certain
threshold
Load factor of .50 means the table is 50 full
Example threshold rehash when LF gt 50
Effect on running times
O(n) for rehash but now table only half as full,
so over a long time insert is slower but still
O(1) average case

23
Rehashing Example withQuadratic Probing
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
E

Insert(2) causes a load factor threshold of 50
to exceed, so
H(x,i) ( h(x) i2 ) mod 7 becomes
H(x,i) ( h(x) i2 ) mod 17

E
2
V
E
E
E
23
V
E
E
0 1 2 3 4 5 6
E
9
V
E
E
9
V
E
23
V
E
2
V
E
E
E
16
V
E
16
V
24
Hashing Summary
H(x,i) ( h(x) f(x,i) ) tablesize
h(x) x tablesize (where tablesize is prime)
Hashing Category Probing Method
Open Hashing Store in list of fixed size (no additional probes)
Sequential Probing f(x,i) i
Quadratic Probing f(x,i) i2
Double Hashing f(x,i) ih2(x)
Closed hashing
25
Sample Code

All code discussed today is available at
cs225/src/library/14-closehash/
implements closed hashing

HashBase virtual Find() 0 virtual
HashFunction() 0
LinHashTable Find()
QuadHashTable Find()
DoubHashTable Find() virtual SecondHash() 0
UserDefined HashFunction() SecondHash()
UserDefined HashFunction()
UserDefined HashFunction()
26
STL ( Java)Hash-related Classes

Implemented as extensions to the C standard
hash_set, hash_multiset
hash_map, hash_multimap
A number of predefined hash functions are
available through the function object hashltTgt
Compare to Java.util.
HashMap, LinkedHashMap
HashSet, LinkedHashSet
Hashtable (open hashing)

27
Practice Problems

Insert the keys and draw the resulting table
the of probes used with the following hashing
schemes (no rehashing)
1) separate chaining (Open Hashing) ( probes
N/A)
2) open addressing (Closed Hashing) with linear
probing
3) open addressing (Closed Hashing) with double
hashing
The output of the hash functions has been
provided.

K h(k) h2(k) probes A 2 3 B 5 6 C 6 4 D 5 2 E 6
5 F 1 3
0 1 2 3 4 5 6

Write a Comment

User Comments (0)