CS 225 Data Structures and Software Principles - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

CS 225 Data Structures and Software Principles

Description:

distribute keys uniformly across the various cells of the table ... What if I want to keep the table size fixed, but will let buckets grow? Open Hashing ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 28
Provided by: anandkris
Category:

less

Transcript and Presenter's Notes

Title: CS 225 Data Structures and Software Principles


1
CS 225 Data Structures and Software Principles
  • Session 14
  • Hashing

2
Discussion Topics
  • Hashing
  • Open Hashing
  • Closed Hashing
  • Probing Methods
  • Rehashing
  • Sample Code

3
Hashing
  • A process that places an item into a structure
    based on a key-to-address transformation
  • Goal optimize Find, Insert, Remove O(1)!
  • Some Terminology
  • Hash table
  • Hash function
  • Bucket
  • Collisions

4
Hashing
  • hash function given a key from some key space K,
    output a legal index into a hash table with m
    entries, T0..m-1
  • When two distinct keys hash to the same index we
    have a collision, requiring a method of collision
    resolution
  • Ideal hash function should
  • be easy to compute
  • distribute keys uniformly across the various
    cells of the table
  • avoid systematic collisions when there is a
    systematic nonrandom pattern to key selection

5
Hashing
  • Good heuristic set the size of the table to a
    prime number. Then we can use the hash function
  • h(x) x mod tablesize
  • Two general ways to resolve collisions
  • Open Hashing
  • Closed Hashing

0 4 8 12 16 20
0 4 0 4 0 4
0 4 1 5 2 6
x
x mod 8
not uniform!
x mod 7
much better
6
Open Hashing
  • Collision resolution build a linked list
    (bucket) off the table cell to hold multiple
    elements
  • a.k.a. Separate Chaining each bucket has a
    separate chain of elements

0 1 2 3 4 5 6
22
29
8
17
11
4
h(x) x mod 7
13
7
Open Hashing
  • Running time of Insert, Remove and Find
  • average case O(1)
  • worst case O(n)
  • Can we do better in the worst case? YES
  • Upper bound the list sizes, but leave table size
    unbounded
  • Run times improved from O(n) to O(1)
  • Do a rehashing on the table when that limit is
    exceeded
  • What if I want to keep the table size fixed, but
    will let buckets grow?

8
Open Hashing
  • Advantages
  • The number of keys stored can be greater than the
    size of the hash table itself
  • Good strategy when there arent too many
    collisions
  • Remove is easy
  • Disadvantages
  • Extra memory is used to store the pointers
  • Need to use dynamic memory for Insert Remove

9
Closed Hashing
  • Records are stored directly in the table (no
    linked lists)
  • Collision resolution choices
  • delete the old element and replace it with the
    new one (!)
  • move the old element elsewhere in the table
  • move the new element elsewhere in the table
  • a.k.a. open addressing a record is no longer
    confined to the cell that its key hashes to
  • How do we know where to move an element?

10
Closed Hashing Probing
  • Idea systematic way to find alternative cells in
    which to place a new record
  • Sequence of cells explored is the probing
    sequence
  • Probing sequence defined by a probing function
  • f takes one parameter probes made so far ?
    returns an offset from the original cell
  • initial hash attempt is the 0th probe
  • Hashing function now is
  • H(x,i) ( h(x) f(i) ) mod tablesize
  • If a cell is full, increment probe count and try
    again

11
Probing MethodLinear Probing
  • Probing function f(i) i
  • Hash function is now
  • H(x,i) ( h(x) i ) mod tablesize
  • Find algorithm search the cells according to the
    probing sequence until we find the key or reach
    an empty cell
  • This algorithm fails when we do a Remove

12
Linear ProbingRemove Problem
  • h(x) x mod 7 H(x,i) ( h(x) i ) mod 7

0 1 2 3 4 5 6
0 1 2 3 4 5 6
0 1 2 3 4 5 6
0 1 2 3 4 5 6
9
9
9
insert 9
insert 23
insert 16
23
23
16
0 1 2 3 4 5 6
0 1 2 3 4 5 6
9
9
remove 23
find 16
Not there!?
23
16
16
13
Linear ProbingFixed
  • To avoid this problem we maintain additional
    state information in the cells
  • Valid flag
  • Empty flag
  • Deleted flag
  • Remove sets the deleted flag (Lazy Deletion)
  • Find deleted flag is same as valid flag (but
    ignore the key in the cell)
  • Insert deleted flag is same as empty flag

14
Linear ProbingExample
  • h(x) x mod 7 H(x,i) ( h(x) i ) mod 7

remove 23
find 16
insert 2
0 1 2 3 4 5 6
0 1 2 3 4 5 6
0 1 2 3 4 5 6
0 1 2 3 4 5 6
E
E
E
E
E
E
E
E
9
V
9
9
V
V
9
V
23
V
23
23
D
D
2
V
16
V
16
16
V
V
16
V
E
E
E
E
E
E
E
E
Found!
  • treat Delete flag like
  • Empty in Insert
  • Valid in Find

15
Linear ProbingClustering Problem
  • H(x,i) h(x) i f(i) i
  • If collision, try h(x) 1, h(x) 2, etc...
  • Good strategy when table is not too full, but
  • Has Primary Clustering problem once a cluster
    forms, it gets large quickly resulting in long
    Find and Insert times
  • Inserting anywhere in a cluster
  • Requires probing to the end of the cluster
  • adds to the cluster size

16
Probing MethodQuadratic Probing
  • Probing function f(i) i2
  • Hash function is now
  • H(x,i) ( h(x) i2 ) mod tablesize
  • Example If h(x) x mod 7
  • values that hash to 2 would follow the sequence
    H 2, 3, 6, 11, 18, 27
  • i2 0, 1, 4, 9, 16, 25

17
Quadratic ProbingExample
  • h(x) x mod 7 H(x,i) ( h(x) i2 ) mod 7

insert 9
insert 23
insert 16
0 1 2 3 4 5 6
0 1 2 3 4 5 6
0 1 2 3 4 5 6
0 1 2 3 4 5 6
E
E
E
E
E
E
E
E
E
9
V
9
V
9
V
E
E
23
V
23
V
E
E
E
E
E
E
E
E
E
E
E
16
V
18
Quadratic ProbingClustering Problem
  • Quadratic probing avoids primary clustering
  • Keys that hash to different cells no longer
    cluster together
  • But has secondary clustering problem Keys that
    hash to the same cell still cluster together
    (resulting in long probe sequences)
  • Quadratic probing cannot guarantee successful
    insertion when the table is half-full or more

19
Probing MethodDouble Hashing
  • Idea Avoid clusters by choosing a probe sequence
    independent of primary position
  • Introduce 2nd hash function h2(x)
  • Probing function f(x,i) ih2(x)
  • h2(x) 1 is the same as linear probing
  • Hash function is now
  • H(x,i) ( h(x) ih2(x) ) tablesize
  • Avoids the clustering problems

20
Double HashingExample
  • h(x) x mod 7
  • h2(x) 5 x mod 5
  • ? H(x,i) (x7 i(5 x5)) 7

insert 9 Seq 2
insert 23 Seq 2, 4
insert 16 Seq 2, 6
0 1 2 3 4 5 6
0 1 2 3 4 5 6
0 1 2 3 4 5 6
E
E
E
E
E
E
9
V
9
V
9
V
E
E
E
E
23
V
23
V
E
E
E
E
E
16
V
21
Rehashing
  • Idea increase the size of the hash table and
    rehash the old values into this new table
  • do not re-insert Deleted or Empty cells
  • One good strategy double the size of the table
    and increase the size to the next largest prime
    number
  • Rehash
  • when the table gets filled up
  • OR
  • to keep the table relatively unfilled for
    performance
  • What is considered relatively unfilled?

22
Rehashing Load Factor
  • Load factor cells used / tablesize
  • Could also use ( Valid cells Deleted cells) /
    tablesize
  • Rehash when load factor is above a certain
    threshold
  • Load factor of .50 means the table is 50 full
  • Example threshold rehash when LF gt 50
  • Effect on running times
  • O(n) for rehash but now table only half as full,
    so over a long time insert is slower but still
    O(1) average case

23
Rehashing Example withQuadratic Probing
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
E
  • Insert(2) causes a load factor threshold of 50
    to exceed, so
  • H(x,i) ( h(x) i2 ) mod 7 becomes
  • H(x,i) ( h(x) i2 ) mod 17

E
2
V
E
E
E
23
V
E
E
0 1 2 3 4 5 6
E
9
V
E
E
9
V
E
23
V
E
2
V
E
E
E
16
V
E
16
V
24
Hashing Summary
H(x,i) ( h(x) f(x,i) ) tablesize
h(x) x tablesize (where tablesize is prime)
Hashing Category Probing Method
Open Hashing Store in list of fixed size (no additional probes)
Sequential Probing f(x,i) i
Quadratic Probing f(x,i) i2
Double Hashing f(x,i) ih2(x)
Closed hashing
25
Sample Code
  • All code discussed today is available at
  • cs225/src/library/14-closehash/
  • implements closed hashing

HashBase virtual Find() 0 virtual
HashFunction() 0
LinHashTable Find()
QuadHashTable Find()
DoubHashTable Find() virtual SecondHash() 0
UserDefined HashFunction() SecondHash()
UserDefined HashFunction()
UserDefined HashFunction()
26
STL ( Java)Hash-related Classes
  • Implemented as extensions to the C standard
  • hash_set, hash_multiset
  • hash_map, hash_multimap
  • A number of predefined hash functions are
    available through the function object hashltTgt
  • Compare to Java.util.
  • HashMap, LinkedHashMap
  • HashSet, LinkedHashSet
  • Hashtable (open hashing)

27
Practice Problems
  • Insert the keys and draw the resulting table
    the of probes used with the following hashing
    schemes (no rehashing)
  • 1) separate chaining (Open Hashing) ( probes
    N/A)
  • 2) open addressing (Closed Hashing) with linear
    probing
  • 3) open addressing (Closed Hashing) with double
    hashing
  • The output of the hash functions has been
    provided.

K h(k) h2(k) probes A 2 3 B 5 6 C 6 4 D 5 2 E 6
5 F 1 3
0 1 2 3 4 5 6
Write a Comment
User Comments (0)
About PowerShow.com