Title: Hashing
1Hashing
- Notes from Weiss, Ch 20 and Notes by Greg McCarra
from Napier University http//www.nada.kth.se/ku
rser/kth/2D1345/inda03/hashingReading.pdf
2Introduction
- What is hashing? Why is it useful to us?
- Well, there are lots of applications out there
that need to support ONLY the operations INSERT,
SEARCH, and DELETE. These are known as
dictionary operations. - Hashing can make this happen in as much as O(n)
but as little as O(1) and is quite fast in
practice. Lets learn more
3What is it?
- a hash table or hash map is a data structure that
uses a hash function to efficiently translate
certain keys (e.g., person names) into associated
values (e.g., their telephone numbers). The hash
function is used to transform the key into the
index (the hash) of an array element (the slot or
bucket) where the corresponding value is to be
sought. - Ideally the hash function should map each
possible key to a different slot index but this
goal is rarely achievable in practice. Most hash
table designs assume that hash collisions pairs
of different keys with the same hash values are
normal occurrences, and accommodate them in some
way. - In a well-dimensioned hash table, the average
cost (number of instructions) for each lookup is
independent of the number of elements stored in
the table. Many hash table designs also allow
arbitrary insertions and deletions of key-value
pairs, at constant average (indeed, amortized)
cost per operation. - In many situations, hash tables turn out to be
more efficient than search trees or any other
table lookup structure. For this reason, they are
widely used in all kinds of computer software. - ----Wikipedia
4Example
- We have a small group of people who wish to join
a club (say about 40 folks). Then, if each of
these people have an ID associated with them
(from 1 to 40) we could store their information
in an array and access it using the ID as the
array index.
5Example
- Now, we have 7 of these clubs, with consecutive
IDs going up to 280. Now what? - We COULD create a 280 element array for each club
and use 40 elements of the array. (wasteful?) - We COULD create a 40 element array and calculate
the index of each person using a mapping. (index
ID - 240).
6Example
- Now, imagine that we are hosting a club in campus
open to all students. We could use the PC ID (8
digits long). How big should our array be? - THINGS TO CONSIDER
- How many students do we expect to join?
- How can we create a key based on this number?
7Hash Functions
- If we expect no more than 100 club members, we
can use the last two digits of the PC ID as our
index (aka KEY). Do we see any problems with
this? - How do we get this number?
- Take the remainder
- (PC ID 100)
8Hash Functions
- Taking the remainder is called the
Division-remainder technique and is an example of
a uniform hash function - A uniform hash function is designed to distribute
the keys roughly evenly into the available
positions within the array (or hash table).
9Collisions
- So what about students 20061234 and 20071234?
They will hash to the same position in the table!
What do we do?
10Collisions
- If no two values are able to map into the same
position in the hash table, we have what is known
as an ideal hashing. For the hash function f,
each key k maps into position f(k). Then, to
search for an element, we simply compute its hash
function and look it up in the table.
11Collisions
- Usually, ideal hashing is not possible (or at
least not guaranteed). Some data is bound to
hash to the same table element, in which case, we
have a collision. - How do we solve this problem?
12Collisions
- We can think of each table location as a bucket
that contains several slots. Each slot is filled
with one piece of data. - This approach involves chaining the data. This
is a common approach when the hash table is used
as disk storage. For each element of the table,
a linked list (of sorts) is maintained to hold
data that map to the same location. This list can
grow as items are entered (unordered) or enter
items into the list in a sorted fashion (for
easier retrieval).
13Collisions
- Other solutions?
- Linear Probing
- Quadratic Probing
- Designing a Good Hash Function
14Linear Probing
- Have you ever been to a theatre or sports event
where the tickets were numbered? - Has someone ever sat in your seat?
- How did you resolve this problem?
15Linear Probing
- Linear Probing involves seeing an item in the
hashed location and then moving by 1 through the
array (circling to the beginning if necessary)
until an open location is found.
16Linear Probing
- Lets say that we have 1000 numbered tickets to
an event, but only sell 400. If we move the event
to a smaller venue, we must also renumber the
tickets. The hash function would work like this - (ticket number) 400.
- How many folks can get the same hashed number?
(3 - for example, tickets 42, 442, and 842)
17Linear Probing
- The idea is that even though these number hash to
the same location, they need to be given a slot
based on their hash number index. Using linear
probing, the entries are placed into the next
available position.
18Linear Probing
- Consider the data with keys 24, 42, 34,62,73
into a table of size 10. These entries can be
placed into the table at the following locations
19Linear Probing
- 24 10 4. Position is free. 24 placed into
element 4 - 42 10 2. Position is free. 42 placed into
element 2 - 34 10 4. Position is occupied. Try next place
in the table (5). 34 placed into position 5. - 62 10 2. Position is occupied. Try next place
in the table (3). 62 placed into position 3. - 73 10 3. Position is occupied. Try next place
in the table (4). Same problem. Try (5). Then
(6). 73 is placed into position 6.
20Linear Probing
- How would it look if the numbers were
- 28, 19, 59, 68, 89??
21Finding and Deleting
- Finding?
- Deleting?
- we must be more careful. Having found the
element, we cant just remove it. Why? - Use lazy deletion
22Clustering
- Sometimes, data will cluster this is caused
when many elements hash to the same (or similar)
location and linear probing has been used often.
We can help with this problem by choosing our
divisor carefully in our hash function and by
carefully choosing our table size.
23Designing a Good Hash Function
- If the divisor is even and there are more even
than odd key values, the hash function will
produce an excess of even values. This is also
true if there are an excessive amount of odd
values. - However, if the divisor is odd, then either kind
of excess of key values would still give a
balanced distribution of odd/even results. - Thus, the divisor should be odd. But, this is not
enough.
24Designing a Good Hash Function
- Thus, the divisor should be odd. But, this is not
enough. - If the divisor itself is divisible by a small odd
number (like 3, 5, or 7) the results are
unbalanced again. Ideally, it should be a prime
number. If no such prime number works for our
table size (the divisor, remember?), we should
use an odd number with no small factors.
25Problems of Linear Probing
- The majority of the problems are caused by
clustering. These problems can be helped by
using Quadratic probing instead.
26Quadratic Probing
- Works like linear probing but instead of looking
to the next available position, the next location
is chosen by looking at the positions that are
12, 22, 32, etc. positions ahead.
27Quadratic Probing
- Consider the data with keys 24, 42, 34,62,73
into a table of size 10. These entries can be
placed into the table at the following locations
28Quadratic Probing
- 24 10 4. Position is free. 24 placed into
element 4 - 42 10 2. Position is free. 42 placed into
element 2 - 34 10 4. Position is occupied. Try place 12
away in the table (5). 34 placed into position 5. - 62 10 2. Position is occupied. Try place 12
away in the table. (3) 62 placed into position 3. - 73 10 3. Position is occupied. Try place 12
away in the table (4). Same problem. Try place 22
away in the table (6). 73 is placed into position
6. - Thus, we jumped over the existing cluster.
- This doesnt completely solve our problem, but it
helps.
29Quadratic Probing
- How would it look if the numbers were
- 28, 19, 59, 68, 89??
30Advantages
- Fast average constant time (O(1)) for finding
information esp apparent when the table is
large. - If the key/value pairs are known before
programming (disallowing insertions/deletions of
new data into the table), the programmer can
reduce average lookup cost by a careful choice of
the hash function, bucket table size, and
internal data structures. (Sometimes this allows
for perfect hashing) - ---- Wikipedia
31Perfect Hashing
- If all of the keys that will be used are known
ahead of time, and there are no more keys than
can fit the hash table, a perfect hash function
can be used to create a perfect hash table, in
which there will be no collisions. If minimal
perfect hashing is used, every location in the
hash table can be used as well. - Perfect hashing allows for constant time lookups
in the worst case. This is in contrast to most
chaining and open addressing methods, where the
time for lookup is low on average, but may be
arbitrarily large. - ---- Wikipedia
32Drawbacks
- More difficult to implement than search trees
- Though operations take O(1) on average, cost of
the hash function can be much higher, so on small
numbers of data, hash tables are not as effective
as a good tree structure. - Can be very inefficient if there are many
collisions. - Unlikely in normal practice, a crafty (malicious)
programmer can force the function to fall into
the worst case behavior and create excessive
collisions, causing poor performance (denial of
service attacks) - ---- Wikipedia
33Implementation
- In Java, Hash Tables are implemented as a set
or a map - The classes Set and HashSet are in the
java.util package - Sets in java have four fundamental operations
- Adding an element (add method)
- Removing an element (remove method)
- Containment Testing (is element in set?)
(contains method) - Listing all elements (in arbitrary order) (list
using an iterator for the set with the hasNext
and next methods in a loop)
34ch16/set/SetDemo.java
01 import java.util.HashSet 02 import
java.util.Scanner 03 import java.util.Set 04
05 06 / 07 This program demonstrates a
set of strings. The user 08 can add and
remove strings. 09 / 10 public class
SetDemo 11 12 public static void
main(String args) 13 14
Set names new HashSet() 15
Scanner in new Scanner(System.in) 16 17
boolean done false 18 while
(!done) 19 20
System.out.print("Add name, Q when done ") 21
String input in.next()
Continued
35ch16/set/SetDemo.java (cont.)
22 if (input.equalsIgnoreCase("Q"))
23 done true 24
else 25 26
names.add(input) 27
print(names) 28 29 30 31
done false 32 while (!done) 33
34 System.out.print("Remove name,
Q when done ") 35 String input
in.next() 36 if (input.equalsIgnoreCase
("Q")) 37 done true 38
else 39 40
names.remove(input) 41
print(names) 42 43 44
Continued
36ch16/set/SetDemo.java (cont.)
45 46 / 47 Prints the contents of
a set of strings. 48 _at_param s a set of
strings 49 / 50 private static void
print(Set s) 51 52
System.out.print(" ") 53 for (String
element s) 54 55
System.out.print(element) 56
System.out.print(" ") 57 58
System.out.println("") 59 60 61
62
Continued
37Maps
- A set stores elements. A map stores associations
between keys and values. - Maps are also used to implement Hash Tables in
Java. - All of these methods are found in the java.util
package - java.util.Map
- java.util.Set
- java.util.HashMap
- java.util.HashSet
38Maps
- A map keeps associations between key and value
objects - Mathematically speaking, a map is a function from
one set, the key set, to another set, the value
set - Every key in a map has a unique value
- A value may be associated with several keys
- Classes that implement the Map interface
- HashMap
- TreeMap
39An Example of a Map
40ch16/map/MapDemo.java
01 import java.awt.Color 02 import
java.util.HashMap 03 import java.util.Map 04
import java.util.Set 05 06 / 07 This
program demonstrates a map that maps names to
colors. 08 / 09 public class MapDemo 10 11
public static void main(String args) 12
13 Map
favoriteColors 14 new
HashMap() 15
favoriteColors.put("Juliet", Color.PINK) 16
favoriteColors.put("Romeo", Color.GREEN) 17
favoriteColors.put("Adam", Color.BLUE) 18
favoriteColors.put("Eve", Color.PINK) 19
Continued
41ch16/map/MapDemo.java (cont.)
20 Set keySet favoriteColors.keyS
et() 21 for (String key keySet) 22
23 Color value favoriteColors.get(
key) 24 System.out.println(key "-"
value) 25 26 27
Continued