Hashing

About This Presentation

Title:

Hashing

Description:

... McCarra from Napier University: http://www.nada.kth.se/kurser/kth/2D1345/inda03 ... a hash table or hash map is a data structure that uses a hash function to ... – PowerPoint PPT presentation

Number of Views:764

Avg rating:3.0/5.0

Slides: 42

Provided by: acade124

Category:

more less

Transcript and Presenter's Notes

Title: Hashing

1
Hashing

Notes from Weiss, Ch 20 and Notes by Greg McCarra
from Napier University http//www.nada.kth.se/ku
rser/kth/2D1345/inda03/hashingReading.pdf

2
Introduction

What is hashing? Why is it useful to us?
Well, there are lots of applications out there
that need to support ONLY the operations INSERT,
SEARCH, and DELETE. These are known as
dictionary operations.
Hashing can make this happen in as much as O(n)
but as little as O(1) and is quite fast in
practice. Lets learn more

3
What is it?

a hash table or hash map is a data structure that
uses a hash function to efficiently translate
certain keys (e.g., person names) into associated
values (e.g., their telephone numbers). The hash
function is used to transform the key into the
index (the hash) of an array element (the slot or
bucket) where the corresponding value is to be
sought.
Ideally the hash function should map each
possible key to a different slot index but this
goal is rarely achievable in practice. Most hash
table designs assume that hash collisions pairs
of different keys with the same hash values are
normal occurrences, and accommodate them in some
way.
In a well-dimensioned hash table, the average
cost (number of instructions) for each lookup is
independent of the number of elements stored in
the table. Many hash table designs also allow
arbitrary insertions and deletions of key-value
pairs, at constant average (indeed, amortized)
cost per operation.
In many situations, hash tables turn out to be
more efficient than search trees or any other
table lookup structure. For this reason, they are
widely used in all kinds of computer software.
----Wikipedia

4
Example

We have a small group of people who wish to join
a club (say about 40 folks). Then, if each of
these people have an ID associated with them
(from 1 to 40) we could store their information
in an array and access it using the ID as the
array index.

5
Example

Now, we have 7 of these clubs, with consecutive
IDs going up to 280. Now what?
We COULD create a 280 element array for each club
and use 40 elements of the array. (wasteful?)
We COULD create a 40 element array and calculate
the index of each person using a mapping. (index
ID - 240).

6
Example

Now, imagine that we are hosting a club in campus
open to all students. We could use the PC ID (8
digits long). How big should our array be?
THINGS TO CONSIDER
How many students do we expect to join?
How can we create a key based on this number?

7
Hash Functions

If we expect no more than 100 club members, we
can use the last two digits of the PC ID as our
index (aka KEY). Do we see any problems with
this?
How do we get this number?
Take the remainder
(PC ID 100)

8
Hash Functions

Taking the remainder is called the
Division-remainder technique and is an example of
a uniform hash function
A uniform hash function is designed to distribute
the keys roughly evenly into the available
positions within the array (or hash table).

9
Collisions

So what about students 20061234 and 20071234?
They will hash to the same position in the table!
What do we do?

10
Collisions

If no two values are able to map into the same
position in the hash table, we have what is known
as an ideal hashing. For the hash function f,
each key k maps into position f(k). Then, to
search for an element, we simply compute its hash
function and look it up in the table.

11
Collisions

Usually, ideal hashing is not possible (or at
least not guaranteed). Some data is bound to
hash to the same table element, in which case, we
have a collision.
How do we solve this problem?

12
Collisions

We can think of each table location as a bucket
that contains several slots. Each slot is filled
with one piece of data.
This approach involves chaining the data. This
is a common approach when the hash table is used
as disk storage. For each element of the table,
a linked list (of sorts) is maintained to hold
data that map to the same location. This list can
grow as items are entered (unordered) or enter
items into the list in a sorted fashion (for
easier retrieval).

13
Collisions

Other solutions?
Linear Probing
Quadratic Probing
Designing a Good Hash Function

14
Linear Probing

Have you ever been to a theatre or sports event
where the tickets were numbered?
Has someone ever sat in your seat?
How did you resolve this problem?

15
Linear Probing

Linear Probing involves seeing an item in the
hashed location and then moving by 1 through the
array (circling to the beginning if necessary)
until an open location is found.

16
Linear Probing

Lets say that we have 1000 numbered tickets to
an event, but only sell 400. If we move the event
to a smaller venue, we must also renumber the
tickets. The hash function would work like this
(ticket number) 400.
How many folks can get the same hashed number?
(3 - for example, tickets 42, 442, and 842)

17
Linear Probing

The idea is that even though these number hash to
the same location, they need to be given a slot
based on their hash number index. Using linear
probing, the entries are placed into the next
available position.

18
Linear Probing

Consider the data with keys 24, 42, 34,62,73
into a table of size 10. These entries can be
placed into the table at the following locations

19
Linear Probing

24 10 4. Position is free. 24 placed into
element 4
42 10 2. Position is free. 42 placed into
element 2
34 10 4. Position is occupied. Try next place
in the table (5). 34 placed into position 5.
62 10 2. Position is occupied. Try next place
in the table (3). 62 placed into position 3.
73 10 3. Position is occupied. Try next place
in the table (4). Same problem. Try (5). Then
(6). 73 is placed into position 6.

20
Linear Probing

How would it look if the numbers were
28, 19, 59, 68, 89??

21
Finding and Deleting

Finding?
Deleting?
we must be more careful. Having found the
element, we cant just remove it. Why?
Use lazy deletion

22
Clustering

Sometimes, data will cluster this is caused
when many elements hash to the same (or similar)
location and linear probing has been used often.
We can help with this problem by choosing our
divisor carefully in our hash function and by
carefully choosing our table size.

23
Designing a Good Hash Function

If the divisor is even and there are more even
than odd key values, the hash function will
produce an excess of even values. This is also
true if there are an excessive amount of odd
values.
However, if the divisor is odd, then either kind
of excess of key values would still give a
balanced distribution of odd/even results.
Thus, the divisor should be odd. But, this is not
enough.

24
Designing a Good Hash Function

Thus, the divisor should be odd. But, this is not
enough.
If the divisor itself is divisible by a small odd
number (like 3, 5, or 7) the results are
unbalanced again. Ideally, it should be a prime
number. If no such prime number works for our
table size (the divisor, remember?), we should
use an odd number with no small factors.

25
Problems of Linear Probing

The majority of the problems are caused by
clustering. These problems can be helped by
using Quadratic probing instead.

26
Quadratic Probing

Works like linear probing but instead of looking
to the next available position, the next location
is chosen by looking at the positions that are
12, 22, 32, etc. positions ahead.

27
Quadratic Probing

Consider the data with keys 24, 42, 34,62,73
into a table of size 10. These entries can be
placed into the table at the following locations

28
Quadratic Probing

24 10 4. Position is free. 24 placed into
element 4
42 10 2. Position is free. 42 placed into
element 2
34 10 4. Position is occupied. Try place 12
away in the table (5). 34 placed into position 5.
62 10 2. Position is occupied. Try place 12
away in the table. (3) 62 placed into position 3.
73 10 3. Position is occupied. Try place 12
away in the table (4). Same problem. Try place 22
away in the table (6). 73 is placed into position
6.
Thus, we jumped over the existing cluster.
This doesnt completely solve our problem, but it
helps.

29
Quadratic Probing

How would it look if the numbers were
28, 19, 59, 68, 89??

30
Advantages

Fast average constant time (O(1)) for finding
information esp apparent when the table is
large.
If the key/value pairs are known before
programming (disallowing insertions/deletions of
new data into the table), the programmer can
reduce average lookup cost by a careful choice of
the hash function, bucket table size, and
internal data structures. (Sometimes this allows
for perfect hashing)
---- Wikipedia

31
Perfect Hashing

If all of the keys that will be used are known
ahead of time, and there are no more keys than
can fit the hash table, a perfect hash function
can be used to create a perfect hash table, in
which there will be no collisions. If minimal
perfect hashing is used, every location in the
hash table can be used as well.
Perfect hashing allows for constant time lookups
in the worst case. This is in contrast to most
chaining and open addressing methods, where the
time for lookup is low on average, but may be
arbitrarily large.
---- Wikipedia

32
Drawbacks

More difficult to implement than search trees
Though operations take O(1) on average, cost of
the hash function can be much higher, so on small
numbers of data, hash tables are not as effective
as a good tree structure.
Can be very inefficient if there are many
collisions.
Unlikely in normal practice, a crafty (malicious)
programmer can force the function to fall into
the worst case behavior and create excessive
collisions, causing poor performance (denial of
service attacks)
---- Wikipedia

33
Implementation

In Java, Hash Tables are implemented as a set
or a map
The classes Set and HashSet are in the
java.util package
Sets in java have four fundamental operations
Adding an element (add method)
Removing an element (remove method)
Containment Testing (is element in set?)
(contains method)
Listing all elements (in arbitrary order) (list
using an iterator for the set with the hasNext
and next methods in a loop)

34
ch16/set/SetDemo.java
01 import java.util.HashSet 02 import
java.util.Scanner 03 import java.util.Set 04
05 06 / 07 This program demonstrates a
set of strings. The user 08 can add and
remove strings. 09 / 10 public class
SetDemo 11 12 public static void
main(String args) 13 14
Set names new HashSet() 15
Scanner in new Scanner(System.in) 16 17
boolean done false 18 while
(!done) 19 20
System.out.print("Add name, Q when done ") 21
String input in.next()
Continued
35
ch16/set/SetDemo.java (cont.)
22 if (input.equalsIgnoreCase("Q"))
23 done true 24
else 25 26
names.add(input) 27
print(names) 28 29 30 31
done false 32 while (!done) 33
34 System.out.print("Remove name,
Q when done ") 35 String input
in.next() 36 if (input.equalsIgnoreCase
("Q")) 37 done true 38
else 39 40
names.remove(input) 41
print(names) 42 43 44
Continued
36
ch16/set/SetDemo.java (cont.)
45 46 / 47 Prints the contents of
a set of strings. 48 _at_param s a set of
strings 49 / 50 private static void
print(Set s) 51 52
System.out.print(" ") 53 for (String
element s) 54 55
System.out.print(element) 56
System.out.print(" ") 57 58
System.out.println("") 59 60 61
62
Continued
37
Maps

A set stores elements. A map stores associations
between keys and values.
Maps are also used to implement Hash Tables in
Java.
All of these methods are found in the java.util
package
java.util.Map
java.util.Set
java.util.HashMap
java.util.HashSet

38
Maps

A map keeps associations between key and value
objects
Mathematically speaking, a map is a function from
one set, the key set, to another set, the value
set
Every key in a map has a unique value
A value may be associated with several keys
Classes that implement the Map interface
HashMap
TreeMap

39
An Example of a Map
40
ch16/map/MapDemo.java
01 import java.awt.Color 02 import
java.util.HashMap 03 import java.util.Map 04
import java.util.Set 05 06 / 07 This
program demonstrates a map that maps names to
colors. 08 / 09 public class MapDemo 10 11
public static void main(String args) 12
13 Map
favoriteColors 14 new
HashMap() 15
favoriteColors.put("Juliet", Color.PINK) 16
favoriteColors.put("Romeo", Color.GREEN) 17
favoriteColors.put("Adam", Color.BLUE) 18
favoriteColors.put("Eve", Color.PINK) 19
Continued
41
ch16/map/MapDemo.java (cont.)
20 Set keySet favoriteColors.keyS
et() 21 for (String key keySet) 22
23 Color value favoriteColors.get(
key) 24 System.out.println(key "-"
value) 25 26 27
Continued

Write a Comment

User Comments (0)

About PowerShow.com

Hashing - PowerPoint PPT Presentation

Hashing

... McCarra from Napier University: http://www.nada.kth.se/kurser/kth/2D1345/inda03 ... a hash table or hash map is a data structure that uses a hash function to ... – PowerPoint PPT presentation