Hashing - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Hashing

Description:

... McCarra from Napier University: http://www.nada.kth.se/kurser/kth/2D1345/inda03 ... a hash table or hash map is a data structure that uses a hash function to ... – PowerPoint PPT presentation

Number of Views:764
Avg rating:3.0/5.0
Slides: 42
Provided by: acade124
Category:
Tags: hashing | nada

less

Transcript and Presenter's Notes

Title: Hashing


1
Hashing
  • Notes from Weiss, Ch 20 and Notes by Greg McCarra
    from Napier University http//www.nada.kth.se/ku
    rser/kth/2D1345/inda03/hashingReading.pdf

2
Introduction
  • What is hashing? Why is it useful to us?
  • Well, there are lots of applications out there
    that need to support ONLY the operations INSERT,
    SEARCH, and DELETE. These are known as
    dictionary operations.
  • Hashing can make this happen in as much as O(n)
    but as little as O(1) and is quite fast in
    practice. Lets learn more

3
What is it?
  • a hash table or hash map is a data structure that
    uses a hash function to efficiently translate
    certain keys (e.g., person names) into associated
    values (e.g., their telephone numbers). The hash
    function is used to transform the key into the
    index (the hash) of an array element (the slot or
    bucket) where the corresponding value is to be
    sought.
  • Ideally the hash function should map each
    possible key to a different slot index but this
    goal is rarely achievable in practice. Most hash
    table designs assume that hash collisions pairs
    of different keys with the same hash values are
    normal occurrences, and accommodate them in some
    way.
  • In a well-dimensioned hash table, the average
    cost (number of instructions) for each lookup is
    independent of the number of elements stored in
    the table. Many hash table designs also allow
    arbitrary insertions and deletions of key-value
    pairs, at constant average (indeed, amortized)
    cost per operation.
  • In many situations, hash tables turn out to be
    more efficient than search trees or any other
    table lookup structure. For this reason, they are
    widely used in all kinds of computer software.
  • ----Wikipedia

4
Example
  • We have a small group of people who wish to join
    a club (say about 40 folks). Then, if each of
    these people have an ID associated with them
    (from 1 to 40) we could store their information
    in an array and access it using the ID as the
    array index.

5
Example
  • Now, we have 7 of these clubs, with consecutive
    IDs going up to 280. Now what?
  • We COULD create a 280 element array for each club
    and use 40 elements of the array. (wasteful?)
  • We COULD create a 40 element array and calculate
    the index of each person using a mapping. (index
    ID - 240).

6
Example
  • Now, imagine that we are hosting a club in campus
    open to all students. We could use the PC ID (8
    digits long). How big should our array be?
  • THINGS TO CONSIDER
  • How many students do we expect to join?
  • How can we create a key based on this number?

7
Hash Functions
  • If we expect no more than 100 club members, we
    can use the last two digits of the PC ID as our
    index (aka KEY). Do we see any problems with
    this?
  • How do we get this number?
  • Take the remainder
  • (PC ID 100)

8
Hash Functions
  • Taking the remainder is called the
    Division-remainder technique and is an example of
    a uniform hash function
  • A uniform hash function is designed to distribute
    the keys roughly evenly into the available
    positions within the array (or hash table).

9
Collisions
  • So what about students 20061234 and 20071234?
    They will hash to the same position in the table!
    What do we do?

10
Collisions
  • If no two values are able to map into the same
    position in the hash table, we have what is known
    as an ideal hashing. For the hash function f,
    each key k maps into position f(k). Then, to
    search for an element, we simply compute its hash
    function and look it up in the table.

11
Collisions
  • Usually, ideal hashing is not possible (or at
    least not guaranteed). Some data is bound to
    hash to the same table element, in which case, we
    have a collision.
  • How do we solve this problem?

12
Collisions
  • We can think of each table location as a bucket
    that contains several slots. Each slot is filled
    with one piece of data.
  • This approach involves chaining the data. This
    is a common approach when the hash table is used
    as disk storage. For each element of the table,
    a linked list (of sorts) is maintained to hold
    data that map to the same location. This list can
    grow as items are entered (unordered) or enter
    items into the list in a sorted fashion (for
    easier retrieval).

13
Collisions
  • Other solutions?
  • Linear Probing
  • Quadratic Probing
  • Designing a Good Hash Function

14
Linear Probing
  • Have you ever been to a theatre or sports event
    where the tickets were numbered?
  • Has someone ever sat in your seat?
  • How did you resolve this problem?

15
Linear Probing
  • Linear Probing involves seeing an item in the
    hashed location and then moving by 1 through the
    array (circling to the beginning if necessary)
    until an open location is found.

16
Linear Probing
  • Lets say that we have 1000 numbered tickets to
    an event, but only sell 400. If we move the event
    to a smaller venue, we must also renumber the
    tickets. The hash function would work like this
  • (ticket number) 400.
  • How many folks can get the same hashed number?
    (3 - for example, tickets 42, 442, and 842)

17
Linear Probing
  • The idea is that even though these number hash to
    the same location, they need to be given a slot
    based on their hash number index. Using linear
    probing, the entries are placed into the next
    available position.

18
Linear Probing
  • Consider the data with keys 24, 42, 34,62,73
    into a table of size 10. These entries can be
    placed into the table at the following locations

19
Linear Probing
  • 24 10 4. Position is free. 24 placed into
    element 4
  • 42 10 2. Position is free. 42 placed into
    element 2
  • 34 10 4. Position is occupied. Try next place
    in the table (5). 34 placed into position 5.
  • 62 10 2. Position is occupied. Try next place
    in the table (3). 62 placed into position 3.
  • 73 10 3. Position is occupied. Try next place
    in the table (4). Same problem. Try (5). Then
    (6). 73 is placed into position 6.

20
Linear Probing
  • How would it look if the numbers were
  • 28, 19, 59, 68, 89??

21
Finding and Deleting
  • Finding?
  • Deleting?
  • we must be more careful. Having found the
    element, we cant just remove it. Why?
  • Use lazy deletion

22
Clustering
  • Sometimes, data will cluster this is caused
    when many elements hash to the same (or similar)
    location and linear probing has been used often.
    We can help with this problem by choosing our
    divisor carefully in our hash function and by
    carefully choosing our table size.

23
Designing a Good Hash Function
  • If the divisor is even and there are more even
    than odd key values, the hash function will
    produce an excess of even values. This is also
    true if there are an excessive amount of odd
    values.
  • However, if the divisor is odd, then either kind
    of excess of key values would still give a
    balanced distribution of odd/even results.
  • Thus, the divisor should be odd. But, this is not
    enough.

24
Designing a Good Hash Function
  • Thus, the divisor should be odd. But, this is not
    enough.
  • If the divisor itself is divisible by a small odd
    number (like 3, 5, or 7) the results are
    unbalanced again. Ideally, it should be a prime
    number. If no such prime number works for our
    table size (the divisor, remember?), we should
    use an odd number with no small factors.

25
Problems of Linear Probing
  • The majority of the problems are caused by
    clustering. These problems can be helped by
    using Quadratic probing instead.

26
Quadratic Probing
  • Works like linear probing but instead of looking
    to the next available position, the next location
    is chosen by looking at the positions that are
    12, 22, 32, etc. positions ahead.

27
Quadratic Probing
  • Consider the data with keys 24, 42, 34,62,73
    into a table of size 10. These entries can be
    placed into the table at the following locations

28
Quadratic Probing
  • 24 10 4. Position is free. 24 placed into
    element 4
  • 42 10 2. Position is free. 42 placed into
    element 2
  • 34 10 4. Position is occupied. Try place 12
    away in the table (5). 34 placed into position 5.
  • 62 10 2. Position is occupied. Try place 12
    away in the table. (3) 62 placed into position 3.
  • 73 10 3. Position is occupied. Try place 12
    away in the table (4). Same problem. Try place 22
    away in the table (6). 73 is placed into position
    6.
  • Thus, we jumped over the existing cluster.
  • This doesnt completely solve our problem, but it
    helps.

29
Quadratic Probing
  • How would it look if the numbers were
  • 28, 19, 59, 68, 89??

30
Advantages
  • Fast average constant time (O(1)) for finding
    information esp apparent when the table is
    large.
  • If the key/value pairs are known before
    programming (disallowing insertions/deletions of
    new data into the table), the programmer can
    reduce average lookup cost by a careful choice of
    the hash function, bucket table size, and
    internal data structures. (Sometimes this allows
    for perfect hashing)
  • ---- Wikipedia

31
Perfect Hashing
  • If all of the keys that will be used are known
    ahead of time, and there are no more keys than
    can fit the hash table, a perfect hash function
    can be used to create a perfect hash table, in
    which there will be no collisions. If minimal
    perfect hashing is used, every location in the
    hash table can be used as well.
  • Perfect hashing allows for constant time lookups
    in the worst case. This is in contrast to most
    chaining and open addressing methods, where the
    time for lookup is low on average, but may be
    arbitrarily large.
  • ---- Wikipedia

32
Drawbacks
  • More difficult to implement than search trees
  • Though operations take O(1) on average, cost of
    the hash function can be much higher, so on small
    numbers of data, hash tables are not as effective
    as a good tree structure.
  • Can be very inefficient if there are many
    collisions.
  • Unlikely in normal practice, a crafty (malicious)
    programmer can force the function to fall into
    the worst case behavior and create excessive
    collisions, causing poor performance (denial of
    service attacks)
  • ---- Wikipedia

33
Implementation
  • In Java, Hash Tables are implemented as a set
    or a map
  • The classes Set and HashSet are in the
    java.util package
  • Sets in java have four fundamental operations
  • Adding an element (add method)
  • Removing an element (remove method)
  • Containment Testing (is element in set?)
    (contains method)
  • Listing all elements (in arbitrary order) (list
    using an iterator for the set with the hasNext
    and next methods in a loop)

34
ch16/set/SetDemo.java
01 import java.util.HashSet 02 import
java.util.Scanner 03 import java.util.Set 04
05 06 / 07 This program demonstrates a
set of strings. The user 08 can add and
remove strings. 09 / 10 public class
SetDemo 11 12 public static void
main(String args) 13 14
Set names new HashSet() 15
Scanner in new Scanner(System.in) 16 17
boolean done false 18 while
(!done) 19 20
System.out.print("Add name, Q when done ") 21
String input in.next()
Continued
35
ch16/set/SetDemo.java (cont.)
22 if (input.equalsIgnoreCase("Q"))
23 done true 24
else 25 26
names.add(input) 27
print(names) 28 29 30 31
done false 32 while (!done) 33
34 System.out.print("Remove name,
Q when done ") 35 String input
in.next() 36 if (input.equalsIgnoreCase
("Q")) 37 done true 38
else 39 40
names.remove(input) 41
print(names) 42 43 44
Continued
36
ch16/set/SetDemo.java (cont.)
45 46 / 47 Prints the contents of
a set of strings. 48 _at_param s a set of
strings 49 / 50 private static void
print(Set s) 51 52
System.out.print(" ") 53 for (String
element s) 54 55
System.out.print(element) 56
System.out.print(" ") 57 58
System.out.println("") 59 60 61
62
Continued
37
Maps
  • A set stores elements. A map stores associations
    between keys and values.
  • Maps are also used to implement Hash Tables in
    Java.
  • All of these methods are found in the java.util
    package
  • java.util.Map
  • java.util.Set
  • java.util.HashMap
  • java.util.HashSet

38
Maps
  • A map keeps associations between key and value
    objects
  • Mathematically speaking, a map is a function from
    one set, the key set, to another set, the value
    set
  • Every key in a map has a unique value
  • A value may be associated with several keys
  • Classes that implement the Map interface
  • HashMap
  • TreeMap

39
An Example of a Map
40
ch16/map/MapDemo.java
01 import java.awt.Color 02 import
java.util.HashMap 03 import java.util.Map 04
import java.util.Set 05 06 / 07 This
program demonstrates a map that maps names to
colors. 08 / 09 public class MapDemo 10 11
public static void main(String args) 12
13 Map
favoriteColors 14 new
HashMap() 15
favoriteColors.put("Juliet", Color.PINK) 16
favoriteColors.put("Romeo", Color.GREEN) 17
favoriteColors.put("Adam", Color.BLUE) 18
favoriteColors.put("Eve", Color.PINK) 19
Continued
41
ch16/map/MapDemo.java (cont.)
20 Set keySet favoriteColors.keyS
et() 21 for (String key keySet) 22
23 Color value favoriteColors.get(
key) 24 System.out.println(key "-"
value) 25 26 27
Continued
Write a Comment
User Comments (0)
About PowerShow.com