Hash Table - PowerPoint PPT Presentation

About This Presentation
Title:

Hash Table

Description:

Chapter 12 Hash Table – PowerPoint PPT presentation

Number of Views:311
Avg rating:3.0/5.0
Slides: 49
Provided by: Darw2
Category:
Tags: hash | hashing | table

less

Transcript and Presenter's Notes

Title: Hash Table


1
Chapter 12
  • Hash Table

2
Hash Table
  • So far, the best worst-case time for searching is
    O(log n).
  • Hash tables
  • average search time of O(1).
  • worst case search time of O(n).

3
Learning Objectives
  • Develop the motivation for hashing.
  • Study hash functions.
  • Understand collision resolution and compare and
    contrast various collision resolution schemes.
  • Summarize the average running times for hashing
    under various collision resolution schemes.
  • Explore the java.util.HashMap class.

4
12.1 Motivation
  • Let's design a data structure using an array for
    which the indices could be the keys of entries.
  • Suppose we wanted to store the keys 1, 3, 5, 8,
    10, with a guaranteed one-step access to any of
    these.

5
12.1 Motivation
  • The space consumption does not depend on the
    actual number of entries stored.
  • It depends on the range of keys.
  • What if we wanted to store strings?
  • For each string, we would first have to compute a
    numeric key that is equivalent to it.
  • java.lang.String.hashCode() computes the numeric
    equivalent (or hashcode) of a string by an
    arithmetic manipulation involving its individual
    characters.

6
12.1 Motivation
  • Using numeric keys directly as indices is out of
    the question for most applications.
  • There isn't enough space

7
12.1 Motivation
8
12.2 Hashing
  • A simple hash function
  • table size of 10
  • h(k) k mod 10

9
12.2 Hashing
  • ear collides with cat at position 4.
  • There is empty space in the table, and it is up
    to the collision resolution scheme to find an
    appropriate position for this string.
  • A better mapping function
  • For any hash function one could devise, there are
    always hashcodes that could force the mapping
    function to be ineffective by generating lots of
    collisions.

10
12.2 Hashing
11
12.3 Collision Resolution
  • There are two ways to resolve collisions.
  • open addressing
  • Find another location for the colliding key
    within the hash table.
  • closed addressing
  • store all keys that hash to the same location in
    a data structure that hangs off that location.

12
12.3.1 Linear Probing
13
12.3.1 Linear Probing
  • As more and more entries are hashed into the
    table, they tend to form clusters that get bigger
    and bigger.
  • The number of probes on collisions gradually
    increases, thus slowing down the hash time to a
    crawl.

14
12.3.1 Linear Probing
  • Insert "cat", "ear", "sad", and "aid"

15
12.3.1 Linear Probing
  • Clustering is the downfall of linear probing, so
    we need to look to another method of collision
    resolution that avoids clustering.

16
12.3.2 Quadratic Probing
17
12.3.2 Quadratic Probing
  • Avoids Clustering
  • When the probing stops with a failure to find an
    empty spot, as many as half the locations of the
    table may still be unoccupied.
  • A hash to 2,3,6,0,7, and 5 are endlessly
    repeated, and an insertion is not done, even
    though half the table is empty.

18
12.3.2 Quadratic Probing
  • For any given prime N, once a location is
    examined twice, all locations that are examined
    thereafter are also ones that have been already
    examined.

19
12.3.3 Chaining
  • If a collision occurs at location i of the hash
    table, it simply adds the colliding entry to a
    linked list that is built at that location.

20
Running times
  • We assume that the hashing process itself
    (hashcode and mapping) takes O(1).
  • Running time of insertion is determined by the
    collision resolution scheme.

21
12.4 The java.util.HashMap Class
  • Consider a university-wide database that stores
    student records.
  • Every student is assigned a unique id (key), with
    which is associated several pieces of information
    such as name, address, credits, gpa, etc.
  • These pieces of information constitute the value.

22
12.4 The java.util.HashMap Class
  • A StudentInfo dictionary that stores (id, info)
    pairs for all the students enrolled in the
    university.
  • The operations corresponding to this relationship
    can be found in hava.util.MapltK,Vgt

23
12.4 The java.util.HashMap Class
  • The Map interface also provides operations to
    enumerate all the keys, enumerate all the values,
    get the size of the dictionary, check whether the
    dictionary is empty, and so on.
  • The java.util.HashMap implements the dictionary
    abstraction as specified by the java.util.Map
    interface. It resolves collisions using chaining.

24
12.4.1 Table and Load Factor
  • When the no-arg constructor is used
  • Default initial capacity 16
  • Default load factor of 0.75.
  • The table size is defined as the actual number of
    key-value mappings in the has table.

25
12.4.1 Table and Load Factor
  • We can choose an initial capacity
  • Only uses capacities that are powers of 2.
  • 101 becomes 128

26
12.4.1 Table and Load Factor
  • An initial capacity of 128.

27
12.4.2 Storage of Entries
  • Relevant fields in the HashMap class.
  • threshold is the size threshold
  • Product of the capacity and the threshold load
    factor (N t)

28
12.4.2 Storage of Entries
  • Entry table sets up an array of chains.
  • Map.EntryltK,Vgt is defined inside the MapltK,Vgt
    interface.
  • next holds a reference to the next Entry in its
    linked list.

29
12.4.3 Adding an Entry
  • Example
  • Name serves as a key to the phone number value.

30
12.4.3 Adding an Entry
31
12.4.3 Adding an Entry
  • If the key argument is null, a special object,
    NULL_KEY is returned, otherwise the argument key
    is returned as is.

32
12.4.3 Adding an Entry
33
12.4.3 Adding an Entry
  • Example
  • h 25 and length 16
  • The binary representation of h and length-1
    (11001 and 01111).

34
12.4.3 Adding an Entry
  • Since length is a power of 2, the binary
    representation of length will be 100...0 with k
    zeros.
  • Any h is expressible as 2c k r.
  • r is a result of the bit-wise and, since the 2c
    k part is a higher order bit that will be zeroed
    out in the process.

35
12.4.3 Adding an Entry
36
12.4.3 Adding an Entry
  • The if statement triggers a rehashing process if
    the size is equal to or greater than the
    threshold.

37
12.4.4 Rehashing
38
12.4.4 Rehashing
39
12.4.5 Searching
40
12.5 Quadratic Probing Repetition of Probe
Locations
  • Quadratic probing only examines N/2 locations of
    the table before starting to repeat locations.
  • Suppose a key is hashed to location h, where
    there is a collision.
  • Following locations are examined.

41
12.5 Quadratic Probing Repetition of Probe
Locations
  • If two different probes (i and j) end up at the
    same location?

42
12.5 Quadratic Probing Repetition of Probe
Locations
  • Since N is a prime number, it must divide one of
    the factors (i j) or (i - j).
  • N divides (i - j) only when at least N probes
    have been made already.
  • N divides (i j) when (i j N), at the very
    least.
  • j N - i

43
12.6 Summary
  • A hash table implements the dictionary operations
    of insert, search, and delete on (key, value)
    pairs.
  • Given a key, a hash function for a given hash
    table computes an index into the table as a
    function of the key by first obtaining a numeric
    hashcode, and then mapping this hashcode to a
    table location.

44
12.6 Summary
  • When a new key hashes to a location in the hash
    table that is already occupied, it is said to
    collide with the occupying key.
  • Collision resolution is the process used upon
    collision to determine an unoccupied location in
    the hash table where the colliding key may be
    inserted.
  • In searching for a key, the same hash function
    and collision resolution scheme must be used as
    for its insertion.

45
12.6 Summary
  • A good hash function must be O(1) time and must
    distribute entries uniformly over the hash table.
  • Open addressing relocates a colliding entry in
    the hash table itself. Closed addressing stores
    all entries that hash to a location, in a data
    structure that hangs off that location.
  • Linear probing and quadratic probing are
    instances of open addressing, while chaining is
    an instance of closed addressing.

46
12.6 Summary
  • Linear probing leads to clustering of entries
    with the clusters becoming increasingly larger as
    more and more collisions occur. Clustering
    degrades performance significantly.
  • Quadratic probing attempts to reduce clustering.
    On the other hand, quadratic probing may leave as
    many as half the hash table empty while reporting
    failure to insert a new entry.

47
12.6 Summary
  • Chaining is the simplest way to resolve
    collisions and also results in better performance
    than linear probing or quadratic probing.
  • The worst-case search time for linear probing,
    quadratic probing, and chaining is O(n).
  • The load factor of a hash table is the ratio of
    the number of keys, n, to the capacity, N.

48
12.6 Summary
  • The average performance of chaining depends on
    the load factor. For a perfect hash function that
    always distributes keys uniformly, the average
    search time for chaining is O(1).
Write a Comment
User Comments (0)
About PowerShow.com