Hashing - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Hashing

Description:

Hashing The process of mapping a key value to a position in a table. A hash function maps key values to positions. A hash table is an array that holds the records. – PowerPoint PPT presentation

Number of Views:149
Avg rating:3.0/5.0
Slides: 15
Provided by: WillT152
Category:
Tags: hashing

less

Transcript and Presenter's Notes

Title: Hashing


1
Hashing
  • The process of mapping a key value to a position
    in a table.
  • A hash function maps key values to positions.
  • A hash table is an array that holds the records.
  • The hash table has M slots (0M-1)
  • For any value K in the key range and some hash
    function h,
  • h(k) I where 0 IltM, and key(TI)K

2
Hashing Situations
  • Hashing is appropriate for unique keys.
  • Good for both in-memory and disk based
    applications.
  • Answers the question What record, if any, has
    key value K?
  • Example Store the n records with keys in range
    0-(n-1).
  • Store the record with key i in slot i.
  • Uses the hash function h(k)k. (Identity
    function).

3
Collisions
  • More reasonable example
  • Store about 1000 records with keys in the range
    0-16,383.
  • Impractical to keep a table of size 16,384.
  • We need a hash function to map keys to a smaller
    range.
  • Given a hash function h and different keys k1 and
    k2. Let ? be a position in the hash table.
  • If h(k1 ) h(k2 ) ? then k1 and k2 have a
    collision at ? under h.

4
Collision Resolution
  • To search for the record with key K
  • Compute the table location h(K).
  • Starting with slot h(K), locate the record
    containing key K using (if necessary) a collision
    resolution policy.
  • Collisions are inevitable in most applications.
  • Example In a group of 23 people the odds are
    good that at least one pair share a birthday.

5
Hash Functions
  • Must return a value within the table range.
  • Should evenly distribute the records to be stored
    among the table slots.
  • Ideally, the function should distribute records
    with equal probability to all the positions. In
    reality, usually depends on the data.
  • If we know nothing about the key distribution,
    evenly distribute the key range among the
    positions.
  • If we know about the key distribution, use a
    distribution dependant hash function.

6
Example Hash Functions
  • h(key)key 16 - uses only last 4 bits.
  • H(key)key 1000 - uses last 4 digits.
  • Use tablesize to make sure result is in the
    range.
  • Mid-square method square the key and take the
    middle r bits for a table of size 2r
  • Sum up ASCII characters and take results modulo
    tablesize (a folding technique).

7
Collision Handling Categories
  • Open hashing - when there is a collision, put
    collided item outside the table.
  • Closed hashing - when there is a collision, put
    collided item inside the table.

8
Open Hashing
  • Look at each table element as the head of a
    linked list of items that has to that position.
  • Can organize the linked lists in many ways
  • ordered unsuccessful searches are quickly
    found.
  • Ordered by frequency if a few are searched for
    frequently, then this is a good technique.
  • If there are N records to be stored and the table
    is of size M then the average search length is
    O(N/M).
  • Good for internal memory. Linked nodes may be in
    different blocks on disk and cause many disk
    accesses.

9
Closed Hashing - Linear Probe
  • If the item you are looking for is not in the
    hash position, look in the next position.
  • Do the same for insert until you find an empty
    location.
  • When you reach the bottom, go to the beginning.
  • Must have at least one empty slot or there will
    be an infinite loop.
  • Tends to have clustering since the collision
    position is not uniformly distributed (i.e. if
    collide at position 4, go to position 5, then 6,
    independent of key).

10
Better Linear Probe
  • Instead of going to the next slot, skip by some
    constant c.
  • The tablesize M and c should be relatively prime.
  • This assures the probing will cycle through all
    the table.
  • Still has some clustering.

11
Quadratic Probe
  • Instead of adding 1 to the key add i2
  • i is the probe sequence, so add 1, 4, 9, 16,...
  • Remember we also mod with table size.

12
Double Hashing
  • After a collision, use a different hash function.
  • Eliminates clustering to some degree.
  • For example if h(k) causes a collision then use
  • p(k,i) ih2(k)
  • h2 is a different hash function
  • generates a different probe sequence

13
Analysis of Closed Hashing
  • load factor lfN/M
  • N is the number of records
  • M is the size of the table
  • N/M is the percent full
  • The larger the load factor the greater the
    probability of a collision
  • Average search length is O(1/(1-lf))

14
Deletions
  • If we delete a value it may stop the search
    prematurely (break the chain).
  • Use a special mark to indicate something was
    deleted. When searching continue if see this
    mark rather than stopping as if it was empty.
  • Once we have many deleted items we may wish to
    rehash everything remaining
  • best if we rehash the most frequently accessed
    items first.
Write a Comment
User Comments (0)
About PowerShow.com