Hash table - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Hash table

Description:

Store the records in a linked list (sorted / unsorted) ... delete a target - delete from unsorted linked list slow. search - sequential search slow O(n) ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 24
Provided by: phi747
Category:
Tags: hash | table | unsorted

less

Transcript and Presenter's Notes

Title: Hash table


1
Hash table
  • MCCS240-111

function H(key KeyType) integer
2
A basic problem
  • We have to store some records and perform the
    following
  • add new record
  • delete record
  • search a record by key
  • Find a way to do these efficiently!

3
Unsorted array
  • Use an array to store the records, in unsorted
    order
  • add - add the records as the last entry fast O(1)
  • delete a target - slow at finding the target,
    fast at filling the hole (just take the last
    entry) O(n)
  • search - sequential search slow O(n)

4
Sorted array
  • Use an array to store the records, keeping them
    in sorted order
  • add - insert the record in proper position. much
    record movement slow O(n)
  • delete a target - how to handle the hole after
    deletion? Much record movement slow O(n)
  • search - binary search fast O(log n)

5
Linked list
  • Store the records in a linked list (sorted /
    unsorted)
  • add - fast if one can insert node anywhere O(1)
  • delete a target - fast at disposing the node, but
    slow at finding the target O(n)
  • search - sequential search slow O(n) (if we only
    use linked list, we cannot use binary search even
    if the list is sorted.)

6
What we will learn
  • We will learn two more approaches, which have
    better performance but are more complex
  • Hash table
  • Tree

7
Array as table
studid
name
score
0012345
andy
81.5
0033333
betty
90
0056789
david
56.8
...
9801010
peter
20
9802020
mary
100
...
9903030
tom
73
9908080
bill
49
Consider this problem. We want to store 1000
student records and search them by student id.
8
Array as table
studid
name
score
0
One stupid way is to store the records in a
huge array (index 0..9999999). The index is used
as the student id, i.e. the record of the student
with studid 0012345 is stored at A12345



12345
andy
81.5



33333
betty
90



56789
david
56.8






9908080
bill
49



9999999
9
Array as table
  • Store the records in a huge array where the index
    corresponds to the key
  • add - very fast O(1)
  • delete - very fast O(1)
  • search - very fast O(1)
  • But it wastes a lot of memory! Not feasible.

10
Hash function
function Hash(key KeyType) integer
Imagine that we have such a magic function Hash.
It maps the key (studid) of the 1000 records into
the integers 0..999, one to one. No two
different keys maps to the same number.
H(0012345) 134 H(0033333) 67 H(0056789)
764 H(9908080) 3
11
Hash table
studid
name
score
0
To store a record, we compute Hash(studid) for
the record and store it at the location
Hash(studid) of the array. To search for a
student, we only need to peek at the location
Hash(target studid).



3
bill
49
9908080



67
betty
90
0033333



134
andy
81.5
0012345



764
david
56.8
0056789



999



12
Hash table with Perfect Hash
  • Such magic function is called perfect hash
  • add - very fast O(1)
  • delete - very fast O(1)
  • search - very fast O(1)
  • But it is generally difficult to design perfect
    hash. (e.g. when the potential key space is large)

13
Hash function
  • A hash function maps a key to an index within in
    a range
  • Desirable properties
  • simple and quick to calculate
  • even distribution, avoid collision as much as
    possible

function Hash(key KeyType) integer
14
Hash function
const HASHSIZE 99 type HashTable array
0..HASHSIZE-1 of something function Hash (x
keytype) integer var i integer h
integer begin h 0 for i 1 to 7 do
h 4 h ord(xi) Hash h mod
HASHSIZE end
A hash function should try to mix the information
in the key and convert them to an index within
the range of the hash table (the hash address).
15
Collision
  • For most cases, we cannot avoid collision
  • Collision resolution - how to handle when two
    different keys map to the same index

H(0012345) 134 H(0033333) 67 H(0056789)
764 H(9903030) 3 H(9908080) 3
16
Operations on Hash Table
  • CreateTable(H)
  • InsertTable(H, newEntry)
  • DeleteTable(target, x, H)
  • RetrieveTable(target, H, found, targetEntry)
  • ClearTable(H)

The user of a hash table does not care how the
entries are stored inside the hash table. The
user only adds, deletes and retrieves entry using
known key.
17
Chained Hash Table
One way to handle collision is to store the
collided records in a linked list. The array now
stores pointers to such lists. If no key maps to
a certain hash value, that array entry points to
nil.
0
1
nil
2
nil
3
4
nil
5

Key 9903030 name tom score 73
HASHMAX
nil
18
Chained Hash table
  • Hash table, where collided records are stored in
    linked list
  • good hash function, appropriate hash size
  • Few collisions. Add, delete, search very fast
    O(1)
  • otherwise
  • some hash value has a long list of collided
    records..
  • add - just insert at the head fast O(1)
  • delete a target - delete from unsorted linked
    list slow
  • search - sequential search slow O(n)

19
Implementation
The Hash table unit reuses the singly linked list
we learnt earlier. TableEntry ListEntry is the
record we store in the Hash table.
unit HashTab interface uses SList const
HASHSIZE 997 HASHMAX HASHSIZE-1 type
TableEntry ListEntry HashAddress
0..HASHMAX HashTable array HashAddress of
List
HASHSIZE should be odd number, and even better, a
prime number. This helps to reduce collision.
20
Create Table
procedure CreateTable (var H HashTable) var i
integer begin for i 0 to HASHMAX do
CreateList(Hi) end
21
Insert Table
procedure InsertTable (var H HashTable
newEntry TableEntry) Pre ... H contains
no curent entry with key equal to that of
newEntry. begin InsertList(1, newEntry,
HHash(newEntry.key)) end
22
Delete Table
procedure DeleteTable (target KeyType var x
TableEntry var H HashTable) var pos
Position position of the target in the list
haddr integer hash address of the target
found boolean begin haddr
Hash(target) SequentialSearch(Hhaddr,
target, found, pos) if found then
DeleteList(pos, x, Hhaddr) else Error(
) end
23
Retrieve Table
procedure RetrieveTable (target KeyType var
H HashTable var found boolean var
targetEntry TableEntry) var L List pos
Position begin L HHash(target)
SequentialSearch(L, target, found, pos) if
found then RetrieveList(pos, targetEntry,
L) end
This is sequential search implementation on
linked list.
Write a Comment
User Comments (0)
About PowerShow.com