Title: Hash Table
1Hash Table
SoongSil Univ.
MultiMedia Lab.
Baek Hae Jung
2Contents
- 1. Direct-address table
- 2. Hash tables
- 3. Hash functions
- Division method
- Multiplication method
- Universal hashings
- 4. Open addressing
- Linear probing
- Quadratic probing
- Double hashing
3Direct-address table
T
0 1 2 3 4 5 6 7 8 9
/
U (universe of keys)
Key
/
2
3
/
K (actual keys)
5
/
/
8
/
slot
Key K1
Slot K1
4Direct-address table
- Direct-address table Tm
- The set of actual key determines the slots in the
table that contain pointers to elements - The other slots contain NIL(/)
Each key in the universe corresponds to an index
in the table
T
0 1 2 3 4 5 6 7 8 9
/
Key
/
2
3
/
5
/
/
8
/
slot
5Operations
Direct-address-search(T, k) Return Tk
T
0 1 2 3 4 5 6 7 8 9
/
Key
/
2
Direct-address-insert(T, x) Tkeyx x
3
/
5
/
/
8
/
Direct-address-delete(T, x) Tkeyx NIL
Time Complexity O(1)
6Hashing
T
U (universe of keys)
/
H(k1)
H(k4)
K (actual keys)
/
H(k2)H(k5)
/
/
H(k3)
/
Slot
Key K1
Function H
Slot H(K1)
7Basic Idea
Key K1
Slot K1
Direct addressing
Key K1
Function h
Slot h(K1)
Hashing
An Element with key k hashes to slot h(k)
H(k) is the hash value of key K
8Collision
- Collision
- Two key hash to the same slot by hash function
- U gt m
9Collision Resolution Policies
- Two classes
- (1) Open hashing, separate chaining
- (2) Closed hashing, open addressing 12.4
- Difference has to do with
- whether collisions are stored outside the table
(open hashing) - whether collisions result in storing one of the
records - at another slot in the table
(closed hashing)
10Open Hashing
- Collision resolution by chaining
- Chaining
- Put all the elements that hash to the same slot
in a linked list
11Example of Collision
12Operations
Chained-hash-inert(T,x) insert x at the head of
list Th(keyx)
- Chained-hash-delete(T,x)
- delete x from the list Th(keyx)
Insert/Delete Time Complexity O(1)
- Chained-hash-search(T,k)
- search for an element with key k in list
Th(k)
13Analysis of open hashing
- Load factor(?)
- The average number of elements stored in a chain.
- N elements / M slots
Ex) 3 Slots, 6 Elements
/ /
k1
/
k5
k3
/
k4
k6
k5
k3
/
k4
k1
k2
k2
/
k6
Search Time Complexity ?(n) worst-case
14Simple uniform hashing
- Average Performance of hashing depends on
- How well the hash function h distributes the set
of keys 12.3 - Assumption of Simple uniform hashing
- Any given element is equally likely to hash into
any of the m slots, - independently of where any other element has
hashed to. - Insertion/Delete/Search Time Complexity
- O(1)
15Analysis of hashing with chaining
In a hash table in which collisions are resolved
by chaining, Unsuccessful search takes time (
?(1?) ), on the average.
16Analysis of hashing with chaining
In a hash table in which collisions are resolved
by chaining, Successful search takes time (
?(1?) ), on the average.
17Hash functions
- A Good hash function
- Avoids collisions.
- Minimize the chance that such variants hash to
the same slot - Tends to spread keys evenly in the array.
- Satisfies the assumption of simple uniform
hashing - Is easy to compute.
- Probability distribution P
for j 0, 1, , m-1.
18Interpreting keys as natural number
- Most hash functions
- The universe of keys
- 0, 1, 2, of natural number
Key 30
Slot
30
Key 14452(pt)
Function h
Slot h(K1)
Pt
P112 128 T116 1 gt 14452
gt sums the ASCII values of the letters in the
string
19Three schemes for Hash function
- Division method
- H(k) k mod m
- Multiplication method
- H(k) ?m(k A mod 1)?
- Universal hashing
- Choose the hash function randomly
20Division method
- Hash function
- h(k) k mod m
- ex) k 123, m 15
- gt h(123) 123 mod 15 3
- Certain Value of m should not be used
- m is even
- m is a power of 2
- m is decimal numbers
- m 2P -1 and k is character
- ex) abcd 97 . 83 98 . 82 99 .
8 100 56828 mod 7 2 - badc 98 . 83 97 . 82
100 . 8 99 57283 mod 7 2 - cf) Good Value of m are primes
- gt primes not too close to exact powers of 2
21Multiplication method
- Hash Function
- H(k) ?m(kA mod 1)?
- Two steps in the multiplication method
- 1. The key k is multiplied by a constant A in
the range 0 lt A lt 1 - and the fractional part of kA extracted.
- 2. This fractional part is multiplied by m and
the floor taken. - Ex)
22Analysis of Universal hashing
If h is chosen from a universal collection of
hash functions and is used to hash n keys into a
table of size m, where n ?m, The expected number
of collisions involving a particular key x is
less than 1.
n-1 / m
23Analysis of Universal hashing
The Class H defined by equations (12.3) and
(12.4) is a universal class of hash functions.
24Open addressing
- Collision Resolution Policy
- All elements are stored in hash table itself
- Each table entry contains
- either an element of the dynamic set or NIL
- Hash table fill up so that
- no further insertions can be made
- Strength
- Save Memory
- Fewer collisions
- Faster retrieval
T
/
0
/
1
69
2
98
3
/
4
72
5
14
6
25Probe
T
- Probe
- Hash table until we find an empty slot in which
to put the key - The sequence of positions probed depends upon the
key being inserted
/
0
/
1
69
2
98
3
/
4
72
5
14
6
26Operations
- Hash-Insert(T, k)
- I 0
- Repeat j h(k, I)
- if Tj NIL
- then Tj k
- return j
- else
- I I I
- Until I m
- Error hash table overflow
Hash-Search(T, k) I 0 Repeat j
h(k, I) if Tj k then return
j I I I Utile TjNIL or I
m Return NIL
/
0
/
1
69
2
98
3
/
4
72
5
14
6
27Operations
- Delete
- Using DELETED Value
Ex2) 98 Delete -gt 100 Search
Ex1) 98 Delete -gt 100 Search
/
0
/
0
/
1
/
1
69
2
69
2
Deleted 98
3
/
3
100
4
100
4
72
5
72
5
14
6
14
6
28Three Techniques of probing
29Linear probing
- D8, keys a,b,c,d have hash values h(a)3,
h(b)0, h(c)4, h(d)3
b
0
- Where do we insert d? 3 already filled
- Probe sequence using linear hashing
- h1(d) (h(d)1)8 48 4
- h2(d) (h(d)2)8 58 5
- h3(d) (h(d)3)8 68 6
- etc.
- 7, 0, 1, 2
- Wraps around the beginning of the table!
1
2
3
a
c
4
d
5
6
7
30Quadratic probing
- Quadratic probing uses a hash function of the
form - h(k, p) (h(k) c1p c2p2) mod m
- Of course, the values of c1, c2 and m determine
whether or not the entire table will be used.
31Analysis of open-address hashing
Given and open-address hash table with load
factor a n/m lt1, The expected number of probes
in an unsuccessful search is at most 1/(1-a),
assuming uniform hashing
32Analysis of open-address hashing
Inserting and element into an open-address hash
table with load factor a requires at most 1/(1-a)
probes on average, assuming uniform hashing.
33Analysis of open-address hashing
Given an open-address hash table with load factor
a lt 1, the expected number of probes in a
successful is at most Assuming uniform hashing
and assuming that each key in the table is
equally likely to be searched for.
34Simulations
- Linear probing
- Quadratic probing
http//swww.ee.uwa.edu.au/plsd210/ds/hash_tables.
html
35Another Scheme Overflow Area
- Divide the pre-allocated table into two sections
primary area to which keys are mapped - overflow area which is an area for collisions
- Possible to design systems
- with multiple overflow tables
- which provide flexibility without losing
- the advantages of the overflow sheme.
36Summary