Title: Hash Tables
1Hash Tables
- a hash table is an array of size Tsize
- has index positions 0 .. Tsize-1
- two types of hash tables
- open hash table
- array element type is a ltkey, valuegt pair
- all items stored in the array
- chained hash table
- element type is a pointer to a linked list of
nodes containing ltkey, valuegt pairs - items are stored in the linked list nodes
- keys are used to generate an array index
- home address (0 .. Tsize-1)
2faster searching
- "balanced" search trees guarantee O(log2 n)
search path by controlling height of the search
tree - AVL tree
- 2-3-4 tree
- red-black tree (used by STL associative container
classes) - hash table allows for O(1) search performance
- search time does not increase as n increases
3Considerations
- How big an array?
- load factor of a hash table is n/Tsize
- Hash function to use?
- int hash(KeyType key) // 0 .. Tsize-1
- Collision resolution strategy?
- hash function is many-to-one
4Hash Function
- a hash function is used to map a key to an array
index (home address) - search starts from here
- insert, retrieve, update, delete all start by
applying the hash function to the key
5Some hash functions
- if KeyType is int - key TSize
- if KeyType is a string - convert to an integer
and then Tsize - goals for a hash function
- fast to compute
- even distribution
- cannot guarantee no collisions unless all key
values are known in advance
6An Open Hash Table
Hash (key) produces an index in the range 0 to
6. That index is the home address
0 1 2 3 4 5 6
K3 K3info
K1 K1info
Some insertions K1 --gt 3 K2 --gt 5 K3 --gt 2
K2 K2info
key value
7Handling Collisions
0 1 2 3 4 5 6
K6 K6info
Some more insertions K4 --gt 3 K5 --gt 2 K6 --gt 4
K3 K3info
K1 K1info
K4 K4info
K2 K2info
Linear probing collision resolution strategy
K5 K5info
8Search Performance
Average number of probes needed to retrieve the
value with key K?
0 1 2 3 4 5 6
K6 K6info
K hash(K) probes K1 3
1 K2 5 1 K3 2
1 K4 3
2 K5 2 5 K6 4
4
K3 K3info
K1 K1info
K4 K4info
K2 K2info
14/6 2.33 (successful)
K5 K5info
unsuccessful search?
9A Chained Hash Table
0 1 2 3 4 5 6
insert keys K1 --gt 3 K2 --gt 5 K3 --gt 2 K4 --gt
3 K5 --gt 2 K6 --gt 4
10Search Performance
Average number of probes needed to retrieve the
value with key K?
K hash(K) probes K1 3
1 K2 5 1 K3 2
1 K4 3
2 K5 2 2 K6 4
1
8/6 1.33 (successful)
unsuccessful search?
11successful search performance
open addressing open addressing chaining
(linear probing) (double
hashing) load factor 0.5 1.50 1.39
1.25 0.7 2.17 1.72 1.35
0.9 5.50 2.56 1.45
1.0 ---- ---- 1.50
2.0 ---- ---- 2.00
12Factors affecting Search Performance
- quality of hash function
- how uniform?
- depends on actual data
- collision resolution strategy used
- load factor of the HashTable
- N/Tsize
- the lower the load factor the better the search
performance
13Traversal
- Visit each item in the hash table
- Open hash table
- O(Tsize) to visit all n items
- Tsize is larger than n
- Chained hash table
- O(Tsize n) to visit all n items
- Items are not visited in order of key value
14Deletions?
- search for item to be deleted
- chained hash table
- find node and delete it
- open hash table
- must mark vacated spot as deleted
- is different than never used
15Hash Table Summary
- search speed depends on load factor and quality
of hash function - should be less than .75 for open addressing
- can be more than 1 for chaining
- items not kept sorted by key
- very good for fast access to unordered data with
known upper bound - to pick a good TSize
16heap
- is a binary tree that
- is complete (but not necessarily FULL),
- except LAST level
- has the heap-order property
- max heap - item stored in each node has a
key/priority that is gt the priority of the items
stored in each of its children - min heap - item stored in each node has a
key/priority that is lt the priority of the items
stored in each of its children - efficient data structure for PriorityQueue ADT
- requires the ability to compare items based on
their priorities - basis for the heapsort algorithm
17two heaps
A heap is always a complete binary tree
18a complete binary tree can be stored in an array
for the item in Ai leftChild is in
A2i1 rightChild is in A2i2 parent
is in A(i-1)/2
19PriorityQueue ADT
- Data Items
- a collection of items which can be ordered by
priority - Operations
- constructor - creates an empty PQ
- empty () - returns true iff a PQ is empty
- size () - returns the number of items in a PQ
- push (item) - adds an item to a PQ
- top () - returns the item in a PQ with the
highest priority - pop () removes the item with the highest
priority from a PQ
20PQ Data structures
- unordered array or linked list
- push is O(1)
- top and pop are (n)
- ordered array or linked list
- push is O(n)
- top and pop are (1)
- heap
- top is O(1)
- push and pop are O(log2 n)
- STL has a priority_queue class
- is implemented using a heap
21PQ operations
- top
- return item at A0
- push and pop must maintain heap-order property
- push
- put new item at end (in Asize)
- re-establish the heap-order property by moving
the new item to where it belongs - pop
- A0 is item to delete
- swap A0 and Asize-1
- move item at A0 down a path to where it belongs
22pop( )
23Balanced Search Trees
- several varieties (Ch.13)
- AVL trees
- 2-3-4 trees
- Red-Black trees
- B-Trees (used for searching secondary memory)
- nodes are added and deleted so that the height of
the tree is kept under control - insert and delete take more work, but retrieval
(also insert delete) never more than log2 n
because height is controlled
24AVL Trees
- a binary search tree in which each node has a
balance factor - the balance factor of a node is the height of its
left subtree minus the height of its right
subtree - balance factor of a leaf node is 0
- insertions or deletions change the balance factor
of one or more node - if a balance factor becomes 2 or -2 the AVL tree
is rebalanced - done by rotating nodes
25Some AVL Trees
-1
-1
1
0
-1
0
0
Balance at a node is height(left subtree) -
height(right subtree)
26Inserting an item
- follow a search path as for a BST
- allocate a node and insert the item at the end of
the path (as for BST) - balance factor of new node is 0
- as recursion unwinds update the balance factors
- if a balance factor becomes 2 or -2 perform a
rotation to bring the AVL tree back into balance
27An Insertion(the numbers are balance factors)
28Another Insertion
0
29Another Insertion
0
30Another Insertion
-1
0
1
0
-1
0
0
0
0
31The right rotation
32The left rotation
-2
-2
1
0
0
0
-1
1
0
0
33AVL Trees
- oldest form of balanced search tree
- maximum height is 1.4 log2 N
- insert, delete and retrieve always O(log2 N)
- rebalancing needed for about 45 of the
insertions - about half of the rebalancings require double
rotations
342-3-4 Tree
- uses larger nodes
- a node has fields for 3 items and 4 nodePointers
- 2-3-4 tree increases in height from the top, not
the bottom - all leaf nodes are on the same level
- insertion and deletion simpler than AVL tree but
space is wasted - 3/4 of the nodePointers are NULL
- some nodes hold only 1 or 2 items
35Inserting 35, 12, 68, 22
36Red-Black Tree
- implementation of a 2-3-4 tree which does not
require space which is unused - nodes are like those for a BST with the addition
of a color field (red/black) - search and traverse ignore the node color
- insert and delete use color to determine when a
rotation is needed to keep the tree balanced - tree height guaranteed to be O(log2 N)
- the underlying data structure for the STL's
associative containers