Hash Tables - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Hash Tables

Description:

cannot guarantee no collisions unless all key values are known in advance. 6. An ... collision resolution strategy used. load factor of the HashTable. N/Tsize ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 37
Provided by: wat88
Category:
Tags: collision | hash | tables

less

Transcript and Presenter's Notes

Title: Hash Tables


1
Hash Tables
  • a hash table is an array of size Tsize
  • has index positions 0 .. Tsize-1
  • two types of hash tables
  • open hash table
  • array element type is a ltkey, valuegt pair
  • all items stored in the array
  • chained hash table
  • element type is a pointer to a linked list of
    nodes containing ltkey, valuegt pairs
  • items are stored in the linked list nodes
  • keys are used to generate an array index
  • home address (0 .. Tsize-1)

2
faster searching
  • "balanced" search trees guarantee O(log2 n)
    search path by controlling height of the search
    tree
  • AVL tree
  • 2-3-4 tree
  • red-black tree (used by STL associative container
    classes)
  • hash table allows for O(1) search performance
  • search time does not increase as n increases

3
Considerations
  • How big an array?
  • load factor of a hash table is n/Tsize
  • Hash function to use?
  • int hash(KeyType key) // 0 .. Tsize-1
  • Collision resolution strategy?
  • hash function is many-to-one

4
Hash Function
  • a hash function is used to map a key to an array
    index (home address)
  • search starts from here
  • insert, retrieve, update, delete all start by
    applying the hash function to the key

5
Some hash functions
  • if KeyType is int - key TSize
  • if KeyType is a string - convert to an integer
    and then Tsize
  • goals for a hash function
  • fast to compute
  • even distribution
  • cannot guarantee no collisions unless all key
    values are known in advance

6
An Open Hash Table
Hash (key) produces an index in the range 0 to
6. That index is the home address
0 1 2 3 4 5 6
K3 K3info
K1 K1info
Some insertions K1 --gt 3 K2 --gt 5 K3 --gt 2
K2 K2info
key value
7
Handling Collisions
0 1 2 3 4 5 6
K6 K6info
Some more insertions K4 --gt 3 K5 --gt 2 K6 --gt 4
K3 K3info
K1 K1info
K4 K4info
K2 K2info
Linear probing collision resolution strategy
K5 K5info
8
Search Performance
Average number of probes needed to retrieve the
value with key K?
0 1 2 3 4 5 6
K6 K6info
K hash(K) probes K1 3
1 K2 5 1 K3 2
1 K4 3
2 K5 2 5 K6 4
4
K3 K3info
K1 K1info
K4 K4info
K2 K2info
14/6 2.33 (successful)
K5 K5info
unsuccessful search?
9
A Chained Hash Table
0 1 2 3 4 5 6
insert keys K1 --gt 3 K2 --gt 5 K3 --gt 2 K4 --gt
3 K5 --gt 2 K6 --gt 4
10
Search Performance
Average number of probes needed to retrieve the
value with key K?
K hash(K) probes K1 3
1 K2 5 1 K3 2
1 K4 3
2 K5 2 2 K6 4
1
8/6 1.33 (successful)
unsuccessful search?
11
successful search performance
open addressing open addressing chaining
(linear probing) (double
hashing) load factor 0.5 1.50 1.39
1.25 0.7 2.17 1.72 1.35
0.9 5.50 2.56 1.45
1.0 ---- ---- 1.50
2.0 ---- ---- 2.00
12
Factors affecting Search Performance
  • quality of hash function
  • how uniform?
  • depends on actual data
  • collision resolution strategy used
  • load factor of the HashTable
  • N/Tsize
  • the lower the load factor the better the search
    performance

13
Traversal
  • Visit each item in the hash table
  • Open hash table
  • O(Tsize) to visit all n items
  • Tsize is larger than n
  • Chained hash table
  • O(Tsize n) to visit all n items
  • Items are not visited in order of key value

14
Deletions?
  • search for item to be deleted
  • chained hash table
  • find node and delete it
  • open hash table
  • must mark vacated spot as deleted
  • is different than never used

15
Hash Table Summary
  • search speed depends on load factor and quality
    of hash function
  • should be less than .75 for open addressing
  • can be more than 1 for chaining
  • items not kept sorted by key
  • very good for fast access to unordered data with
    known upper bound
  • to pick a good TSize

16
heap
  • is a binary tree that
  • is complete (but not necessarily FULL),
  • except LAST level
  • has the heap-order property
  • max heap - item stored in each node has a
    key/priority that is gt the priority of the items
    stored in each of its children
  • min heap - item stored in each node has a
    key/priority that is lt the priority of the items
    stored in each of its children
  • efficient data structure for PriorityQueue ADT
  • requires the ability to compare items based on
    their priorities
  • basis for the heapsort algorithm

17
two heaps
A heap is always a complete binary tree
18
a complete binary tree can be stored in an array
for the item in Ai leftChild is in
A2i1 rightChild is in A2i2 parent
is in A(i-1)/2
19
PriorityQueue ADT
  • Data Items
  • a collection of items which can be ordered by
    priority
  • Operations
  • constructor - creates an empty PQ
  • empty () - returns true iff a PQ is empty
  • size () - returns the number of items in a PQ
  • push (item) - adds an item to a PQ
  • top () - returns the item in a PQ with the
    highest priority
  • pop () removes the item with the highest
    priority from a PQ

20
PQ Data structures
  • unordered array or linked list
  • push is O(1)
  • top and pop are (n)
  • ordered array or linked list
  • push is O(n)
  • top and pop are (1)
  • heap
  • top is O(1)
  • push and pop are O(log2 n)
  • STL has a priority_queue class
  • is implemented using a heap

21
PQ operations
  • top
  • return item at A0
  • push and pop must maintain heap-order property
  • push
  • put new item at end (in Asize)
  • re-establish the heap-order property by moving
    the new item to where it belongs
  • pop
  • A0 is item to delete
  • swap A0 and Asize-1
  • move item at A0 down a path to where it belongs

22
pop( )
23
Balanced Search Trees
  • several varieties (Ch.13)
  • AVL trees
  • 2-3-4 trees
  • Red-Black trees
  • B-Trees (used for searching secondary memory)
  • nodes are added and deleted so that the height of
    the tree is kept under control
  • insert and delete take more work, but retrieval
    (also insert delete) never more than log2 n
    because height is controlled

24
AVL Trees
  • a binary search tree in which each node has a
    balance factor
  • the balance factor of a node is the height of its
    left subtree minus the height of its right
    subtree
  • balance factor of a leaf node is 0
  • insertions or deletions change the balance factor
    of one or more node
  • if a balance factor becomes 2 or -2 the AVL tree
    is rebalanced
  • done by rotating nodes

25
Some AVL Trees
-1
-1
1
0
-1
0
0
Balance at a node is height(left subtree) -
height(right subtree)
26
Inserting an item
  • follow a search path as for a BST
  • allocate a node and insert the item at the end of
    the path (as for BST)
  • balance factor of new node is 0
  • as recursion unwinds update the balance factors
  • if a balance factor becomes 2 or -2 perform a
    rotation to bring the AVL tree back into balance

27
An Insertion(the numbers are balance factors)
28
Another Insertion
0
29
Another Insertion
0
30
Another Insertion
-1
0
1
0
-1
0
0
0
0
31
The right rotation
32
The left rotation
-2
-2
1
0
0
0
-1
1
0
0
33
AVL Trees
  • oldest form of balanced search tree
  • maximum height is 1.4 log2 N
  • insert, delete and retrieve always O(log2 N)
  • rebalancing needed for about 45 of the
    insertions
  • about half of the rebalancings require double
    rotations

34
2-3-4 Tree
  • uses larger nodes
  • a node has fields for 3 items and 4 nodePointers
  • 2-3-4 tree increases in height from the top, not
    the bottom
  • all leaf nodes are on the same level
  • insertion and deletion simpler than AVL tree but
    space is wasted
  • 3/4 of the nodePointers are NULL
  • some nodes hold only 1 or 2 items

35
Inserting 35, 12, 68, 22
36
Red-Black Tree
  • implementation of a 2-3-4 tree which does not
    require space which is unused
  • nodes are like those for a BST with the addition
    of a color field (red/black)
  • search and traverse ignore the node color
  • insert and delete use color to determine when a
    rotation is needed to keep the tree balanced
  • tree height guaranteed to be O(log2 N)
  • the underlying data structure for the STL's
    associative containers
Write a Comment
User Comments (0)
About PowerShow.com