Title: Dictionaries, Tables Hashing
1Dictionaries, Tables Hashing
2The Dictionary ADT
- a dictionary (table) is an abstract model of a
database - like a priority queue, a dictionary stores
key-element pairs - the main operation supported by a dictionary is
searching by key
3Examples
- Telephone directory
- Library catalogue
- Books in print key ISBN
- FAT (File Allocation Table)
4Main Issues
- Size
- Operations search, insert, delete, ??? Create
reports??? List? - What will be stored in the dictionary?
- How will be items identified?
5The Dictionary ADT
- simple container methods
- size()
- isEmpty()
- elements()
- query methods
- findElement(k)
- findAllElements(k)
6The Dictionary ADT
- update methods
- insertItem(k, e)
- removeElement(k)
- removeAllElements(k)
- special element
- NO_SUCH_KEY, returned by an unsuccessful search
7Implementing a Dictionary with a Sequence
- unordered sequence
- searching and removing takes O(n) time
- inserting takes O(1) time
- applications to log files (frequent insertions,
rare searches and removals) 34 14 12 22 18
34
14
12
22
18
8Implementing a Dictionary with a Sequence
- array-based ordered sequence (assumes keys can
be ordered)- searching takes O(log n) time
(binary search)- inserting and removing takes
O(n) time- application to look-up tables
(frequent searches, rare insertions and removals)
12
14
18
22
34
9Binary Search
- narrow down the search range in stages
- high-low game
- findElement(22)
2
4
5
7
8
9
12
14
17
19
22
25
27
28
33
37
14
low
mid
high
10Binary Search
2
4
5
7
8
9
12
14
17
19
22
25
27
28
33
37
25
low
mid
high
2
4
5
7
8
9
12
14
17
19
22
25
27
28
33
37
19
low
mid
high
2
4
5
7
8
9
12
14
17
19
22
25
27
28
33
37
22
low mid high
11Pseudocode for Binary SearchAlgorithm
- BinarySearch(S, k, low, high)if low high then
return NO_SUCH_KEYelse mid (lowhigh) /
2if k key(mid) then return key(mid)else
if k k, low, mid-1)else return BinarySearch(S,
k, mid1, high)
12Running Time of Binary Search
- The range of candidate items to be searched is
halved after each comparison
13Running Time of Binary Search
- In the array-based implementation, access by rank
takes O(1) time, thus binary search runs in O(log
n) time - Binary Search is applicable only to Random Access
structures (Arrays, Vectors)
14Implementations
- Sorted? Non Sorted?
- Elementary Arrays, vectors linked lists
- Orgainization None (log file), Sorted, Hashed
- Advanced balanced trees
15Skip Lists
- Simulate Binary Search on a linked list.
- Linked list allows easy insertion and deletion.
- http//www.epaperpress.com/s_man.html
16A FAT Example
- Directory Key file name. Data (time, date,
size ) location of first block in the FAT table. - If first block is in physical location 23 (Disk
block number) look up position 23 in the FAT.
Either shows end of file or has the block number
on disk. - Example Directory entry block 4
- FAT x x x F 5 6 10 x 23 25
- 3
- The file occupies blocks 4,5,6,10, 3.
17Hashing
- Place item with key k in position h(k).
- Hope h(k) is 1-1.
- Requires unique key (unless multiple items
allowed). Key must be protected from change (use
abstract class that provides only a constructor). - Keys must be comparable.
18Key class
- public abstract class KeyID
- Private Comparable searchKey
- Public KeyID(Comparable m)
- searchKey m
- //Only one constructor
- public Comparable getSearchKey()
- return searchKey
-
19Hashing Problem
- RTT is a large phone company, and they want to
provide enhanced caller ID capability - given a phone number, return the callers name
- phone numbers are in the range 0 to R 10101
- n is the number of phone numbers used
- want to do this as efficiently as possible
20Hashing Problem
- We know two ways to design this dictionary
- abalanced search tree (AVL, red-black) or a
skip-list with the phone number as the key has
O(log n) query time and O(n) space --- good space
usage and search time, but can we reduce the
search time to constant? - abucket array indexed by the phone number has
optimal O(1) query time, but there is a huge
amount of wasted space O(n R)
21Bucket Array
- Each cell is thought of as a bucket or a
container - Holds key element pairs
- In array A of size N, an element e with key k is
inserted in Ak.
(null)
(null)
Roberto
(null)
000-000-0000 000-000-0001
401-863-7639 ... 999-999-9999
22Generalized indexing
- Hash table
- Data storage associated with a key
- The key need not be an integer
23Hash Tables
- A data structure
- The location of an item is determined
- directly as a function of the item itself
- Not by a sequence of trial and error comparisons
- Commonly used to provide faster searching
- O(n) for linear searches
- O (logn) for binary search
- O(1) for hash table
24Example
- A symbol table constructed by a compiler
- Stores identifiers and information about them
25Another Solution
- A Hash Table is an alternative solution with O(1)
expected query time and O(n N) space, where N
is the size of the table - Like an array, but with a function to map the
large range of keys into a smaller one - e.g., take the original key, mod the size of the
table, and use that as an index
26Example
- Insert item (401-863-7639, Roberto) into a table
ofsize 5 - 4018637639 mod 5 4, so item (401-863-7639,
Roberto) is stored in slot 4 of the table - A lookup uses the same process map the key to an
index, then check the array cell at that index
401- 863-7639 Roberto
0 1 2 3
4
27Collision
- Insert (401-863-9350, Andy)
- And insert (401-863-2234, Devin). We have a
collision!
28Collision Resolution
- How to deal with two keys which map to the same
cell of the array? - Use chaining
- Set up lists of items with the same index
29Chaining
0 1 2 3 4
30Chaining
- The expected, search/insertion/removal time is
O(n/N), provided the indices are uniformly
distributed - The performance of the data structure can be
fine-tuned by changing the table size N
31Hash Function
- Function h defined by h(i) i
- Determines the location of an item i in the hash
table - Called a hash function.
- To reduce the large size of a hash table use
- h(i) i mod 25
32From Keys to Indices
- The mapping of keys to indices of a hash table is
called a hash function - A hash function is usually the composition of two
maps - hash code map key ? integer
- compression map integer ? 0, N - 1
- An essential requirement of the hash function is
tomap equal keys to equal indices - A good hash function minimizes the probability
of collisions
33Java Hash
- Java provides a hashCode() method for the Object
class, which typically returns the 32-bit memory
address of the object. - This default hash code would work poorly for
Integer and String objects - The hashCode() method should be suitably
redefined by classes.
34Popular Hash-Code Maps
- Integer cast for numeric types with 32 bits or
less, we can reinterpret the bits of the number
as an int - Component sum for numeric types with more than
32 bits (e.g., long and double), we can add the
32-bit components.
35Popular Hash-Code Maps
- Polynomial accumulation for strings of a natural
language, combine the character values (ASCII or
Unicode) a 0 a 1 ... a n-1 by viewing them as the
coefficients of a polynomial a 0 a 1 x ...
x n-1 a n-1
36Popular Hash-Code Maps
- The polynomial is computed with Horners rule,
ignoring overflows, at a fixed value xa0 x
(a1 x (a2 ... x (an-2 x an-1 ) ... )) - The choice x 33, 37, 39, or 41 gives at most 6
collisions on a vocabulary of 50,000 English
words - Why is the component-sum hash code bad for
strings?
37Random Hashing
- Random hashing
- Uses a simple random number generation technique
- Scatters the items randomly throughout the hash
table
38Popular Compression Maps
- Division h(k) k mod N
- the choice N 2 k is bad because not all the bits
aretaken into account - the table size N is usually chosen as a
primenumber - certain patterns in the hash codes are propagated
- Multiply, Add, and Divide (MAD)
- h(k) ak b mod N
- eliminates patterns provided a mod N ยน 0
- same formula used in linear congruential
(pseudo)random number generators
39More on Collisions
- A key is mapped to an already occupied table
location - what to do?!?
- Use a collision handling technique
- Weve seen Chaining
- Can also use Open Addressing
- Double Hashing
- Linear Probing
40Linear Probing
- If the current location is used, try the next
table location - linear_probing_insert(K)if (table is full)
errorprobe h(K)while (tableprobe
occupied)probe (probe 1) mod Mtableprobe
K
41Linear Probing
- Lookups walk along table until the key or an
empty slot is found - Uses less memory than chaining
- dont have to store all those links
- Slower than chaining
- may have to walk along table for a long way
- Deletion is more complex
- either mark the deleted slot
- or fill in the slot by shifting some elements down
42Linear Probing Example
- h(k) k mod 13
- Insert keys
- 18 41 22 44 59 32 31 73
0 1 2 3 4 5 6 7
8 9 10 11 12
41
18
44
59
32
22
31
72
0 1 2 3 4 5 6 7
8 9 10 11 12
43Double Hashing
- Use two hash functions
- If M is prime, eventually will examine every
position in the table - double_hash_insert(K)if(table is full)
errorprobe h1(K)offset h2(K)while
(tableprobe occupied) probe (probe
offset) mod Mtableprobe K
44Double Hashing
- Many of same (dis)advantages as linear probing
- Distributes keys more uniformly than linear
probing does
45Double Hashing Example
- h1(K) K mod 13
- h2(K) 8 - K mod 8
- we want h2 to be an offset to add
- 18 41 22 44 59 32 31 73
0 1 2 3 4 5 6 7
8 9 10 11 12
44
41
73
18
32
53
31
22
0 1 2 3 4 5 6 7
8 9 10 11 12
46Hash code
- static int hashCode(long i)
- return (int)((i 32) (int) i)
47Hash code
- static int hashCode(String s) int h0 for
(int i0 i(h 27) // 5-bit cyclic shift of the running
sum h (int) s.charAt(i) // add in next
character return h
48Linear Probing Hash Table
- public class LinearProbingHashTable implements
Dictionary - / Marker for deactivated buckets /
- private static Item AVAILABLE new Item(null,
null) - / number of items in the dictionary /
- private int n 0
- / capacity of the bucket array /
- private int N
- / bucket array /
- private Item A
49Linear Probing Hash Table
- / hash comparator /
- private HashComparator h
- / constructor providing the hash comparator /
- public LinearProbingHashTable(HashComparator hc)
- h hc
- N 1023 // default capacity
- A new ItemN
50Linear Probing Hash Table
- / constructor providing the hash comparator
and the capacity - of the bucket array /
- public LinearProbingHashTable(HashComparator hc,
int bN) - h hc
- N bN
- A new ItemN
51Linear Probing Hash Table
- // auxiliary methods
- private boolean available(int i)
- return (Ai AVAILABLE)
-
- private boolean empty(int i)
- return (Ai null)
-
52Linear Probing Hash Table
- private Object key(int i)
- return Ai.key()
-
- private Object element(int i)
- return Ai.element()
-
- private void check(Object k)
- if (!h.isComparable(k)) throw new
InvalidKeyException("Invalid key.") -
53Helper search method
- / helper search method /
- private int findItem(Object key) throws
InvalidKeyException - check(key)
- int i h.hashValue(key) N // division
method compression map - int j i
54Helper search method
- do
- if (empty(i))
- return -1 // item is not found
- if (available(i))
- i (i 1) N // bucket is deactivated
- else if (h.isEqualTo(key(i), key)) // we have
found our item - return i
- else // we must keep looking
- i (i 1) N
- while (i ! j)
- return -1 // item is not found
-
55Dictionary
- // methods of the dictionary ADT
- public Object findElement (Object key) throws
InvalidKeyException - int i findItem(key) // helper method for
finding a key - if (i
- return Dictionary.NO_SUCH_KEY
- return element(i)
-
56Dictionary
- public void insertItem (Object key, Object
element) throws InvalidKeyException - check(key)
- int i h.
- hashValue(key) N // division method
compression map - int j i
57Dictionary
- // remember where we are starting
- do
- if (empty(i) available(i)) // this slot
is available - Ai new Item(key, element)
- n
- return
-
- i (i 1) N // check next slot
- while (i ! j) // repeat until we return to
start - throw new HashTableFullException("Hash table is
full.")
58Dictionary
- public Object removeElement (Object key) throws
InvalidKeyException - int i findItem(key) // find this key first
- if (i
- return Dictionary.NO_SUCH_KEY
- // nothing to remove
- Object toReturn element(i)
- Ai AVAILABLE // mark this slot as
deactivated - n--
- return toReturn
-