Title: Hash Tables
1Hash Tables
- CS341
- Western Washington University
2A Dictionary
The concept of a dictionary is present in tons of
software applications. Like what? A
dictionary is a collection of elements,
where each element has a unique key, and an
element usually has a field called value. The
operations performed on a dictionary
include insert an element search for an
element delete an element All these operations
are dependent on the key value.
3Access in a dictionary is randomas opposed to
sequential. Which other data type uses random
access? Sequential access? The STL provides a
map as an implementation of a dictionary templat
e ltclass Key, class Value, class
Compare lessltKeygt, class Allocator
allocatorgt class map It is also possible for a
dictionary to allow duplicatesentries that have
the same key, but different values. The STL
provides a multimap class for this.
4A Hash Table
A hash table is another alternative for
representing a dictionary. In a hash table, a
hash function is used to map keys into positions
in a table. This is the act of hashing.
Heres the philosophy of hashing In the ideal
situation, if element e has the key k and f is
the hash function, then e is stored in position
f(k) of the table. Searchcompute f(k) and see
if there is an element there Insertioncompute
f(k) and place it in that position Deletioncomput
e f(k) and delete at that position
5Lets analyze the performance of the ideal
situation. Searching ?(1). Before we can
hash the key, we must numericize the
keyborrowing vocabulary from Dr. Mobus. We
need to translate the key into a numerical
value. For some applications, it may already be
in a numerical format. The trick to hashing is
three-fold 1. Choosing how to numerisize. 2.
Choosing the number of bucketsthis corresponds
to the size of the hash table. Sometimes its
called b, sometimes its called D. 3. Choosing
the hash function. Most hash functions are
implemented using modulus f(k) k D, where
the buckets in the hash table are indexed 0 thru
D-1.
6Hash Table Implementations
2 approaches from the book include Linear Open
Addressing Chains Before we talk about
implementations, theres one more important
termcollision. A collision occurs when multiple
keys hash to the same locationthis location is
also called the home bucket. How collisions are
handled differs between the two implementations.
Linear Open Addressing If a collision occurs,
insert the entry into the next available
bucket. What effect does this have on the
dictionary operations?
7Chains
With chaining, we maintain a chain of
entries that have the same home bucket. We can
implement this as an array of linked listsbut
you probably already guessed that. What effect
does chaining have on the dictionary operations?
8A Hashing Application Text Compression
We can use compression to reduce the amount of
storage needed for a text file. There are
many algorithms that can be used to perform
the compression. We will need to use
decompression to decode the text file when
retrieving it from storage. For instance,
combine the characters with digits representing
the number of occurrences of each character.
This seems to only work well for particular
strings. Another technique is called the LZW
Method. It is named after its creators Lempel,
Ziv, and Welch.
9LZW Compression
LZW compression maps strings of text
characters into numeric codessound
familiar? Lets look at the string aaabbbbbbaab
aaba The alphabet consists of as and bs. We
can assign a the code 0, and b the code
1. We need to create a dictionary to store the
keys and codes. When compressing a string,
well perform dictionary lookups to retrieve a
code for a particular sequence in the string.
Well also need to modify the dictionary to
contain additional key and code pairs.
10The LZW Rule
The Rule is to find the longest prefix, p, of
the unencoded part of the input file that is in
the dictionary and output its code. Additionally
if there is a next character, c, in the input
file, then pc is assigned the next code and
inserted into the dictionary.
11LZW Decompression
- For decompression, we reverse the process we
- used for compression. We will be searching
- based on codes in the dictionary, rather than
- keys.
- We will still need to dynamically create the
- dictionary. We can initialize it exactly as
- before.
- Here are the cases to consider
- the code p is in the dictionaryhooray!
- the code p is not in the dictionaryboo!
- In either case we need to build the add
- an entry in the dictionary.
12How do we build the dictionary during decompressio
n? To start off, we know that the first code
corresponds to a single character, so we
can replace it. When p is in the dictionary, the
text(p) is added to the decoded string. We also
need to add the pair(next code, text(q)fc(p))
into the dictionary. Here q represents the
code preceding p, and fc(p) corresponds to the
first character of p. When p is not in the
dictionary, we need to determine figure out what
the appropriate text will be and add a new entry
into the dictionary. In this case, text(p)
text(q)fc(q). Decompress the example
string 0214537