Title: CS2851 Dr' Mark L' Hornick
1Hashing, HashMaps and HashSets
2JCF HashMap is a Map
- Implements same methods found in TreeMap
- put(), get()
- remove()
- entrySet(), keySet()
- containsKey(), containsValue()
- size(), equals(), clear()
3The JCF HashMap
- Like the JCF TreeMap, this class implements the
Map interface - Implying a data structure based on key/value
pairs - public class HashMapltK, Vgt
- implements MapltK, Vgt
- extends AbstractMapltK, Vgt
- Example HashMapltString, Doublegt
students//String holds student ID//Double
holds gpa -
4JCF HashMap does not sort either keys or values
- Implements Map, not SortedMap
- entrySet(), keySet() generate unsorted Sets
- Iterating through these Sets results in keys or
entries in no apparent order - So
- Why bother with a HashMap at all?
- Whats the point?
5Review performance of previously covered data
structures
- ArrayList
- get()
- add()
- contains()
- LinkedList
- get()
- add()
- contains()
- TreeMap/TreeSet
- get()
- add() (put)
- contains()
6HashMaps advantage is overall performance
- Constant time performance for ALL operations!
- Put()
- Get()
- ContainsKey()
- How???
7Hash definition
- A hash is a transformation of a key into a
numeric value that maps to the index of an array
(or table) - This is done in two steps
- First, generate a numeric hashcode from the key
- Second, transform the hashcode into an array
index
Key
hashcode
index
8How do you generate a hashcode?
- In Java, most classes have a built-in hashCode()
method - Classes that dont override hashCode() inherit
the Object classs hashCode() method - Which returns the memory address of the object,
which is non-deterministic
9How do you transform a hashcode into an array
index?
- First, consider the Integer class
- Integers hashCode( ) method simply returns the
underlying int - The HashMap class has a ltpackage-visiblegt hash()
method - static int hash(Object x)
- int h x.hashCode()
- h (h ltlt 9)
- h (h gtgtgt 14)
- h (h ltlt 4)
- h (h gtgtgt 10)
- return h
-
- This method further scrambles the hashcode for
example - hash(123456789) // Returns 1272491941
10How do you transform a hashcode into an array
index?
- An index in the range 01023 can be computed as
follows - int index hash (123456789) 1024
- or
- int index hash (123456789) 1023
- The resulting index933
- The second operation is computationally faster
11How does the operator work?
- The operator performs a bitwise and on its
operands. - For each pair of bits a and b, if a and b are
both 1 bits, a b 1. Otherwise, a b 0. - For example,
- 10100001101001
- 00000000001111
- 00000000001001
121023 in binary form
- 00000000000000000000000111111111
- So (w 1023)
- returns the rightmost 9 bits of the operand w
- In general, this works well as long as the table
length is a power of 2 - Why??
13Exercise
What are the index values xxx, yyy, and zzz?
14More hashing examples(for a table 1024 in length)
- 123456789 indexes to 933
- 428671256 indexes to 500
- 884739816 indexes to 234
15Hashing can result in Collisions
- 123456789 indexes to 933
- 428671256 indexes to 500
- 884739816 indexes to 234
- 403578063 indexes to 933
- When two different keys yield the same index,
that is called a collision. - Keys that yield the same index are called
synonyms.
16Hashing is inefficient when there are a lot of
collisions
- Ideally, we want the hashing algorithm to
generate indices sprinkled randomly throughout
the underlying table - The Uniform Hashing Assumption assumes
- Each key is equally likely to hash to any one of
the table addresses, independently of where the
other keys have hashed
17Even if this assumption is true, collisions still
occur
- This is due to the finite set of indices in a
table - An infinite number of keys cannot be mapped into
a finite set of indices - So collision handlers have to be implemented
18The JCF HashMap collision handling mechanism
- At index i in table, store the linked list of all
elements whose keys hash to I - This is called chaining
- It implements a simple singly-linked list
- Note The table length must be a power of 2.
19(No Transcript)
20As chains get long, performance degrades to O(m)
- Once the table becomes 75 full, it is resized
- All indices are recalculated
- Chains are removed or reduced
21(No Transcript)
22(No Transcript)
23Another collision handler
- In Open Address hashing, when a collision occurs,
the next available index is used to store the
key/value - This leads to some interesting practical
implementation problems (see the text)
24HashSets
25A HashSet is an unordered Collection in which the
element is the key
- The HashSet class has all of the methods in the
Collection interface - add, remove, size, contains,
- plus toString (inherited from AbstractCollection)
- public class HashSetltEgt
- extends AbstractSetltEgt
- implements SetltEgt, Cloneable,
- java.io.Serializable