Title: CS2851 Dr' Mark L' Hornick
1Hashing, HashMaps and HashSets
2JCF HashMap is a Map
- Implements same methods found in TreeMap
- put(), get()
- remove()
- entrySet(), keySet()
- containsKey(), containsValue()
- size(), equals(), clear()
3The JCF HashMap
- Like the JCF TreeMap, this class implements the
Map interface - Implying a data structure based on key/value
pairs - public class HashMapltK, Vgt
- implements MapltK, Vgt
- extends AbstractMapltK, Vgt
- Example HashMapltString, Doublegt
students//String holds student ID//Double
holds gpa -
4JCF HashMap does not sort either keys or values
- Implements Map, not SortedMap
- entrySet(), keySet() generate unsorted Sets
- Iterating through these Sets results in keys or
entries in no apparent order - So
- Why bother with a HashMap at all?
- Whats the point?
5Review performance of previously covered data
- ArrayList
- get()
- add()
- contains()
- LinkedList
- get()
- add()
- contains()
- TreeMap/TreeSet
- get()
- add() (put)
- contains()
6HashMaps advantage is overall performance
- Constant time performance for ALL operations!
- Put()
- Get()
- ContainsKey()
- How???
7Hash definition
- A hash is a transformation of a key into a
numeric value that maps to the index of an array
(or table) - This is done in two steps
- First, generate a numeric hashcode from the key
- Second, transform the hashcode into an array
8How do you generate a hashcode?
- In Java, most classes have a built-in hashCode()
method - Classes that dont override hashCode() inherit
the Object classs hashCode() method - Which returns the memory address of the object,
which is non-deterministic
9How do you transform a hashcode into an array
- First, consider the Integer class
- Integers hashCode( ) method simply returns the
underlying int - The HashMap class has a ltpackage-visiblegt hash()
method - static int hash(Object x)
- int h x.hashCode()
- h (h ltlt 9)
- h (h gtgtgt 14)
- h (h ltlt 4)
- h (h gtgtgt 10)
- return h
- This method further scrambles the hashcode for
example - hash(123456789) // Returns 1272491941
10How do you transform a hashcode into an array
- An index in the range 01023 can be computed as
follows - int index hash (123456789) 1024
- or
- int index hash (123456789) 1023
- The resulting index933
- The second operation is computationally faster
11How does the operator work?
- The operator performs a bitwise and on its
operands. - For each pair of bits a and b, if a and b are
both 1 bits, a b 1. Otherwise, a b 0. - For example,
- 10100001101001
- 00000000001111
- 00000000001001
121023 in binary form
- 00000000000000000000000111111111
- So (w 1023)
- returns the rightmost 9 bits of the operand w
- In general, this works well as long as the table
length is a power of 2 - Why??
What are the index values xxx, yyy, and zzz?
14More hashing examples(for a table 1024 in length)
- 123456789 indexes to 933
- 428671256 indexes to 500
- 884739816 indexes to 234
15Hashing can result in Collisions
- 123456789 indexes to 933
- 428671256 indexes to 500
- 884739816 indexes to 234
- 403578063 indexes to 933
- When two different keys yield the same index,
that is called a collision. - Keys that yield the same index are called
16Hashing is inefficient when there are a lot of
- Ideally, we want the hashing algorithm to
generate indices sprinkled randomly throughout
the underlying table - The Uniform Hashing Assumption assumes
- Each key is equally likely to hash to any one of
the table addresses, independently of where the
other keys have hashed
17Even if this assumption is true, collisions still
- This is due to the finite set of indices in a
table - An infinite number of keys cannot be mapped into
a finite set of indices - So collision handlers have to be implemented
18The JCF HashMap collision handling mechanism
- At index i in table, store the linked list of all
elements whose keys hash to I - This is called chaining
- It implements a simple singly-linked list
- Note The table length must be a power of 2.
19(No Transcript)
20As chains get long, performance degrades to O(m)
- Once the table becomes 75 full, it is resized
- All indices are recalculated
- Chains are removed or reduced
21(No Transcript)
22(No Transcript)
23Another collision handler
- In Open Address hashing, when a collision occurs,
the next available index is used to store the
key/value - This leads to some interesting practical
implementation problems (see the text)
25A HashSet is an unordered Collection in which the
element is the key
- The HashSet class has all of the methods in the
Collection interface - add, remove, size, contains,
- plus toString (inherited from AbstractCollection)
- public class HashSetltEgt
- extends AbstractSetltEgt
- implements SetltEgt, Cloneable,
- java.io.Serializable