Title: Chapter 11 Searching
1Chapter 11Searching
- CS 260 Data Structures
- Indiana University Purdue University Fort Wayne
- Mark Temte
2Chapter outline
- Serial search
- Binary search
- Search by hashing
- Open-address hashing
- Hash functions
- Double hashing
- Chained hashing
- Analysis of hashing
- All search methods considered are array searches
3Serial search
- This is also known as a . . .
- Linear search
- Sequential search
- Goal
- Look for a target value in a first .. (first n
1) - A search method typically might return
- ( first i ) for success
- 1 to indicate failure
int i for ( i 0 ( i lt n ) ( a first i
! target ) i ) // loop ended if ( ( i lt n
) ( a first i target ) ) lt
success at ( first i ) gt else lt failure gt
4Serial search
- Analysis
- Best case
- Success on the first access
- O( 1 ) constant performance
- Worst case
- Failure
- O( n ) linear performance
- Average case
- Assume success equally likely at each position
- O( n ) linear performance
total accesses over all
positions n(n1)/2 Ave accesses
(n1)/2 number
of positions n
5Binary search
- Binary search is often written as a recursive
method - The following version is easier to remember and
code correctly than the version in the text
public static int binarySearch( int a, int
first, int last, int target ) int mid (
first last )/2 if ( first gt last )
return 1 if ( target lt a mid )
return binarySearch( a, first, mid-1, target
) else if ( target a mid )
return mid else return
binarySearch( a, mid1, last, target )
6Binary search
- Recall the precondition
- The array must be sorted before the binary search
may be used - Analysis
- O( log(n) ) logarithmic performance
7Search by hashing
- Hashing is a search technique with average O(1)
performance used to search a key-value table - A key-value table is also known as a . . .
- Dictionary
- Map
- Associative array
- A hash function associates every possible key
with a position in the array - The hash function must be easy to compute
- To search for a key-value pair, the hash function
is applied to the key and the resulting position
in the array is accessed
8Search by hashing
- Not only the average hashing performance
constant, but it is also efficient to add and
remove key-value pairs - The hash function has the form
- The integer returned by the hash function must be
a valid array index
private int hash( ltkey typegt key )
9Example of a hash function
- Let class Pair represent a key-value pair object
- The table is the array table defined by
- Pair table new Pair 1000
- Each key is an employee social security number
- This is String of characters of the form
999-99-9999 - The hash function maps the social security number
to the array index defined by the last three
digits of the social security number - This integer is a valid array index in the range
0..999
10Search by hashing
- The ideal situation is to store the key-value
pair in - Problem the possibility of a collision
- Also called a hash clash
- A collision is when
- It is not usually possible to obtain a perfect
hash function - How we resolve this problem leads to various
special hashing techniques - Open-address hashing
- Double hashing
- Chained hashing
table hash( key )
key1 ! key2 but hash key1 hash key2
11Collision example
- Hash your birthday to the range 0..365
- Ignore leap year
- Question
- In a classroom with 23 students, what is the
probability of the students having at least one
collision?
- Answer
- Greater than 50
- So, with an array loading factor of less than 6,
there more than a 50-50 chance of a collision - Collisions are almost guaranteed to happen
- They must be handled in an efficient manner
12Open-address hashing
- The open-address hashing technique resolves
collisions using linear probing - For linear probing, establish a sequence of
predetermined alternate locations to use in the
event of a collision - Note that this wraps around the array if
necessary - Linear probing uses the first available open
location - Alternate locations are tried in order
- The sequence of alternates is needed in the event
there are collisions at some of the alternate
locations
Let L0 hash( key ) If L0 is occupied, use a
series of alternate locations L1, L2, L3,
Alternate Lp is defined by Lp1 (Lp 1
) table.length
13Example
- Consider a hypothetical hash function at left
- Build the table in the order
- A, B, C, D E, F, G
- Search for D
- Success at location 1
- Search for E
- Success at location 7
- Search for H with hash( H ) 3
- Failure at location 5
- Delete C and search for G
- Search ends if failure at location 3 unless we
know to skip over location 3
0 1 2 3 4 5 6 7 8
keys values
table
14Open-address hashing
- To handle deletions . . .
- Need to mark each location as one of . . .
- hasBeenUsed
- has not been used
- For this purpose, add a new boolean instance
variable hasBeenUsed to the Pair class - Now the search for G has the information to skip
over deleted location 3 and succeed at location 4
0 1 2 3 4 5 6 7 8
keys values hasBeenUsed
table
15Open-address hashing
- The open-address hashing algorithm for searching
is to use linear probing until . . . - The key is found
- Success
- Or until
- Failure
- To reduce the number of collisions, the maximum
number of items to be placed in the table needs
to be known in advance - The capacity of the array must be set to a size
somewhat larger
table Lp .hasBeenUsed false
16The hashCode( ) method
- Every Java class inherits method hashCode( )
- This method maps any key object to an int
- The resulting int must subsequently be mapped to
the range 0 . . (table.length-1) by a method
hash( ) supplied by the programmer
table hash( key.hashCode( ) )
anObject
your choice
-------- int --------
------- array index --------
17Not using Java?
- If the given language does not have a method such
as hashCode( ), a replacement method must be
implemented - No problem if the key is already an integer
- Otherwise, use the data in a non-integer key to
obtain an integer in some other way - Perhaps use the integer ASCII codes of a
character string to build an integer reflecting
the differences in Strings - Any data can be viewed as a bit string in
assembly language if necessary
18Constructing hash( ) methods
- Assume that the key has already been converted to
an int using hashCode( ) or some other method - The hash( ) method used to map the int to a valid
array index should . . . - Be efficient to compute with O(1)
- Distribute the keys evenly throughout the array
- Use all key information
- Break up natural clusters of keys
19Constructing hash( ) methods
- A very good hash method is known as division
- This method satisfies the first three criteria
for a good hash function - However, it does not break up natural clusters of
keys - Nearby keys keep their relative positions except
when one key wraps around and the other does not
hash( key ) Math.abs( key )table.length
20Constructing hash( ) methods
- Another hash method is multiplication
- Still another is called mid-square
Let M (?5 1 ) / 2 0.6180339887 hash( key )
( int ) ( arrayCapacity lt fractional part of
Mkey gt )
hash( key ) lt extract some middle digits or
bits from ( key )2 gt
21The Table class
- This is a class for a key-value table ADT
- Instead of defining a Pair class and having an
array of Pair objects, we will use parallel
arrays for keys, data, and hasBeenUsed - State
private int manyItems private Object
keys private Object data private
boolean hasBeenUsed
22The Table class
- Behavior
- Table( capacity )
- Inefficient to change the capacity dynamically
- size( )
- capacity( )
- put( key, value )
- containsKey( key )
- get( key )
- remove( key )
23The ADT invariant of the Table class
- The ADT invariant of the Table class
- The number of elements in the table is in the
instance variable manyItems. - The preferred location for an element with a
given key is at index - hash( key ). If a collision occurs, then a
circular array search is performed - in the forward direction to find the next open
position. When an open - position is found a index i, then the element
itself is placed in data i - and the elements key is placed in keys i .
- An index i that is not currently used has data i
and keys i set to null. - If an index i has been used at some point (now or
in the past), then - hasBeenUsed i is true otherwise it is false.
24The Table class
- Private helper methods
- hash( key )
- nextIndex( index )
- findIndex( key )
private int hash(Object key) return
Math.abs( key.hashCode( ) ) data.length
private int nextIndex( int index ) if (
index 1 data.length ) return 0
else return index 1
25The Table class
private int findIndex( Object key ) int
count 0 int i hash( key )
while ( ( count lt data.length )
hasBeenUsed i ) if ( key.equals(
keys i ) ) return i
count i nextIndex( i )
return -1
- Note the variable count is needed when the key
is not in the table and every position has been
used - The search will terminate after every cell has
been examined
26The Table class
public Object get( Object key ) int index
findIndex( key ) if ( index -1)
return null else return data
index
- If the search for key fails, the method returns
null - Otherwise, it returns the data associated with
the key
27public Object put( Object key, Object element )
int index findIndex( key )
Object answer if ( index ! -1 )
// The key is
already in the table. answer data
index data index element
return answer else if ( manyItems lt
data.length ) // The key is not yet in
this Table index hash( key )
while ( keys index ! null ) index
nextIndex( index ) keys index
key data index element
hasBeenUsed index true
manyItems return null else
// The table is
full. throw new IllegalStateException(
"Table is full. )
28The Table class
public Object remove( Object key ) int
index findIndex( key ) Object answer
null if ( index ! -1 ) answer
data index keys index null
data index null
manyItems-- return answer
29Double hashing
- Linear probing used with open-address hashing
makes clustering worse - The double hashing technique is similar to
open-address hashing but reduces clustering - The double hashing technique chooses a second
hashing function hash2( key ) - Example
Suppose hash( key ) 711 and hash2( key )
111 Linear probing sequence 711, 712, 713, . .
. Double hashing sequence 711, 822, 933, . . .
30Double hashing
- For double hashing, the sequence of predetermined
alternate locations to use in the event of a
collision is defined as follows - Note that the increment hash2( key ) is usually
different for different keys - For linear probing it was the same (i.e.,1) for
all keys
Let L0 hash( key ) If L0 is occupied, use a
series of alternate locations L1, L2, L3,
Alternate Lp is defined by Lp1 (Lp
hash2( key ) ) table.length
31Double hashing
- There is a problem with double hashing
- If hash2( key ) evenly divides the table size,
many locations are never probed - Example
- The solution to this dilemma is to choose an
array size that is a prime number
Suppose the array size is 1000 and hash2( key )
100 Suppose L0 327 Then the sequence of
probes examines only the locations 327,
427, 527, 627, 727, 827, 927, 027, 127, and 227
32Double hashing
- Example of choosing the array size to be a prime
- Try this at home with your favorite prime number
and any values for hash( key ) and hash2( key )
Suppose the array size is 11 (prime) and hash2(
key ) 4 Suppose L0 6 Then the sequence of
probes examines only the locations 6, 10,
3, 7, 0, 4, 8, 1, 5, 9, 2 This covers the entire
array
33Double hashing
- The following are good choices for hash( key )
and hash2( key ) - Both use Javas hashCode( ) and the division
method - Remember, the value of data.length must be prime
- Note that the value of hash2( key ) is such that
- The value of hash2( key ) cannot be 0 or
data.length
hash( key ) Math.abs( key.hashCode( ) )
data.length hash2( key ) 1 Math.abs(
key.hashCode( ) ) ( data.length 2 )
1 lt hash2( key ) lt data.length -1
34Chained hashing
- Chained hashing uses linked lists
- Define a Node class with instance variables for
- the key
- the value
- a Node pointer
- Start with an Node array of any size
- Each array component is interpreted to be the
head of a linked list of all key-value pairs that
collide at that position
Node table new Node size
35Chained hashing
0 1 2 3 4 5 6
- Three keys collide at position 2
36Predefined Java class
- Java has two predefined classes for hashing
- java.util.Hashtable
- java.util.HashMap
- Both use open-address hashing
- See the text for details
- Appendix D, pages 764 765
37Analysis of hashing
- We consider the result of an analysis of the
three hash methods in the case of a successful
search - A statistically uniform hash function is assumed
- It is also assumed that no removals have taken
place - The analysis gives the average number of probes
needed in a successful search as a function the
the loading factor
keys stored Definition The
hashing loading factor a array
size
38Analysis of hashing
- The following table gives
- The average number of probes needed for each hash
technique as a function of the loading factor - Some representative values for various loading
factors