Title: Searching / Hashing
1Searching / Hashing
2Big-O of Search Algorithms
- Sequential Search - O(n)
- unsorted list in an array (did not do this
term) - linked list, even if sorted (gradelnklist files)
- Binary Search - O(log2n)
- sorted list in an array (gradelistarray files)
- BST if reasonably balanced (tree files)
- Hashing - O(1) - constant search time!
3Hashing Fundamentals
- Records (structs) are stored in an array
- Records are not sorted on a particular key
- Hash function calculates the position in the
array in which a record is stored based on the
key - Ideally, hash function should be one-to-one,
i.e., two different keys should not "hash" to the
same position
4Hashing Fundamentals
- To add an item to a hash table, use the hash
function to calculate its position and store it
directly there - To locate (search for) an item in a hash table,
use the hash function to calculate its position
and look for it directly there - Unused positions in the hash table need to have a
default "empty" value stored
5Example 1 Student Records with SSN as Key
Hash function h(ssn) ssn const int
MAXSTUDENTS 1,000,000,000 struct
StudentType long ssn string
lastname string firstname char
midinit float gpa StudentType
studentsMAXSTUDENTS
6Example 1
- very simple hash function
- hash function is one-to-one
- a LOT of wasted space
- this example wastes 99.9999 of array positions
7Example 2 Student Records with SSN as Key
Hash function h(ssn) ssn 10000 const int
MAXSTUDENTS 10,000 struct StudentType
long ssn string lastname
string firstname char midinit
float gpa StudentType studentsMAXSTUDENT
S
8Example 2
- still a relatively simple hash function
- still some wasted space, but not as much (only
wasting 90 of array positions) - hash function is no longer guaranteed to be
one-to-one - no longer guaranteed O(1) searching
9Collisions
- A collision occurs when two keys hash to the same
value - As seen in example 1, a perfect hash function can
waste a lot of space, but ... - ... reducing the wasted space can introduce the
possibility of collisions! - Want to find optimal array size and hash function
to minimize wasted space and minimize collisions
10Ways to Handle CollisionsLinear Probing
- To insert a record
- Start by calculating the hash value
- Starting at that position, do sequential search
for an empty spot - Store record in empty spot
- indx h(insertssn)
- while (studentsindx.ssn ! empty value)
- indx (indx 1) MAXSTUDENTS
- studentsindx newstudentrecord
11Ways to Handle CollisionsLinear Probing
- To locate (search for) a record
- Start by calculating the hash value
- Starting at that position, do sequential search
for the record - If an empty spot is encountered before finding
record, record is not there - indx h(searchssn)
- while (studentsindx.ssn ! searchssn
- studentsindx.ssn ! empty value)
- indx (indx 1) MAXSTUDENTS
- if (studentsindx.ssn searchssn )
- found student with searchssn
- else
- no student in table with searchssn
12Ways to Handle CollisionsChaining
- Have each element in the array be the head
pointer to a linked list of records whose keys
hash to the same value - Slightly better than linear probing - limits
the length of the sequential search required once
collisions start to occur - Requires more storage than linear probing even if
same table size is used because of space required
for pointers
13Possible Hash Functions
- Division Method
- h(key) key MAXSTUDENTS
- Folding
- break key into "pieces" and do calculations
with the pieces - ex h(123 45 6321) 123456321
- 135
14For more info
- Read pages 647-662 in text
- Look at problems 29, 32, 33(only columns for 29
and 32) - Food for thought
- Do you think a hash table is a good storage
option for a group of records that you want to
display in various sorted orders?