Hashing Part I - PowerPoint PPT Presentation

About This Presentation
Title:

Hashing Part I

Description:

a person's name for example. Generating n. How is a string converted into an integer? ... gets used to generate the address. less chance for conflicts. more on ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 21
Provided by: Kristofer6
Category:

less

Transcript and Presenter's Notes

Title: Hashing Part I


1
Hashing Part I
  • CS 367 Introduction to Data Structures

2
Searching
  • Up to now the only way to find a key is to search
    through all or part of the data
  • linked list O(n)
  • AVL tree O(log n)
  • binary search of array O(log n)
  • If lots of data and/or searching the data very
    often, these times can be long
  • given the key, would like to get the data directly

3
Hashing
  • The solution to this problem is to put the key
    through a function that says exactly where the
    data is (or where it should be placed)
  • this function is called a hash function
  • h(key) integer
  • the integer obtained from a hash function can be
    used as an index into an array
  • if the hash function is perfect always
    generates a unique integer for different keys
    the time to place and access data is O(1)

4
Hashing
A M X
Hashing Function
A
M
X
0
1
2
3
4
5
6
7
8
9
10
11
5
Hashing Functions
  • So what is the hashing function?
  • the simplest hashing function is to use the
    division remainder
  • assume the array is 1000 elements in size
  • translate the data into a number, n
  • h(n) n 1000

6
Hashing Functions
  • simple example
  • consider a small school
  • each student is tracked by a 4 digit ID number
  • each students ID begins with the year they
    started
  • 2000 -gt 0, 2001-gt1, 2002-gt2, etc.
  • all student records are stored in an array
  • maximum of 1000 students per year
  • lets look at records for all sophomores
  • assume they were freshman in 2001

7
Hashing Functions
To find Johns record in the array 1009 1000
9 Go to index number 9.
Marys ID 1000 Petes ID 1004 Johns ID
1009 Amys ID 1011
0
1
2
3
4
5
6
7
8
9
10
11

Marys records
Petes records
Johns records
Amys records
8
Generating n
  • The previous example is rather simplistic in that
    it is hashing already unique integers
  • seems kind of pointless
  • maybe not if the integers are large
  • consider the UWs 10 digit ID numbers
  • Often it is desirable to hash some other kind of
    data
  • a persons name for example

9
Generating n
  • How is a string converted into an integer?
  • the simplest method is to add all of the ASCII
    values for each character together
  • example
  • convert amy into an integer
  • a 97 m 109 y 121
  • a m y 327
  • there are lots of other ways to convert strings
    to integers
  • what are a few of them?

10
Hashing Functions
  • There are millions of possible hashing functions
  • we will not be considering them all
  • basically, anything you can think of to generate
    an integer could be used as a hashing function
  • Mathematicians have spent lots of time and effort
    to come up with some basic methods that work
    pretty well

11
Division
  • We have already seen the division method
  • it involves taking the remainder of division
  • h(key) key tableSize
  • A few notes about making this work better
  • table size should be a prime number
  • usually a good method if nothing very little is
    known about the keys
  • the remaining methods will all use division as
    the final step in their calculation

12
Folding
  • Separate the key into various equally sized parts
    and then recombine them
  • usually with addition
  • Two kinds of folding
  • shift folding
  • just add the various parts together as they are
  • boundary folding
  • reverse the order of every other part and add
    them together

13
Folding
  • Consider a SSN as a key
  • break it into 3 parts
  • first 3, second 3, last 3
  • Shift folding example
  • SSN 123-45-6789
  • first 123 second 456 third 789
  • h(key) (first second third) size
  • h(SSN) 1368 tableSize
  • Boundary folding example
  • h(key) (first R(second) third) size
  • h(key) (123 654 789) size

14
Increasing Performance
  • Consider using shifting and exclusive ORing to
    generate the key
  • exclusive OR parts together to generate index
  • Example
  • consider the string abcdefgh
  • if each part is a letter, just exclusive OR them
  • a b c d e f g h
  • often, a character is represented by 8 bits
  • whats the problem with this?
  • might be better to exclusive OR chunks of the
    string
  • abcd efgh
  • why were four digits chosen in this case?

15
Increasing Performance
  • int shiftFold(String key, int tableSize)
  • int chunk 0
  • int result 0
  • byte st key.getBytes()
  • for(int i0 iltst.length i4)
  • for(int j0 (jlt4) (j i lt st.length) j)
  • chunk chunk stj i
  • chunk chunk ltlt 8
  • result result chunk
  • chunk 0
  • return result tableSize

16
Increasing Performance
  • The performance could be increased even more if
    the table size was a power of 2
  • can get rid of the modulo operation at the end
  • modulo is an expensive calculation
  • could just do a subtraction and an AND operation
    instead

17
Mid-Square Function
  • Square the number and take the middle part as the
    index
  • a string must first be converted to get the
    number to square
  • The entire key gets used to generate the address
  • less chance for conflicts
  • more on this later
  • This method works best if the table size is a
    power of two

18
Mid-Square Function
  • Table size equals 1024 (210)
  • The key is 3121
  • 31212 9740441 (100101001010000101100001)2
  • middle 10 digits of this value are listed in bold
  • Index in array is
  • (0101000010)2 322
  • This is all very quick and easy to calculate
    using mask and shift operations

19
Mid-Square Function
  • int tableSize 1024
  • int mask (tableSize 1)
  • int maskBits logBase2(tableSize)
  • int shiftBits 7
  • // table size must be a power of two
  • int midSquare(String key, int tableSize)
  • int n stringToNum(key)
  • int n n n
  • return n (mask ltlt shiftBits)

20
Extraction
  • Simply pull out a certain part of the key and use
    it as the index
  • example
  • SSN 123-45-6789
  • index middle of key 456
  • alternative index first, middle, last 159
  • Should try to choose a part of the key that is
    most likely unique
  • consider foreign student SSN
  • start with 999
  • probably not a great idea to extract the first
    three numbers
Write a Comment
User Comments (0)
About PowerShow.com