Title: CS 1312
1CS 1312
- Introduction to
- Object Oriented Programming
- Lecture 13
- Insertion Sort, Hashing
2Insertion Sort
- In CS 1311 you were introduced to two sorting
techniques - Insertion Sort
- e.g. Inserting into a sorted linked list
- Merge Sort
- Classic divide and conquer algorithm
- Actually we just made up merge sort. It doesn't
really work which you can prove to yourself with
Java
3Insertion Sort
- Today we'll again look at Insertion Sort.
- Not because it's efficient (it's not)
- Because it is a good example of linked list
operations in conjunction with the comparable
interface - Insertion sort is the same technique you use to
arrange playing cards in your hand
4You might sort by suit...
5You might sort by first name...
6You might sort by age...
12/18/1980
6/9/1981
11/17/2791
12/21/1981
20
19
19
-791
7Or you could sort them in the order their
birthdays occur during the year. That way you can
send them a birthday card in the desperate hope
that they might come back.
(Okay some of them.)
The point is that you decide how you want the
data sorted.
8Let's write a date class
- Note Java actually has a date class
9- class Date implements Comparable
- private int month
- private int day
- private int year
- public Date(int month, int day, int year)
- setMonth(month)
- setDay(day)
- setYear(year)
-
- public String toString()
- return "" month "/" day "/" year
-
- public static int composDate(Date date)
- return date.year 10000
- date.month 100
- date.day
-
-
10- // class Date (continued)
- public void setMonth(int month)
- this.month month
-
- public int getMonth()
- return month
-
- public void setDay(int day)
- this.day day
-
- public int getDay()
- return day
-
- public void setYear(int year)
- this.year year
-
- public int getYear()
- return year
-
11- // class Date (continued)
- public int compareTo(Object o)
- int retval 0
- Date d (Date)o
- int thisOne composDate(this)
- int otherOne composDate(d)
- if(thisOne gt otherOne)
- retval 1
-
- else if(thisOne lt otherOne)
- retval -1
- else
- retval 0
-
- return retval
-
12Questions?
13Let's create a girlfriend card class
14- class Girlfriend implements Comparable
- private String name
- private Date birthday
- public Girlfriend
- (String name, int month, int date, int year)
- this(name, new Date(month, date, year))
-
- public Girlfriend(String name, Date birthday)
- setName(name)
- setBirthday(birthday)
-
- public void setName(String name)
- this.name name
-
- public String getName()
- return name
-
-
15- // class Girlfriend
- public void setBirthday(Date birthday)
- this.birthday birthday
-
- public Date getBirthday()
- return birthday
-
- public String toString()
- return "Girlfriend " name " Birthday "
- birthday
-
- public static int composMonDay(Date date)
-
- return date.getMonth() 100 date.getDay()
-
-
-
16- // class Girlfriend
- public int compareTo(Object o)
- int retval
- Girlfriend gf (Girlfriend)o
- int thisOne composMonDay(this.getBirthday())
- int otherOne composMonDay(gf.getBirthday())
- if(thisOne lt otherOne)
- retval -1
-
- else if(thisOne gt otherOne)
- retval 1
- else
- retval 0
-
- return retval
-
17Questions?
18Next a DataNode
19- class DataNode implements Comparable
- private Comparable data
- public DataNode(Comparable data)
- setData(data)
-
- public void setData(Comparable data)
- this.data data
-
- public Comparable getData()
- return data
-
- public String toString()
- return "" data
-
- public int compareTo(Object o)
- DataNode dn (DataNode)o
- return this.getData().compareTo(dn.getData())
-
20- // class DataNode (continued)
- public boolean equals(Object o)
- DataNode dn (DataNode)o
- return getData().equals(dn.getData())
-
- public static void main(String args)
- DataNode dn1 new DataNode("Node 1")
- DataNode dn2 new DataNode("Node 2")
- DataNode nul new DataNode(null)
- System.out.println(dn1)
- System.out.println(dn2)
- System.out.println(nul)
- System.out.println("dn1.compareTo(dn2)"
dn1.compareTo(dn2)) - System.out.println("dn2.compareTo(dn1)"
dn2.compareTo(dn1)) - DataNode dngf new DataNode
- (new Girlfriend("Chewie", 11, 17, 2791))
- System.out.println(dngf)
-
21Questions?
22ListNode
23- class ListNode extends DataNode
- private ListNode next
- public ListNode(Comparable data)
- this(data, null)
-
- public ListNode(Comparable data, ListNode next)
- super(data)
- setNext(next)
-
- public void setNext(ListNode next)
- this.next next
-
- public ListNode getNext()
- return next
-
- public String toString()
- return "Data " getData() " Next\n"
next -
-
24- // class ListNode (continued)
- public int compareTo(Object o)
- ListNode ln (ListNode)o
- return
- getData().compareTo(((ListNode)o
).getData()) -
- public static void main(String args)
- ListNode ln1 new ListNode("abc")
- ListNode ln2 new ListNode("xyz")
- ListNode lnBS new ListNode(
- new Girlfriend("Brittany", 12, 21,
1981)) - ListNode lnCA new ListNode(
- new Girlfriend("Christina", 12, 18,
1980)) -
- System.out.println(ln1)
- System.out.println(ln2)
- System.out.println(lnBS)
- System.out.println(lnCA)
-
25- // class ListNode (continued)
-
- System.out.println("ln1.compareTo(ln2) "
-
ln1.compareTo(ln2)) - System.out.println("ln2.compareTo(ln1) "
-
ln2.compareTo(ln1)) - System.out.println("lnBS.compareTo(lnCA) "
-
lnBS.compareTo(lnCA)) - System.out.println("lnCA.compareTo(lnBS) "
-
lnCA.compareTo(lnBS)) - System.out.println
- (("Brittany").compareTo("Christina"))
- //System.out.println(lnBS.compareTo(ln1))
-
- ListNode n3 new ListNode("Third")
- ListNode n2 new ListNode("Second", n3)
- ListNode n1 new ListNode("First", n2)
- ListNode head new ListNode("Head", n1)
-
26- // class ListNode (continued)
- n1 null
- n2 null
- n3 null
-
- System.out.println(head)
-
- ListNode a new ListNode(
- new Girlfriend("Albertina",
1,1,100)) - ListNode b new ListNode(
- new Girlfriend("Zoe", 12, 31,
3000)) - System.out.println(a.compareTo(b))
- // main
- // class
-
27Questions?
28SortedList
29- class SortedList
- private ListNode head
- public SortedList()
- head null
-
- public String toString()
- return "SortedList\n" head
-
-
30- // class SortedList (continued)
- public void add(Comparable data)
- ListNode temp new ListNode(data)
- if(head null)
- head temp
-
- else if(head.compareTo(temp) gt 0)
- temp.setNext(head)
- head temp
-
- else
- add(head, temp)
-
-
-
-
31- // class SortedList (continued)
- private void add(ListNode current, ListNode
temp) - if(current.getNext() null)
- current.setNext(temp)
-
- else if(current.getNext().compareTo(temp) gt 0)
- temp.setNext(current.getNext())
- current.setNext(temp)
-
- else
- add(current.getNext(), temp)
-
-
-
-
32- // class SortedList (continued)
- public static void main(String args)
- SortedList sl1 new SortedList()
- sl1.add("abc")
- sl1.add("xyz")
- System.out.println(sl1)
- sl1.add("aaa")
- sl1.add("zzz")
- sl1.add("mmm")
- System.out.println(sl1)
-
-
33- // class SortedList (continued)
- // main (continued)
-
- SortedList sl2 new SortedList()
- sl2.add(new Girlfriend("Brittany", 12, 21,
1981)) - Date d new Date(6, 9, 1981)
- Girlfriend gf2 new Girlfriend("Natalie", d)
- sl2.add(gf2)
- sl2.add(new Girlfriend("Christina", 12, 18,
1980)) - sl2.add(new Girlfriend("Chewie", 11, 17,
2791)) - sl2.add(new Girlfriend("First", 1,1,3000))
- sl2.add(new Girlfriend("Last", 12,31,1000))
- System.out.println(sl2)
-
- // main
34Questions?
35Hashing
36Desire
- We want to store objects in some structure and be
able to retrieve them extremely fast. - The number of items to store might be big.
37Hashing--Why?
Motivation Linked lists work well enough for
most applications, but provide slow service for
large data sets.
Ordered insertion takes too long for large sets.
3815
O(N2)
Why it matters
O(N)
10
Steps
O(log N)
5
0
5
20
10
15
Items
39Big Uh Oh
40Sanity Check
A search time of O(1)? How is this possible?
41Corned Beef Hash(ing) A classic use for leftover
corned beef. If you don't have enough leftover
potatoes, you can use frozen hash brown potatoes
in this dish. 2 tablespoons vegetable oil1
onion, finely chopped1 cup peeled, cubed, cooked
potatoes 2 cups finely diced cooked corned
beef1/2 teaspoon thymesalt and pepper to
tastedash Tabasco sauce1/2 cup heavy cream3
poached or fried eggs Heat oil in a heavy skillet
and sauté onions until tender. Add potatoes,
meat, thyme, salt, pepper and Tabasco. Stir well
and press mixture down with a spatula to form a
large pancake. Pour cream over and press mixture
down again. Cook for about 20 minutes, until the
hash has a slight crust on the bottom. Flip it
over. To do this easily, place a large dinner
plate face down over hash and turn the skillet
and plate over. Slide the hash from the plate
back into the skillet to cook the over side.
Continue cooking for an addition 10 - 15
minutes. Slice hash into three wedges. Top each
wedge with an egg and serve immediately. Yield
3 servings.
42One Way
Naive Solution Imagine we had to create a large
table, sized to the range of possible social
security numbers. Data myRecord
new Data 999999999 /
123456789 NOTE
Here, we assume there are approximately
a billion social security numbers
/
Perhaps not the best?
43Example
Social Security numbers come in patterns of
123-45-6578 There are millions of
potentially unique numbers.
0
1
2
239,455
239,456
239,457
We might be tempted to use a social security
number as an index value to some data set...
239,458
239,459
. . .
44Example
If we only planned on holding a few thousand
records, an array sized to nearly a billion items
would be very wasteful. Q How can we combine
the speed of accessing an array while still
efficiently using available memory resources?
A Shrink the population range values to fit
the array size. Use a hash function.
. . .
45Hashing
Idea Shrink the address space to fit the
population size.
999-99-9999
range of address space (passed into a method)
population size (usually a fixed array size)
100
000-00-0000
46Example
Instead of using the social security number as
the array index, StudentFile temp
studentRecordsiSocSecNum reduce the range of
the number to something within the size of the
array StudentFile temp
recordiSocSecNum record.length
returns an index within the appropriate range
47Recall
- Our friend the Mod Function
- x y
- will yield values between 0 and y-1
48Reality Check
- Everyone getting the idea?
49The Art of Hashing
Obviously, the hash function is the key. It
takes a large range of values, and shrinks them
to fit a smaller address range.
0
0
Range of our table
Range of Soc. Sec. Numbers
N
999,999,999
50A problem...
- We have an array of length 100
- We have about 50 students
- We hash using ssn 100
- George P. Burdell
- 123-45-6789
- George W. Bush
- 321-54-7689
Collision!
51Hash Functions How To Design
- The Perfect Hash Function
- would be very fast (used for all data access)
- would return a unique result for each key, i.e.,
would result in zero collisions - in general case, perfect hash doesnt exist (we
can create one for a specific population, but as
soon as that population changes... )
- Common Hash Functions
- Digit selection e.g., last 4 of phone num
- Division modulo
- Character keys use ASCII num values for chars
(e.g., R is 82)
52Cost of Hash
- Two costs of hashing 1. loss of natural
order - side effect of desired random shrinking
- lose any ordering of original indices
- 2. collision will occur
- no perfect hash function
- when (not if) collision, how to handle it?
- Collision Resolution strategies
- Multiple record buckets small for each index,
but . . . - Open address methods look for next open
address, but . . . - Coalesced chaining use cellar for overflow
(34..40 of size) - External chaining linked list at each location
Consider this classroom...
53Collision Resolution
Technique Multiple element buckets
- Idea have extra spaces there for overflow
- if population of 8, and if hash function of mod
8, then
1st 1st 2ndhash
collision collision
Problems using 3N space what if 3rd collision
at any one locale?
54Collision Resolution
Technique Open address methods
- Idea upon collision, look for an empty spot
- if population of 8, and if hash function of mod
8 - Assume data items arrived in the order W, X, Y,
Z, A, B, C, D
D belongs at 2, but C already there
W already at 1, so C to next available slot
X already at 3, so Z to next available slot
B belongs at 5, but Z already there
Problem Deteriorates to an unsorted list (e.g.,
O(N) )
55Collision Resolution
Technique Coalesced chaining
- Idea have small extra cellar to handle
collision - if population of 8, and if hash function of mod
8 - Assume data items arrived in the order W, X, Y,
Z, A, B, C, D
Works well with cellar of 35 to 40 of N if
good hash function cellar can overflow if
need be
0 1 W hashes to 1 9 2
D hashes to 2 3 X hashes to 3
10 4 Y hashes to 4 5 B
hashes to 5 6 A hashes to 6 7 8 9
C hashes to 1 10 Z hashes to 3
Cellar
Cellar bottom is now 8
56Collision Resolution
Technique External chaining
- Idea have pointers to all items at given hash,
handle collision as normal event. - if population of 8, and if hash function of mod
8 - Assume data items arrived in the order W, X, Y,
Z, A, B, C, D
57Hashing with Chaining Example
58- public class Node
- int iData
- Node nextNode
- public Node()
- public Node(int iData)
- this.iData iData
-
- public void insertNode(int iData)
- insertNode (iData, this)
-
- public void insertNode(int iData, Node
current) - if (current.getNextNode() null)
- current.setNextNode(new Node(iData))
- else
- insertNode(iData, current.getNextNode(
))
59- public Node locateNode(int iData)
- return locateNode(iData, this)
-
- public Node locateNode(int iData, Node
current) - if (iData current.getData())
- return current
- else if (current.getNextNode() null)
- return null
- else
- return locateNode
- (iData, current.getNextNode(
)) -
- public int getData()
- return iData
-
- public Node getNextNode()
60- public void setNextNode(Node nextNode)
- this.nextNode nextNode
-
- public String toString()
- return "Node " iData
-
-
- // Node
61- public class HashChain
- private Node bucket
- private int TableSize
- public HashChain(int TableSize)
- this.TableSize TableSize
- bucket new NodeTableSize
- for (int i0 ilt TableSize i)
- bucketi new Node()
- // HashChain
-
- private int getHashKey(int newElement)
- return newElement TableSize
- // getHashKey
- public void addElement(int newElement)
- int index getHashKey(newElement)
- bucketindex.insertNode(newElement)
- //addElement
62- public Node getElement(int iData)
- int index getHashKey(iData)
- Node item bucketindex.locateNode(iData
) - return item
- // getElement
- public void printHashChain()
- Node temp
- for(int i0 i lt TableSize i)
-
- System.out.print(i" ")
- temp bucketi
- while(temp.getNextNode() ! null)
-
- temp temp.getNextNode()
- System.out.print(temp" ")
-
- System.out.println()
-
63- class Driver
-
- public static void main(String arg)
-
- int N 50
- HashChain hash
- new HashChain(Integer.parseInt(arg0
)) - for (int i0 ilt N i)
-
- hash.addElement((int)(Math.random()
N) -
- // for
- hash.printHashChain()
- // main
- // Driver
64- C\My Documents\sandbox\Hashinggtjava Driver 22
- 0 Node 22 Node 22
- 1 Node 1 Node 45
- 2 Node 24 Node 46 Node 24 Node 24
- 3 Node 25 Node 25
- 4 Node 4 Node 4
- 5 Node 27 Node 5 Node 49 Node 27
- 6 Node 6 Node 6
- 7 Node 29 Node 29
- 8
- 9 Node 31 Node 9 Node 9
- 10
- 11 Node 11 Node 33 Node 33 Node 33 Node 33
- 12 Node 12
- 13 Node 13 Node 35 Node 35
- 14 Node 14
- 15 Node 15 Node 37
- 16 Node 16 Node 38 Node 16 Node 38
- 17 Node 39 Node 39
65Load Factor
We can measure how full our table has become
with a load factor. A load factor is merely
the ratio of full spots to empty spots. It gives
us a measure of table utilization.
This gives us a way of estimating the chance of a
collision
66What Good is a Load Factor?
unsuccessful search
15
Number of probes against load factor for
linear probing hash
successful search
10
Probes
5
0
25
100
50
75
Load Factor Percentage
67Probe?
- Is this lecture sponsored by
- No, not exactly.
- A probe refers to an attempt to find the target.
68Rehashing
Performance charts suggest that as our load
factor increases, the number of probes
increases. At some point, it may be worth the
trouble to grow the table size, and rehash
Make a new table, and rehash each entry into the
new table
rehash
69Rehashing
Question Why cant we just reuse the old hash
values in our new, larger table?
Make sure you can answer such a question.
rehash
70Questions?
71Better Hashing
The key to efficient hashing is the hash
function. This is fairly easy if the data hold a
uniformly distributed number. But how can we
efficiently convert a name into a key number?
Experimenting with this problem will expose some
issues in hashing. Heres our basic method
signature public int getHash(String
strName)
72Hashing Names
Version 1
public int getHash (String strName) int
hash 0 for (int i 0 i lt
strName.length() i) hash (int)
strName.charAt(i) hash tableSize
return hash
73Hashing Names
public int getHash (String strName) int
hash 0 for (int i 0 i lt
strName.length() i) hash (int)
strName.charAt(i) hash tableSize
return hash
For large tables, this hash function does not
distribute the keys very well.
So, on average, our hash function returns numbers
up to 1,016. If the table size is a large prime
number, we will never distribute keys to the
upper portion of the table. As a result, we will
tend to have more collisions on the lower part of
the table.
74Hashing Names
Version 2
public int getHash (String strName) int
hash 0 hash (int)
strName.charAt(0) 27 (int)
strName.charAt(1) 729 (int)
strName.charAt(2) hash tableSize
return hash
Strategy only examine first three characters
Given 27 is the number of characters in the
alphabet, plus the space character. 729 is 27 2.
75Hashing (contd)
public int getHash (String strName) int
hash 0 hash (int) strName.charAt(0)
27 (int) strName.charAt(1)
729 (int) strName.charAt(2) hash
tableSize return hash
There are now 263 (or 17,576) combinations of
letters. This should distribute evenly over a
large table.
BUT English does not uniformly distribute
letters in words. There are in fact only 2,851
combinations of three letter sequences in
English. So once again, we under utilize the
table. (Only about a quarter is actually hashed.)
76Inductive Analysis
What happened in our two previous examples?
They worked, but what caused them to be
inefficient.
Hash does not expand limited range
table size
range of name values
The problem was a mismatch of address space and
table size. If the table size exceeds the
address range, an under utilization occurs.
77Improved Hash Function
public int getHash (String strName) int
hash 0 for (int i0 ilt strName.length()
i) hash 27 hash (int)
strName.charAt(i) hash tableSize if
(hash lt 0 ) hash tableSize
return hash
Side note for the mathematically inclined, this
applies what is known as Horners rule
78Why Is This a Better Hash?
public int getHash (String strName) int
hash 0 for (int i0 ilt
strName.length() i) hash 27
hash (int) strName.charAt(i)
hash tableSize if (hash lt 0 )
hash tableSize return hash
Still subject to quirks of the English language,
but not sensitive to three-letter
combinations. Uses a polynomial expansion to
generate a large input value, so the hash will
likely use the entire table, even for large
tables.
Addresses possible roll-over
79Hard Lessons about Hashing
Your hash function must be carefully
selected. It varies with your data. You have to
study your input, and base your hash on the
properties of the input data. Your range of
input should be larger than your table size (else
your hashing will under utilize the
table). Watch out for tables sized to a large
prime number.
80Summary of Hash Tables
- Purpose Fast searching of lists by reducing
address space to approximately population size. - Hash function the reduction function
- Collision hash(a) hash(b), but a!b
- Collision resolution strategies
- Multiple element buckets still risk collisions
- Open addressing quickly deteriorates to unordered
list - Chaining is most general solution
81Questions?
82Test Yourself
In the context of a hashtable, what is an address
space? What is a hashing function? Should a
hashing function return values equal to, greater
than or less than the table size? Why? What
data structure (seen in previous slides) might we
use to implement a hash table?
83Questions?
84(No Transcript)