Title: Sorting Algorithms
1Sorting Algorithms
2Motivation
- Example Phone Book Searching
- If the phone book was in random order, we would
probably never use the phone! - Lets say ½ second per entry
- There are 70,000 households in Ilam
- 35,000 seconds 10hrs to find a phone number!
- Best time ½ second
- average time is about 5 hrs
3Motivation
- The phone book is sorted
- Jump directly to the letter of the alphabet we
are interested in using - Scan quickly to find the first two letters that
are really close to the name we are interested in - Flip whole pages at a time if not close enough
4The Big Idea
- Take a set of N randomly ordered pieces of data
aj and rearrange data such that for all j (j gt
0 and j lt N), R holds, for relational operator R - a0 R a1 R a2 R aj R aN-1 R aN
- If R is lt, we are doing an ascending sort Each
consecutive item in the list is going to be
larger than the previous - If R is gt, we are doing a descending sort
Items get smaller as move down the list -
5Queue Example Radix Sort
- Also called bin sort
- Repeatedly shuffle data into small bins
- Collect data from bins into new deck
- Repeat until sorted
- Appropriate method of shuffling and collecting?
- For integers, key is to shuffle data into bins
on a per digit basis, starting with the rightmost
(ones digit) - Collect in order, from bin 0 to bin 9, and left
to right within a bin
6Radix Sort Ones Digit
- Data 459 254 472 534 649 239 432 654 477
- Bin 0
- Bin 1
- Bin 2 472 432
- Bin 3
- Bin 4 254 534 654
- Bin 5
- Bin 6
- Bin 7 477
- Bin 8
- Bin 9 459 649 239
- After Call 472 432 254 534 654 477 459 649
239
7Radix Sort Tens Digit
- Data 472 432 254 534 654 477 459 649 239
- Bin 0
- Bin 1
- Bin 2
- Bin 3 432 534 239
- Bin 4 649
- Bin 5 254 654 459
- Bin 6
- Bin 7 472 477
- Bin 8
- Bin 9
- After Call 432 534 239 649 254 654 459 472 477
8Radix Sort Hundreds Digit
- Data 432 534 239 649 254 654 459 472 477
- Bin 0
- Bin 1
- Bin 2 239 254
- Bin 3
- Bin 4 432 459 472 477
- Bin 5 534
- Bin 6 649 654
- Bin 7
- Bin 8
- Bin 9
- Final Sorted Data 239 254 432 459 472 477 534
649 654
9Radix Sort Algorithm
- Begin with current digit as ones digit
- While there is still a digit on which to classify
-
- For each number in the master list,
- Add that number to the appropriate sublist
keyed on the current digit - For each sublist from 0 to 9
- For each number in the sublist
- Remove the number from the sublist and append
to a new master list - Advance the current digit one place to the left.
10Radix Sort and Queues
- Each list (the master list (all items) and bins
(per digit)) needs to be first in, first out
ordered perfect for a queue.
11A Quick Tangent
- How fast have the sorts youve seen before
worked? - Bubble, Insertion, Selection O(n2)
- We will see sorts that are better, and in fact
optimal for general sorting algorithms - Merge/Quicksort O(n log n)
- How fast is radix sort?
12Analysis of Radix Sort
- Let n be the number of items to sort
- Outer loop control is on maximum length of input
numbers in digits (Let this be d) - For every digit,
- Assign each number to sort to a group (n
operations) - Pull each number back into the master list (n
operations) - Overall running time 2 n d gt O(n)
13Analysis of Radix Sort
- O(n log n) is optimal for general sorting
algorithms - Radix sort is O(n)? How does that work?
- Radix sort is not a general sorting algorithm
It cant sort arbitrary information Rectangles
objects, Automobiles objects, etc are no good. - Can sort items that can be broken into
constituent pieces and whose pieces can be
ordered - Integers (digits), Strings (characters)
14Sorting Algorithms
- What does sorting really require?
- Compare pieces of data at different positions
- Swap the data at those positions until order is
correct
20
3
18
9
5
15Selection Sort
- void selectionSort(int a, int size)
-
- for (int k 0 k lt size-1 k)
-
- int index mininumIndex(a, k, size)
- swap(ak,aindex)
-
-
- int minimumIndex(int a, int first, int last)
-
- int minIndex first
- for (int j first 1 j lt last j)
- if (aj lt aminIndex) minIndex j
- return minIndex
16Selection Sort
- What is selection sort doing?
- Repeatedly
- Finding smallest element by searching through
list - Inserting at front of list
- Moving front of list forward by 1
17Selection Sort Step Through
20
3
18
9
5
minIndex(a, 0, 5) ? 1 swap (a0,a1)
18Order From Previous
Find minIndex (a, 1, 5) 4
Find minIndex (a, 2, 5) 3
19Find minIndex (a, 3, 5) 3
K 4 size-1 Done!
20Cost of Selection Sort
- void selectionSort(int a, int size)
-
- for (int k 0 k lt size-1 k)
-
- int index mininumIndex(a, k, size)
- swap(ak,aindex)
-
-
- int minimumIndex(int a, int first, int last)
-
- int minIndex first
- for (int j first 1 j lt last j)
- if (aj lt aminIndex) minIndex j
- return minIndex
21Cost of Selection Sort
- How many times through outer loop?
- Iteration is for k 0 to lt (N-1) gt N-1 times
- How many comparisons in minIndex?
- Depends on outer loop Consider 5 elements
- K 0 j 1,2,3,4
- K 1 j 2, 3, 4
- K 2 j 3, 4
- K 3 j 4
- Total comparisons is equal to 4 3 2 1,
which is N-1 N-2 N-3 1 - What is that sum?
-
22Cost of Selection Sort
- (N-1) (N-2) (N-3) 3 2 1
- (N-1) 1 (N-2) 2 (N-3) 3
- N N N gt repeated addition of N
- How many repeated additions?
- There were n-1 total starting objects to add, we
grouped every 2 together approximately N/2
repeated additions - gt Approximately N N/2 O(N2) comparisons
-
23Insertion Sort
- void insertionSort(int a, int size)
-
- for (int k 1 k lt size k)
-
- int temp ak
- int position k
- while (position gt 0 aposition-1 gt temp)
-
- aposition aposition-1
- position--
-
- aposition temp
-
24Insertion Sort
- List of size 1 (first element) is already sorted
- Repeatedly
- Chooses new item to place in list (ak)
- Starting at back of the list, if new item is less
than item at current position, shift current data
right by 1. - Repeat shifting until new item is not less than
thing in front of it. - Insert the new item
25Insertion Sort Step Through
Single card list already sorted
20
3
18
9
5
A1
A2
A3
A4
A0
Move 3 left until hits something smaller
20
3
18
9
5
A2
A3
A4
A0
A1
26Move 3 left until hits something smaller Now
two sorted
18
9
5
3
20
A2
A3
A4
A0
A1
Move 18 left until hits something smaller
20
18
3
9
5
A3
A4
A0
A1
A2
27Move 18 left until hits something smaller Now
three sorted
9
5
3
18
20
A3
A4
A0
A1
A2
Move 9 left until hits something smaller
3
20
9
18
5
A4
A0
A1
A2
A3
28Move 9 left until hits something smaller Now
four sorted
3
9
18
20
5
A4
A0
A1
A2
A3
Move 5 left until hits something smaller
3
9
18
20
5
A0
A1
A2
A3
A4
29Move 5 left until hits something smaller Now
all five sorted Done
3
9
18
20
5
A0
A1
A2
A3
A4
30Cost of Insertion Sort
- void insertionSort(int a, int size)
-
- for (int k 1 k lt size k)
-
- int temp ak
- int position k
- while (position gt 0 aposition-1 gt temp)
-
- aposition aposition-1
- position--
-
- aposition temp
-
31Cost of Insertion Sort
- Outer loop
- K 1 to lt size 1,2,3,4 gt N-1
- Inner loop
- Worst case Compare against all items in list
- Inserting new smallest thing
- K 1, 1 step (position k 1, while position gt
0) - K 2, 2 steps position 2,1
- K 3, 3 steps position 3,2,1
- K 4, 4 steps position 4,3,2,1
- Again, worst case total comparisons is equal to
sum of I from 1 to N-1, which is O(N2)
32Cost of Swaps
- Selection Sort
- void selectionSort(int a, int size)
-
- for (int k 0 k lt size-1 k)
-
- int index mininumIndex(a, k, size)
- swap(ak,aindex)
-
-
- One swap each time, for O(N) swaps
33Cost of Swaps
- Insertion Sort
- void insertionSort(int a, int size)
-
- for (int k 1 k lt size k)
-
- int temp ak
- int position k
- while (position gt 0 aposition-1 gt temp)
-
- aposition aposition-1
- position--
-
- aposition temp
-
-
- Do a shift almost every time do compare, so O(n2)
shifts - Shifts are faster than swaps (1 step vs 3 steps)
- Are we doing few enough of them to make up the
difference?
34Another Issue - Memory
- Space requirements for each sort?
- All of these sorts require the space to hold the
array - O(N) - Require temp variable for swaps
- Require a handful of counters
- Can all be done in place, so equivalent in
terms of memory costs - Not all sorts can be done in place though!
35Which O(n2) Sort to Use?
- Insertion sort is the winner
- Worst case requires all comparisons
- Most cases dont (jump out of while loop early)
- Selection use for loops, go all the way through
each time
36Tradeoffs
- Given random data, when is it more efficient to
- Just search versus
- Insertion Sort and search
- Assume Z searches
- Search on random data Z O(n)
- Sort and binary search O(n2) Z log2n
37Tradeoffs
- Z n lt n2 (Z log2n)
- Z n Z log2n lt n2
- Z (n-log2n) lt n2
- Z lt n2/(n-log2n)
- For large n, log2n is dwarfed by n in (n-log2n)
- Z lt n2/n
- Z lt n (approximately)
-
38Improving Sorts
- Better sorting algorithms rely on divide and
conquer (recursion) - Find an efficient technique for splitting data
- Sort the splits separately
- Find an efficient technique for merging the data
- Well see two examples
- One does most of its work splitting
- One does most of its work merging