Title: Sorting
1Sorting
2- The efficiency of data handling can often be
substantially increased if the data are sorted - For example, it is practically impossible to find
a name in the telephone directory if the items
are not sorted - In order to sort a set of item such as numbers or
words, two properties must be considered - The number of comparisons required to arrange the
data - The number of data movement
3- Depending on the sorting algorithm, the exact
number of comparisons or exact number of
movements may not always be easy to determine - Therefore, the number of comparisons and
movements are approximated with big-O notations - Some sorting algorithm may do more movement of
data than comparison of data - It is up to the programmer to decide which
algorithm is more appropriate for specific set of
data - For example, if only small keys are compared such
as integers or characters, then comparison are
relatively fast and inexpensive - But if complex and big objects should be
compared, then comparison can be quite costly
4- If on the other hand, the data items moved are
large, and the movement is relatively done more,
then movement stands out as determining factor
rather than comparison - Further, a simple method may only be 20 less
efficient than a more elaborated algorithm - If sorting is used in a program once in a while
and only for small set of data, then using more
complicated algorithm may not be desirable - However, if size of data set is large, 20 can
make significant difference and should not be
ignored - Lets look at different sorting algorithms now
5- Insertion Sort
- Start with first two element of the array,
data0, and data1 - If they are out of order then an interchange
takes place - Next data2 is considered and placed into its
proper position - If data2 is smaller than data0, it is placed
before data0 by shifting down data0 and
data1 by one position - Otherwise, if data2 is between data0 and
data1, we just need to shift down data 1 and
place data2 in the second position - Otherwise, data2 remain as where it is in the
array - Next data3 is considered and the same process
repeats - And so on
6Algorithm and code for insertion sort
InsertionSort(data, n) for (i1, iltn,
i) move all elements dataj greater than
datai by one position place datai in its
proper position
template ltclass Tgt void InsertionSort(T data ,
int n) for (int i1 iltn, i)
T tmp datai for (int j
i jgt0 tmp lt dataj-1 j--)
dataj dataj-1 dataj tmp
7Example of Insertion Sort
Put tmp2 in position 1
tmp 2
Moving 5 down
Put tmp3 in position 2
tmp 3
Moving 5 down
8Since 5 is less than 8 no shifting is required
tmp 8
Put tmp1 in position 1
Moving 5 down
Moving 2 down
Moving 3 down
Moving 8 down
tmp1
9- Advantage of insertion sort
- If the data are already sorted, they remain
sorted and basically no movement is not necessary - Disadvantage of insertion sort
- An item that is already in its right place may
have to be moved temporary in one iteration and
be moved back into its original place - Complexity of Insertion Sort
- Best case This happens when the data are already
sorted. It takes O(n) to go through the elements - Worst case This happens when the data are in
reverse order, then for the ith item (i-1)
movement is necessary - Total movement 1 2 .. . (n-1) n(n-1)/2
which is O(n2) - The average case is approximately half of the
worst case which is still O(n2)
10- Selection Sort
- Select the minimum in the array and swap it with
the first element - Then select the second minimum in the array and
swap it with the second element - And so on until everything is sorted
11Algorithm and code for selection sort
SelectionSort(data ,n) for (i0 iltn-1
i) Select the smallest element among
datai datan-1 Swap it with
datai
template ltclass Tgt void SelectionSort(T data ,
int n) int i, j, least for (i1 iltn-1,
i) for (j i1 leasti
jltn j) if dataj lt
dataleast least j
swap (dataleast, datai)
12Example of Selection Sort
The first minimum is searched in the entire
array which is 1 Swap 1 with the first position
The second minimum is 2 Swap it with the second
position
13The third minimum is 3 Swap 1 with the third
position
The fourth minimum is 5 Swap it with the forth
position
14- Complexity of Selection Sort
- The number of comparison and/or movements is the
same in each case (best case, average case and
worst case) - The number of comparison is equal to
- Total (n-1) (n-2) (n-3) . 1
- n(n-1)/2
-
- which is O(n2)
15- Bubble Sort
- Start from the bottom and move the required
elements up (i.e. bubble the elements up) - Two adjacent elements are interchanged if they
are found to be out of order with respect to each
other - First datan-1 and datan-2 are compared and
swapped if they are not in order - Then datan-2 and datan-3 are swapped if they
are not in order - And so on
16Algorithm and code for bubble sort
BubbleSort(data ,n) for (i0 iltn-1 i)
for (jn-1 jgti --j)
swap elements in position j and j-1 if they are
out of order
template ltclass Tgt void BubbleSort(T data , int
n) for (int i0 iltn-1, i) for
(int j n-1 jgti --j) if dataj
lt dataj-1 swap (dataj,
dataj-1)
17Example of Bubble Sort
Iteration 1 Start from the last element up to
the first element and bubble the smaller elements
up
Iteration 2 Start from the last element up to
second element and bubble the smaller elements up
18Example of Bubble Sort
Iteration 3 Start from the last element up to
third element and bubble the smaller elements up
Iteration 4 Start from the last element up to
fourth element and bubble the smaller elements up
19- Complexity of Bubble Sort
- The number of comparison and/or movements is the
same in each case (best case, average case and
worst case) - The number of comparison is equal to
- Total (n-1) (n-2) (n-3) . 1
- n(n-1)/2
-
- which is O(n2)
20- Comparing the bubble sort with insertion and
selection sorts we can say that - For the average case, bubble sort makes
approximately twice as many comparisons and the
same number of moves as insertion sort - Bubble sort also, on average, makes as many
comparison as selection sort and n times more
moves than selection sort - Between theses three types of sorts Insertion
Sort is generally better algorithm because if
array is already sorted running time only takes
O(n) which is relatively faster than other
algorithms
21- Shell Sort
- Shell sort works on the idea that it is easier
and faster to sort many short lists than it is to
sort one large list - Select an increment value k (the best value for k
is not necessarily clear) - Sort the sequence consisting of every kth element
(use some simple sorting technique) - Decrement k and repeat above step until k1
22Example of Shell Sort
Choose k 4 first
23Example of Shell Sort
Now choose k 2, and then 1 by applying the
insertion sort
24Algorithm of shell sort
ShellSort(data ,n) determine numbers ht,
ht-1, ..h1 of ways of dividing array data into
subarrays for (h ht tgt1 t--, hht )
divide data into h sub-array for
(i1 ilth i) sort sub-array
datai sort array data
- Complexity of shell sort
- Shell sort works well on data that is almost
sorted O (n log2 n) - Deeper analysis of Shell sort is quite difficult
- Can be shown is practice that it is O(n3/2)
25Code for shell sort
template ltclass Tgt void ShellSort(T data , int
arrsize) int i, j, hCnt, h, k int
increments 20 // create appropriate number
of increments h for (h 1 i0 hltarrsize
i) increments i h h 3h
1 // loop on the number of different
increments h for (ii-1 igt0 i--) h
increments i // loop on the number
of sub-arrays h-sorted in ith pass for
(hCnth hCntlt2h hCnt) // insertion
sort for sub-array containing every hth element
of array data for (jhCntl
jltarrsize) T tmp dataj
k j while (k-hgt0 tmp
lt data k-h) datak
datak-h k k h
data k tmp
j j h
26- Heap Sort
- Heap sort uses a heap as described in the earlier
lectures - As we said before, a heap is a binary tree with
the following two properties - Value of each node is not less than the values
stored in each of its children - The tree is perfectly balanced and the leaves in
the level are all in the leftmost positions
27- The procedure is
- The data are transformed into a heap first
- Doing this, the data are not necessarily sorted
however, we know that the largest element is at
the root - Thus, start with a heap tree,
- Swap the root with the last element
- Restore all elements except the last element into
a heap again - Repeat the process for all elements until you are
done
28Algorithm and Code for Heap sort
HeapSort(data ,n) transform data into a
heap for (in-1 igt1 i--) swap the
root with the element in position i
restore the heap property for the tree data0
datai-1
template ltclass Tgt void HeapSort(T data , int
size) for (int i (size/2)-1 igt0
i--) MoveDown(data, i, size-1) // creates
the heap for (isize-1 igt1 --i)
Swap (data0, datai) // move the
largest item to datai MoveDown(data, 0,
i-1) // restores the heap
29Example of Heap Sort
We first transform the data into heap
The initial tree is formed as follows
30We turn the array into a heap first
31(No Transcript)
32(No Transcript)
33Now we start to sort the elements
Swap the root with the last element
Restore the heap
34Swap the root with the last element
Restore the heap
35Swap the root with the last element
Restore the heap
36Swap the root with the last element
Restore the heap
37Swap the root with the last element
Restore the heap
38Swap the root with the last element
Restore the heap
39Swap the root with the last element
Restore the heap
40Swap the root with the last element
Restore the heap
41Place the elements into array using breadth first
traversal
42- Complexity of heap sort
- The heap sort requires a lot of movement which
can be inefficient for large objects - In the second phase when we start to sort the
elements while keeping the heap, we exchange
n-1 times the root with the element in position
i and also restore the heap n-1 times which
takes O(nlogn) - In general
- The first phase, where we turn the array into
heap, requires O(n) steps - And the second phase when we start to sort the
elements requires - O(n-1) swap O(nlogn) operations to restore the
heap - Total O(n) O(nlogn) O(n-1) O(nlogn)
43- Quick Sort
- This is known to be the best sorting method.
- In this scheme
- One of the elements in the array is chosen as
pivot - Then the array is divided into sub-arrays
- The elements smaller than the pivot goes into one
sub-array - The elements bigger than the pivot goes into
another sub-array - The pivot goes in the middle of these two
sub-arrays - Then each sub-array is partitioned the same way
as the original array and process repeats
recursively
44Algorithm of quick sort
QuickSort(array ) if length (array) gt 1
choose a pivot // partition array into
array1 and array2 while there are
elements left in array include
elements either in array1 // if element lt pivot
or in array2 // if element gt
pivot QuickSort(array1)
QuickSort(array2)
- Complexity of quick sort
- The best case is when the arrays are always
partitioned equally - For the best case, the running time is O(nlogn)
- The running time for the average case is also
O(nlogn) - The worst case happens if pivot is always either
the smallest element in the array or largest
number in the array. - In the worst case, the running time moves toward
O(n2)
45Code for quick sort
template ltclass Tgt void quicksort(T data , int
first, int last) int lower first 1 upper
last swap (datafirst, data(firstlast)/2))
T pivot data first while (lower lt
upper) while (datalower lt pivot)
lower while (pivot lt dataupper)
upper-- if (lower lt upper)
swap(datalower, dataupper--) else
lower swap (dataupper,
datafirst) if (first lt upper-1)
quicksort(data, first, upper-1) if (upper1 lt
last) quicksort(data, upper1, last)
templateltclass Tgt void quicksort(T data , int
n) if (nlt2) return for
(int i1, max0 iltn i) if (datamax
lt datai) max i
swap(datan-1, datamax) quicksort(data,
0, n-2)
46Example of Quick Sort
- By example
- Select pivot
- Partition
65
65
47- Recursively apply quicksort to both partitions
- Result will ultimately be a sorted array
0 13 26 31 43 57 65 75 81 92
48- Radix Sort
- Radix refers to the base of the number. For
example radix for decimal numbers is 10 or for
hex numbers is 16 or for English alphabets is 26. - Radix sort has been called the bin sort in the
past - The name bin sort comes from mechanical devices
that were used to sort keypunched cards - Cards would be directed into bins and returned to
the deck in a new order and then redirected into
bins again - For integer data, the repeated passes of a radix
sort focus on the ones place value, then on the
tens place value, then on the thousands place
value, etc - For character based data, focus would be placed
on the right-most character, then the second most
right-character, etc
49Algorithm and Code for Radix Sort Assuming the
numbers to be sorted are all decimal integers
RadixSort(array ) for (d 1 d lt the
position of the leftmost digit of longest number
i) distribute all numbers among piles 0
through 9 according to the dth digit Put
all integers on one list
void radixsort(long data , int n) int i,
j, k, mask 1 const int radix 10 //
because digits go from 0 to 9 const int digits
10 Queueltlonggt queuesradix for (i0,
factor 1, i lt digits factor factorradix,
i) for (j0 jltn j)
queues (dataj / factor ) radix .enqueue
(dataj) for (jk0 j lt radix j)
while (!queuesj.empty())
datak queuesj.dequeue()
50- Example of Radix Sort
- Assume the data are
- 459 254 472 534 649 239 432 654 477
- Radix sort will arrange the values into 10 bins
based upon the ones place value
0 1 2 472 432 3 4 254 534
654 5 6 7 477 8 9 459 649 239
51- The sublists are collected and made into one
large bin (in order given) - 472 432 254 534 654 477 459 649 239
- Then Radix sort will arrange the values into 10
bins based upon the tens place value
0 1 2 3 432 534 239 4 649 5 254 654
459 6 7 472 477 8 9
52- The sublists are collected and made into one
large bin (in order given) - 432 534 239 649 254 654 459 472 477
- Radix sort will arrange the values into 10 bins
based upon the hundreds place value (done!)
0 1 2 239 254 3 4 432 459 472
477 5 534 6 649 654 7 8 9
- The sublists are collected and the numbers are
sorted - 239 254 432 459 472 477 534 649 654
53- Another Example of Radix Sort
- Assume the data are
- 9 54 472 534 39 43 654 77
- To make it simple, rewrite the numbers to make
them all three digits like - 009 054 472 534 039 043 654 077
- Radix sort will arrange the values into 10 bins
based upon the ones place value
0 1 2 472 3 043 4 054 534
654 5 6 7 077 8 9 009 039
54- The sublists are collected and made into one
large bin (in order given) - 472 043 054 534 654 077 009 039
- Then Radix sort will arrange the values into 10
bins based upon the tens place value
0 009 1 2 3 534 039 4 043 5 054
654 6 7 472 077 8 9
55- The sublists are collected and made into one
large bin (in order given) - 009 534 039 043 054 654 472 077
- Radix sort will arrange the values into 10 bins
based upon the hundreds place value (done!)
0 009 039 043 054 077 1 2 3 4 472
5 534 6 654 7 8 9
- The sublists are collected and the numbers are
sorted - 009 039 043 054 077 472 534 654
56- Assume the data are
- area book close team new place prince
- To sort the above elements using the radix sort
you need to have 26 buckets, one for each
character. - You also need one more character to represent
space which has the lowest value. Suppose that
letter is question-mark ? and it is used to
represent space - You can rewrite the data as follows
- area? Book? Close Team? New?? Place Print
- Now all letters have 5 characters and it is easy
to compare them with each other - To do the sorting, you can start from the right
most character, place the data into appropriate
buckets and collect them. Then place them into
bucket based on the second right most character
and collect them again and so on.
57- Complexity of Radix Sort
- The complexity is O(n)
- However, keysize (for example, the maximum number
of digits) is a factor, but will still be a
linear relationship because for example for at
most 3 digits 3n is still O(n) which is linear - Although theoretically O(n) is an impressive
running time for sort, it does not include the
queue implementation - Further, if radix r (the base) is a large number
and a large amount of data has to be sorted, then
radix sort algorithm requires r queues of at most
size n and the number rn is O(rn) which can be
substantially large depending of the size of r.