Title: Sorting in Linear Time
1Sorting in Linear Time
- CS 583
- Analysis of Algorithms
2Outline
- Comparison Sort Algorithms
- Lower Bounds for Sorting
- Counting Sort
- Order Statistics
- Minimum and Maximum
- Selection
3Comparison Sorts
- We have seen several algorithms that can sort n
numbers in O(n lg n) time. - Merge sort and heapsort achieve this upper bound
in the worst case quicksort achieves it on
average. - These algorithms have one common property the
sorted order they determine is based only on
comparisons between the input elements. Such
sorting algorithms are called comparison sorts. - We prove that any comparison sort must make ?(n
lg n) comparisons in the worst case to sort n
elements.
4Comparison Sort Decision Tree
We assume without loss of generality that all
input elements are distinct. In this case, we can
simply make comparisons of one form, for example,
ai lt aj. Comparison sorts can be viewed in
terms of decision tree. For example, sorting
three elements using insertion sort will look as
follows 12 lt gt 23 13 lt
gt lt1,2,3gt 13 lt gt lt1,3,2gt lt3,1,2gt ...
5Comparison Sort Decision Tree (cont.)
- In a decision tree each internal node is
annotated by ij for some i and j in the range 1
lt i,j lt n. - Each leaf is annotated by a permutation lt?(1),
... , ?(n)gt. - The execution of the sorting algorithm
corresponds to tracing a path from the root to a
leaf. - Any correct sorting algorithm must be able to
produce each permutation of its input, hence all
n! leaves of the decision tree must be
"reachable".
6Lower Bound for Comparison Sorts
Theorem 8.1 Any comparison sort algorithm
requires ?(n lg n) comparisons in the worst
case. Proof. The length of the longest path
from the root of a decision tree to any of its
reachable trees represents the worst-case number
of comparisons that a sorting algorithm performs.
Hence, we need to determine the height of the
decision tree, where each permutation is a
reachable leaf.
7Lower Bound for Comparison Sorts (cont.)
Consider a tree of height h and l leaves. Since
each permutation appears as a leaf, we have n! lt
l. A binary tree of height h has no more than 2h
leaves n! lt l lt 2h gt h gt lg(n!) lg(n!)
?(n lg n) (see 3.18) gt h gt ?(n lg n) gt h
?(n lg n) ?
8Counting Sort
- This algorithm assumes that each of the n input
elements is an integer in the range 0 to k. - When k O(n), the sort runs in ?(n) time.
- The basic idea is to determine for each input
element x, the number of elements less than x. - This information can be used to place x directly
into its position in the output array. - The algorithm requires an input array A, the
output array B, and an intermediate (counting)
array C.
9Counting Sort Example
n5, k2 2 1 0 2 2 C after steps
1-4 0 1 2 1 1 3 C after loop
6 0 1 2 1 2 5 B in loop 9 1 2 3 4 5 index
2 2 0 1 2
10Counting Sort Pseudocode
Counting-Sort (A,B,n,k) 1 for i 0 to k 2
Ci 0 3 for i 0 to n 4 CAi 5 //
Ci contains the number of elements i 6 for i
1 to k 7 Ci Ci Ci-1 8 // Ci now
contains number of elements lt i 9 for i n to
1 10 BCAi Ai 11 CAi-- 12 return
11Counting Sort Performance
- After loop 6, the array C contains the first
position of an element with value i, which is the
same as the number of elements lt i. At each
iteration, when the i element is placed into the
output array, the position of the next element i
will be before the current one. - To calculate the running time, observe that the
number of operations is k1(loop 1) nkn
?(kn). When using kO(n), we have the running
time ?(n). - An important quality of the counting sort is that
it is stable, numbers with the same value appear
in the output array in the same order as they do
in the input array. The property of stability is
important when satellite date are carried around
with the key.
12Order Statistics
- The ith order statistics of a set of n elements
is the ith smallest element. - The minimum element is the first order statistics
(i1). - The maximum element is the last order statistics
(in). - A median is a the half point of the set
(i(n1)/2). - The selection problem is finding the ith order
statistics from a set of n distinct numbers. - It can be solved in O(n lg n) time by sorting
elements, and then selecting the ith element from
the sorted array. - The fastest algorithm runs in O(n) time in the
worst case.
13Minimum/Maximum Pseudocode
- MINIMUM(A)
- 1 min A1
- 2 for i 2 to lengthA
- 3 if Ai lt min
- min Ai
- 5 return min
- The above algorithm makes (n-1) comparisons.
Finding the maximum can be accomplished with
(n-1) comparisons as well - MAXIMUM(A)
- 1 max A1
- 2 for i 2 to lengthA
- 3 if Ai gt max
- max Ai
- 5 return max
14Simultaneous Minimum and Maximum
MINMAX(A) 1 min A1 2 max A1 3 i 2 4
while (i lt lengthA) 5 if (i1) gt
lengthA 6 x_min Ai x_max Ai 7
else 8 if Ai lt Ai1 9 x_min
Ai x_max Ai1 10 else 11 x_min
Ai1 x_max Ai 12 if x_max gt max 13
max x_max 14 if x_min lt min 15 min
x_min 16 i 2 17 return (min, max) The above
algorithm performs at most 5n/2 comparisons to
find both minimum and maximum, and hence runs in
?(n) time.
15General Selection
The general selection algorithm finds an ith
order statistics. The algorithm below is modeled
after a quicksort algorithm with expected running
time O(n). RANDOMIZED-SELECT (A,p,r,i) 1 if
pr 2 return Ap 3 q RANDOMIZED-PARTITION(A
,p,r) 4 k q-p1 5 if i k // the pivot
element is the answer 6 return Aq 7 else 8
if iltk 9 return RANDOMIZED-SELECT(A,p,q-1,i
) 10 else 11 return RANDOMIZED-SELECT(A,q1,
r,i-k)