CSE 326 Data Structures Sorting
  • Lecture 16 Friday, Feb 14, 2003

Review QuickSort
procedure quickSortRecursive (Array A, int left,
int right) if (left right) return int pivot
choosePivot(A, left, right) / partition A
s.t. Aleft, Aleft1, , Ai ? pivot
Ai1, Ai2, , Aright ? pivot / quickSortR
ecursive(A, left, i) quickSortRecursive(A, i1,
Review The Partition
i left j right repeat while (Ai lt
pivot) i while (Aj gt pivot) j--
if (iltj) swap(Ai, Aj)
i j else break
Why do we need i, j ?
There exists a sentinel Ak? pivot There exists a sentinel Ak? pivot There exists a sentinel Ak? pivot There exists a sentinel Ak? pivot There exists a sentinel Ak? pivot
Aleft Ai-1 Ai Aj Aright
? pivot ? pivot ? pivot ? pivot ? pivot
Review The Partition
At the end
? pivot ? pivot ? pivot ? pivot
Aleft Aj Ai Aright
? pivot ? pivot ? pivot ? pivot
Q How are these elements ?
A They are pivot !
quickSortRecursive(A, left, j) quickSortRecursive
(A, i, right)
Why is QuickSort Faster than Merge Sort?
  • Quicksort typically performs more comparisons
    than Mergesort, because partitions are not always
    perfectly balanced
  • Mergesort n log n comparisons
  • Quicksort 1.38 n log n comparisons on average
  • Quicksort performs many fewer copies, because on
    average half of the elements are on the correct
    side of the partition while Mergesort copies
    every element when merging
  • Mergesort 2n log n copies (using temp array)
  • n log n copies (using
    alternating array)
  • Quicksort n/2 log n copies on average

Stable Sorting Algorithms
  • Typical sorting scenario
  • Given N records R1, R2, ..., RN
  • They have N keys R1.A, ..., RN.A
  • Sort the records s.t.R1.A ? R2.A ? ... ?
  • A sorting algorithm is stable if
  • If i lt j and Ri.A Rj.A then Ri
    comes before Rj in the output

Stable Sorting Algorithms
  • Which of the following are stable sorting
    algorithms ?
  • Bubble sort
  • Insertion sort
  • Selection sort
  • Heap sort
  • Merge sort
  • Quick sort

Stable Sorting Algorithms
  • Which of the following are stable sorting
    algorithms ?
  • Bubble sort yes
  • Insertion sort yes
  • Selection sort yes
  • Heap sort no
  • Merge sort no
  • Quick sort no

We can always transform a non-stable sorting
algorithm into a stable one How ?
Detour Computing the Median
  • The median of A1, A2, , AN is some Ak
  • There exists N/2 elements ? Ak
  • There exists N/2 elements ? Ak
  • Think of it as the perfect pivot !
  • Very important in applications
  • Median income v.s. average income
  • Median grade v.s. average grade
  • To compute sort A1, , AN, then
  • Time O(N log N)
  • Can we do it in O(N) time ?

Detour Computing the Median
int medianRecursive(Array A, int left, int
right) if (leftright) return Aleft . .
. Partition . . . if N/2 ? j return
medianRecursive(A, left, j) if N/2 ? i return
medianRecursive(A, i, right) return
pivot Int median(Array A, int N) return
medianRecursive(A, 0, N-1)
Why ?
? pivot ? pivot ? pivot ? pivot
Aleft Aj Ai Aright
? pivot ? pivot ? pivot ? pivot
Detour Computing the Median
  • Best case running timeT(N) T(N/2) cN
    T(N/4) cN(1 1/2) T(N/8) cN(1
    1/2 1/4) . . . T(1)
    cN (1 1/2 1/4 1/2k) O(N)
  • Worst case O(N2)
  • Average case O(N)
  • Question how can you compute the median in O(N)
    worst case time ? Note its tricky.

Back to Sorting
  • Naïve sorting algorithms
  • Bubble sort, insertion sort, selection sort
  • Time O(N2)
  • Clever sorting algorithms
  • Merge sort, heap sort, quick sort
  • Time O(N log N)
  • I want to sort in O(N) !
  • Is this possible ?

Could We Do Better?
  • Consider any sorting algorithm based on
  • Run it on A1, A2, ..., AN
  • Assume they are distinct
  • At each step it compares some Ai with some Aj
  • If Ai lt Aj then it does something...
  • If Ai gt Aj then it does something else...
  • ? Decision Tree !

Decision tree to sort list A,B,C
Every possible execution of the algorithm
corresponds to a root-to-leafpath in the tree.
Max depth of the decision tree
  • How many permutations are there of N numbers?
  • How many leaves does the tree have?
  • Whats the shallowest tree with a given number of
  • What is therefore the worst running time (number
    of comparisons) by the best possible sorting

Max depth of the decision tree
  • How many permutations are there of N numbers?
  • N!
  • How many leaves does the tree have?
  • N!
  • Whats the shallowest tree with a given number of
  • log(N!)
  • What is therefore the worst running time (number
    of comparisons) by the best possible sorting
  • log(N!)

Stirlings approximation
At least onebranch in thetree has thisdepth
If you forget Stirlings formula...
TheoremEvery algorithm that sorts by comparing
keys takes ?(n log n) time
Bucket Sort
  • Now lets sort in O(N)
  • AssumeA0, A1, , AN-1 ?0, 1, , M-1M
    not too big
  • Example sort 1,000,000 person records on the
    first character of their last names
  • Hence M 128 (in practice M 27)

Bucket Sort
int bucketSort(Array A, int N) for k 0 to
M-1 Qk new Queue for j 0 to N-1
QAj.enqueue(Aj) Result new Queue
for k 0 to M-1 Result Result.append(Qk)
return Result
Stablesorting !
Bucket Sort
  • Running time O(MN)
  • Space O(MN)
  • Recall that M ltlt N, hence time O(N)
  • What about the Theorem that says sorting takes
    ?(N log N) ??

This is not realsorting, becauseits for
trivial keys
Radix Sort
  • I still want to sort in time O(N) non-trivial
  • A0, A1, , AN-1 are strings
  • Very common in practice
  • Each string iscd-1cd-2c1c0, where c0, c1, ,
    cd-1 ?0, 1, , M-1M 128
  • Other example decimal numbers

  • Radix The base of a number system (Websters
  • alternate terminology radix is number of bits
    needed to represent 0 to base-1 can say base 8
    or radix 3
  • Used in 1890 U.S. census by Hollerith
  • Idea BucketSort on each digit, bottom up.

The Magic of RadixSort
  • Input list 126, 328, 636, 341, 416, 131, 328
  • BucketSort on lower digit341, 131, 126, 636,
    416, 328, 328
  • BucketSort result on next-higher digit416, 126,
    328, 328, 131, 636, 341
  • BucketSort that result on highest digit126,
    131, 328, 328, 341, 416, 636

Inductive Proof that RadixSort Works
  • Keys d-digit numbers, base B
  • (that wasnt hard!)
  • Claim after ith BucketSort, least significant i
    digits are sorted.
  • Base case i0. 0 digits are sorted.
  • Inductive step Assume for i, prove for i1.
  • Consider two numbers X, Y. Say Xi is ith digit
    of X
  • Xi1 lt Yi1 then i1th BucketSort will put them
    in order
  • Xi1 gt Yi1 , same thing
  • Xi1 Yi1 , order depends on last i digits.
    Induction hypothesis says already sorted for
    these digits because BucketSort is stable

Radix Sort
int radixSort(Array A, int N) for k 0 to
d-1 A bucketSort(A, on position k)
Running time T O(d(MN)) O(dN) O(Size)
Radix Sort
35 53 55 33 52 32 25
Q0 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9
52 32
53 33
35 55 25
52 32 53 33 35 55 25
Q0 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9
32 33 35
52 53 55
25 32 33 35 52 53 55
Running time of Radixsort
  • N items, D digit keys of max value M
  • How many passes?
  • How much work per pass?
  • Total time?

Running time of Radixsort
  • N items, D digit keys of max value M
  • How many passes? D
  • How much work per pass? N M
  • just in case MgtN, need to account for time to
    empty out buckets between passes
  • Total time? O( D(NM) )

Radix Sort
  • What is the size of the input ? Size DN
  • Radix sort takes time O(Size) !!

cD-1 cD-2 c0
A0 S m i t h
A1 J o n e s

Radix Sort
  • Variable length strings
  • Can adapt Radix Sort to sort in time O(Size) !
  • What about our Theorem ??

Radix Sort
  • Suppose we want to sort N distinct numbers
  • Represent them in decimal
  • Need Dlog N digits
  • Hence RadixSort takes time O(DN) O(N log N)
  • The total Size of N keys is O(N log N) !
  • No conflict with theory ?

Sorting HUGE Data Sets
  • US Telephone Directory
  • 300,000,000 records
  • 64-bytes per record
  • Name 32 characters
  • Address 54 characters
  • Telephone number 10 characters
  • About 2 gigabytes of data
  • Sort this on a machine with 128 MB RAM
  • Other examples?

Merge Sort Good for Something!
  • Basis for most external sorting routines
  • Can sort any number of records using a tiny
    amount of main memory
  • in extreme case, only need to keep 2 records in
    memory at any one time!

External MergeSort
  • Split input into two tapes (or areas of disk)
  • Merge tapes so that each group of 2 records is
  • Split again
  • Merge tapes so that each group of 4 records is
  • Repeat until data entirely sorted

log N passes
Better External MergeSort
  • Suppose main memory can hold M records.
  • Initially read in groups of M records and sort
    them (e.g. with QuickSort).
  • Number of passes reduced to log(N/M)
