Title: CSE 326: Data Structures: Sorting
1CSE 326 Data Structures Sorting
- Lecture 16 Friday, Feb 14, 2003
2Review QuickSort
procedure quickSortRecursive (Array A, int left,
int right) if (left right) return int pivot
choosePivot(A, left, right) / partition A
s.t. Aleft, Aleft1, , Ai ? pivot
Ai1, Ai2, , Aright ? pivot / quickSortR
ecursive(A, left, i) quickSortRecursive(A, i1,
right)
3Review The Partition
i left j right repeat while (Ai lt
pivot) i while (Aj gt pivot) j--
if (iltj) swap(Ai, Aj)
i j else break
Why do we need i, j ?
There exists a sentinel Ak? pivot There exists a sentinel Ak? pivot There exists a sentinel Ak? pivot There exists a sentinel Ak? pivot There exists a sentinel Ak? pivot
Aleft Ai-1 Ai Aj Aright
? pivot ? pivot ? pivot ? pivot ? pivot
4Review The Partition
At the end
? pivot ? pivot ? pivot ? pivot
Aleft Aj Ai Aright
? pivot ? pivot ? pivot ? pivot
Q How are these elements ?
A They are pivot !
quickSortRecursive(A, left, j) quickSortRecursive
(A, i, right)
5Why is QuickSort Faster than Merge Sort?
- Quicksort typically performs more comparisons
than Mergesort, because partitions are not always
perfectly balanced - Mergesort n log n comparisons
- Quicksort 1.38 n log n comparisons on average
- Quicksort performs many fewer copies, because on
average half of the elements are on the correct
side of the partition while Mergesort copies
every element when merging - Mergesort 2n log n copies (using temp array)
- n log n copies (using
alternating array) - Quicksort n/2 log n copies on average
6Stable Sorting Algorithms
- Typical sorting scenario
- Given N records R1, R2, ..., RN
- They have N keys R1.A, ..., RN.A
- Sort the records s.t.R1.A ? R2.A ? ... ?
RN.A - A sorting algorithm is stable if
- If i lt j and Ri.A Rj.A then Ri
comes before Rj in the output
7Stable Sorting Algorithms
- Which of the following are stable sorting
algorithms ? - Bubble sort
- Insertion sort
- Selection sort
- Heap sort
- Merge sort
- Quick sort
8Stable Sorting Algorithms
- Which of the following are stable sorting
algorithms ? - Bubble sort yes
- Insertion sort yes
- Selection sort yes
- Heap sort no
- Merge sort no
- Quick sort no
We can always transform a non-stable sorting
algorithm into a stable one How ?
9Detour Computing the Median
- The median of A1, A2, , AN is some Ak
s.t. - There exists N/2 elements ? Ak
- There exists N/2 elements ? Ak
- Think of it as the perfect pivot !
- Very important in applications
- Median income v.s. average income
- Median grade v.s. average grade
- To compute sort A1, , AN, then
medianAN/2 - Time O(N log N)
- Can we do it in O(N) time ?
10Detour Computing the Median
int medianRecursive(Array A, int left, int
right) if (leftright) return Aleft . .
. Partition . . . if N/2 ? j return
medianRecursive(A, left, j) if N/2 ? i return
medianRecursive(A, i, right) return
pivot Int median(Array A, int N) return
medianRecursive(A, 0, N-1)
Why ?
? pivot ? pivot ? pivot ? pivot
Aleft Aj Ai Aright
? pivot ? pivot ? pivot ? pivot
11Detour Computing the Median
- Best case running timeT(N) T(N/2) cN
T(N/4) cN(1 1/2) T(N/8) cN(1
1/2 1/4) . . . T(1)
cN (1 1/2 1/4 1/2k) O(N) - Worst case O(N2)
- Average case O(N)
- Question how can you compute the median in O(N)
worst case time ? Note its tricky.
12Back to Sorting
- Naïve sorting algorithms
- Bubble sort, insertion sort, selection sort
- Time O(N2)
- Clever sorting algorithms
- Merge sort, heap sort, quick sort
- Time O(N log N)
- I want to sort in O(N) !
- Is this possible ?
13Could We Do Better?
- Consider any sorting algorithm based on
comparisons - Run it on A1, A2, ..., AN
- Assume they are distinct
- At each step it compares some Ai with some Aj
- If Ai lt Aj then it does something...
- If Ai gt Aj then it does something else...
- ? Decision Tree !
14Decision tree to sort list A,B,C
Every possible execution of the algorithm
corresponds to a root-to-leafpath in the tree.
15Max depth of the decision tree
- How many permutations are there of N numbers?
- How many leaves does the tree have?
- Whats the shallowest tree with a given number of
leaves? - What is therefore the worst running time (number
of comparisons) by the best possible sorting
algorithm?
16Max depth of the decision tree
- How many permutations are there of N numbers?
- N!
- How many leaves does the tree have?
- N!
- Whats the shallowest tree with a given number of
leaves? - log(N!)
- What is therefore the worst running time (number
of comparisons) by the best possible sorting
algorithm? - log(N!)
17Stirlings approximation
At least onebranch in thetree has thisdepth
18If you forget Stirlings formula...
TheoremEvery algorithm that sorts by comparing
keys takes ?(n log n) time
19Bucket Sort
- Now lets sort in O(N)
- AssumeA0, A1, , AN-1 ?0, 1, , M-1M
not too big - Example sort 1,000,000 person records on the
first character of their last names - Hence M 128 (in practice M 27)
20Bucket Sort
int bucketSort(Array A, int N) for k 0 to
M-1 Qk new Queue for j 0 to N-1
QAj.enqueue(Aj) Result new Queue
for k 0 to M-1 Result Result.append(Qk)
return Result
Stablesorting !
21Bucket Sort
- Running time O(MN)
- Space O(MN)
- Recall that M ltlt N, hence time O(N)
- What about the Theorem that says sorting takes
?(N log N) ??
This is not realsorting, becauseits for
trivial keys
22Radix Sort
- I still want to sort in time O(N) non-trivial
keys - A0, A1, , AN-1 are strings
- Very common in practice
- Each string iscd-1cd-2c1c0, where c0, c1, ,
cd-1 ?0, 1, , M-1M 128 - Other example decimal numbers
23RadixSort
- Radix The base of a number system (Websters
dictionary) - alternate terminology radix is number of bits
needed to represent 0 to base-1 can say base 8
or radix 3 - Used in 1890 U.S. census by Hollerith
- Idea BucketSort on each digit, bottom up.
24The Magic of RadixSort
- Input list 126, 328, 636, 341, 416, 131, 328
- BucketSort on lower digit341, 131, 126, 636,
416, 328, 328 - BucketSort result on next-higher digit416, 126,
328, 328, 131, 636, 341 - BucketSort that result on highest digit126,
131, 328, 328, 341, 416, 636
25Inductive Proof that RadixSort Works
- Keys d-digit numbers, base B
- (that wasnt hard!)
- Claim after ith BucketSort, least significant i
digits are sorted. - Base case i0. 0 digits are sorted.
- Inductive step Assume for i, prove for i1.
- Consider two numbers X, Y. Say Xi is ith digit
of X - Xi1 lt Yi1 then i1th BucketSort will put them
in order - Xi1 gt Yi1 , same thing
- Xi1 Yi1 , order depends on last i digits.
Induction hypothesis says already sorted for
these digits because BucketSort is stable
26Radix Sort
int radixSort(Array A, int N) for k 0 to
d-1 A bucketSort(A, on position k)
Running time T O(d(MN)) O(dN) O(Size)
27Radix Sort
35 53 55 33 52 32 25
A
Q0 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9
52 32
53 33
35 55 25
52 32 53 33 35 55 25
A
Q0 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9
32 33 35
25
52 53 55
25 32 33 35 52 53 55
A
28Running time of Radixsort
- N items, D digit keys of max value M
- How many passes?
- How much work per pass?
-
- Total time?
29Running time of Radixsort
- N items, D digit keys of max value M
- How many passes? D
- How much work per pass? N M
- just in case MgtN, need to account for time to
empty out buckets between passes - Total time? O( D(NM) )
30Radix Sort
- What is the size of the input ? Size DN
- Radix sort takes time O(Size) !!
cD-1 cD-2 c0
A0 S m i t h
A1 J o n e s
AN-1
31Radix Sort
- Variable length strings
- Can adapt Radix Sort to sort in time O(Size) !
- What about our Theorem ??
A0
A1
A2
A3
A4
32Radix Sort
- Suppose we want to sort N distinct numbers
- Represent them in decimal
- Need Dlog N digits
- Hence RadixSort takes time O(DN) O(N log N)
- The total Size of N keys is O(N log N) !
- No conflict with theory ?
33Sorting HUGE Data Sets
- US Telephone Directory
- 300,000,000 records
- 64-bytes per record
- Name 32 characters
- Address 54 characters
- Telephone number 10 characters
- About 2 gigabytes of data
- Sort this on a machine with 128 MB RAM
- Other examples?
34Merge Sort Good for Something!
- Basis for most external sorting routines
- Can sort any number of records using a tiny
amount of main memory - in extreme case, only need to keep 2 records in
memory at any one time!
35External MergeSort
- Split input into two tapes (or areas of disk)
- Merge tapes so that each group of 2 records is
sorted - Split again
- Merge tapes so that each group of 4 records is
sorted - Repeat until data entirely sorted
log N passes
36Better External MergeSort
- Suppose main memory can hold M records.
- Initially read in groups of M records and sort
them (e.g. with QuickSort). - Number of passes reduced to log(N/M)