CSE 326: Data Structures: Sorting - PowerPoint PPT Presentation

About This Presentation
Title:

CSE 326: Data Structures: Sorting

Description:

CSE 326: Data Structures: Sorting Lecture 16: Friday, Feb 14, 2003 Review: QuickSort Review: The Partition Review: The Partition Why is QuickSort Faster than Merge Sort? – PowerPoint PPT presentation

Number of Views:572
Avg rating:3.0/5.0
Slides: 37
Provided by: DANS154
Category:
Tags: cse | data | merge | sort | sorting | structures

less

Transcript and Presenter's Notes

Title: CSE 326: Data Structures: Sorting


1
CSE 326 Data Structures Sorting
  • Lecture 16 Friday, Feb 14, 2003

2
Review QuickSort
procedure quickSortRecursive (Array A, int left,
int right) if (left right) return int pivot
choosePivot(A, left, right) / partition A
s.t. Aleft, Aleft1, , Ai ? pivot
Ai1, Ai2, , Aright ? pivot / quickSortR
ecursive(A, left, i) quickSortRecursive(A, i1,
right)
3
Review The Partition
i left j right repeat while (Ai lt
pivot) i while (Aj gt pivot) j--
if (iltj) swap(Ai, Aj)
i j else break
Why do we need i, j ?
There exists a sentinel Ak? pivot There exists a sentinel Ak? pivot There exists a sentinel Ak? pivot There exists a sentinel Ak? pivot There exists a sentinel Ak? pivot
Aleft Ai-1 Ai Aj Aright
? pivot ? pivot ? pivot ? pivot ? pivot
4
Review The Partition
At the end
? pivot ? pivot ? pivot ? pivot
Aleft Aj Ai Aright
? pivot ? pivot ? pivot ? pivot
Q How are these elements ?
A They are pivot !
quickSortRecursive(A, left, j) quickSortRecursive
(A, i, right)
5
Why is QuickSort Faster than Merge Sort?
  • Quicksort typically performs more comparisons
    than Mergesort, because partitions are not always
    perfectly balanced
  • Mergesort n log n comparisons
  • Quicksort 1.38 n log n comparisons on average
  • Quicksort performs many fewer copies, because on
    average half of the elements are on the correct
    side of the partition while Mergesort copies
    every element when merging
  • Mergesort 2n log n copies (using temp array)
  • n log n copies (using
    alternating array)
  • Quicksort n/2 log n copies on average

6
Stable Sorting Algorithms
  • Typical sorting scenario
  • Given N records R1, R2, ..., RN
  • They have N keys R1.A, ..., RN.A
  • Sort the records s.t.R1.A ? R2.A ? ... ?
    RN.A
  • A sorting algorithm is stable if
  • If i lt j and Ri.A Rj.A then Ri
    comes before Rj in the output

7
Stable Sorting Algorithms
  • Which of the following are stable sorting
    algorithms ?
  • Bubble sort
  • Insertion sort
  • Selection sort
  • Heap sort
  • Merge sort
  • Quick sort

8
Stable Sorting Algorithms
  • Which of the following are stable sorting
    algorithms ?
  • Bubble sort yes
  • Insertion sort yes
  • Selection sort yes
  • Heap sort no
  • Merge sort no
  • Quick sort no

We can always transform a non-stable sorting
algorithm into a stable one How ?
9
Detour Computing the Median
  • The median of A1, A2, , AN is some Ak
    s.t.
  • There exists N/2 elements ? Ak
  • There exists N/2 elements ? Ak
  • Think of it as the perfect pivot !
  • Very important in applications
  • Median income v.s. average income
  • Median grade v.s. average grade
  • To compute sort A1, , AN, then
    medianAN/2
  • Time O(N log N)
  • Can we do it in O(N) time ?

10
Detour Computing the Median
int medianRecursive(Array A, int left, int
right) if (leftright) return Aleft . .
. Partition . . . if N/2 ? j return
medianRecursive(A, left, j) if N/2 ? i return
medianRecursive(A, i, right) return
pivot Int median(Array A, int N) return
medianRecursive(A, 0, N-1)
Why ?
? pivot ? pivot ? pivot ? pivot
Aleft Aj Ai Aright
? pivot ? pivot ? pivot ? pivot
11
Detour Computing the Median
  • Best case running timeT(N) T(N/2) cN
    T(N/4) cN(1 1/2) T(N/8) cN(1
    1/2 1/4) . . . T(1)
    cN (1 1/2 1/4 1/2k) O(N)
  • Worst case O(N2)
  • Average case O(N)
  • Question how can you compute the median in O(N)
    worst case time ? Note its tricky.

12
Back to Sorting
  • Naïve sorting algorithms
  • Bubble sort, insertion sort, selection sort
  • Time O(N2)
  • Clever sorting algorithms
  • Merge sort, heap sort, quick sort
  • Time O(N log N)
  • I want to sort in O(N) !
  • Is this possible ?

13
Could We Do Better?
  • Consider any sorting algorithm based on
    comparisons
  • Run it on A1, A2, ..., AN
  • Assume they are distinct
  • At each step it compares some Ai with some Aj
  • If Ai lt Aj then it does something...
  • If Ai gt Aj then it does something else...
  • ? Decision Tree !

14
Decision tree to sort list A,B,C
Every possible execution of the algorithm
corresponds to a root-to-leafpath in the tree.
15
Max depth of the decision tree
  • How many permutations are there of N numbers?
  • How many leaves does the tree have?
  • Whats the shallowest tree with a given number of
    leaves?
  • What is therefore the worst running time (number
    of comparisons) by the best possible sorting
    algorithm?

16
Max depth of the decision tree
  • How many permutations are there of N numbers?
  • N!
  • How many leaves does the tree have?
  • N!
  • Whats the shallowest tree with a given number of
    leaves?
  • log(N!)
  • What is therefore the worst running time (number
    of comparisons) by the best possible sorting
    algorithm?
  • log(N!)

17
Stirlings approximation
At least onebranch in thetree has thisdepth
18
If you forget Stirlings formula...
TheoremEvery algorithm that sorts by comparing
keys takes ?(n log n) time
19
Bucket Sort
  • Now lets sort in O(N)
  • AssumeA0, A1, , AN-1 ?0, 1, , M-1M
    not too big
  • Example sort 1,000,000 person records on the
    first character of their last names
  • Hence M 128 (in practice M 27)

20
Bucket Sort
int bucketSort(Array A, int N) for k 0 to
M-1 Qk new Queue for j 0 to N-1
QAj.enqueue(Aj) Result new Queue
for k 0 to M-1 Result Result.append(Qk)
return Result
Stablesorting !
21
Bucket Sort
  • Running time O(MN)
  • Space O(MN)
  • Recall that M ltlt N, hence time O(N)
  • What about the Theorem that says sorting takes
    ?(N log N) ??

This is not realsorting, becauseits for
trivial keys
22
Radix Sort
  • I still want to sort in time O(N) non-trivial
    keys
  • A0, A1, , AN-1 are strings
  • Very common in practice
  • Each string iscd-1cd-2c1c0, where c0, c1, ,
    cd-1 ?0, 1, , M-1M 128
  • Other example decimal numbers

23
RadixSort
  • Radix The base of a number system (Websters
    dictionary)
  • alternate terminology radix is number of bits
    needed to represent 0 to base-1 can say base 8
    or radix 3
  • Used in 1890 U.S. census by Hollerith
  • Idea BucketSort on each digit, bottom up.

24
The Magic of RadixSort
  • Input list 126, 328, 636, 341, 416, 131, 328
  • BucketSort on lower digit341, 131, 126, 636,
    416, 328, 328
  • BucketSort result on next-higher digit416, 126,
    328, 328, 131, 636, 341
  • BucketSort that result on highest digit126,
    131, 328, 328, 341, 416, 636

25
Inductive Proof that RadixSort Works
  • Keys d-digit numbers, base B
  • (that wasnt hard!)
  • Claim after ith BucketSort, least significant i
    digits are sorted.
  • Base case i0. 0 digits are sorted.
  • Inductive step Assume for i, prove for i1.
  • Consider two numbers X, Y. Say Xi is ith digit
    of X
  • Xi1 lt Yi1 then i1th BucketSort will put them
    in order
  • Xi1 gt Yi1 , same thing
  • Xi1 Yi1 , order depends on last i digits.
    Induction hypothesis says already sorted for
    these digits because BucketSort is stable

26
Radix Sort
int radixSort(Array A, int N) for k 0 to
d-1 A bucketSort(A, on position k)
Running time T O(d(MN)) O(dN) O(Size)
27
Radix Sort
35 53 55 33 52 32 25
A
Q0 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9
52 32
53 33
35 55 25
52 32 53 33 35 55 25
A
Q0 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9
32 33 35
25
52 53 55
25 32 33 35 52 53 55
A
28
Running time of Radixsort
  • N items, D digit keys of max value M
  • How many passes?
  • How much work per pass?
  • Total time?

29
Running time of Radixsort
  • N items, D digit keys of max value M
  • How many passes? D
  • How much work per pass? N M
  • just in case MgtN, need to account for time to
    empty out buckets between passes
  • Total time? O( D(NM) )

30
Radix Sort
  • What is the size of the input ? Size DN
  • Radix sort takes time O(Size) !!

cD-1 cD-2 c0
A0 S m i t h
A1 J o n e s

AN-1
31
Radix Sort
  • Variable length strings
  • Can adapt Radix Sort to sort in time O(Size) !
  • What about our Theorem ??

A0
A1
A2
A3
A4
32
Radix Sort
  • Suppose we want to sort N distinct numbers
  • Represent them in decimal
  • Need Dlog N digits
  • Hence RadixSort takes time O(DN) O(N log N)
  • The total Size of N keys is O(N log N) !
  • No conflict with theory ?

33
Sorting HUGE Data Sets
  • US Telephone Directory
  • 300,000,000 records
  • 64-bytes per record
  • Name 32 characters
  • Address 54 characters
  • Telephone number 10 characters
  • About 2 gigabytes of data
  • Sort this on a machine with 128 MB RAM
  • Other examples?

34
Merge Sort Good for Something!
  • Basis for most external sorting routines
  • Can sort any number of records using a tiny
    amount of main memory
  • in extreme case, only need to keep 2 records in
    memory at any one time!

35
External MergeSort
  • Split input into two tapes (or areas of disk)
  • Merge tapes so that each group of 2 records is
    sorted
  • Split again
  • Merge tapes so that each group of 4 records is
    sorted
  • Repeat until data entirely sorted

log N passes
36
Better External MergeSort
  • Suppose main memory can hold M records.
  • Initially read in groups of M records and sort
    them (e.g. with QuickSort).
  • Number of passes reduced to log(N/M)
Write a Comment
User Comments (0)
About PowerShow.com