Title: Quicksort
1Quicksort
- Lecture 15
- Quicksort
- Background
- Worst Case Analysis
- Best Case Analysis
- Average Case Analysis
- Empirical Comparison
- Randomized Quicksort
- Lectures 13, 14
- Learning Activity
- Pair off with someone you
- dont know
- Take out a sheet of paper
- Solve problems
- Discover your questions
- Discuss in pairs
- Discuss in class
- Leave knowing how to solve linear constant
coefficient difference equations with geometric
forcing terms
2Quicksort
- Mergesort is Q(nlogn), but inconvenient for
implementation with arrays since we need space to
merge - Quicksort sorts in place, using partitioning
- Example Pivot about first element (3)
- 3 1 4 1 5 9 2 6 5 3 5 8 9 --- before
- 2 1 3 1 3 9 5 6 5 4 5 8 9 ---
after - At most n swaps
- Pivot element ends up in its final position
- No element left or right of pivot flips sides
- Sort each side independently
- Recursive Divide and Conquer approach
3Quicksort
procedure Quicksort (Ti..j) if j - i is small
enough, then insert(Ti..j) else pivot
(Ti..j,l) quicksort(Ti..l-1) quicksort(Tl1
..j)
Dividing the problem into subproblems based on
where the pivot ends up
4Choosing a Good Pivot
- This is the crux of an implementation
- What would the worst case be?
5Choosing a Good Pivot
- This is the crux of an implementation
- What would the best case be?
Splitting the input as evenly as possible results
in subproblems of size n/2. The total number of
levels then becomes log2n, and the effort to
partition each level is Q(n), for a total
complexity Q(nlogn)
6Choosing a Good Pivot
- The instances that become worst or best cases
depend on the method used for choosing a pivot - Books pivot routine chooses first in list
- Worst case is an already sorted list (!)
- This can be very bad for many applications that
expect the list to already be partially sorted - Could instead choose
- The middle element
- Median of first, last, and middle
- A random element
7Choosing a Good Pivot
- Quicksort takes Q(n2) in the worst case, and
Q(nlogn) in the best case. How does it perform
on average?
- We want to consider the runtime as a
- Random Variable and compute its expected
- value given a particular distribution, or
- probability measure, on possible inputs.
8Recall Elementary Probability
- A random variable is a function that assigns an
arbitrary number to each sample point.
Random Variable, X
Probability Measure
- PrXx means the probability
- of the event Xx, e.g
- PrX -1 .1
- PrX5 .2
- PrX 0 .1
Set S
-1 0 4 -1.5 2 -3 5 5 9 3 2
.1 .1 .2 .1 .05 .05 .1 .1 .05 .1 .05
.1
.1
.2
.1
.05
.05
.1
.1
.05
.1
.05
9Recall Elementary Probability
- The expectation of X (or expected value, or mean,
or average) is given by
Random Variable, X
Probability Measure
Set S
- Like the center of mass.
- In this example E(X) 2.45
-1 0 4 -1.5 2 -3 5 5 9 3 2
.1 .1 .2 .1 .05 .05 .1 .1 .05 .1 .05
.1
.1
.2
.1
.05
- The probability mass function
- of random variable X is just
- p(x) PrXx
.05
.1
.1
.05
.1
.05
10Quicksort Average Case Analysis
- What is the Sample Space?
All possible inputs of size n
Random Variable
Probability Measure
Set S
11Quicksort Average Case Analysis
- What is the Probability Measure?
Assume Equally Likely?
Random Variable
Probability Measure
Set S
1/n! 1/n! 1/n! 1/n! 1/n! 1/n!
12Quicksort Average Case Analysis
- What is the Random Variable?
Runtime on that input
Random Variable
Probability Measure
Set S
1/n! 1/n! 1/n! 1/n! 1/n! 1/n!
t(left of pivot)t(right of pivot) linear stuff
t(left of pivot)t(right of pivot) linear stuff
t(left of pivot)t(right of pivot) linear stuff
t(left of pivot)t(right of pivot) linear stuff
t(left of pivot)t(right of pivot) linear stuff
t(left of pivot)t(right of pivot) linear stuff
13Quicksort Average Case Analysis
- What is the Random Variable?
Runtime on that input
Pivot Location
Random Variable, X
Probability Measure
Set S
1/n! 1/n! 1/n! 1/n! 1/n! 1/n!
t(left of pivot)t(right of pivot) linear stuff
t(left of pivot)t(right of pivot) linear stuff
t(left of pivot)t(right of pivot) linear stuff
t(left of pivot)t(right of pivot) linear stuff
t(left of pivot)t(right of pivot) linear stuff
t(left of pivot)t(right of pivot) linear stuff
14Quicksort Average Case Analysis
- What is the Random Variable?
Runtime on that input
Pivot Location
Random Variable, X
Probability Measure
Set S
1/n! 1/n! 1/n! 1/n! 1/n! 1/n!
t(left of pivot)t(right of pivot) linear stuff
1st 1st 2nd 2nd 3rd 3rd
t(left of pivot)t(right of pivot) linear stuff
t(left of pivot)t(right of pivot) linear stuff
t(left of pivot)t(right of pivot) linear stuff
t(left of pivot)t(right of pivot) linear stuff
t(left of pivot)t(right of pivot) linear stuff
15Quicksort Average Case Analysis
- What is the Random Variable?
Runtime on that input
Pivot Location
Random Variable, X
Probability Measure
Set S
1/n! 1/n! 1/n! 1/n! 1/n! 1/n!
t(0)t(2) g(n)
1st 1st 2nd 2nd 3rd 3rd
Where t(m) is the average time to sort an array
of size m
t(0)t(2) g(n)
t(1)t(1) g(n)
t(1)t(1) g(n)
t(2)t(0) g(n)
t(2)t(0) g(n)
16Quicksort Average Case Analysis
- What is the Probability Mass Function for X?
Random Variable, X
Probability Measure
Set S
1/n! 1/n! 1/n! 1/n! 1/n! 1/n!
t(0)t(2) g(n)
t(0)t(2) g(n)
t(1)t(1) g(n)
- The probability mass function
- of random variable X is just
- p(x) PrXx
t(1)t(1) g(n)
t(2)t(0) g(n)
1/n
t(2)t(0) g(n)
x
t(0)t(2)g(n)
t(2)t(0)g(n)
t(1)t(1)g(n)
17Quicksort Average Case Analysis
- What is the Expected, or Average, value of X?
Random Variable, X
Probability Measure
Set S
1/n! 1/n! 1/n! 1/n! 1/n! 1/n!
t(0)t(2) g(n)
t(0)t(2) g(n)
t(1)t(1) g(n)
- The probability mass function
- of random variable X is just
- p(x) PrXx
t(1)t(1) g(n)
t(2)t(0) g(n)
1/n
t(2)t(0) g(n)
x
t(0)t(2)g(n)
t(2)t(0)g(n)
t(1)t(1)g(n)
18Quicksort Average Case Analysis
- What is the Expected, or Average, value of X?
Random Variable, X
Probability Measure
Set S
1/n! 1/n! 1/n! 1/n! 1/n! 1/n!
t(0)t(2) g(n)
t(0)t(2) g(n)
t(1)t(1) g(n)
- The probability mass function
- of random variable X is just
- p(x) PrXx
t(1)t(1) g(n)
t(2)t(0) g(n)
1/n
t(2)t(0) g(n)
x
t(0)t(2)g(n)
t(2)t(0)g(n)
t(1)t(1)g(n)
19Quicksort Average Case Analysis
- What is the Expected, or Average, value of X?
Random Variable, X
Probability Measure
Set S
1/n! 1/n! 1/n! 1/n! 1/n! 1/n!
t(0)t(2) g(n)
t(0)t(2) g(n)
t(1)t(1) g(n)
- The probability mass function
- of random variable X is just
- p(x) PrXx
t(1)t(1) g(n)
t(2)t(0) g(n)
1/n
t(2)t(0) g(n)
x
t(0)t(2)g(n)
t(2)t(0)g(n)
t(1)t(1)g(n)
20Quicksort Average Case Analysis
Is this difference equation Linear? Is it
Constant Coefficient? Do our solution methods
apply?
21Quicksort Average Case Analysis
Is this difference equation Linear? Is it
Constant Coefficient? Do our solution methods
apply?
22Quicksort Average Case Analysis
Book claims the hidden constants are smaller
for Quicksort than for Heapsort or Mergesort
23Empirical Comparison
Averages computed from 50 random samples
Miliseconds
Size of Instance
24Empirical Comparison
Averages computed from 50 random samples
Miliseconds
Size of Instance
25Empirical Comparison
Averages computed from 50 random samples
Miliseconds (log scale)
Size of Instance
26Empirical Comparison
Averages computed from 50 random samples
Miliseconds
Size of Instance
27Worst Case in 50
Averages computed from 50 random samples
Miliseconds
Size of Instance
Which has the better theoretical worst case?
practical worst case?
28Worst Case
Averages computed from 50 random samples
Miliseconds
Size of Instance
29Worst Case Asymptotic Bound
Averages computed from 50 random samples
Miliseconds
Size of Instance
30Why Randomize Quicksort?
- Avoiding pivoting around the first (or last)
element yields a worst case instance that is not
an already sorted list - No matter how a pivot is chosen, though, there is
an instance that generates the worst-case runtime - An enemy could target the worst case instance!
The average runtime depends strongly on how we
assumed the probability measure is distributed
over the algorithm domain, so by making the
worst case instance more likely, the expected
runtime approaches worst case - If you scramble every input (or randomly choose a
pivot), there is no instance that always yields
the worst case runtime the expected runtime
becomes independent of the (unknown) probability
measure on the algorithm domain - The worst case is still possible (now, from every
input), but it is also very unlikely from any
input