Title: 2IL05 Data Structures 2IL06 Introduction to Algorithms
12IL05 Data Structures 2IL06 Introduction to
Algorithms
- Spring 2009Lecture 5 QuickSort Selection
2QuickSort
One more sorting algorithm
3Sorting algorithms
- Input a sequence of n numbers a1, a2, , an
- Output a permutation of the input such that ai1
ain - Important properties of sorting algorithms
- running time how fast is the algorithm in the
worst case - in place only a constant number of input
elements are ever stored outside the
input array
4Sorting algorithms
- Input a sequence of n numbers a1, a2, , an
- Output a permutation of the input such that ai1
ain - Important properties of sorting algorithms
- running time how fast is the algorithm in the
worst case - in place only a constant number of input
elements are ever stored outside the
input array
T(n2)
yes
T(n log n)
no
yes
T(n log n)
T(n2)
yes
5QuickSort
T(n2)
yes
T(n log n)
no
yes
T(n log n)
T(n2)
yes
- Why QuickSort?
- Expected running time T(n log n) (randomized
QuickSort) - Constants hidden in T(n log n) are small
- using linear time median finding to guarantee
good pivot gives worst case T(n log n)
6QuickSort
- QuickSort is a divide-and-conquer algorithm
- To sort the subarray Ap..r
- DividePartition Ap..r into two subarrays
Ap..q-1 and Aq1..r, such that each element
in Ap..q-1 is Aq and Aq is lt each element
in Aq1..r. - ConquerSort the two subarrays by recursive calls
to QuickSort - CombineNo work is needed to combine the
subarrays, since they are sorted in place. - Divide using a procedure Partition which returns
q.
7QuickSort
- QuickSort(A, p, r)
- if p lt r
- then q ? Partition(A, p, r)
- QuickSort(A, p, q-1)
- QuickSort(A, q1, r)
- Partition(A, p, r)
- x ? Ar
- i ? p-1
- for j ? p to r-1
- do if Aj x
- then i ? i1
- exchange Ai ? Aj
- exchange Ai1 ? Ar
- return i1
- Initial call QuickSort(A, 1, n)
- Partition always selects Ar as the pivot (the
element around which to partition)
8Partition
- As Partition executes, the arrayis partitioned
into four regions (some may be empty) - Loop invariant
- all entries in Ap..i are pivot
- all entries in Ai1..j-1 are gt pivot
- Ar pivot
- Partition(A, p, r)
- x ? Ar
- i ? p-1
- for j ? p to r-1
- do if Aj x
- then i ? i1
- exchange Ai ? Aj
- exchange Ai1 ? Ar
- return i1
9Partition
- Partition(A, p, r)
- x ? Ar
- i ? p-1
- for j ? p to r-1
- do if Aj x
- then i ? i1
- exchange Ai ? Aj
- exchange Ai1 ? Ar
- return i1
10Partition - Correctness
- Partition(A, p, r)
- x ? Ar
- i ? p-1
- for j ? p to r-1
- do if Aj x
- then i ? i1
- exchange Ai ? Aj
- exchange Ai1 ? Ar
- return i1
- Loop invariant
- all entries in Ap..i are pivot
- all entries in Ai1..j-1 are gt pivot
- Ar pivot
- Initializationbefore the loop starts, all
conditions are satisfied, since r is the pivot
and the two subarrays Ap..i and Ai1..j-1 are
empty - Maintenancewhile the loop is running, if Aj
pivot, then Aj and Ai1 are swapped and then
i and j are incremented ? 1. and 2. hold.If Aj
gt pivot, then increment only j ? 1. and 2. hold.
11Partition - Correctness
- Partition(A, p, r)
- x ? Ar
- i ? p-1
- for j ? p to r-1
- do if Aj x
- then i ? i1
- exchange Ai ? Aj
- exchange Ai1 ? Ar
- return i1
- Loop invariant
- all entries in Ap..i are pivot
- all entries in Ai1..j-1 are gt pivot
- Ar pivot
- Terminationwhen the loop terminates, j r, so
all elements in A are partitioned into one of
three cases - Ap..i pivot, Ai1..r-1 gt pivot, and Ar
pivot - Lines 7 and 8 move the pivot between the two
subarrays - Running time
T(n) for an n-element subarray
12QuickSort running time
- QuickSort(A, p, r)
- if p lt r
- then q ? Partition(A, p, r)
- QuickSort(A, p, q-1)
- QuickSort(A, q1, r)
- Running time depends on partitioning of
subarrays - if they are balanced, then QuickSort is as fast
as MergeSort - if they are unbalanced, then QuickSort can be as
slow as InsertionSort - Worst case
- subarrays completely unbalanced 0 elements in
one, n-1 in the other - T(n) T(n-1) T(0) T(n) T(n-1) T(n)
T(n2) - input sorted array
13QuickSort running time
- QuickSort(A, p, r)
- if p lt r
- then q ? Partition(A, p, r)
- QuickSort(A, p, q-1)
- QuickSort(A, q1, r)
- Running time depends on partitioning of
subarrays - if they are balanced, then QuickSort is as fast
as MergeSort - if they are unbalanced, then QuickSort can be as
slow as InsertionSort - Best case
- subarrays completely balanced each has n/2
elements - T(n) 2T(n/2) T(n) T(n log n)
- Average?
14QuickSort running time
- Average running time is much closer to best case
than to worst case. - Intuition
- imagine that Partition always produces a 9-to1
split - T(n) T(9n/10) T(n/10) T(n)
15T(n) T(9n/10) T(n/10) T(n)
- Remember Section 4.2 (or Lecture 2)
- log10n full levels, log10/9n non-empty levels
- base of log does not matter in asymptotic
notation (as long as it is constant)
16QuickSort running time
- Average running time is much closer to best case
than to worst case. - Intuition
- imagine that Partition always produces a 9-to1
split - T(n) T(9n/10) T(n/10) T(n)
- T(n log n)
- Any split of constant proportionality yields a
recursion tree of depth T(log n) -
- But splits will not always be constant, there
will be a mix of good and - bad splits
17QuickSort running time
- Average running time is much closer to best case
than to worst case. - More intuition
- mixing good and bad splits does not affect the
asymptotic running time - assume levels alternate between best-case and
worst case splits - extra levels add only to hidden constant, in both
cases O(n log n)
18Randomized QuickSort
- pick pivot at random
- RandomizedPartition(A, p, r)
- i ? Random(p, r)
- exchange Ar ?Ai
- return Partition(A, p, r)
- random pivot results in reasonably balanced split
on average ? expected running time T(n log n) - see book for detailed analysis
- alternative use linear time median finding to
find a good pivot ? worst case running time T(n
log n)price to pay added complexity
19Selection
Medians and Order Statistics
20Definitions
- ith order statistic ith smallest of a set of n
elements - minimum 1st order statistic
- maximum nth order statistic
- median halfway point
- n odd ? unique median at i (n1)/2
- n even ? lower median at i n/2, upper median at
i n/21 - here median means lower median
21The selection problem
- Input a set A of of n distinct numbers and a
number i, with 1 i n. - Output The element x ? A that is larger than
exactly i-1 other elements in A. (The ith
smallest element of A.) - Easy solution
- sort the input in T(n log n) time
- return the ith element in the sorted array
- This can be done faster
start with minimum and maximum
22Minimum and maximum
- Find the minimum with n-1 comparisons examine
each element in turn and keep track of the
smallest one - Is this the best we can do?
- Each element (except the minimum) must be
compared to a smaller element at least once - Minimum(A, n)
- min ? A1
- for i ? 2 to n
- do if min gt Ai
- then min ? Ai
- return min
- Find maximum by replacing gt with lt
yes
23Simultaneous minimum and maximum
- Assume we need to find both the minimum and the
maximum - Easy solution find both separately
- ? 2n-2 comparisons ? T(n) time
- But only 3 n/2 are needed
- maintain the minimum and maximum seen so far
- dont compare elements to the minimum and maximum
separately, process them in pairs - compare the elements of each pair to each other,
then compare the largest to the maximum and the
smallest to the minimum - ? 3 comparisons for every 2 elements
24The selection problem
- Input a set A of of n distinct numbers and a
number i, with 1 i n. - Output The element x ? A that is larger than
exactly i-1 other elements in A. (The ith
smallest element of A.) - TheoremThe ith smallest element of A can be
found in O(n) time in the worst case. - Idea
- partition the input array, recurse on one side of
the split - guarantee a good split
- use Partition with a designated pivot element
25Selection in worst-case linear time
20
3
10
8
14
6
12
9
11
18
7
4
5
17
15
1
2
13
i 12
- Divide the n elements into groups of 5 ? n/5
groups
26Selection in worst-case linear time
x
20
3
10
8
14
6
12
9
11
18
7
4
5
17
15
1
2
13
i 12
- Divide the n elements into groups of 5 ? n/5
groups - Find the median of each of the n/5 groups(sort
each group of 5 elements in constant time and
simply pick the median) - Find the median x of the n/5 medians
recursively - Partition the array around x
27Selection in worst-case linear time
i 12
- Divide the n elements into groups of 5 ? n/5
groups - Find the median of each of the n/5 groups(sort
each group of 5 elements in constant time and
simply pick the median) - Find the median x of the n/5 medians
recursively - Partition the array around x
? x is the kth element after partitioning
28Selection in worst-case linear time
i 12
- Divide the n elements into groups of 5 ? n/5
groups - Find the median of each of the n/5 groups(sort
each group of 5 elements in constant time and
simply pick the median) - Find the median x of the n/5 medians
recursively - Partition the array around x
- If i k, return x. If i lt k, recursively find
the ith smallest element on the low side. If i gt
k, recursively find the (i-k)th smallest element
on the high side.
? x is the kth element after partitioning
29Selection in worst-case linear time
i 12
- Divide the n elements into groups of 5 ? n/5
groups - Find the median of each of the n/5 groups(sort
each group of 5 elements in constant time and
simply pick the median) - Find the median x of the n/5 medians
recursively - Partition the array around x
- If i k, return x. If i lt k, recursively find
the ith smallest element on the low side. If i gt
k, recursively find the (i-k)th smallest element
on the high side.
? x is the kth element after partitioning
i 5
30Analysis
- How many elements are larger than x?
31Analysis
- How many elements are larger than x?
- Half of the medians found in step 2 are x
- The groups of these medians contain 3 elements
each which are gt x(discounting xs group and the
last group) - ? at least
elements are gt x
x
32Analysis
- Symmetrically, at least 3n/10 6 elements are lt
x - ? the algorithm recurses on 7n/10 6 elements
x
33Analysis
- Divide the n elements into groups of 5 ? n/5
groups - Find the median of each of the n/5 groups(sort
each group of 5 elements in constant time and
simply pick the median) - Find the median x of the n/5 medians
recursively - Partition the array around x
- If i k, return x. If i lt k, recursively find
the ith smallest element on the low side. If i gt
k, recursively find the (i-k)th smallest element
on the high side.
- O(n)
- O(n)
- T( n/5 )
- O(n)
- T(7n/10 6)
- T(n) O(1) for small n (lt 140)
34Solving the recurrence
- Solve by substitution
- Inductive hypothesis T(n) cn for some constant
c and all n gt 0 - assume that c is large enough such that T(n) cn
for all n lt 140 - pick constant a such that the O(n) term is an
for all n gt 0 - T(n) c n/5 c(7n/10 6) an
- c n/5 c 7cn/10 6c an
- 9cn/10 7c an
- cn (-cn/10 7c an)
- remains to show -cn/10 7c an 0
35Solving the recurrence
- remains to show -cn/10 7c an 0
- -cn/10 7c an 0
- cn/10 -7c an
- cn -70c 10an
- c(n -70) 10an
- c 10a(n/(n-70))
- n 140 ? n/(n-70) 2
- ? 20a 10a(n/(n-70))
- choose c 20a ? T(n) O(n)
Why 140? Any integer gt 70 would have worked
36Selection
- TheoremThe ith smallest element of A can be
found in O(n) time in the worst case. - Does not require any assumptions on the input
- Is not in conflict with the O(n log n) lower
bound for sorting, since it does not use sorting - Randomized Selection pick a pivot at random
- TheoremThe ith smallest element of A can be
found in O(n) expected time.
37Tutorials this week
- Small tutorials on Tuesday 34.
- No Wednesday 78 big tutorial.
- Small tutorial Friday 78.