Title: Selection --Medians and Order Statistics (Chap. 9)
1Selection --Medians and Order Statistics (Chap. 9)
- The ith order statistic of n elements Sa1,
a2,, an ith smallest elements - Also called selection problem
- Minimum and maximum
- Median, lower median, upper median
- Selection in expected/average linear time
- Selection in worst-case linear time
2O(nlg n) Algorithm
- Suppose n elements are sorted by an O(nlg n)
algorithm, e.g., MERGE-SORT - Minimum the first element
- Maximum the last element
- The ith order statistic the ith element.
- Median
- If n is odd, then ((n1)/2)th element.
- If n is even,
- then (?(n1)/2?)th element, lower median
- then (?(n1)/2?)th element, upper median
- All selections can be done in O(1), so total
O(nlg n). - Can we do better?
3Selection in Expected Linear Time O(n)
- Select ith element
- A divide-and-conquer algorithm RANDOMIZED-SELECT
- Similar to quicksort, partition the input array
recursively - Unlike quicksort, which works on both sides of
the partition, just work on one side of the
partition. - Called prune-and-search, prune one side, just
search the other side). - (Please review or read quicksort in chapter 7.)
4RANDOMIZED-SELECT(A,p,r,i)
- if pr then return Ap
- q?RANDOMIZED-PARTITION(A,p,r)
- //the q holds for Ap,q-1?Aq ?Aq1,r
- k ?q-p1
- if ik then return Aq
- else if iltk
- then return RANDOMIZED-SELECT(A,p
,q-1,i) - else return RANDOMIZED-SELECT(A,
q1,r,i-k)
5Analysis of RANDOMIZED-SELECT
- Worst-case running time ?(n2), why???
it may be unlucky and always partition into Aq,
an empty side and a side with remaining
elements. So every partitioning of m elements
will take ?(m) time, and mn,n-1,,2. Thus
total is ?(n) ?(n-1) ?(2) ? (n(n1)/2-1)
?(n2). Moreover, no particular input elicits the
worst-case behavior, Because of randomness.
But in average, it is good.
By using probabilistic analysis/random variable,
it can be proven that the expected running time
is O(n). (ref. to page 187).
Can we do better, such that O(n) in worst case??
6Selection in worst case linear time O(n)
- Select the ith smallest element of Sa1, a2,,
an - Use so called prune-and-search technique
- Let x? S, and partition S into three subsets
- S1aj aj ltx, S2aj aj x, S3aj aj gtx
- If S1 gti, search ith smallest element in S1
recursively, (prune S2 and S3 away) - Else If S1 S2 gti, then return x (the ith
smallest element) - Else search (i-( S1 S2 ))th in S3
recursively, (prune S1 and S2 away) - The question is how to select x such that S1 and
S3 are nearly equal.?
7The Way to Select x
At least (3n/10)-6 elements ltx
Divide elements into ?n/5? groups of 5 elements
each. Find the median of each group Find the
median of the medians
At least (3n/10)-6 elements gtx
Because each of 1/2 ?n/5?-2 groups contributes 3
elements which are ? x
8SELECT ith Element in n Elements)
- Divide n elements into ?n/5? groups of 5
elements. - Find the median of each group.
- Use SELECT recursively to find the median x of
the above ?n/5? medians. - Partition n elements around x into S1, S2 , and
S3. - If S1gti, search ith smallest element in S1
recursively, - Else If S1S2gti, then return x (the
ith smallest element) - Else search (i-(S1S2))th in S3
recursively,
9Analysis of SELECT (cont.)
- Steps 1,2,4 take O(n),
- Step 3 takes T(?n/5?).
- Let us see step 5
- At least half of medians in step 2 are ? x, thus
at least 1/2 ?n/5?-2 groups contribute 3 elements
which are ? x. i.e, 3(?1/2 ?n/5? ? -2) ?
(3n/10)-6. - Similarly, the number of elements ? x is also at
least (3n/10)-6. - Thus, S1 is at most (7n/10)6, similarly for
S3. - Thus SELECT in step 5 is called recursively on at
most (7n/10)6 elements. - Recurrence is
- T(n) O(1)
if nlt some value (i.e. 140) - T(?n/5?)T(7n/106)O(n) if n ?the
value (i.e, 140)
10Solve recurrence by substitution
- Suppose T(n) ? cn, for some c.
- T(n) ? c ?n/5? c(7n/106) an
- ? cn/5 c 7/10cn6c an
- 9/10cnan7c
- cn(-cn/10an7c)
- Which is at most cn if -cn/10an7clt0.
- i.e., c ?10a(n/(n-70)) when ngt70.
- So select n140, and then c ?20a.
- Note n may not be 140, any integer gt70 is OK.
11Summary
- Bucket sort, counting sort, radix sort
- Their running times,
- Modifications
- The ith order statistic of n elements Sa1,
a2,, an ith smallest elements - Minimum and maximum.
- Median, lower median, upper median
- Selection in expected/average linear time
- Worst case running time
- Prune-and-search
- Selection in worst-case linear time
- Why group size 5?