Sorting - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Sorting

Description:

Sorting - Universidade Federal de Campina Grande ... Sorting – PowerPoint PPT presentation

Number of Views:175
Avg rating:3.0/5.0
Slides: 40
Provided by: Deptof70
Category:

less

Transcript and Presenter's Notes

Title: Sorting


1
Sorting
2
Introduction
  • The objective is to take an unordered set of
    comparable data items and arrange them in order.
  • We will usually sort the data into ascending
    order sorting into descending order is very
    similar.
  • Data can be sorted in various ADTs, such as
    arrays and trees in fact, we have already seen
    how a binary search tree can be used to sort
    data.
  • There are two main sorting themes address-based
    sorting and comparison-based sorting. We will
    focus on the latter.

3
The family of sorting methods
Main sorting themes
Address- -based sorting
Comparison-based sorting
Proxmap Sort
RadixSort
Transposition sorting
BubbleSort
Divide and conquer
Diminishing increment sorting
Insert and keep sorted
MergeSort
QuickSort
Insertion sort
Tree sort
ShellSort
4
Lecture schedule
  • An overview of sorting algorithms
  • We will not study address-based sorting in detail
    because they are quite restricted in their
    application
  • One slide on Proxmap/Radix sort
  • Examples from each type of comparison-based
    sorting algorithm
  • Bubble sort transposition sorting
  • Insertion sort (already seen, see Tree notes)
    insert and keep sorted
  • Selection sort priority queue sorting
  • Shell sort diminishing increment sorting
  • Merge sort and Quick sort divide and conquer
    sorting

5
Bubble sort
  • A pretty dreadful type of sort!
  • However, the code is small

for (int iarr.length igt0 i--) for (int
j1 jlti j) if (arrj-1 gt arrj)
temp arrj-1
arrj-1 arrj arrj temp

6
Insertion sort
  • Tree Insertion Sort
  • This is inserting into a normal tree structure
    i.e. data are put into the correct position when
    they are inserted.
  • Weve already seen this happening with a tree,
    requiring a find and an insert.
  • We know the time complexity for one insert is
    O(logN) O(1) O(logN) therefore to insert N
    items will have a complexity of O(NlogN).
  • Array Insertion Sort
  • The array must be sorted insertion requires a
    find insert.
  • The insertion time complexity is O(logN) O(N)
    O(N) for one item, therefore to insert N items
    will have a complexity of O(N2).

7
SelectionSort
  • SelectionSort uses an array implementation of the
    Priority Queue ADT (not the heap implementation
    we will study)
  • The elements are inserted into the array as they
    arrive and then extracted in descending order
    into another array
  • The major disadvantage is the performance
    overhead of finding the largest element at each
    step we have to traverse over the entire array
    to find it

13
2
15
4
8
SelectionSort (2)
  • In practice, we use the same array for the
    Priority Queue and the output results
  • General algorithm
  • 1. Initialise PQLast to the last index of the
    Priority Queue
  • 2. Search from the start of the array to PQLast
    for the largest element call its position front
  • 3. Swap element indexed by front with element
    indexed by PQLast
  • 4. Decrement PQLast by one
  • 5. Repeat steps 2 4

9
SelectionSort (3)
12
9
13
2
15
4
PQLast
etc.
10
Time complexity of SelectionSort
  • The main operations being performed in the
    algorithm are
  • 1. Comparisons to find the largest element in the
    Priority Queue subarray (to find front index)
    there are n (n-1) 2 1 comparisons,
    i.e., n/2 (n1) (n2 n) / 2 so this is an
    O(n2) operation
  • 2. Swapping elements between front and PQLast
    n-1 exchanges are performed so this is an O(n)
    operation
  • The dominant operation (time-wise) gives the
    overall time complexity, i.e., O(n2)
  • Although this is an O(n2) algorithm, its
    advantage overO(n log n) sorts is its simplicity

11
Time complexity of SelectionSort (2)
  • For very small sets of data, SelectionSort may
    actually be more efficient than O(n log n)
    algorithms
  • This is because many of the more complex sorts
    have a relatively high level of overhead
    associated with them, e.g., recursion is
    expensive compared with simple loop iteration
  • This overhead might outweigh the gains provided
    by a more complex algorithm where a small number
    of data elements is being sorted
  • SelectionSort does better than BubbleSort as
    fewer swaps are required, although the same
    number of comparison operations are performed
    (each swap puts an element in its correct place)

12
Shell sort diminishing increment sorting
  • Named after D.L. Shell! But it is also rather
    like shrinking shells of sorting, for Insertion
    Sort.
  • Shell sort aims to reduce the work done by
    insertion sort (i.e. scanning a list and
    inserting into the right position).
  • Do the following
  • Begin by looking at the lists of elements x1
    elements apart and sort those elements by
    insertion sort
  • Reduce the number x1 to x2 so that x1 is not a
    multiple of x2 and repeat these two steps until
    x2 1.
  • It can be shown that this approach will sort a
    list with a time complexity of O(N1.25).

13
Shell Sort Illustration
  • Consider sorting the following list by Shell sort

GAP3
GAP2
GAP1
6 7 23 24 91
45 23 4 9 6 7 91 8 12 24
9 24 45 91
6 8 23
4 7 12
9 6 4 24 8 7 45 23 12 91
4 8 9 12 45
4 6 8 7 9 23 12 24 45 91
4 6 7 8 9 12 23 24 45 91
14
Comparing O(N2), O(N1.25) and O(N)
15
How do you choose the gap size?
  • The idea of the decreasing gap size is that the
    list becomes more and more sorted each time the
    gap size is reduced, therefore you dont (for
    example) want to have a gap size of 4 followed by
    a gap size of 2 because youll be sorting half
    the numbers a second time.
  • There is no formal proof of a good initial gap
    size, but about a 10th the size of N is a
    reasonable start.
  • Try to use prime numbers as your gap size, or odd
    numbers if you can not readily get a list of
    primes (though note gaps of 9, 7, 5, 3, 1 will
    be doing less work when gap3).

16
Divide and conquer sorting
MergeSort
QuickSort
17
Divide ...
5
1
4
2
10
3
9
15
18
and conquer
5
1
4
2
10
3
9
15
19
MergeSort divide and conquer sorting
  • For MergeSort an initial array is repeatedly
    divided into halves (usually each is a separate
    array), until arrays of just one or zero elements
    remain
  • At each level of recombination, two sorted arrays
    are merged into one
  • This is done by copying the smaller of the two
    elements from the sorted arrays into the new
    array, and then moving along the arrays

1
13
24
26
2
15
27
38
20
Merging
1
13
24
26
2
15
27
38
1
etc.
21
Merge Sort In Pseudocode
  • Basic idea in pseudocode

method mergeSort(array) is if len(array)
0 or len(array) 1 then // terminating
case return array else end
len(array) center end / 2 left mergeSort(
array0center ) // recursive
call right mergeSort( arraycenterend )
// recursive call return merge(left,right)
// method that merges lists end if
end method
22
Analysis of MergeSort
  • Let the time to carry out a MergeSort on n
    elements be T(n)
  • Assume that n is a power of 2, so that we always
    split into equal halves (the analysis can be
    refined for the general case)
  • For n1, the time is constant, so we can take
    T(1) 1
  • Otherwise, the time T(n) is the time to do two
    MergeSorts on n/2 elements, plus the time to
    merge, which is linear
  • So, T(n) 2 T(n/2) n
  • Divide through by n to get T(n)/n T(n/2)/(n/2)
    1
  • Replacing n by n/2 gives, T(n/2)/(n/2)
    T(n/4)/(n/4) 1
  • And again gives, T(n/4)/(n/4) T(n/8)/(n/8) 1

23
Analysis of MergeSort (2)
  • We continue until we end up with T(2)/2 T(1)/1
    1
  • Since n is divided by 2 at each step, we have
    log2n steps
  • Now, substituting the last equation in the
    previous one, and working back up to the top
    gives T(n)/n T(1)/1 log2n
  • That is, T(n)/n log2n 1
  • So T(n) n log2n n O(n log n)
  • Although this is an O(n log n) algorithm, it is
    hardly ever used for main memory sorts because it
    requires linear extra memory

24
QuickSort divide and conquer sorting
  • As its name implies, QuickSort is the fastest
    known sorting algorithm in practice
    (address-based sorts can be faster)
  • It was devised by C.A.R. Hoare in 1962
  • Its average running time is O(n log n) and it is
    very fast
  • It has worst-case performance of O(n2) but this
    can be made very unlikely with little effort
  • The idea is as follows
  • 1. If the number of elements to be sorted is 0 or
    1, then return
  • 2. Pick any element, v (this is called the pivot)
  • 3. Partition the other elements into two disjoint
    sets, S1 of elements ? v, and S2 of elements gt v
  • 4. Return QuickSort (S1) followed by v followed
    by QuickSort (S2)

25
QuickSort example
5
1
4
2
10
3
9
15
12
Pick the middle element as the pivot, i.e., 10
26
QuickSort Example (full)
5
1
4
2
10
3
9
15
12
Unsorted List
5
1
4
2
3
9
15
12
10
Partition1
1
2
3
4
5
9
12
15
Partition2
1
3
2
5
9
Partition3
1
2
3
4
5
9
10
12
15
Sorted List
27
QuickSort Pseudocode
  • Basic idea in pseudocode (compare with mergeSort)

method quickSort(array) is if len(array)
0 or len(array) 1 then // terminating
case return array else pivotIndx int(
random() len(array) ) pivot
array(pivotIndx) // just choose a random
pivot (left, right) partition(array, pivot)
// get left and right return quickSort(left)
pivot quickSort(right) end if end method
28
Partitioning example
5
11
4
25
10
3
9
15
12
Pick the middle element as the pivot, i.e., 10
29
Partitioning example (2)
10
4
5
25
11
3
9
15
12
30
Pseudo code for partitioning
pivotPos middle of array aswap apivotPos
with afirst // Move the pivot out of the
way swapPos first 1 for each element in the
array from swapPos to last do // If the
current element is smaller than pivot we //
move it towards start of array if
(acurrentElement lt afirst) swap
aswapPos with acurrentElement
increment swapPos by 1 // Now move the pivot
back to its rightful place swap afirst with
aswapPos-1 return swapPos-1 // Pivot
position
31
Analysis of QuickSort
  • We assume a random choice of pivot
  • Let the time to carry out a QuickSort on n
    elements be T(n)
  • We have T(0) T(1) 1
  • The running time of QuickSort is the running time
    of the partitioning (linear in n) plus the
    running time of the two recursive calls of
    QuickSort
  • Let i be the number of elements in the left
    partition, then T(n) T(i) T(ni1) cn (for
    some constant c)

32
Worst-case analysis
  • If the pivot is always the smallest element, then
    i 0 always
  • We ignore the term T(0) 1, so the recurrence
    relation isT(n) T(n1) cn
  • So, T(n1) T(n2) c(n1) and so on until we
    getT(2) T(1) c(2)
  • Substituting back up gives T(n) T(1) c(n
    2) O(n2)
  • Notice that this case happens if we always take
    the pivot to be the first element in the array
    and the array is already sorted
  • So, in this extreme case, QuickSort takes O(n2)
    time to do absolutely nothing!

33
Best-case analysis
  • In the best case, the pivot is in the middle
  • To simplify the equations, we assume that the two
    subarrays are each exactly half the length of the
    original (a slight overestimate which is
    acceptable for big-Oh calculations)
  • So, we get T(n) 2T(n/2) cn
  • This is very similar to the formula for
    MergeSort, and a similar analysis leads to T(n)
    cn log2n n O(n log n)

34
Average-case analysis
  • We assume that each of the sizes of the left
    partition are equally likely, and hence have
    probability 1/n
  • With this assumption, the average value of T(i),
    and hence also of T(ni1), is (T(0) T(1)
    T(n1))/n
  • Hence, our recurrence relation becomesT(n)
    2(T(0) T(1) T(n1))/n cn
  • Multiplying by n givesnT(n) 2(T(0) T(1)
    T(n1)) cn2
  • Replacing n by n1 gives(n1)T(n1) 2(T(0)
    T(1) T(n2)) c(n1)2
  • Subtracting the last equation from the previous
    one givesnT(n) (n1)T(n1) 2T(n1) 2cn c

35
Average-case analysis (2)
  • Rearranging, and dropping the insignificant c on
    the end, gives nT(n) (n1)T(n1) 2cn
  • Divide through by n(n1) to getT(n)/(n1)
    T(n1)/n 2c/(n1)
  • Hence, T(n1)/n T(n2)/(n1) 2c/n and so on
    down toT(2)/3 T(1)/2 2c/3
  • Substituting back up givesT(n)/(n1) T(1)/2
    2c(1/3 1/4 1/(n1))
  • The sum in brackets is about loge(n1) ? 3/2,
    where ? is Eulers constant, which is
    approximately 0.577
  • So, T(n)/(n1) O(log n) and T(n) O(n log n)

36
Some observations about QuickSort
  • We have seen that a consistently poor choice of
    pivot can lead to O(n2) time performance
  • A good strategy is to pick the middle value of
    the left, centre, and right elements
  • For small arrays, with n less than (say) 20,
    QuickSort does not perform as well as simpler
    sorts such as SelectionSort
  • Because QuickSort is recursive, these small cases
    will occur frequently
  • A common solution is to stop the recursion at n
    10, say, and use a different, non-recursive sort
  • This also avoids nasty special cases, e.g.,
    trying to take the middle of three elements when
    n is one or two

37
Address-Based Sorting
  • Proxmap uses techniques similar to hashing to
    assign an element to its correctly sorted
    position in a container such as an array using a
    hash function
  • The algorithms are generally complex, and very
    often only suitable for certain kinds of data
  • You need to find a suitable hashing function that
    will distribute data fairly evenly
  • Clashes are dealt with using a comparison based
    sort, so the more clashes there are the further
    the time complexity moves away from O(n) (see
    notes view for more details)

38
Radix / Bucket / Bin sort
  • Radix Sort
  • In its favour, it is an O(n) sort. That makes it
    the fastest sort we have investigated.
  • However it requires at least 2n space in which to
    operate, and is in principle exponential in its
    space complexity..
  • A radix sort makes one pass through the data for
    each atom in the key. If the key is three
    character uppercase alphabetic strings, such as
    ABC, then A is an atom. In this case, there would
    be three passes through the data.
  • The first pass sorts on the low order element of
    the key (the C in our example). The second pass
    sorts on the next atom, in order of importance
    (the B in our example). Each pass progresses
    toward the high order atom of the key.
  • In each such pass, the elements of the array are
    distributed into k buckets. For our alphabetic
    key example, A's go into the 'A' bucket, B's into
    the 'B' bucket, and so on. These distribution
    buckets are then gathered, in order, and placed
    back in the original array. The next pass is then
    executed. When a pass has been made on the high
    order atom in the key, the array will be sorted.

39
Radix Sort Example
col1
col2
col3
121
123 121
123 134 156 121
123
123 234 345 134 245 356 156 267 378 121 232 343
134
Bins 0,2,4-9 Are empty
156
234 245 267 232
134
Bins 0-1,4,6-9 Are empty
156
234 232
232
345 356 378 343
245
234
267
Bins 0-1,3,5-9 Are empty
Bins 0-2,5,7-9 Are empty
. . .
Bins 0, 4-9 are empty
Write a Comment
User Comments (0)
About PowerShow.com