Title: Chapter 10: Algorithm Efficiency
1Chapter 10 Algorithm Efficiency
- As we have seen, there are multiple ways to
implement a data structure - array-based
- pointer-based
- using a previously defined ADT like a List ADT
- the implementation requires certain algorithms
(for instance, array-based sorted list requires
an ordered insert by shifting) - We can gauge the worthiness of an implementation
by considering - the amount of memory space required for the
algorithm(s) - known as the space complexity
- the amount of time needed to perform the
algorithm(s) - known as time complexity
- In this chapter, we first consider what time
complexity means and describe the tools for
determining time complexities - known as analysis of algorithms
- We will then examine a variety of algorithms for
their complexities - primarily searching and sorting algorithms
2Comparing Algorithms
- Consider two algorithms to solve a problem A and
B - you can run A and time it, then run B and time it
and compare these times (whether wall-clock time
or system clock time) - This is problematic
- what if implementation A requires garbage
collection and B does not? - what if B uses vector operations and your
processor does not have these built-in? - what if, while running A, the OS is busy with
other duties such as receiving E-mail messages or
loading a large set of data from disk? - what if the data run on A differs in order from
the data run on B? - Using walk-clock or system clock time is not
sufficient - we need a more fundamental idea of how well the
algorithms perform - we will get this by counting the number of
instructions required to execute the algorithm - we are not necessarily interested in knowing the
precise number of operations, but instead how the
number of operations to be executed changes as
input size changes
3Examples
- In the first example, we have 1 instruction prior
to the loop and 2 instructions in the loop plus 1
comparison prior to loop execution - if the loop executes n times, this gives us 1
(n1) 2n total operations or 3n2 - In the second example, the innermost instruction
executes a total of 5n2 times - to be more accurate, we might also count the
operations in the for-loop mechanisms like we
counted the while-loop comparison above - we will see shortly that these extra operations
are not really worth bothering about - How many times does the instruction execute in
this modified nested for-loop?
Node curr head while(curr ! null)
System.out.println(curr.getItem( )) curr
curr.getNext( ) for(i0iltni)
for(j0jltnj) for(k0klt5k)
// some instruction goes here
for(i0iltni) for(j0jltij)
for(k0klt5k) // some
instruction goes here
Notice here that the middle loop iterates i
times, not a constant
4Algorithm Growth Rates
- We can view the complexity of an algorithm (or
the number of instructions executed) as a
function of the size of the data - Assume n is the number of data in our data
structure - the linked list traversal has a function f(n)
3n2 - the 3 nested for-loop has a function f(n) 5n2
for the first for-loop example and f(n)
5n(n1)/2 for the second for-loop example - When comparing algorithms, we will use such a
function - For convenience, we will round off our growth
rates to the nearest level of complexity by
dropping off lesser terms and constants - 3n2 rounds off to n
- 5n2 round off to n2
- What is the function for the third loop?
5Big O Notation
- Formally, we say that an algorithm is order f(n)
- if there exists constants k and n0 such that the
algorithm requires no more than kf(n)
instructions to solve a problem of size n gt n0 - What does this mean?
- if we can find a bounding function such that the
algorithm always performs within some constant
times that function for a reasonable size input,
then the algorithm is in O(function) - a reasonable sized input means that the input
is gt n0 where n0 is some value specific to this
algorithm include this in our definition
because inputs of very small sizes might be
special cases - For instance, searching a list of size 0 takes 1
operation - If we can find such a bounding function f(n),
then we say that the algorithm has a complexity
of O(f(n)), or is bounded by O(f(n)) - for instance, the first two algorithms two slides
back would have complexities of O(n) or O(n2)
respectively
6Some Example Growth-rate Functions
7Comparing Growth Rates
We see above a table showing the approximate
number of instructions needed for the different
growth rate functions for several sample sizes of
n To the left, these functions are
graphed Notice how large 2n becomes very quickly
8Best/Average/Worst Case Complexities
- Many algorithms can have different complexities
based on the order of the data - for instance, consider searching for the Node in
a linked list storing the value 12 - it might be stored in the first Node in the list
(which will take 1 comparison) - or the last Node in the list (which will take n
comparisons) - or it may not even in be the list (which will
take n comparisons also) - We can identify three time complexities with many
algorithms - best case
- the complexity when the algorithm has to do the
least amount of work - worst case
- the complexity when the algorithm has to do the
most amount of work - average case
- the complexity of the algorithm when it has to do
the average amount of work - what does average amount of work mean?
- is there an easy way to identify least, most,
average amount of work? - We are mostly concerned with worst case
complexities
9Example Searching Linked List
current head while(current!null
current.getItem( ) ! target) current
current.getNext( )
- The code will find a particular item in a linked
list - the complexity is determined by the number of
iterations through the loop - unlike the previous version which only terminated
the loop when current null, here the code
will exit the loop as soon as it finds the value
target (or when it reaches the end of the list) - how long does it take?
- in the best case, the item is at head, so it
takes the initial assignment statement plus 2
comparisons, or 3 operations, so a constant
amount of time (because the number of operations
is independent of the list size), since 3 k1,
we have complexity of O(1) - in the worst case, the item is in the last
position (or does not exist in the list at all)
so it takes 1 3n (or 13(n1)), which is
complexity O(n) - what is the average case?
- can I just average (13n 3) / 2
- that is, take the average of the best and worst
cases? no, it doesnt work like that
10Computing Average Case
- This can be tricky and we will cover this more
formally in CSC 464 - Here, we give a brief idea
- the average complexity for search
- complexity of finding item 1 probability of
wanting item 1 complexity of finding item 2
probability of wanting item 2 complexity
of finding item n probability of wanting item
n - for our search, we will assume that there is
equal likelihood of wanting any particular item,
so prob of wanting item i 1 / n - the complexity for finding item 1 3, item 2
3, item 3 9, etc and the complexity for
finding item n 3n, so we have - average case complexity 3 1/n 6 1/n 9
1/n 3 n 1/n 3 (1 2 3 n) /
n 3 (n 1) n / 2 n 3 (n 1) / 2 - Note (1 2 3 n) (n 1) n / 2
- So our average case is 3 (n 1) / 2 which is
O(n), same as the worst case complexity
11Efficiencies of Common Algorithms
- Sequential search
- best case find it right away, O(1)
- worst case have to go through entire list, O(n)
- average case O(n)
- Binary search
- best case find it right in the middle of the
array, O(1) - worst case
- each comparison occurs in the middle of the
current array - after each iteration, if we have not found the
item, we chop the array in half, so the size of
the array goes from n to n/2 to n/4 to n/8, etc - we are guaranteed of finding the item (or
discovering that the item is not in the array)
once we have reduced the size of the array to 1 - how many iterations does it take to reduce the
array to 1 element? K iterations where n / 2k
1, solving for k, we get 2k n or k log 2 n - so, our worst case complexity is k log n, or
O(log n) - average case using a similar analysis as we did
for sequential search, we find the average case
is also O(log n)
12Deriving Complexities
- So far it seems pretty easy to determine a
complexity - Count the number of instructions, if the number
of instructions is independent of the input size,
then O(1) - If there is a loop, determine the number of
iterations through the loop and multiply this
value by the number of instructions executed in
the loop - for nested loops, multiply the complexities of
each loop - We get different best and worst cases when the
number of executions of a loop can vary - for instance, the search algorithms while loop
will iterate anywhere from 0 to n times whereas
the print algorithms while loop will iterate
exactly n times - So while loops and if statements can alter the
complexity so that there are different
complexities for best and worst case - what about if the algorithm uses recursion?
13Recursive Example
- For recursive factorial
- the function itself has only 1 operation
- if(n gt 1) return nfactorial(n-1) else return 1
- however, the function might call itself
recursively, so we need to determine how many
times the function might call itself - since each recursive call occurs with n being 1
less, and stops once n lt 1, we can determine the
number of recursive calls as n-1 - so the complexity is 1(n-1) or O(n)
- For recursive binary search
- the function computes the mid point, and then has
a nested if-else statement to check to see if the
mid point lt the target, the target, or gt the
target - if this is not a base case, then recurse,
otherwise return the location (or an error) - so no matter how many data are in the array, the
number of operations per function call is
constant (varies from two to four) - how many times does the function recurse?
- like with the iterative version, it varies
between one time (found immediately) and log n
times (found only at the end, or not at all) - so we have a different worst and best case
complexity depending on which of these conditions
is true - complexity is 1 1 O(1) (best case) and 1
log n O(log n) (worst case)
14Another Recursive Example
- In Towers of Hanoi, we saw that for n d disks,
the algorithm recursively called the function
with d-1 disks, moved 1 disk, and recursively
called the function with d-1 disks - so the function did 1 thing but called itself
twice with d-1 - does this mean 2(n-1)1 or roughly O(n)?
- in fact, the number of recursive calls is 2n
- when n d, the function calls itself twice with
n d-1 - when n d-1, the function calls itself twice
with n d-2 - when n d-2, the function calls itself twice
with n d-3 - this behavior continues until n 1
- A tree of recursive calls for n4 is shown below
- It should be easy to see that in fact there are
24- 1total calls giving us a complexity of 12n
-1 which we call O(2n)
With n 4 initially, we do the function with n
3 twice, Calling the function with n 3
results in calling the function with n 2 twice
and since we call the function with n 3 twice,
we do n 2 a total of four times
15Fibonacci Analysis
- An interesting algorithm to analyze is Fibonacci
- The iterate version to the right consists of 3
(n-2)3 1 operations so is O(n) - The recursive function either returns 1 or calls
fib(n-1) and fib(n-2) and adds the results
together - The question is, how many recursive calls are
there? - The behavior of the recursive fibonacci is
somewhat like the Towers of Hanoi in that each
call contains 2 subcalls (although the tree of
calls isnt symmetric since one call is with size
n-2) - The complexity of the recursive version is O(2n)
- So here we see the recursive version, while
simpler to write and understand, is much much
worse! - determine the difference in instructions executed
when n 10. How about when n 100? What about
n 1000?
public int fib1(int n) int temp1 1,
temp2 1, temp, j for(j2jltnj)
temp temp1 temp1 temp1
temp2 temp2 temp return
temp1 public int fib2(int n)
if(fib lt 2) return 1 else return fib(n-1)
fib(n-2)
16Sorting Algorithms
- Now we turn to analyzing sorting algorithms
- Sorting is an interesting problem to investigate
because there are many different ways to
accomplish it - What we will find is that many sorting algorithms
offer different best, worst and average case
complexities - Your choice of sorting algorithm should, at least
in part, be based on the algorithms
computational complexity - However, we will find that complexity alone
doesnt tell us the whole story, so we will also
look at such things as - need for recursion
- amount of memory space required
- difficulty in the implementation itself
- if the complexity itself is misleading
- we will find that mergesort often has a better
complexity than quicksort and yet quicksort could
be faster! and we will also find that radixsort
has the best complexity but is possibly the
slowest!
17Selection Sort
for(last n-1last gt1 last --)
largest indexOfLargest(array, last1)
temp theArraylargest
theArraylargest theArraylast
theArraylast temp private int
indexOfLargest(Comparble a, int size)
int indexSoFar 0 for(int j1 j lt
size j) if(theArrayj.compareTo(
theArrayindexSoFar)gt0) indexSoFar j
return indexSoFar
- The idea of the selection sort is to repeatedly
find the smallest item in the remainder of the
array - iterate for each position in the array
- find the smallest leftover item (in the positions
to the right of this location in the array) - swap the smallest with the value at the current
position - the books code (shown to the right) works
right-to-left instead of left-to-right and moves
the largest item to position i, decrementing i
18Selection Sort Analysis
- 1st iteration largest value sought between
array0 and arrayn-1 - 37 is found and swapped with the last value (13)
- 2nd iteration largest value sought between
array0 and arrayn-2 - 29 is found and swapped with the second to last
value (13) - Third iteration
- Entire process takes 4 (n-1) iterations
- when we are done, the smallest value has to be in
position 0
- Finding the largest value in n items takes n
comparisons, but n decreases as we iterate
through the outer loop - There are two for-loops
- the outer loop iterates n-1 times
- the inner loop iterates i times where i is the
iteration - the complexity is n-1 n-2 2 1 n (n
- 1) / 2 (n2 n) / 2 which is O(n2) - notice that this complexity is the best, worst
and average case why?
19Bubble Sort
- The idea of the Bubble Sort is
- to bubble the largest value to the top of the
array in repeated passes by swapping pair-wise
values if the item on the left is gt the item on
the right - If you can make it through one whole iteration
without swapping values, then the array is now in
sorted order and you can quit - this allows us to exit the outer loop as soon as
we can detect that we have a sorted array
endLimit n boolean sorted false
while(!sorted) // while you still need to
sort sorted true // assume array is
now sorted for(int index0indexltendLimit-
1index) if(theArrayindex.compareTo( theA
rrayindex1)gt0) temp
theArrayindex
theArrayindex theArrayindex1
theArrayindex1 temp
sorted false // since we swapped
items, // array is not yet sorted, continue
endLimit-- //
reduce number of passes in for loop
20Bubble Sort Analysis
- In the first pass
- 29 and 10 are swapped
- 29 and 14 are swapped
- 29 and 37 do not need to be swapped
- 37 and 13 are swapped
- since there was some swapping, we do at least one
more pass - In the second pass
- 10 and 14 are not swapped
- 14 and 29 are not swapped
- 29 and 13 are swapped
- we dont check 37 since we know its the largest
- Outer while loop iterates at least one time, but
no more than n times - Inner for-loop iterates n-1 times the first pass,
n-2 times the next pass, etc - In the worst case, we do all n outer loops giving
us a total number of inner iterations of
123n-1 or O(n2) - In the best case, we have to do the outer loop
once resulting in just n-1 operations or O(n) - Average case?
21Insertion Sort
- This algorithm works by doing an ordered insert
of the next array item into a partially sorted
array - take the next value
- Find its proper location by shifting elements in
the array to the right and insert the new item
into its proper place - starting with the second element (the first
element is sorted with respect to itself) - continue inserting through the last element
- The algorithm repeats this insert for array
elements 1 to n-1
for(unsorted 1 unsorted lt n unsorted)
location unsorted current
theArrayunsorted while((location gt 0
theArraylocation-1 .compareTo(current) gt 0))
theArraylocation
theArraylocation-1 location--
theArraylocation current
22Insertion Sort Analysis
- The outer loop iterates n 1 times
- The inner while loop iterates until the new item
is properly inserted, which will range from 1 to
i times where i is the iteration of the outer
loop - Like Bubble Sort we have at most 1 2 n -
1 n (n - 1) / 2 operations O(n2) in the
worst case - In the best case, each of the inner iterations
takes only 1 comparison resulting in O(n)
operations - what is the average case?
23Comparisons
- Of these three algorithms, we see that
- Selection Sort is always O(n2)
- Bubble Sort and Insertion Sort range from O(n) in
the best case to O(n2) in the worst case - Why then use Selection Sort?
- Recall that complexity of O(n2) means that the
number of operations are bound by kn2 where k is
some constant, it turns out that k is smaller for
Selection Sort than for Bubble Sort in the worst
case - Insertion Sort would be the best because it has a
better best case than selection sort, and k is
smaller for Insertion Sort than for Selection
Sort - Note though that Bubble Sort works very well when
an array is almost sorted under what
circumstances would an array be almost sorted? - What are the algorithms average cases?
- All 3 have an average case of O(n2)
- This should be obvious for selection sort
- For insertion sort, the average number of
comparisons per iteration i is i / 2 so we would
have 1 / 2 2 / 2 3 / 3 (n 1) / 2 n
(n 1) / 4
24Mergesort
- We now examine more complex (in terms of how they
work) algorithms that have better performance
than the previous three - The Mergesort uses recursion and so is harder to
understand - its principle is divide and conquer by reducing
the sorting problem into one that contains two
subarrays of half the size - array of size 16 is divided into 2 arrays of size
8 - array of size 8 is divided into 2 arrays of size
4 - array of size 4 is divided into 2 arrays of size
2 - array of size 2 is divided into 2 arrays of size
1 - an array of size 1 represents our base case where
such an array is always sorted - now we combine the results of our recursive calls
by merging - merge the two arrays of size 1 into a sorted
array of size 2 - merge the two sorted arrays of size 2 into a
sorted array of size 4 - merge the two sorted arrays of size 4 into a
sorted array of size 8 - merge the two sorted arrays of size 8 into a
sorted array of size 16 - we must implement the recursive dividing
algorithm and the iterative merge (which is where
most of the work is done)
25Mergesort Code
mergesort(array, first, last) if(first
lt last) mid (first last) / 2
mergesort(array, first, mid)
mergesort(array, mid1, last)
merge(array, first, mid, last) merge(array,
first, mid, last) int index first,
mid2 mid1, f2 first, l2 last
temparray new arraymaxsize while(f2 lt
mid) (mid2 lt l2)
if(arrayf2.compareTo(arraymid2)lt0)
temparrayindexarrayf2
else temparrayindexarraymid2
while(f2ltmid) temparrayindexarray
f2 while(mid2ltlast)
temparrayindexarraymid2 for(int
tempfirst templtlast temp)
arraytemp temparraytemp
- The merge operation is the complex piece of code
- go through both subarrays and merge them into a
new temparray by placing elements in order - example merge 2, 3, 6, 8 and 1, 4, 5, 7
- copy smaller of 2 and 1 into temparray (1) and
move l2 to point at 4 - copy smaller of 2 and 4 into temparray (1, 2)
and move l1 to point at 2 - continue until one array has been copied and then
copy the remainder of the other array into
temparray - now copy the temparray back into the original
array (or the portion of the array from first to
last)
26Mergesort Example Analysis
- A call to mergesort results in two recursive
calls to mergesort on an array half the size of
the one from the current call followed by a call
to merge - each mergesort function call is O(1) as mergesort
does either 1 thing or 5 - merge requires that two n / 2 arrays be merged
into an array of size n, so is O(n) - How many recursive calls will there be?
- We divide an array in half with each recursive
call,
- We need to divide an array in half k times for it
to reach a size of 1 where n / 2k 1, or k
log n - Our mergesort complexity is O(n log n)
27What if n is Not a Power of 2?
- If n is not a power of 2, not all of the base
cases occur at the last level - But all base cases will occur on the last or
second to last level - All base cases occur on either level log n or log
(n - 1)
- The complexity is then between n log (n 1)
and n log n - recall from algebra that log (n 1) (log n) /
2 so O(log n) O(log (n-1)) - Mergesort then has best/average/worst case
complexities all of O(n log n) (irrelevant of
the order of the data or whether the number of
data is a power of 2 or not)
28Quicksort
- Recall the partition algorithm used to find the k
smallest item in an array - Quicksort is centered around partition
- find a pivot point in an array whereby all
elements to the pivots left are lt the pivot and
all elements to the pivots right are gt the pivot - recursively do the same thing to both the
left-hand side and right-hand side of the array
about the pivot - Thus, some element, p, is in the right location
in the array because all elements lt p are to its
left and all elements gt p are to its right - Now recursively sort the left side and the right
side - Quicksort has two parts
- find a pivot point and partition the array as
shown below - recursively call quicksort with the two subarrays
(S1 and S2 below)
29Quicksort Code
partition(theArray, first, last) pivot
theArrayfirst temp first
for(f1 first1 f1 lt last f1) if
(theArrayf1.compareTo(pivot) lt 0)
temp tempItem theArrayf1
theArrayf1 theArraytemp
theArraytemp tempItem
tempItem theArrayfirst theArrayfirst
theArraytemp theArraytemp tempItem
return lastS1
quickSort(theArray, first, last) if (first lt
last) pivotIndex partition(theArray,
first, last) quickSort(theArray, first,
pivotIndex-1) quickSort(theArray,
pivotIndex1, last)
30Example
Our array starts as 6 3 7 4 2 9 1 8
5 After partition executes 5 3 4 2 1 6
7 8 9 partition returns 5 (index of pivots
new location) Now we recursively call QuickSort
with the array and the two locations
that represent the start of the lower array (0)
and the upper array (6)
31Quicksort Analysis
- Partition is O(n)
- it iterates from first1 to last (no more than
n-1 iterations) each time doing 1 comparison and
possibly 4 assignment statements, followed by
moving the pivot (4 more assignment statements) - How many times does partition get called?
- this is trickier than mergesort because mergesort
always divided an array into two equal sized
arrays - if pivotIndex is always halfway between first and
last, then, like mergesort, we will always be
dividing an array into 2 equal (or nearly equal)
sized subarrays and therefore have log n levels - if we select a pivot such that it winds up closer
to one end of the array than the other, we may
not have all of our base cases end at the bottom
two levels - consider if we try to sort 1 5 3 2 6 8 7 4
since all values gt 1 (our pivot), we end up
after partition with the same array, and
therefore we recursively call quicksort with an
empty array (to the left of 1) and an array of
size n 1 (to the right of 1) - if we divide our array into an array of size 0
and an array of size n 1, we will have as many
as n levels - the complexity of quicksort ranges between
nklog n and nkn or O(n log n) and O(n2) - the best and average case is closer to O(n log n)
- when will the worst case arise?
32Quicksorts Worst Case
- Surprisingly, quicksorts worst case is when the
array is already sorted in ascending or
descending order - consider the array to the right
- each time, the pivot chosen is the first value of
the subarray, and it is always in its proper
place - the quicksort method then recurses on an array of
size 0 and an array of size n-1 leading to n
total quicksort calls - The selection of the pivot can make a difference
on the performance of quicksort - by selecting the first value in the array,
quicksorts performance deteriorates if the array
is nearly sorted - There are other strategies available
- select the pivot and pivotIndex randomly
- select 3 possible pivot values and select the
middle value of the 3 for your pivot - select the pivot as the middle element of the
array - So as not to complicate partition unnecessarily,
select the pivot and swap that value with the
value at position first, before starting partition
33Finding the Kth Smallest Revisited
findKSmallest(array, k, first, last)
pivotIndex partition(array, first, last)
if(pivotIndex k) return arrayk
if(pivotIndex lt k) return findKSmallest(array,
k, pivotIndex1, last) else return
findKSmallest(array, k, first, pivotIndex-1)
- We now revisit finding the kth smallest item in
an array from chapter 3 - The partition algorithm is the same as from
quicksort - The remainder of the algorithm is shown to the
right notice that unlike quicksort, we only
recurse on the array to the left OR the right
side of the pivot which can reduce the complexity
- What is this algorithms complexity?
- Best case O(n)
- you would have to partition at least once
- Worst case O(n2)
- Average case O(n)
Partition the array around the pivot value, if
the pivot falls at position k, you have found the
k-smallest (actually, the k1 smallest since Java
arrays start at 0), otherwise if the pivotIndex lt
k, then k resides in the upper portion of the
array, recurse using the array to the right of
the pivot, otherwise recurse using the array to
the left of the pivot
34Radix Sort
- This is a non-comparison sort
- This means that the sort does not compare array
elements against each other - instead, use a collection of queues
- peal off a digit/character of each item in the
array - enqueue that array item into a queue matching the
digit/character that we pealed off - once all array elements are inserted into their
proper queues, dequeue each one at a time from
each queue, placing the item back into the
original array - repeat this process for each digit/character
right-to-left - see the example to the right
35Radix Sort Analysis
- Pseudocode given to the right
- Complexity is misleading here
- two nested for-loops k1 n d instructions
- nested for/while loop that must dequeue n total
items, so is k2 n instructions - d is the number of digits/characters of the
largest (longest) array element, so is a constant - Radix Sorts complexity then is O(n) in the best,
worst and average cases! - The constant k can be quite large
- for ints (10 digits) k is at least 10
- for Strings that might be 255 characters long, k
could be as large as 255! - So the O(n) of radix sort can actually be worse
than all of the previous sorting algorithms
because k may be so large!
let d be the size of the largest item in
digits or characters initialize q queues (10
for digits, 256 for Ascii characters,
65336 for Unicode characters!) for(jd jgt0
j--) for(i0iltni) k the
jth digit/character of the ith
array element queuek.enqueue(n)
for(i0iltqi) while(!queuei.empty(
)) temp queuei.dequeue( )
add temp to back of array
How do we get the jth digit/character? for
Strings, use charAt(i)
36Conclusions/Comparisons
Recall best case complexities for Bubble Sort and
Insertion Sort are O(n) Of the above sorts (not
counting Heapsort), surprisingly, Quicksort has
the best run-time performance on average, but
Mergesort is the only one that guarantees O(n
log n) performance without the (excluding Radix
Sort which may be less efficient depending on the
type of data) We will cover Treesort and
Heapsort later in the semester