Title: Computing Science 1P
1Computing Science 1P
Lecture 19 Friday 9th March
Simon Gay Department of Computing
Science University of Glasgow
2006/07
2What's coming up?
Fri 9th March (today) lecture as normal Mon
12th Wed 14th March labs FPP Wed 14th
March lecture / tutorial as normal Fri 16th
March NO LECTURE EASTER BREAK Mon 19th March
Fri 6th April Tue 10th Wed 11th
April Monday is a holiday Drop-in labs /
FPP Wed 11th April lecture / tutorial as
normal NORMAL SCHEDULE RESUMES
3Free Programming Project 2
We feel that the FPP in semester 1 was very
beneficial for those of you who did it, and there
is some evidence that you enjoyed it too.
So, there will be another FPP now, and the
handout describes it.
As an added incentive, there will be prizes for
the best projects.
4FPP Timetable
Fri 9th March Unit 16 (FPP2) handed out. Start
thinking about what you want to do. Mon 12th
Wed 14th March Discuss your idea with your
tutor write a clear specification, work on
a plan. Easter Break Further work on your
project, if you wish. Tue 10th Wed 11th
April Further work and advice from tutors. Mon
16th Wed 18th April Demonstration to your
tutor submission (there will also be
another Unit this week) Tutors will nominate the
best projects from each group the lecturers
will select the winners winners will also be
asked to explain their programs.
5More on function parameters
We are very familiar with the idea of defining a
function with parameters
def test(x,y,z)
and then calling the function with the correct
number of parameters in the correct order
f(1,"hello",1.2)
So far, this is the norm in most programming
languages. Python is unusually flexible in
providing extra features.
6Naming the parameters when calling a function
Optionally we can give the name of the parameter
when we call the function
f(x1,y"hello",z1.2)
Why would we do this?
If the parameters have informative names, then
the function call (as well as the function
definition) becomes more readable
def lookup(phonebook,name)
number lookup(phonebook myBook, name "John")
7More on naming parameters
If we name the parameters when calling a
function, then we don't have to put them in the
correct order
number lookup(phonebook myBook, name "John")
number lookup(name "John", phonebook myBook)
are both correct.
8Default values of parameters
We can specify a default value for a parameter of
a function. Giving a value to that parameter when
calling the function then becomes optional.
Example
def lookup(phonebook,name,errorvalue"")
then
number lookup(myBook, "John")
is equivalent to
number lookup(myBook, "John", "")
9Default values of parameters
We can specify a default value for a parameter of
a function. Giving a value to that parameter when
calling the function then becomes optional.
Example
def lookup(phonebook,name,errorvalue"")
If we want to we can write
number lookup(myBook, "John", "Error")
10Algorithms
We're going to spend a little time discussing
algorithms, a central aspect of computing science
and programming.
An algorithm is a systematic method or procedure
for solving a problem. Every computer program is
based on one or more algorithms sometimes
simple, sometimes very complex.
11Quoted from Wikipedia
The word algorithm comes from the name of the 9th
century Persian mathematician Abu
Abdullah Muhammad ibn Musa al-Khwarizmi whose
works introduced Arabic numerals and algebraic
concepts. He worked in Baghdad at the time when
it was the centre of scientific studies and
trade. The word algorism originally referred
only to the rules of performing arithmetic using
Arabic numerals but evolved via European Latin
translation of al-Khwarizmi's name into
algorithm by the 18th century. The word evolved
to include all definite procedures for solving
problems or performing tasks.
12Algorithms
For a given problem there may be several
algorithms which will give the solution. We are
often interested in the most efficient algorithm
usually this means the fastest.
A fundamental discovery of computing science is
the existence of so-called NP-complete problems.
These are problems which, as far as we know,
cannot be solved efficiently however, an
efficient algorithm for any one of them would
mean that we could solve all of them efficiently.
We'll say a little more about this later, but
first let's see how different algorithms can be
more or less efficient.
13Sorting
Sorting means putting data into order numerical,
alphabetical, whatever.
As you know, it is a fundamental operation
provided by databases data is often stored in a
sorted form to make searching easier. (E.g.
telephone directories)
Python lists have a built-in sort method. We can
happily use it, but as computing scientists we
would also like to know how it works.
Let's start by thinking about possible algorithms
for sorting.
14How do we put things in order?
Think specifically about a list of numbers we
want to put them into increasing order. How do we
do it?
Obvious idea
Find the smallest number (we know how to do
that!). Remove it and put it into the first
position of a new list. Now find the smallest of
the remaining numbers it should become the
second item of the new list. And so on.
15Selection Sort
5
3
1
8
2
7
6
4
original data
16Selection Sort
5
3
1
8
2
7
6
4
original data
find smallest by looking along the list from the
beginning
17Selection Sort
1
5
3
8
2
7
6
4
original data
find smallest by looking along the list from the
beginning
18Selection Sort
5
3
8
2
7
6
4
original data
start a new list with the smallest item
1
sorted data
19Selection Sort
5
3
8
2
7
6
4
original data
find smallest by looking along the list from the
beginning
1
sorted data
20Selection Sort
5
3
8
7
6
4
original data
put the smallest item into the new list
1
2
sorted data
21Selection Sort
5
3
8
7
6
4
original data
put the smallest item into the new list
1
2
sorted data
and so on, until the original list is empty
22Selection Sort Alternative
It is possible to reformulate the algorithm so
that instead of removing items from the original
list and putting them in a new list, we modify
the original list by moving items within it.
(In fact this is the more usual way to present
it).
23Selection Sort Alternative
5
3
1
8
2
7
6
4
find smallest item
24Selection Sort Alternative
5
3
1
8
2
7
6
4
swap it with the first item
25Selection Sort Alternative
1
3
5
8
2
7
6
4
swap it with the first item
26Selection Sort Alternative
1
3
5
8
2
7
6
4
the yellow part is now sorted
27Selection Sort Alternative
1
3
5
8
2
7
6
4
find smallest item in the non-yellow part
28Selection Sort Alternative
1
3
5
8
2
7
6
4
swap it with the first item in the non-yellow part
29Selection Sort Alternative
1
2
5
8
3
7
6
4
swap it with the first item in the non-yellow part
30Selection Sort Alternative
1
2
5
8
3
7
6
4
and now the sorted (yellow) part of the list is
bigger
31Selection Sort Alternative
1
2
5
8
3
7
6
4
continue
32Selection Sort Alternative
1
2
3
8
5
7
6
4
continue
33Selection Sort Alternative
1
2
3
8
5
7
6
4
continue
34Selection Sort Alternative
1
2
3
4
5
7
6
8
continue
35Selection Sort Alternative
1
2
3
4
5
7
6
8
continue 5 is in place already
36Selection Sort Alternative
1
2
3
4
5
7
6
8
continue
37Selection Sort Alternative
1
2
3
4
5
7
6
8
continue
38Selection Sort Alternative
1
2
3
4
5
6
7
8
continue
39Selection Sort Alternative
1
2
3
4
5
6
7
8
continue 7 is in place already
40Selection Sort Alternative
1
2
3
4
5
6
7
8
continue the last item is guaranteed to be in
place
41Selection Sort Alternative
1
2
3
4
5
6
7
8
finished
42Selection Sort in Python
The first version, which builds a new list
def sort(x) s while len(x) gt 0
p 0 position of minimum so far
i 1 while i lt len(x) loop over the
rest of x if xi lt xp smaller
item found p i update
position i i 1 s s
xp put smallest in the new list
del xp and remove from x return s
43Selection Sort in Python
The second version, which modifies the original
list
def sort(x) i 0 while i lt len(x)
p i position of minimum so far
j i1 while j lt len(x) loop over the
rest of x if xj lt xp smaller
item found p j update
position j j 1 temp
xi move smallest into position i,
xi xp extending the sorted region
xp temp of x i i 1
44Analyzing Selection Sort
How can we begin to analyze the efficiency
(meaning speed) of selection sort?
Of course we could try it on various data sets
and measure the time taken, but because different
computers have different processing speeds in
general, the time taken to sort 1000 numbers on
my computer does not tell you much about how long
it would take on your computer.
Also, as computing scientists, we would like to
understand something more fundamental than
empirical measurements.
45Counting Comparisons
The first idea is to analyze the algorithm and
work out how many computational steps are needed
to solve a problem of a given size.
For sorting algorithms it turns out that the most
relevant kind of computational step is the
comparison of two items in the list.
If the items are large pieces of data, e.g. long
strings, then comparing them can be slow, and all
of the other steps in the algorithm are
relatively quick.
For sorting algorithms we are interested in the
number of comparisons needed to sort n items,
expressed in terms of n.
46Analyzing Selection Sort
Assume that we start with a list of length n.
To find the smallest item, we go round a loop n-1
times, doing a comparison each time (items 2 n
are each compared with the smallest item found so
far).
Then we find the smallest of n-1 items, then the
smallest of n-2, and so on.
The total number of comparisons is
(n-1) (n-2) (n-3) 2 1
47Analyzing Selection Sort
If you are taking Maths, you know that
(n-1) (n-2) (n-3) 2 1 n(n-1)/2
which can easily be proved by induction.
Or
n
n-1
48Analyzing Selection Sort
Selection sort needs n(n-1)/2 comparisons to sort
n items.
As n becomes large, the dominant term is n²/2 and
we say that selection sort is an order n²
algorithm.
This tells us something useful, independently of
the speed of a particular computer.
If it takes a certain time to sort a certain data
set, then to sort 10 times more data will take
100 times as long. To sort 1000 times more data
will take 1 000 000 times as long. And so on.
49Analyzing Selection Sort
n n² time
10 100
100 10 000
1 000 1 million 1 sec
10 000 100 million 100 sec
100 000 10 billion 3 hours
1 000 000 1 trillion 4 months
10 000 000 100 trillion 3 million yrs
50Can we do better?
There are several fairly obvious sorting
algorithms which are all order n². You can look
them up e.g. insertion sort, bubble sort. They
may run at different speeds for particular data
sets, but they all have the feature that the
running time is proportional to the square of the
size of the data set.
It turns out that there are more efficient
sorting algorithms. The simplest to describe is
merge sort, so we'll look at that.
51Merge Sort
First we need the idea of merging two sorted
lists to form a new list which is also sorted.
1
3
5
8
2
4
6
7
smallest
1
52Merge Sort
First we need the idea of merging two sorted
lists to form a new list which is also sorted.
1
3
5
8
2
4
6
7
smallest
1
2
53Merge Sort
First we need the idea of merging two sorted
lists to form a new list which is also sorted.
1
3
5
8
2
4
6
7
smallest
1
2
3
54Merge Sort
First we need the idea of merging two sorted
lists to form a new list which is also sorted.
1
3
5
8
2
4
6
7
smallest
1
2
3
4
55Merge Sort
First we need the idea of merging two sorted
lists to form a new list which is also sorted.
1
3
5
8
2
4
6
7
smallest
1
2
3
4
5
56Merge Sort
First we need the idea of merging two sorted
lists to form a new list which is also sorted.
1
3
5
8
2
4
6
7
smallest
1
2
3
4
5
6
57Merge Sort
First we need the idea of merging two sorted
lists to form a new list which is also sorted.
1
3
5
8
2
4
6
7
smallest
1
2
3
4
5
6
7
58Merge Sort
First we need the idea of merging two sorted
lists to form a new list which is also sorted.
1
3
5
8
2
4
6
7
only thing left
1
2
3
4
5
6
7
8
59Merge Sort
Given some original data to sort
5
3
1
8
2
7
6
4
split it into two halves
5
3
1
8
2
7
6
4
sort each half (how? using merge sort!)
1
3
5
8
2
4
6
7
and merge
1
2
3
4
5
6
7
8
60Merge in Python
def merge(x,y) i 0 position in x j
0 position in y z new list
while i lt len(x) and j lt len(y) if xi
lt yj next item comes from x z
z xi i i 1 else
next item comes from y z z
yj j j 1 if i lt len(x)
unmerged items remain in x z z xi
else unmerged items remain in y z
z yj return z
61Merge Sort in Python
def sort(x) if len(x) lt 1 return
x else d len(x)/2 return
merge(sort(xd),sort(xd))
62Analyzing Merge Sort
The algorithm repeatedly splits lists in half,
sorts them, then merges the results. All the
comparisons are in the merging.
Think of it like this
length 1
merge
length n/4
merge
length n/2
merge
length n
63Analyzing Merge Sort
Merging to produce a list of length n requires
n-1 comparisons. The important thing is that this
is order n.
Each round of merging requires n comparisons in
total (not exactly, but we only care about the
fact that it is n not n² or something else).
How many rounds of merging are there? Easiest to
see if n is a power of 2
n 8, 3 rounds n 16, 4 rounds n 32, 5
rounds and so on
the number of rounds is log n (meaning
logarithm to base 2)
64Analyzing Merge Sort
There are log n rounds of merging, each requiring
n comparisons. We say that merge sort has order
n log n.
65Comparing Selection Sort and Merge Sort
We now know that selection sort has order n² and
merge sort has order n log n.
n n log n time n² time
10 33 100
100 664 10 000
1 000 9966 0.01 sec 1 million 1 sec
10 000 132 877 0.1 sec 100 million 100 sec
100 000 1.6 million 1.6 sec 10 billion 3 hours
1 000 000 20 million 20 sec 1 trillion 4 months
10 000 000 230 million 4 min 100 trillion 3 million yrs
66Conclusion
There are usually many algorithms for a given
problem some are more efficient than others the
difference can have huge practical significance.
The subject of algorithm analysis is a large area
of CS. It will come back later in the degree,
especially in Levels 3 and 4.
Even for the problem of sorting, there is much
more to say than the fact that merge sort is
better than selection sort. It is possible to
prove that we can't do better than n log
n, unless the data has special properties.