Unit 12: Theory of Computation

About This Presentation

Title:

Unit 12: Theory of Computation

Description:

Algorithms' design: the limits of algorithms - some problems are unsolvable. Algorithms' efficiency: how do ... Order of Magnitude - Neglecting Minor Elements ... – PowerPoint PPT presentation

Number of Views:75

Avg rating:3.0/5.0

Slides: 64

Provided by: daphnawe

Category:

more less

Transcript and Presenter's Notes

Title: Unit 12: Theory of Computation

1
Unit 12 Theory of Computation
syllabus

Algorithms design the limits of algorithms -
some problems are unsolvable
Algorithms efficiency how do we measure the
efficiency of an algorithm?
Improvement by factor and by order of magnitude
Some examples of complexity analysis
Intractable problems

basic programming concepts
object oriented programming
topics in computer science
2
Theory of Computation Questions

Computability (????????) are there algorithms
which can solve our problem? Is there something
we can say about every algorithm which solves the
problem?
Complexity (????????) how good is an algorithm
which solves the problem?
is it efficient in terms of processing steps
(time)?
is it efficient in terms of storage space
(memory)?
how do we compare algorithms efficiency?
Verification given an algorithm that solves the
problem, how can we be sure that the algorithm is
correct?

3
1. Computability

Can computers become powerful enough as to enable
us to solve any problem? is it just a matter of
waiting, or is there something more principled?
Answer there are problems which cannot be solved
by any computer!
This question was studied by mathematicians of
the early 20th century, leading to one famous
counterexample - the Halting Problem (Alan
Turing, 1937)

4
The Halting Problem assumption

Problem given a program P and input x, does the
program P halt on the input x?
Assumption this problem is computable
there is an algorithm which always returns a
yes/no answer
there exists a method
booelan doesHalt(P,x)
that returns true if P halts on the specified
input x, and false if P does not halt on the
specified input x
Goal find a contradiction

5
Method doesHalt

booelan doesHalt(String P String x)
// implements algorithm which determines if
program P halts on input x
read the program P (which is just a text file)
read the input x
run the algorithm
return true if P halts on the specified input x
return false if P does not halt on input x

6
The Halting Problem Setup

Define a new method
testHalt(String P)
if (doesHalt(P,P))
loop forever
else
print halt
testHalt(P) does the opposite of doesHalt(P,P)

7
The logical catch

What happens if we run testHalt, and give it as
input testHalt itself
testHalt(testHalt)
??

8
The Halting Problem Paradox

Suppose testHalt(testHalt) terminates and prints
halt
? doesHalt(testHalt,testHalt) returned false
? testHalt(testHalt) does not terminate
Suppose testHalt(testHalt) loops forever
? doesHalt(testHalt,testHalt) returned true
? testHalt(testHalt) terminates
Conclusion method testHalt() cannot exist
therefore our assumption is wrong
we say that the Halting Problem is undecidable
(???? ?????)

9
Decidability - the Bright Side

We have already seen that
Many problems can be solved algorithmically
There may be more than one way to solve a
particular problem

10
Models of computation

ideal computer model simple to analyze, yet as
powerful
necessary features of a computing model
accepts input
stores and retrieves information (memory)
takes actions depending on internal state and
input
produces output

11
Conceptual Model Turing Machine

Information representation
alphabet containing b, 0, 1, x,y,
a finite set of states
infinite tape divided to cells, holding
memory
input/output
each cell contains one symbol from alphabet,
with final number of non-blank symbols
a read/write head

12
Turing machine programs

Action (s,a) ? (s,a,1) ? s,a,s,a,
1
Interpretation for current state (s) input
symbol (a)
write a new symbol a
go into new state s
move one cell left (-1) or right (1)
such a collection of instructions is called a
Turing
machine program (and a model for an algorithm)

TM
13
2. Complexity Time Efficiency

How do we measure time efficiency?
Assume we have a problem P, with two algorithms
A1 and A2 that solve it
Suppose that the algorithms were implemented on a
computer, and their running times were measured
Algorithm A1 1.25 seconds
Algorithm A2 0.34 seconds
may we conclude that algorithm A2 is better?

probably not!
14
Time Efficiency Questions We Must Ask

Were the algorithms tested on the same computer?
Which computer did we use? Is there a preferred
benchmark computer to test the algorithms?
What were the inputs given to the algorithm? Were
the inputs equal? Of equal size?
Is there a better way for measuring time
efficiency, independent of a particular computer?

15
Operations per Input Size

Measure amount of work as a function of the
size of input given to the algorithm
In an array sorting algorithm - number of cells
to sort
In an algorithm for finding a word in a text -
number of characters, or number of words

16
Measuring Efficiency

measure
Number of steps the algorithm performs for
every input size ( as a function of the input
size)
definition of step
Anything that takes approximately constant
time to run (i.e. running time does not depend on
the input size)

17
Algorithmic Steps Examples

In a sort algorithm
switch two adjacent cells
In a search algorithm
Read content of next cell (or stop)
Find out if this is the element were looking for
In a numeric algorithm for multiplying two
numbers
multiply 2 digits / add 2 digits
These steps take constant time to perform,
which is not dependent upon the size of input (
length of list, or number of digits in number)

18
Advantages of the Suggested Measure

It is not dependent on a particular computer
To figure out the running time on a particular
computer, we
estimate how long it takes to perform a basic
step on the particular computer
multiply by the number of steps as calculated for
a specific input size

19
Example Character Search

Problem Find out if the character c is found in
a given text
Solution 1

found ? false while (more characters to read
and found false) read the next character in
the text if this character is c, found ? true If
(end of text reached) print (not found) else
print(found)
20
Solution 1 Time Analysis

Input size?
n Number of characters in text
What is a basic step?
Find out if end of text has been reached
Read next character in text
Test if character is c
What is the running time as function of input
size n?
In the worst case, no more than n basic steps
2 operations before and after loop
T(n) ? 3n 2

21
Character Search Simple Improvement

Solution 2

found ? false add c to end of text while (found
false) read the next character in the text if
this character is c, found ? true If (end of text
reached) print (not found) else
print(found) Remove c from end of text
22
Solution 2 Time Analysis

The basic step is different
Read next character in text
Test if character is c
In the worst case, the running time of Solution 2
is
T(n) ? 2n 4
Consequences
we shortened the time it takes to perform the
basic step
but
we added a constant to the overall running time
Question are we better off?

23
Running Time Tables
Input Size 1 3 5 10 100 1000 30000 3000000
3n 2 5 11 17 32 302 3002 90002 9000002
2n 4 6 10 14 24 204 2004 60004 6000004
improvement ratio 0.83 1.1 1.21 1.33 1.48 1.5 1
.5 1.5
improvement by factor the ratio between the
running times of both solutions, as n grows,
converges to a constant
24
Best, Average and Worst cases

We analyzed the worst case, in which the
character c is not in the text
Other possibilities average case
What is the advantage of measuring the worst
case?
The average case is a good measure, but it
characterizes only the overall performance over
many inputs
Computing the average case is quite complex
What information does best case analysis give us?

25
Finding Phone Number in Phonebook

Problem find if a number x appears in a sorted
array of numbers (e.g., a phonebook)
We can use the algorithms we developed for
character search (both are variants of the serial
search method)
However, the assumption that the array is sorted
can be used in a clever way

26
Binary Search

Basic idea cut out half of the search space in
every step
The basic step in binary search
Divide the remaining search space to 2
Find out which half space contains the number
were looking for, and call it the remaining
search space
Check termination condition the number is found
in the mid-point, or the remaining search space
is of size 1
The basic step in serial search
Calculate the next cell to look for (index
index 1)
Find out if this cell contains the number were
looking for
Check termination condition the number is found,
or the end of the array is reached

27
Search Efficiency Analysis

Suppose that the search array has 1000 cells
Binary Search in the worst case we inspect
mid-points of ranges of size 1000, 500, 250, 125,
63, 32, 16, 8, 4, 2, total of 10 steps
Serial search 1,000 steps
How many cells in the general case?
With million cells
Binary Search 20 steps in the worst case
Serial search 1,000,000 steps

28
Binary vs. Serial - Number of Steps
Input Size 10 100 1000 10000 100000 1000000
serial 10 100 1000 10000 100000 1000000
binary 4 7 10 14 17 20
improvement ratio 2.5 14 100 714 5883 50000

improvement ratio grows as the input size grows
it is called improvement by order of magnitude
in contrast, with improvement in factor, the
improvement ratio had reached a constant plateau

29
What About the Cost of Basic Step?

When we dealt with improvement in factor, the
duration of a basic step was very interesting
the improvement was the ratio between the
durations of basic steps
Is it important now?
For example, assume that a single step in a
serial search takes 1 time units, and that a
single step in a binary search takes 1000 time
units would there still be an improvement?

30
Binary vs. Serial - Different Step Duration
Input Size 10 100 1000 10000 100000 1000000 10000
000 100000000
serial 10 100 1000 10000 100000 1000000 10000000
100000000
binary 4000 7000 10000 14000 17000 20000 24000 27
000
improvement ratio 0.0025 0.014 0.1 0.714 5.8
8 50 417 3,704
31
Duration of Basic Step is Negligible

Even with an unfavorable basic step duration
ratio of 1000/1
for small input sizes (lt 10000) - serial search
wins
for larger input sizes - binary search wins
The reason
the ratio between the duration of basic steps is
constant
the ratio between the number of basic steps grows
as the input size grows
Consequence the dominant factor as the input
size grows is the number of basic steps, not
their duration

32
Complexity of algorithms

We saw two basic kind of improvements in running
time of an algorithm
by factor
by order of magnitude
For large inputs the latter improvement is much
more significant, canceling any increase in basic
step cost
This is why we only pay attention to the
dominant element in two running time functions,
or their order of magnitude

33
Linear Order

In serial search, any running time function will
be of the form f(n) an b, a linear function
We say that the complexity of the algorithms is
linear
Linear order is denoted by f(n) O(n) this is
called the Big-O notation
Note that the ratio between any two linear
functions is constant for large enough n,
approaching the ratio between the duration of the
basic steps

34
Complexity order of Magnitude

In general, two functions are of the same order
if the ratio between their values is constant for
large enough n
Example, all these functions are of quadratic
order
n2, 5n2 6, 5n2 100n - 90, 5000n2,
n2/6
Hierarchy of orders of magnitude
O(log n) logarithmic
O(n) linear
O(n2) quadratic
O(nk) (k gt2) polynomial
O(2n) exponential

35
Order of Magnitude - Neglecting Minor Elements

When we compare functions we mostly pay attention
to the largest order of magnitude
Example suppose we have two algorithms A1 and A2
whose running times are 100n and n2/100
for n gt 10000, n2/100 gt 100n
We prefer A2 if the input size is less than
10000, and prefer A1 otherwise

36
Example Prime Test

Problem determine if a number n is prime
First attempt
check if 2..n/2 are dividers of n
complexity ?n ? O(n)
Second attempt
check only odd dividers (since n
cannot be even)
complexity ?n/2 ? O(n)
Third attempt
check only odd dividers in 2..sqrt(n)
complexity O(?n)

37
Example Two Letter Occurrences

Problem for a given text input, find the most
frequent occurrence of an adjacent two letter
pair in the text
First attempt
For every pair that appears in the text, count
how many times this pair appears in the text, and
find the maximum
Complexity (n-1) (n-1) n2 - 2n 1 O(n2)
Second attempt
Use a two-dimensional 26x26 array
Complexity (n - 1) 22626 O(n)
Tradeoff added storage complexity, reduced time
complexity

38
Example Ternary Search

Split the search space into three parts
Is it an improvement in order of magnitude? in
factor?

39
Example Sort

Sorting is the process of arranging a list of
items into a particular order
There must be some value on which the order is
based
There are many algorithms for sorting a list of
items, which vary in efficiency
We will examine two specific algorithms
Selection Sort
Insertion Sort

40
Selection Sort

The approach of Selection Sort
select one value and put it in its final place in
the sort list
repeat for all other values
In more detail
find the smallest value in the list
switch it with the value in the first position
find the next smallest value in the list
switch it with the value in the second position
repeat until all values are placed

selection
41
public static void selectionSort (int
numbers) int min, temp for
(int index 0 index lt numbers.length-1
index) min index
for (int scan index1 scan lt numbers.length
scan) if (numbersscan lt
numbersmin) min scan
// Swap the values temp
numbersmin numbersmin
numbersindex numbersindex temp

42
Insertion Sort

The approach of Insertion Sort
Pick any item, insert it into its proper place in
a sorted sublist
repeat until all items have been inserted
In more detail
consider the first item to be a sorted sublist
(of one item)
insert the second item into the sorted sublist,
shifting items as necessary to make room to
insert the new addition
insert the third item into the sorted sublist (of
two items), shifting as necessary
repeat until all values are inserted into their
proper position

insertion
43
public static void insertionSort (int
numbers) for (int index 1 index lt
numbers.length index) int key
numbersindex int position
index // shift larger values to the
right while (position gt 0
numbersposition-1 gt key)
numbersposition numbersposition-1
position--
numbersposition key
44
Comparing Sorts

Both Selection and Insertion sorts are similar in
efficiency, same order of magnitude
Both have outer loops that scan all elements, and
inner loops that compare the value of the outer
loop with almost all values in the list
Therefore approximately n2 number of comparisons
are made to sort a list of size n
We therefore say that these sorts are of order n2
Still, there is a difference in factor in average
time
inner loop of insertion sort inspects on average
half the elements
Finally, there are numerous other sort algorithms
which are more efficient in order of magnitude,
e.g., order n(log n)

Sorts
45
Example The Sorted Array Sum Problem

Input Sorted array A of n numbers, and a number
S
Output Are there two numbers in the array whose
sum is S?
Algorithm 1 For each pair of numbers, check if
their sum is S
Complexity 1 n (n-1) / 2 pairs, quadratic
complexity
Algorithm 2 For each Ai, binary search S-Ai
Complexity 2 n log n
Algorithm 3 left, right pointers
If Aleft Aright S, finish
If Aleft Aright lt S, left
If Aleft Aright gt S, right--
Complexity 3 linear!

46
Why Bother with complexity?

Computers today are very fast, and perform
millions of operations per second
Nevertheless, improvement in order of magnitude
can reduce computation duration by seconds, hours
and even days
Moreover, the following fact appears to be true
for some problems, the only known algorithms take
so many steps, that even the fastest computers
today, and any that will ever exist, are unable
to solve the problem
Example The travelling salesperson (TSP) problem

47
The Traveling Salesman Problem

Problem find the shortest path which starts
at some city and traverses all other cities

6
8
11
5
13
8
6
3
7
4
11
48
Brute Force Solution to TSP

Algorithm
For each possible path, find its length
Choose the path with minimum length
Number of possible paths
At most (n-1)(n-2)1 (n-1)! (n factorial)
Complexity of algorithm n(n-1)! O(n!)
How long will it take to go over O(n!) paths for
growing input size n?

49
TSP Computing Times for Different Input Sizes
Suppose our computer computes million paths per
second
of cities 6
of paths 120
computing time 8 milliseconds
50
TSP Computing Times for Different Input Sizes
Suppose our computer computes million paths per
second
of cities 6 11
of paths 120 3,628,800
computing time 8 milliseconds 3.5 seconds
51
TSP Computing Times for Different Input Sizes
Suppose our computer computes million paths per
second
of cities 6 11 13
of paths 120 3,628,800 479,001,600
computing time 8 milliseconds 3.5 seconds 8
minutes
52
TSP Computing Times for Different Input Sizes
Suppose our computer computes million paths per
second
of cities 6 11 13 16
of paths 120 3,628,800 479,001,600 1,307,674,36
8,000
computing time 8 milliseconds 3.5 seconds 8
minutes 15 days
53
TSP Computing Times for Different Input Sizes
Suppose our computer computes million paths per
second
of cities 6 11 13 16 18
of paths 120 3,628,800 479,001,600 1,307,674,36
8,000 335,000,000,000,000
computing time 8 milliseconds 3.5 seconds 8
minutes 15 days 11 years
54
TSP Computing Times for Different Input Sizes
Suppose our computer computes million paths per
second
of cities 6 11 13 16 18 21
of paths 120 3,628,800 479,001,600 1,307,674,36
8,000 335,000,000,000,000 2,430,000,000,000,000,
000
computing time 8 milliseconds 3.5 seconds 8
minutes 15 days 11 years 77,000 years!
55
TSP - an Intractable Problem

TSP cannot be solved this way for reasonable
input sizes
The complexity of our algorithm for TSP
O(n!) ? O(2n) is exponential
Any exponential running time function implies
that the problem cannot be practically solved
(only for a carefully selected small set of
inputs)

TSP
56
Effect of Improved Technology
Size of Largest Problem Instance Solvable in 1
hour
Complexity n n2 n3 n5 2n 3n
With Present Computer N1 N2 N3 N4 N5 N6
With Computer 100 Times Faster 100N1 10N2 4.46N3
2.5N4 N5 6.64 N6 4.19
With Computer 1000 Times Faster 1000N1 31.6N2 10N
3 3.98N4 N5 9.97 N6 6.29
57
TSP - A Member of a Large Family

It may seem that TSP is just one problem
However, there is a whole set of problems, called
NP problems, from a large variety of areas, which
are very similar to TSP
Those problems are the focus of much CS research,
and yet no efficient (polynomial) algorithm has
been found
Although it has not been proven, it is strongly
believed that there is no efficient algorithm for
NP problems (this is the famous P NP problem)

58
The NP Complete Class

Many of the NP problems are complete, in the
sense that if an efficient solution to any one of
them is found, then all other NP problems can be
solved efficiently
This is true since
all the problems in the NP class were reduced to
a single NPC problem
this problem was reduced to many other NP
problems, each of which is therefore also NPC
A reduction from A to B means that given an
efficient algorithm that solves B, we can find an
efficient algorithm that solves A

59
Example of a Reduction Tree
If we find a solution to any of the red
problems, then we can find a solution to SAT
(backtrack), and all NP problems are solvable
SAT is reduced to another problem
SAT
Special Problem if it is solvable then any NP
problem is solvable
60
The Sorted Array Sum Revisited

Input Sorted array A of n numbers, and a number
S
Output Is there a group of numbers in the array
whose sum is S?
Possible solution for each possible group of
numbers, find out if its sum is S
Complexity number of groups 2n, therefore
complexity is exponential
This problem is known to be NP-Complete!

61
Examples of NP Complete Problems

Knapsack
Input set of elements U with weights a number B
Problem find a subset of U with max weight s.t.
sum of weights ? B
Minimum Set Cover
Input set of tasks to perform a group of people
who are able to perform each subsets of the set
of tasks
Problem find a minimal sized subgroup of people
who can perform all the tasks

62
More NPC Problems

Graph Coloring
For a long time map makers believed that if you
planned carefully you could color any map with
maximum of four colors many mathematicians tried
to prove this, but only recently with the aid of
a computer was it shown to be true
There is no known polynomial time algorithm to
color a graph with the minimum number of colors
Minimum Bin Packing (disk storage)
Input k files of size s1sk disk capacity M
Problem Find a partition of the files to disks
such that each disk will store at most M bytes,
where minimal number of disks are required

63
The Good News About NPC Problems

Although there is no efficient algorithm known
that can solve NP problems, there are other
approaches
Approximation Some problems have efficient
algorithms which approximate the solution, i.e.,
find a solution which is optimal within a factor
Randomization Some problems have efficient
algorithms, which use coins, and find a good
solution with high probability
Average case some NP problems are not so hard
on average need statistical approaches