Title: Unit 12: Theory of Computation
1Unit 12 Theory of Computation
syllabus
- Algorithms design the limits of algorithms -
some problems are unsolvable - Algorithms efficiency how do we measure the
efficiency of an algorithm? - Improvement by factor and by order of magnitude
- Some examples of complexity analysis
- Intractable problems
basic programming concepts
object oriented programming
topics in computer science
2Theory of Computation Questions
- Computability (????????) are there algorithms
which can solve our problem? Is there something
we can say about every algorithm which solves the
problem? - Complexity (????????) how good is an algorithm
which solves the problem? - is it efficient in terms of processing steps
(time)? - is it efficient in terms of storage space
(memory)? - how do we compare algorithms efficiency?
- Verification given an algorithm that solves the
problem, how can we be sure that the algorithm is
correct?
31. Computability
- Can computers become powerful enough as to enable
us to solve any problem? is it just a matter of
waiting, or is there something more principled? - Answer there are problems which cannot be solved
by any computer! - This question was studied by mathematicians of
the early 20th century, leading to one famous
counterexample - the Halting Problem (Alan
Turing, 1937)
4The Halting Problem assumption
- Problem given a program P and input x, does the
program P halt on the input x? - Assumption this problem is computable
- there is an algorithm which always returns a
yes/no answer - there exists a method
- booelan doesHalt(P,x)
- that returns true if P halts on the specified
input x, and false if P does not halt on the
specified input x - Goal find a contradiction
5Method doesHalt
- booelan doesHalt(String P String x)
- // implements algorithm which determines if
program P halts on input x -
- read the program P (which is just a text file)
- read the input x
- run the algorithm
- return true if P halts on the specified input x
- return false if P does not halt on input x
6The Halting Problem Setup
- Define a new method
- testHalt(String P)
- if (doesHalt(P,P))
- loop forever
- else
- print halt
-
- testHalt(P) does the opposite of doesHalt(P,P)
7The logical catch
- What happens if we run testHalt, and give it as
input testHalt itself - testHalt(testHalt)
- ??
8The Halting Problem Paradox
- Suppose testHalt(testHalt) terminates and prints
halt - ? doesHalt(testHalt,testHalt) returned false
- ? testHalt(testHalt) does not terminate
- Suppose testHalt(testHalt) loops forever
- ? doesHalt(testHalt,testHalt) returned true
- ? testHalt(testHalt) terminates
- Conclusion method testHalt() cannot exist
- therefore our assumption is wrong
- we say that the Halting Problem is undecidable
(???? ?????)
9Decidability - the Bright Side
- We have already seen that
- Many problems can be solved algorithmically
- There may be more than one way to solve a
particular problem
10Models of computation
- ideal computer model simple to analyze, yet as
powerful - necessary features of a computing model
- accepts input
- stores and retrieves information (memory)
- takes actions depending on internal state and
input - produces output
11Conceptual Model Turing Machine
- Information representation
- alphabet containing b, 0, 1, x,y,
- a finite set of states
- infinite tape divided to cells, holding
- memory
- input/output
- each cell contains one symbol from alphabet,
with final number of non-blank symbols - a read/write head
12Turing machine programs
- Action (s,a) ? (s,a,1) ? s,a,s,a,
1 - Interpretation for current state (s) input
symbol (a) - write a new symbol a
- go into new state s
- move one cell left (-1) or right (1)
- such a collection of instructions is called a
Turing - machine program (and a model for an algorithm)
TM
132. Complexity Time Efficiency
- How do we measure time efficiency?
- Assume we have a problem P, with two algorithms
A1 and A2 that solve it - Suppose that the algorithms were implemented on a
computer, and their running times were measured - Algorithm A1 1.25 seconds
- Algorithm A2 0.34 seconds
- may we conclude that algorithm A2 is better?
probably not!
14Time Efficiency Questions We Must Ask
- Were the algorithms tested on the same computer?
- Which computer did we use? Is there a preferred
benchmark computer to test the algorithms? - What were the inputs given to the algorithm? Were
the inputs equal? Of equal size? - Is there a better way for measuring time
efficiency, independent of a particular computer?
15Operations per Input Size
- Measure amount of work as a function of the
size of input given to the algorithm - In an array sorting algorithm - number of cells
to sort - In an algorithm for finding a word in a text -
number of characters, or number of words
16Measuring Efficiency
- measure
- Number of steps the algorithm performs for
every input size ( as a function of the input
size) - definition of step
- Anything that takes approximately constant
time to run (i.e. running time does not depend on
the input size)
17Algorithmic Steps Examples
- In a sort algorithm
- switch two adjacent cells
- In a search algorithm
- Read content of next cell (or stop)
- Find out if this is the element were looking for
- In a numeric algorithm for multiplying two
numbers - multiply 2 digits / add 2 digits
- These steps take constant time to perform,
which is not dependent upon the size of input (
length of list, or number of digits in number)
18Advantages of the Suggested Measure
- It is not dependent on a particular computer
- To figure out the running time on a particular
computer, we - estimate how long it takes to perform a basic
step on the particular computer - multiply by the number of steps as calculated for
a specific input size
19Example Character Search
- Problem Find out if the character c is found in
a given text - Solution 1
found ? false while (more characters to read
and found false) read the next character in
the text if this character is c, found ? true If
(end of text reached) print (not found) else
print(found)
20Solution 1 Time Analysis
- Input size?
- n Number of characters in text
- What is a basic step?
- Find out if end of text has been reached
- Read next character in text
- Test if character is c
- What is the running time as function of input
size n? - In the worst case, no more than n basic steps
2 operations before and after loop - T(n) ? 3n 2
21Character Search Simple Improvement
found ? false add c to end of text while (found
false) read the next character in the text if
this character is c, found ? true If (end of text
reached) print (not found) else
print(found) Remove c from end of text
22Solution 2 Time Analysis
- The basic step is different
- Read next character in text
- Test if character is c
- In the worst case, the running time of Solution 2
is - T(n) ? 2n 4
- Consequences
- we shortened the time it takes to perform the
basic step - but
- we added a constant to the overall running time
- Question are we better off?
23Running Time Tables
Input Size 1 3 5 10 100 1000 30000 3000000
3n 2 5 11 17 32 302 3002 90002 9000002
2n 4 6 10 14 24 204 2004 60004 6000004
improvement ratio 0.83 1.1 1.21 1.33 1.48 1.5 1
.5 1.5
improvement by factor the ratio between the
running times of both solutions, as n grows,
converges to a constant
24Best, Average and Worst cases
- We analyzed the worst case, in which the
character c is not in the text - Other possibilities average case
- What is the advantage of measuring the worst
case? - The average case is a good measure, but it
characterizes only the overall performance over
many inputs - Computing the average case is quite complex
- What information does best case analysis give us?
25Finding Phone Number in Phonebook
- Problem find if a number x appears in a sorted
array of numbers (e.g., a phonebook) - We can use the algorithms we developed for
character search (both are variants of the serial
search method) - However, the assumption that the array is sorted
can be used in a clever way
26Binary Search
- Basic idea cut out half of the search space in
every step - The basic step in binary search
- Divide the remaining search space to 2
- Find out which half space contains the number
were looking for, and call it the remaining
search space - Check termination condition the number is found
in the mid-point, or the remaining search space
is of size 1 - The basic step in serial search
- Calculate the next cell to look for (index
index 1) - Find out if this cell contains the number were
looking for - Check termination condition the number is found,
or the end of the array is reached
27Search Efficiency Analysis
- Suppose that the search array has 1000 cells
- Binary Search in the worst case we inspect
mid-points of ranges of size 1000, 500, 250, 125,
63, 32, 16, 8, 4, 2, total of 10 steps - Serial search 1,000 steps
- How many cells in the general case?
- With million cells
- Binary Search 20 steps in the worst case
- Serial search 1,000,000 steps
28Binary vs. Serial - Number of Steps
Input Size 10 100 1000 10000 100000 1000000
serial 10 100 1000 10000 100000 1000000
binary 4 7 10 14 17 20
improvement ratio 2.5 14 100 714 5883 50000
- improvement ratio grows as the input size grows
- it is called improvement by order of magnitude
- in contrast, with improvement in factor, the
improvement ratio had reached a constant plateau
29What About the Cost of Basic Step?
- When we dealt with improvement in factor, the
duration of a basic step was very interesting
the improvement was the ratio between the
durations of basic steps - Is it important now?
- For example, assume that a single step in a
serial search takes 1 time units, and that a
single step in a binary search takes 1000 time
units would there still be an improvement?
30Binary vs. Serial - Different Step Duration
Input Size 10 100 1000 10000 100000 1000000 10000
000 100000000
serial 10 100 1000 10000 100000 1000000 10000000
100000000
binary 4000 7000 10000 14000 17000 20000 24000 27
000
improvement ratio 0.0025 0.014 0.1 0.714 5.8
8 50 417 3,704
31Duration of Basic Step is Negligible
- Even with an unfavorable basic step duration
ratio of 1000/1 - for small input sizes (lt 10000) - serial search
wins - for larger input sizes - binary search wins
- The reason
- the ratio between the duration of basic steps is
constant - the ratio between the number of basic steps grows
as the input size grows - Consequence the dominant factor as the input
size grows is the number of basic steps, not
their duration
32Complexity of algorithms
- We saw two basic kind of improvements in running
time of an algorithm - by factor
- by order of magnitude
- For large inputs the latter improvement is much
more significant, canceling any increase in basic
step cost - This is why we only pay attention to the
dominant element in two running time functions,
or their order of magnitude
33Linear Order
- In serial search, any running time function will
be of the form f(n) an b, a linear function - We say that the complexity of the algorithms is
linear - Linear order is denoted by f(n) O(n) this is
called the Big-O notation - Note that the ratio between any two linear
functions is constant for large enough n,
approaching the ratio between the duration of the
basic steps
34Complexity order of Magnitude
- In general, two functions are of the same order
if the ratio between their values is constant for
large enough n - Example, all these functions are of quadratic
order - n2, 5n2 6, 5n2 100n - 90, 5000n2,
n2/6 - Hierarchy of orders of magnitude
- O(log n) logarithmic
- O(n) linear
- O(n2) quadratic
- O(nk) (k gt2) polynomial
- O(2n) exponential
35Order of Magnitude - Neglecting Minor Elements
- When we compare functions we mostly pay attention
to the largest order of magnitude - Example suppose we have two algorithms A1 and A2
whose running times are 100n and n2/100 - for n gt 10000, n2/100 gt 100n
- We prefer A2 if the input size is less than
10000, and prefer A1 otherwise
36Example Prime Test
- Problem determine if a number n is prime
- First attempt
- check if 2..n/2 are dividers of n
- complexity ?n ? O(n)
- Second attempt
- check only odd dividers (since n
cannot be even) - complexity ?n/2 ? O(n)
- Third attempt
- check only odd dividers in 2..sqrt(n)
- complexity O(?n)
37Example Two Letter Occurrences
- Problem for a given text input, find the most
frequent occurrence of an adjacent two letter
pair in the text - First attempt
- For every pair that appears in the text, count
how many times this pair appears in the text, and
find the maximum - Complexity (n-1) (n-1) n2 - 2n 1 O(n2)
- Second attempt
- Use a two-dimensional 26x26 array
- Complexity (n - 1) 22626 O(n)
- Tradeoff added storage complexity, reduced time
complexity
38Example Ternary Search
- Split the search space into three parts
- Is it an improvement in order of magnitude? in
factor?
39Example Sort
- Sorting is the process of arranging a list of
items into a particular order - There must be some value on which the order is
based - There are many algorithms for sorting a list of
items, which vary in efficiency - We will examine two specific algorithms
- Selection Sort
- Insertion Sort
40Selection Sort
- The approach of Selection Sort
- select one value and put it in its final place in
the sort list - repeat for all other values
- In more detail
- find the smallest value in the list
- switch it with the value in the first position
- find the next smallest value in the list
- switch it with the value in the second position
- repeat until all values are placed
selection
41 public static void selectionSort (int
numbers) int min, temp for
(int index 0 index lt numbers.length-1
index) min index
for (int scan index1 scan lt numbers.length
scan) if (numbersscan lt
numbersmin) min scan
// Swap the values temp
numbersmin numbersmin
numbersindex numbersindex temp
42Insertion Sort
- The approach of Insertion Sort
- Pick any item, insert it into its proper place in
a sorted sublist - repeat until all items have been inserted
- In more detail
- consider the first item to be a sorted sublist
(of one item) - insert the second item into the sorted sublist,
shifting items as necessary to make room to
insert the new addition - insert the third item into the sorted sublist (of
two items), shifting as necessary - repeat until all values are inserted into their
proper position
insertion
43 public static void insertionSort (int
numbers) for (int index 1 index lt
numbers.length index) int key
numbersindex int position
index // shift larger values to the
right while (position gt 0
numbersposition-1 gt key)
numbersposition numbersposition-1
position--
numbersposition key
44Comparing Sorts
- Both Selection and Insertion sorts are similar in
efficiency, same order of magnitude - Both have outer loops that scan all elements, and
inner loops that compare the value of the outer
loop with almost all values in the list - Therefore approximately n2 number of comparisons
are made to sort a list of size n - We therefore say that these sorts are of order n2
- Still, there is a difference in factor in average
time - inner loop of insertion sort inspects on average
half the elements - Finally, there are numerous other sort algorithms
which are more efficient in order of magnitude,
e.g., order n(log n)
Sorts
45Example The Sorted Array Sum Problem
- Input Sorted array A of n numbers, and a number
S - Output Are there two numbers in the array whose
sum is S? - Algorithm 1 For each pair of numbers, check if
their sum is S - Complexity 1 n (n-1) / 2 pairs, quadratic
complexity - Algorithm 2 For each Ai, binary search S-Ai
- Complexity 2 n log n
- Algorithm 3 left, right pointers
- If Aleft Aright S, finish
- If Aleft Aright lt S, left
- If Aleft Aright gt S, right--
- Complexity 3 linear!
46Why Bother with complexity?
- Computers today are very fast, and perform
millions of operations per second - Nevertheless, improvement in order of magnitude
can reduce computation duration by seconds, hours
and even days - Moreover, the following fact appears to be true
for some problems, the only known algorithms take
so many steps, that even the fastest computers
today, and any that will ever exist, are unable
to solve the problem - Example The travelling salesperson (TSP) problem
47The Traveling Salesman Problem
- Problem find the shortest path which starts
at some city and traverses all other cities
6
8
11
5
13
8
6
3
7
4
11
48Brute Force Solution to TSP
- Algorithm
- For each possible path, find its length
- Choose the path with minimum length
- Number of possible paths
- At most (n-1)(n-2)1 (n-1)! (n factorial)
- Complexity of algorithm n(n-1)! O(n!)
- How long will it take to go over O(n!) paths for
growing input size n?
49TSP Computing Times for Different Input Sizes
Suppose our computer computes million paths per
second
of cities 6
of paths 120
computing time 8 milliseconds
50TSP Computing Times for Different Input Sizes
Suppose our computer computes million paths per
second
of cities 6 11
of paths 120 3,628,800
computing time 8 milliseconds 3.5 seconds
51TSP Computing Times for Different Input Sizes
Suppose our computer computes million paths per
second
of cities 6 11 13
of paths 120 3,628,800 479,001,600
computing time 8 milliseconds 3.5 seconds 8
minutes
52TSP Computing Times for Different Input Sizes
Suppose our computer computes million paths per
second
of cities 6 11 13 16
of paths 120 3,628,800 479,001,600 1,307,674,36
8,000
computing time 8 milliseconds 3.5 seconds 8
minutes 15 days
53TSP Computing Times for Different Input Sizes
Suppose our computer computes million paths per
second
of cities 6 11 13 16 18
of paths 120 3,628,800 479,001,600 1,307,674,36
8,000 335,000,000,000,000
computing time 8 milliseconds 3.5 seconds 8
minutes 15 days 11 years
54TSP Computing Times for Different Input Sizes
Suppose our computer computes million paths per
second
of cities 6 11 13 16 18 21
of paths 120 3,628,800 479,001,600 1,307,674,36
8,000 335,000,000,000,000 2,430,000,000,000,000,
000
computing time 8 milliseconds 3.5 seconds 8
minutes 15 days 11 years 77,000 years!
55TSP - an Intractable Problem
- TSP cannot be solved this way for reasonable
input sizes - The complexity of our algorithm for TSP
O(n!) ? O(2n) is exponential - Any exponential running time function implies
that the problem cannot be practically solved
(only for a carefully selected small set of
inputs)
TSP
56Effect of Improved Technology
Size of Largest Problem Instance Solvable in 1
hour
Complexity n n2 n3 n5 2n 3n
With Present Computer N1 N2 N3 N4 N5 N6
With Computer 100 Times Faster 100N1 10N2 4.46N3
2.5N4 N5 6.64 N6 4.19
With Computer 1000 Times Faster 1000N1 31.6N2 10N
3 3.98N4 N5 9.97 N6 6.29
57TSP - A Member of a Large Family
- It may seem that TSP is just one problem
- However, there is a whole set of problems, called
NP problems, from a large variety of areas, which
are very similar to TSP - Those problems are the focus of much CS research,
and yet no efficient (polynomial) algorithm has
been found - Although it has not been proven, it is strongly
believed that there is no efficient algorithm for
NP problems (this is the famous P NP problem)
58The NP Complete Class
- Many of the NP problems are complete, in the
sense that if an efficient solution to any one of
them is found, then all other NP problems can be
solved efficiently - This is true since
- all the problems in the NP class were reduced to
a single NPC problem - this problem was reduced to many other NP
problems, each of which is therefore also NPC - A reduction from A to B means that given an
efficient algorithm that solves B, we can find an
efficient algorithm that solves A
59Example of a Reduction Tree
If we find a solution to any of the red
problems, then we can find a solution to SAT
(backtrack), and all NP problems are solvable
SAT is reduced to another problem
SAT
Special Problem if it is solvable then any NP
problem is solvable
60The Sorted Array Sum Revisited
- Input Sorted array A of n numbers, and a number
S - Output Is there a group of numbers in the array
whose sum is S? - Possible solution for each possible group of
numbers, find out if its sum is S - Complexity number of groups 2n, therefore
complexity is exponential - This problem is known to be NP-Complete!
61Examples of NP Complete Problems
- Knapsack
- Input set of elements U with weights a number B
- Problem find a subset of U with max weight s.t.
sum of weights ? B - Minimum Set Cover
- Input set of tasks to perform a group of people
who are able to perform each subsets of the set
of tasks - Problem find a minimal sized subgroup of people
who can perform all the tasks
62More NPC Problems
- Graph Coloring
- For a long time map makers believed that if you
planned carefully you could color any map with
maximum of four colors many mathematicians tried
to prove this, but only recently with the aid of
a computer was it shown to be true - There is no known polynomial time algorithm to
color a graph with the minimum number of colors - Minimum Bin Packing (disk storage)
- Input k files of size s1sk disk capacity M
- Problem Find a partition of the files to disks
such that each disk will store at most M bytes,
where minimal number of disks are required
63The Good News About NPC Problems
- Although there is no efficient algorithm known
that can solve NP problems, there are other
approaches - Approximation Some problems have efficient
algorithms which approximate the solution, i.e.,
find a solution which is optimal within a factor - Randomization Some problems have efficient
algorithms, which use coins, and find a good
solution with high probability - Average case some NP problems are not so hard
on average need statistical approaches