Title: COM328M2: Algorithms and Data Structures
1COM328M2 Algorithms and Data Structures
- Dr Zumao Weng
- http//www.infm.ulst.ac.uk/zumao/teaching/COM328M
2 - School of Computing and Intelligent Systems
- University of Ulster at Magee
- 2-2-2009
2Chapter 2 Analysis of Algorithms II
3Problem Solving Main Steps
- Problem definition
- Algorithm design / Algorithm specification
- Algorithm analysis
- Implementation
- Testing
- Maintenance
41. Problem Definition
- What is the task to be accomplished?
- Calculate the average of the grades for a given
student - Understand the talks given out by politicians and
translate them in Chinese - What are the time / space / speed / performance
requirements ?
52. Algorithm Design / Specifications
- Algorithm Finite set of instructions that, if
followed, accomplishes a particular task. - Describe in natural language / pseudo-code /
diagrams / etc. - Criteria to follow
- Input Zero or more quantities (externally
produced) - Output One or more quantities
- Definiteness Clarity, precision of each
instruction - Finiteness The algorithm has to stop after a
finite (may be very large) number of steps - Effectiveness Each instruction has to be basic
enough and feasible - Understand speech
- Translate to Chinese
64,5,6 Implementation, Testing, Maintainance
- Implementation
- Decide on the programming language to use
- C, C, Lisp, Java, Perl, Prolog, assembly, etc.
, etc. - Write clean, well documented code
- Test, test, test
- Integrate feedback from users, fix bugs, ensure
compatibility across different versions ?
Maintenance
73. Algorithm Analysis
- Space complexity
- How much space is required
- Time complexity
- How much time does it take to run the algorithm
- Often, we deal with estimates!
8Space Complexity
- Space complexity The amount of memory required
by an algorithm to run to completion - Core dumps the most often encountered cause is
memory leaks the amount of memory required
larger than the memory available on a given
system - Some algorithms may be more efficient if data
completely loaded into memory - Need to look also at system limitations
- E.g. Classify 2GB of text in various categories
politics, tourism, sport, natural disasters,
etc. can I afford to load the entire
collection?
9Space Complexity (contd)
- Fixed part The size required to store certain
data/variables, that is independent of the size
of the problem - - e.g. name of the data collection
- - same size for classifying 2GB or 1MB of texts
- Variable part Space needed by variables, whose
size is dependent on the size of the problem - - e.g. actual text
- - load 2GB of text VS. load 1MB of text
10Space Complexity (contd)
- S(P) c S(instance characteristics)
- c constant
- Example
- void float sum (float a, int n)
-
- float s 0
- for (int i 0 iltn i)
- s ai
-
- return s
-
- Space? one word for n, one for a passed by
reference!, - one for i ? constant space!
11Time Complexity
- Often more important than space complexity
- space available (for computer programs!) tends to
be larger and larger - time is still a problem for all of us
- 3-4GHz processors on the market
- still
- researchers estimate that the computation of
various transformations for 1 single DNA chain
for one single protein on 1 TerraHZ computer
would take about 1 year to run to completion - Algorithms running time is an important issue
12Running Time
- Problem prefix averages
- Given an array X
- Compute the array A such that Ai is the average
of elements X0 Xi, for i0..n-1 - Sol 1
- At each step i, compute the element Xi by
traversing the array A and determining the sum of
its elements, respectively the average - Sol 2
- At each step i update a sum of the elements in
the array A - Compute the element Xi as sum/i
Big question Which solution to choose?
13Running time
Suppose the program includes an if-then statement
that may execute or not ? variable running
time Typically algorithms are measured by their
worst case
14Experimental Approach
- Write a program that implements the algorithm
- Run the program with data sets of varying size.
- Determine the actual running time using a system
call to measure time (e.g. system (date) ) - Problems?
15Experimental Approach
- It is necessary to implement and test the
algorithm in order to determine its running time.
- Experiments can be done only on a limited set of
inputs, and may not be indicative of the running
time for other inputs. - The same hardware and software should be used in
order to compare two algorithms. condition very
hard to achieve!
16Use a Theoretical Approach
- Based on high-level description of the
algorithms, rather than language dependent
implementations - Makes possible an evaluation of the algorithms
that is independent of the hardware and software
environments - ? Generality
17Algorithm Description
- How to describe algorithms independent of a
programming language - Pseudo-Code a description of an algorithm that
is - more structured than usual prose but
- less formal than a programming language (Or
diagrams) - Example find the maximum element of an array.
- Algorithm arrayMax(A, n)
- Input An array A storing n integers.
- Output The maximum element in A.
- currentMax ? A0
- for I ? 1 to n -1 do
- if currentMax lt Ai then currentMax ? Ai
- return currentMax
18Pseudo Code
- Expressions use standard mathematical symbols
- use ? for assignment ( ? in C/C)
- use for the equality relationship (? in C/C)
- Method Declarations -Algorithm
name(param1, param2) - Programming Constructs
- decision structures if ... then ... else ..
- while-loops while ... do
- repeat-loops repeat ... until ...
- for-loop for ... do
- array indexing Ai
- Methods
- calls object method(args)
- returns return value
- Use comments
- Instructions have to be basic enough and feasible!
19Low Level Algorithm Analysis
- Based on primitive operations (low-level
computations independent from the programming
language) - E.g.
- Make an addition 1 operation
- Calling a method or returning from a method 1
operation - Index in an array 1 operation
- Comparison 1 operation etc.
- Method Inspect the pseudo-code and count the
number of primitive operations executed by the
algorithm
20Example
- Algorithm arrayMax(A, n) Input An array A
storing n integers. Output The maximum element
in A.currentMax ?A0for i ? 1 to n -1 doif
currentMax lt Ai then currentMax ?
Aireturn currentMax - How many operations ?
21Asymptotic Notation
- Need to abstract further
- Give an idea of how the algorithm performs
- n steps vs. n5 steps
- n steps vs. n2 steps
22Problem
- Fibonacci numbers
- F0 0
- F1 1
- Fi Fi-1 Fi-2 for i ? 2
- Pseudo-code
- Number of operations
23Algorithm Analysis
- Abstract even further
- Characterise an algorithm as a function of the
problem size - E.g.
- Input data array ? problem size is N (length of
array) - Input data matrix ? problem size is N x M
24Asymptotic Notation
- Goal to simplify analysis by getting rid of
unneeded information (like rounding
1,000,0011,000,000) - We want to say in a formal way 3n2 n2
- The Big-Oh Notation
- given functions f(n) and g(n), we say that f(n)
is O(g(n)) if and only if there are positive
constants c and n0 such that f(n) c g(n) for n
n0
25Graphic Illustration
- f(n) 2n6
- Conf. def
- Need to find a function g(n) and a const. c such
as f(n) lt cg(n) - g(n) n and c 4
- ? f(n) is O(n)
- The order of f(n) is n
c g
n
(
n
)
4
g
n
(
n
)
n
26More examples
- What about f(n) 4n2 ? Is it O(n)?
- Find a c such that 4n2 lt cn for any n gt n0
- 50n3 20n 4 is O(n3)
- Would be correct to say is O(n3n)
- Not useful, as n3 exceeds by far n, for large
values - Would be correct to say is O(n5)
- OK, but g(n) should be as closed as possible to
f(n) - 3log(n) log (log (n)) O( ? )
Simple Rule Drop lower order terms and constant
factors
27Properties of Big-Oh
- If f(n) is O(g(n)) then af(n) is O(g(n)) for any
a. - If f(n) is O(g(n)) and h(n) is O(g(n)) then
f(n)h(n) is O(g(n)g(n)) - If f(n) is O(g(n)) and h(n) is O(g(n)) then
f(n)h(n) is O(g(n)g(n)) - If f(n) is O(g(n)) and g(n) is O(h(n)) then f(n)
is O(h(n)) - If f(n) is a polynomial of degree d , then f(n)
is O(nd) - nx O(an), for any fixed x gt 0 and a gt 1
- An algorithm of order n to a certain power is
better than an algorithm of order a ( gt 1) to the
power of n - log nx is O(log n), fox x gt 0 how?
- log x n is O(ny) for x gt 0 and y gt 0
- An algorithm of order log n (to a certain power)
is better than an algorithm of n raised to a
power y.
28Asymptotic analysis - terminology
- Special classes of algorithms
- logarithmic O(log n)
- linear O(n)
- quadratic O(n2)
- polynomial O(nk), k 1
- exponential O(an), n gt 1
- Polynomial vs. exponential ?
- Logarithmic vs. polynomial ?
29Some Numbers
30Relatives of Big-Oh
- Relatives of the Big-Oh
- ? (f(n)) Big Omega asymptotic lower bound
- ? (f(n)) Big Theta asymptotic tight bound
- Big-Omega think of it as the inverse of O(n)
- g(n) is ? (f(n)) if f(n) is O(g(n))
- Big-Theta combine both Big-Oh and Big-Omega
- f(n) is ? (g(n)) if f(n) is O(g(n)) and g(n) is ?
(f(n)) - Make the difference
- 3n3 is O(n) and is ? (n)
- 3n3 is O(n2) but is not ? (n2)
31More relatives
- Little-oh f(n) is o(g(n)) if for any cgt0 there
is n0 such that f(n) lt c(g(n)) for n gt n0. - Little-omega
- Little-theta
- 2n3 is o(n2)
- 2n 3 is o(n) ?
32Example
- Remember the algorithm for computing prefix
averages - - compute an array A starting with an array X
- - every element Ai is the average of all
elements Xj with j lt i - Remember some pseudo-code Solution 1
- Algorithm prefixAverages1(X)
- Input An n-element array X of numbers.
- Output An n -element array A of numbers such
that Ai is the average of elements X0, ... ,
Xi. - Let A be an array of n numbers.
- for i? 0 to n - 1 do
- a ? 0
- for j ? 0 to i do
- a ? a Xj
- Ai ? a/(i 1)
- return array A
33Example (contd)
- Algorithm prefixAverages2(X)
- Input An n-element array X of numbers.
- Output An n -element array A of numbers such
that Ai is the average of elements X0, ... ,
Xi. - Let A be an array of n numbers.
- s? 0
- for i ? 0 to n do
- s ? s Xi
- Ai ? s/(i 1)
- return array A
34Back to the original question
- Which solution would you choose?
- O(n2) vs. O(n)
- Some math
- properties of logarithms
- logb(xy) logbx logby
- logb (x/y) logbx - logby
- logbxa alogbx
- logba logxa/logxb
- properties of exponentials
- a(bc) aba c
- abc (ab)c
- ab /ac a(b-c)
- b a logab
- bc a clogab
35Important Series
Sum of squares Sum of exponents Geometric
series Special case when A 2 20 21 22
2N 2N1 - 1
36Analysing recursive algorithms
- function foo (param A, param B)
- statement 1
- statement 2
- if (termination condition)
- return
- foo(A, B)
-
37Solving recursive equations by repeated
substitution
- T(n) T(n/2) c substitute for T(n/2)
- T(n/4) c c substitute for T(n/4)
- T(n/8) c c c
- T(n/23) 3c in more compact form
-
- T(n/2k) kc inductive leap
- T(n) T(n/2logn) clogn choose k
logn - T(n/n) clogn
- T(1) clogn b clogn ?(logn)
38Solving recursive equations by telescoping
- T(n) T(n/2) c initial equation
- T(n/2) T(n/4) c so this holds
- T(n/4) T(n/8) c and this
- T(n/8) T(n/16) c and this
-
- T(4) T(2) c eventually
- T(2) T(1) c and this
- T(n) T(1) clogn sum equations,
canceling the terms appearing on both sides - T(n) ?(logn)
39Problem
- Running time for finding a number in a sorted
array - binary search
- Pseudo-code
- Running time analysis
40ADT
- ADT Abstract Data Types
- A logical view of the data objects together with
specifications of the operations required to
create and manipulate them. - Describe an algorithm pseudo-code
- Describe a data structure ADT
41What is a data type?
- A set of objects, each called an instance of the
data type. Some objects are sufficiently
important to be provided with a special name. - A set of operations. Operations can be realized
via operators, functions, procedures, methods,
and special syntax (depending on the implementing
language) - Each object must have some representation (not
necessarily known to the user of the data type) - Each operation must have some implementation
(also not necessarily known to the user of the
data type)
42What is a representation?
- A specific encoding of an instance
- This encoding MUST be known to implementors of
the data type but NEED NOT be known to users of
the data type - Terminology "we implement data types using data
structures
43Two varieties of data types
- Opaque data types in which the representation is
not known to the user. - Transparent data types in which the
representation is profitably known to the user-
i.e. the encoding is directly accessible and/or
modifiable by the user. - Which one you think is better?
- What are the means provided by C for creating
opaque data types?
44Why are opaque data types better?
- Representation can be changed without affecting
user - Forces the program designer to consider the
operations more carefully - Encapsulates the operations
- Allows less restrictive designs which are easier
to extend and modify - Design always done with the expectation that the
data type will be placed in a library of types
available to all.
45How to design a data typeStep 1 Specification
- Make a list of the operations (just their names)
you think you will need. Review and refine the
list. - Decide on any constants which may be required.
- Describe the parameters of the operations in
detail. - Describe the semantics of the operations (what
they do) as precisely as possible.
46How to design a data type Step 2 Application
- Develop a real or imaginary application to test
the specification. - Missing or incomplete operations are found as a
side-effect of trying to use the specification.
47How to design a data typeStep 3 Implementation
- Decide on a suitable representation.
- Implement the operations.
- Test, debug, and revise.
48Example - ADT Integer
- Name of ADT Integer
- Operation Description
C/C - Create Defines an identifier with an
- undefined value int
id1 - Assign Assigns the value of one integer id1
id2 - identifier or value to another integer
- identifier
- isEqual Returns true if the values associated
id1 id2 - with two integer identifiers are the
- same
49Example ADT Integer
- LessThan Returns true if an identifier integer
is - less than the value of the second id1ltid2
- integer identifier
- Negative Returns the negative of the integer
value -id1 - Sum Returns the sum of two integer values
id1id2 - Operation Signatures
- Create identifier ? Integer
- Assign Integer ? Identifier
- IsEqual (Integer,Integer) ? Boolean
- LessThan (Integer,Integer) ? Boolean
- Negative Integer ? Integer
- Sum (Integer,Integer) ? Integer
50More examples
- Well see more examples throughout the module
- Stack
- Queue
- Tree
- And more