Title: Analysis of Algorithms CS 477677
1Analysis of AlgorithmsCS 477/677
- Final Exam Review
- Instructor George Bebis
2The Heap Data Structure
- Def A heap is a nearly complete binary tree with
the following two properties - Structural property all levels are full, except
possibly the last one, which is filled from left
to right - Order (heap) property for any node x
- Parent(x) x
8
7
4
5
2
Heap
3Array Representation of Heaps
- A heap can be stored as an array A.
- Root of tree is A1
- Parent of Ai A ?i/2?
- Left child of Ai A2i
- Right child of Ai A2i 1
- HeapsizeA lengthA
- The elements in the subarray A(?n/2?1) .. n
are leaves - The root is the max/min element of the heap
A heap is a binary tree that is filled in order
4Operations on Heaps(useful for sorting and
priority queues)
- MAX-HEAPIFY O(lgn)
- BUILD-MAX-HEAP O(n)
- HEAP-SORT O(nlgn)
- MAX-HEAP-INSERT O(lgn)
- HEAP-EXTRACT-MAX O(lgn)
- HEAP-INCREASE-KEY O(lgn)
- HEAP-MAXIMUM O(1)
- You should be able to show how these algorithms
perform on a given heap, and tell their running
time
5Lower Bound for Comparison Sorts
- Theorem Any comparison sort algorithm requires
?(nlgn) comparisons in the worst case. - Proof How many leaves does the tree have?
- At least n! (each of the n! permutations if the
input appears as some leaf) ? n! - At most 2h leaves
- ? n! 2h
- ? h lg(n!) ?(nlgn)
h
leaves
6Linear Time Sorting
- Any comparison sort will take at least nlgn to
sort an array of n numbers - We can achieve a better running time for sorting
if we can make certain assumptions on the input
data - Counting sort each of the n input elements is an
integer in the range 0, r and rO(n) - Radix sort the elements in the input are
integers represented as d-digit numbers in some
base-k where dT(1) and k O(n) - Bucket sort the numbers in the input are
uniformly distributed over the interval 0, 1)
7Analysis of Counting Sort
- Alg. COUNTING-SORT(A, B, n, k)
- for i ? 0 to r
- do C i ? 0
- for j ? 1 to n
- do CA j ? CA j 1
- Ci contains the number of elements equal to i
- for i ? 1 to r
- do C i ? C i Ci -1
- Ci contains the number of elements i
- for j ? n downto 1
- do BCA j ? A j
- CA j ? CA j - 1
?(r)
?(n)
?(r)
?(n)
Overall time ?(n r)
8RADIX-SORT
- Alg. RADIX-SORT(A, d)
- for i ? 1 to d
- do use a stable sort to sort array A on digit i
- 1 is the lowest order digit, d is the
highest-order digit
?(d(nk))
9Analysis of Bucket Sort
- Alg. BUCKET-SORT(A, n)
- for i ? 1 to n
- do insert Ai into list B?nAi?
- for i ? 0 to n - 1
- do sort list Bi with quicksort sort
- concatenate lists B0, B1, . . . , Bn -1
together in order - return the concatenated lists
O(n)
?(n)
O(n)
?(n)
10Hash Tables
- Direct addressing (advantages/disadvantages)
- Hashing
- Use a function h to compute the slot for each key
- Store the element (or a pointer to it) in slot
h(k) - Advantages of hashing
- Can reduce storage requirements to (K)
- Can still get O(1) search time in the average
case
11Hashing with Chaining
- How is the main idea?
- Practical issues?
- Analysis of INSERT, DELETE
- Analysis of SEARCH
- Worst case
- Average case
- (both successful and unsuccessful)
12Designing Hash Functions
- The division method
- h(k) k mod m
- The multiplication method
- h(k) ?m (k A mod 1)?
- Universal hashing
- Select a hash function at random,
- from a carefully designed class of
- functions
Advantage fast, requires only one
operation Disadvantage certain values of m give
are bad (powers of 2)
Disadvantage Slower than division
method Advantage Value of m is not critical
typically 2p
Advantage provides good results on average,
independently of the keys to be stored
13Open Addressing
- Main idea
- Different implementations
- Linear probing
- Quadratic probing
- Double hashing
- Know how each one of them works and their main
advantages/disadvantages - How do you insert/delete?
- How do you search?
- Analysis of searching
14Binary Search Tree
- Tree representation
- A linked data structure in which each node is an
object - Binary search tree property
- If y is in left subtree of x,
- then key y key x
- If y is in right subtree of x,
- then key y key x
15Operations on Binary Search Trees
- SEARCH O(h)
- PREDECESSOR O(h)
- SUCCESOR O(h)
- MINIMUM O(h)
- MAXIMUM O(h)
- INSERT O(h)
- DELETE O(h)
- You should be able to show how these algorithms
perform on a given binary search tree, and tell
their running time
16Red-Black-Trees Properties
- Binary search trees with additional properties
- Every node is either red or black
- The root is black
- Every leaf (NIL) is black
- If a node is red, then both its children are
black - For each node, all paths from the node to
descendant leaves contain the same number of
black nodes
17Properties of Red-Black-Trees
- Any node with height h has black-height h/2
- The subtree rooted at any node x contains at
least 2bh(x) - 1 internal nodes - No path is more than twice as long as any other
path ? the tree is balanced - Longest path h lt 2bh(root)
- Shortest path bh(root)
18Upper bound on the height of Red-Black-Trees
- Lemma A red-black tree with n internal nodes has
height at most 2lg(n 1). - Proof
- n
- Add 1 to both sides and then take logs
- n 1 2b 2h/2
- lg(n 1) h/2 ?
- h 2 lg(n 1)
root
height(root) h
bh(root) b
r
l
2b - 1
2h/2 - 1
number n of internal nodes
since b ? h/2
19Operations on Red-Black Trees
- SEARCH O(h)
- PREDECESSOR O(h)
- SUCCESOR O(h)
- MINIMUM O(h)
- MAXIMUM O(h)
- INSERT O(h)
- DELETE O(h)
- Red-black-trees guarantee that the height of the
tree will be O(lgn) - You should be able to show how these algorithms
perform on a given red-black tree (except for
delete), and tell their running time
20Adj. List - Adj. Matrix Comparison
Graph representation adjacency list, adjacency
matrix
matrices
lists
lists (mn) vs. n2 Â Â
lists (mn) vs. n2 Â Â
Adjacency list representation is better for most
applications
21Minimum Spanning Trees
- Given
- A connected, undirected, weighted graph G (V,
E) - A minimum spanning tree
- T connects all vertices
- w(T) S(u,v)?T w(u, v) is minimized
22Correctness of MST Algorithms(Prims and
Kruskals)
- Let A be a subset of some MST (i.e., T), (S, V -
S) be a cut that respects A, and (u, v) be a
light edge crossing (S, V-S). Then (u, v) is safe
for A . - Proof
- Let T be an MST that includes A
- edges in A are shaded
- Case1 If T includes (u,v), then
- it would be safe for A
- Case2 Suppose T does not include
- the edge (u, v)
- Idea construct another MST T
- that includes A ? (u, v)
23PRIM(V, E, w, r)
- Q ? ?
- for each u ? V
- do keyu ? 8
- pu ? NIL
- INSERT(Q, u)
- DECREASE-KEY(Q, r, 0) ? keyr ? 0
- while Q ? ?
- do u ? EXTRACT-MIN(Q)
- for each v ? Adju
- do if v ? Q and w(u, v) lt
keyv - then pv ? u
-
DECREASE-KEY(Q, v, w(u, v))
Total time O(VlgV ElgV) O(ElgV)
O(V) if Q is implemented as a min-heap
O(lgV)
Min-heap operations O(VlgV)
Executed V times
Takes O(lgV)
Executed O(E) times
O(ElgV)
Constant
Takes O(lgV)
24KRUSKAL(V, E, w)
- A ? ?
- for each vertex v ? V
- do MAKE-SET(v)
- sort E into non-decreasing order by w
- for each (u, v) taken from the sorted list
- do if FIND-SET(u) ? FIND-SET(v)
- then A ? A ? (u, v)
- UNION(u, v)
- return A
- Running time O(VElgEElgV)O(ElgE) dependent
on the implementation of the disjoint-set data
structure
O(V)
O(ElgE)
O(E)
O(lgV)
25Shortest Paths Problem
- Variants of shortest paths problem
- Effect of negative weights/cycles
- Notation
- dv estimate
- d(s, v) shortest-path weight
- Properties
- Optimal substructure theorem
- Triangle inequality
- Upper-bound property
- Convergence property
- Path relaxation property
26Relaxation
- Relaxing an edge (u, v) testing whether we can
improve the shortest path to v found so far by
going through u - If dv gt du w(u, v)
- we can improve the shortest path to v
- ? update dv and ?v
After relaxation dv ? du w(u, v)
RELAX(u, v, w)
RELAX(u, v, w)
27Single Source Shortest Paths
- Bellman-Ford Algorithm
- Allows negative edge weights
- TRUE if no negative-weight cycles are reachable
from the source s and FALSE otherwise - Traverse all the edges V 1 times, every time
performing a relaxation step of each edge - Dijkstras Algorithm
- No negative-weight edges
- Repeatedly select a vertex with the minimum
shortest-path estimate dv uses a queue, in
which keys are dv
28BELLMAN-FORD(V, E, w, s)
- INITIALIZE-SINGLE-SOURCE(V, s)
- for i ? 1 to V - 1
- do for each edge (u, v) ? E
- do RELAX(u, v, w)
- for each edge (u, v) ? E
- do if dv gt du w(u, v)
- then return FALSE
- return TRUE
- Running time O(VVEE)O(VE)
?(V)
O(V)
O(E)
O(E)
29Dijkstra (G, w, s)
- INITIALIZE-SINGLE-SOURCE(V, s)
- S ? ?
- Q ? VG
- while Q ? ?
- do u ? EXTRACT-MIN(Q)
- S ? S ? u
- for each vertex v ? Adju
- do RELAX(u, v, w)
- Update Q (DECREASE_KEY)
-
- Running time O(VlgV ElgV) O(ElgV)
?(V)
O(V) build min-heap
Executed O(V) times
O(lgV)
O(E) times (total)
O(lgV)
30Correctness
- Bellman-Fords Algorithm Show that dv d (s,
v), for every v, after V-1 passes. -
- Dijkstras Algorithm For each vertex u ? V, we
have du d(s, u) at the time when u is added
to S.
31NP-completeness
- Algorithmic vs Problem Complexity
- Class of P problems
- Tractable/Intractable/Unsolvable problems
- NP algorithms and NP problems
- PNP ?
- Reductions and their implication
- NP-completeness and examples of problems
- How do we prove a problem NP-complete?
- Satisfiability problem and its variations